202 34 8MB
English Pages 368 Year 2018
Scott Krig Synthetic Vision
Scott Krig
Synthetic Vision
Using Volume Learning and Visual DNA
ALL SOFTWARE HEREIN IS © 2013 SCOTT A. KRIG - ALL RIGHTS RESERVED. PATENTS MAY APPLY TO CERTAIN CONCEPTS EMBODIED IN THE SOFTWARE AND THE DESCRIPTIVE INFORMATION HEREIN. SOFTWARE IS AVAILABLE UNDER LICENSE AGREEMENT WITH KRIG RESEARCH FOR BINARY AND OPENSOURCE LICENSING. SEE HTTP://KRIGRESEARCH.COM FOR LICENSING AND DOWNLOAD INFORMATION SEE ALSO https://www.degruyter.com/view/product/486481 FOR LICENSING AND DOWNLOAD INFORMATION
ISBN 978-1-5015-1517-0 e-ISBN (PDF) 978-1-5015-0596-6 e-ISBN (EPUB) 978-1-5015-0629-1 Library of Congress Control Number: 2018947667 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2018 Scott Krig Published by Walter de Gruyter Inc., Boston/Berlin Printing and binding: CPI books GmbH, Leck Typesetting: MacPS, LLC, Carmel www.degruyter.com
About De|G PRESS Five Stars as a Rule De|G PRESS, the startup born out of one of the world’s most venerable publishers, De Gruyter, promises to bring you an unbiased, valuable, and meticulously edited work on important topics in the fields of business, information technology, computing, engineering, and mathematics. By selecting the finest authors to present, without bias, information necessary for their chosen topic for professionals, in the depth you would hope for, we wish to satisfy your needs and earn our five-star ranking. In keeping with these principles, the books you read from De|G PRESS will be practical, efficient and, if we have done our job right, yield many returns on their price. We invite businesses to order our books in bulk in print or electronic form as a best solution to meeting the learning needs of your organization, or parts of your organization, in a most cost-effective manner. There is no better way to learn about a subject in depth than from a book that is efficient, clear, well organized, and information rich. A great book can provide life-changing knowledge. We hope that with De|G PRESS books you will find that to be the case.
DOI 10.1515/9781501505966-201
Acknowledgments Thanks to Jeffrey Pepper at De Gruyter Publishers for taking this project and making it possible; thanks to Uma Gadamsetty for technical editorial feedback and interesting applications discussions; and thanks to several other reviewers, most notably Chris Nelson, Jaya Dalal, and many people at De Gruyter for editorial touches during the publications process. Thanks to those who have graciously provided permissions to use illustrations, as well as numerous conversations on various points with several people, too many to mention. As usual, thanks to my wife for advising me and managing my schedule, so that I pace myself and do not burn out as rapidly as I am able to by over working and enjoying it (maybe too much). And most thanks to the Father of Lights, who shines wisdom and knowledge on us all, Anno Domini 2018. Scott Krig
DOI 10.1515/9781501505966-202
Contents Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA 1 Overview 1 Synthetic Visual Pathway Model 3 Visual Genome Model 4 Volume Learning 6 Classifier Learning and Autolearning Hulls 7 Visual Genome Project 8 Master Sequence of Visual Genomes and VDNA 10 VGM API and Open Source 11 VDNA Application Stories 11 Overcoming DNN Spoofing with VGM 11 Inspection and Inventory Using VDNA 17 Other Applications for VDNA 18 Background Trends in Synthetic Intelligence 19 Background Visual Pathway Neuroscience 22 Feature and Concept Memory Locality 22 Attentional Neural Memory Research 23 HMAX and Visual Cortex Models 24 Virtually Unlimited Feature Memory 24 Genetic Preexisting Memory 25 Neurogenesis, Neuron Size, and Connectivity 25 Bias for Learning New Memory Impressions 26 Synthetic Vision Pathway Architecture 27 Eye/LGN Model 28 VDNA Synthetic Neurobiological Machinery 28 Memory Model 30 Learning Centers and Reasoning Agent Models 31 Deep Learning vs. Volume Learning 32 Summary 37 Chapter 2: Eye/LGN Model 39 Overview 39 Eye Anatomy Model 39 Visual Acuity, Attentional Region (AR) 41 LGN Model 43 LGN Image Assembly 44 LGN Image Enhancements 46 LGN Magno and Parvo Channel Model 47 Magno and Parvo Feature Metric Details 48 Scene Scanning Model 49 DOI 10.1515/9781501505966-203
x
Contents
Eye/LGN Visual Genome Sequencing Phases 52 Magno Image Preparation 53 Parvo Image Preparation 53 Segmentation Rationale 54 Saccadic Segmentation Details 56 Processing Pipeline Flow 59 Processing Pipeline Output Files and Unique IDs 62 Feature Metrics Generation 63 Summary 65 Chapter 3: Memory Model and Visual Cortex 67 Overview 67 Visual Cortex Feedback to LGN 69 Memory Impressions and Photographic Memory 70 CAM Neurons, CAM Features, and CAM Neural Clusters 71 Visual Cortex and Memory Model Architecture 72 CAM and Associative Memory 73 Multivariate Features 74 Primal Shapes, Colors, Textures, and Glyphs 75 Feature VDNA 75 Volume Feature Space, Metrics, Learning 77 Visual DNA Compared to Human DNA 78 Spatial Relationship Processing Centers 79 Strand and Bundle Models 80 Strand Feature Topology 81 Strand Learning Example 82 Bundles 84 Visual Genome Sequencing 84 Visual Genome Format and Encodings 86 Summary 86 Chapter 4: Learning and Reasoning Agents 87 Overview 87 Machine Learning and AI Background Survey 88 Learning Models 89 Training Protocols 95 Reasoning and Inference 96 Synthetic Learning and Reasoning Model Overview 96 Conscious Proxy Agents in the PFC 97 Volume Learning 98 VGM Classifier Learning 99 Qualifier Metrics Tuning 99
Contents
Genetically Preexisting Learnings and Memory 100 Continuous Learning 101 Associative Learning 101 Object Learning vs. Category Learning 102 Agents as Dedicated Proxy Learning Centers 102 Agent Learning and Reasoning Styles 103 Autolearning Hull Threshold Learning 108 Correspondence Permutations and Autolearning Hull Families 110 Hull Learning and Classifier Family Learning 114 Autolearning Hull Reference/Target Differences 116 Structured Classifiers Using MCC Classifiers 117 VDNA Sequencing and Unique Genome IDs 118 Correspondence Signature Vectors (CSV) 119 Alignment Spaces and Invariance 121 Agent Architecture and Agent Types 121 Custom Agents 122 Master Learning Controller: Autogenerated C++ Agents 123 Default CSV Agents 124 Agent Ecosystem 125 Summary 125 Chapter 5: VGM Platform Overview 127 Overview 127 Feature Metrics, Old and New 128 Invariance 129 Visual Genomes Database 130 Global Unique File ID and Genome ID 131 Neuron Encoder and QoS Profiles 132 Agent Registry 132 Image Registry 134 Strand Registry 135 Segmenter Intermediate Files 136 Visual Genome Metrics Files 137 Base Genome Metrics 137 Genome Compare Scores 138 Agent Management 140 Sequencer Controller 141 Correspondence Controller 142 Master Learning Controller (MLC) 144 CSV Agents 144 Correspondence Signature Vectors (CSVs) 146 Group Metric Classifiers (GMCs) 147
xi
xii
Contents
Strand Topological Distance 149 Interactive Training and Strand Editing 149 Metric Combination Classifiers (MCCs) 150 MCC Function Names 151 MCC Best Metric Search 152 Metric Combination Classifier (MCC) Summary 153 VGM Platform Controllers 157 Image Pre-Processing and Segmenter: lgn 158 Genome Image Splitter: gis 159 Compute Visual Genomes: vg 159 Comparing and Viewing Metrics: vgc 160 Agent Testing and Strand Management: vgv 162 Summary 167 Chapter 6: Volume Projection Metrics 169 Overview 169 Memory Structure: 3x3 vs. 3x1 170 CAM Feature Spaces 172 CAM Neural Clusters 174 Volume Projection Metrics 175 Quantization Space Pyramids 176 Strand CAM Cluster Pyramids 177 Volume Metric Details 179 Volume Impression Recording 179 Volume Metrics Functions 179 Volume Metrics Memory Size Discussion 181 Magno and Parvo Low-Level Feature Tiles 184 Realistic Values for Volume Projections 186 Quantized Volume Projection Metric Renderings 187 Summary 194 Chapter 7: Color 2D Region Metrics 195 Overview 195 Background Research 195 Color Spaces 197 RGB Color 198 LUMA, RGBI, CIELab Intensity 199 HSL Hue and Saturation 199 Eye Model Color Ranging 199 Squinting Model and Sliding Histograms 200 Sliding Contrast over Cumulative Histograms 201 Sliding Lightness over Normal Histograms 201
Contents
Sliding Metrics, Centroid, and Best Match 202 Static Color Histogram Metrics 203 LGN Model Color Leveling 204 Color Level Raw 204 Color Level Centered 204 Color Level CIELab Constant 205 Color Level HSL Saturation Boosting 206 LGN Model Dominant Colors 207 Leveled Histogram Distance, Moments 208 Popularity Colors 208 Standard Colors 214 Color Metrics Functions 216 Summary 218 Chapter 8: Shape Metrics 219 Overview 219 Strand Topological Shape Metrics 219 Single-Image vs. Multiple-Image Strands 219 Strand Local Vector Coordinate System 220 Strand Vector Metrics 223 Strand Set Metrics 225 Strand Shape Metrics: Ellipse and Fourier Descriptors 226 Volume Projection Shape Metrics 227 Statistical Metrics 227 Ratio Metrics 230 Genome Structure Shape Metrics 231 Genome Structure Local Feature Tensor Space 231 Genome Structure Correspondence Metrics 233 Shape Metric Function List 233 Summary 236 Chapter 9: Texture Metrics 239 Overview 239 Volume Projection Metrics for CAM Clusters 240 3x1 RGBI Component Textures 240 3x3 RGB Textures 242 Volume Metric Distance Functions 244 Haralick Features 248 Haralick Metrics 250 SDMX Features 253 SDMX Metrics 254 Haralick and SDMX Metric Comparison Graphs 257
xiii
xiv
Contents
Texture Similarity Graphs (Match < 1.0) 257 Texture Dissimilarity Graphs (Nonmatch > 1.0) 261 MCC Texture Functions 264 CSV Texture Functions 267 Summary 267 Chapter 10: Region Glyph Metrics 269 Overview 269 Color SIFT 270 Color Component R,G,B,I G-SURF 271 Color Component R,G,B,I ORB 271 RGB DNN 272 MCC Functions for Glyph Bases 272 Glyph Base CSV Agent Function 274 Summary 274 Chapter 11: Applications, Training, Results 275 Overview 275 Test Application Outline 276 Strands and Genome Segmentations 278 Building Strands 279 Parvo Strand Example 280 Magno Strand Example 281 Discussion on Segmentation Problems and Work-arounds 281 Strand Alternatives: Single-image vs. Multi-image 282 Testing and Interactive Reinforcement Learning 282 Hierarchical Parallel Ensemble Classifier 283 Reinforcement Learning Process 284 Test Genomes and Correspondence Results 285 Selected Uniform Baseline Test Metrics 286 Test Genome Pairs 292 Compare Leaf : Head (Lo-res) Genomes 292 Compare ront Squirrel : Stucco Genomes 294 Compare Rotated Back : Brush Genomes 297 Compare Enhanced Back : Rotated Back Genomes 299 Compare Left Head : Right Head Genomes 301 Test Genome Correspondence Scoring Results 304 Scoring Results Discussion 305 Scoring Strategies and Scoring Criteria 306 Unit Test for First Order Metric Evaluations 307 Unit Test Groups 307 Unit Test Scoring Methodology 307
Contents
MATCH Unit Test Group Results 309 NOMATCH Unit Test Group Results 313 CLOSE Unit Test Group Results 316 Agent Coding 320 Summary 322 Chapter 12: Visual Genome Project 323 Overview 323 VGM Model and API Futures 324 VGM Cloud Server, API, and iOS App 325 Licensing, Sponsors, and Partners 325 Bibliography 327 Index 337
xv
Preface This work began in 2014 as I prototyped a synthetic model of the visual pathway, including volume learning, visual genomes, and visual DNA concepts, with details published in Appendix E of my previous book Computer Vision Metrics: Survey, Taxonomy and Analysis of Computer Vision, Visual Neuroscience, and Deep Learning, Textbook Edition (Springer) [1]. The motivation and hypothesis for this work is to create a complete visual pathway model from the eyes through the lateral geniculate nucleus (LGN) through the visual cortex through the higher-level learning and reasoning centers—a Synthetic Vision Model—based on the best neuroscience and computer vision research to date. Various ad hoc vision systems have been devised over recent history, combining machine learning and deep learning, artificial intelligence (AI), neuroscience concepts, computer vision, and image processing, with varying success [1]. However, no complete synthetic vision model has been introduced, so the opportunity to research and develop such a model is attractive. While no model can claim to be complete, the journey to this current model points the way to future models (see Chapter 12 for details). The approach taken for this work is to first identify the most plausible visual neuroscience research and concepts to model, to duplicate what is known in a synthetic model, including computer vision and image processing methods enhanced with novel approaches taken herein. In summary, decades of my experience along with about 1,000 fundamental references have been consulted for this work, including the substantial survey, taxonomy, and analysis of the literature in my work [1], to identify the most promising concepts to guide the development of a complete synthetic vision model, as well as talks with friends, engineers, and scientists in the field. The goal for this work is to lay a foundation for future research and collaboration via a Visual Genome Project to catalog all possible instances of chosen visual features taken from a massive collection of millions of images (as a start), enabling collaboration in vision science research and applications, to classify and study visual features. The starting point for a visual genome project is described in this work, and I hope to continue the work and enable a community of researchers and developers to work together in a common framework, to share knowledge, and to advance visual learning science. Some inspiration for this work comes from the Human Genome Project, which catalogs human genetic materials to allow for ongoing research into genetic science, to identify common genes, and perhaps to improve medicine and our quality of life. The Visual Genome Project has a similar goal—to catalog visual impressions, visual features, and visual learning—as a basis for future research and commercial applications. This work points the way toward a future with complete synthetic vision systems, acting as components within synthetic biological systems such as robots, rather than DOI 10.1515/9781501505966-204
xviii
Preface
ad hoc applications of computer vision and deep learning as are prominent today. The synthetic vision model developed here provides a working prototype pointing to the future.
Chapter 1 Synthetic Vision Using Volume Learning and Visual DNA Whence arises all that order and beauty we see in the world?
―Isaac Newton
Overview Imagine a synthetic vision model, with a large photographic memory, that learns all the separate features in each image it has ever seen, which can recall features on demand and search for similarities and differences, and learn continually. Imagine all the possible applications, which can grow and learn over time, only limited by available storage and compute power. This book describes such a model. This is a technical and visionary book, not a how-to book, describing a working synthetic vision model of the human visual system, based on the best neuroscience research, combined with artificial intelligence (AI), deep learning, and computer vision methods. References to the literature are cited throughout the book, allowing the reader to dig deeper into key topics. As a technical book, math concepts, equations, and some code snippets are liberally inserted describing key algorithms and architecture. In a nutshell, the synthetic vision model divides each image scene into logical parts similar to multidimensional puzzle pieces, and each part is described by about 16,000 different Visual DNA metrics (VDNA) within a volume feature space. VDNA are associated into strands—like visual DNA genes—to represent higher-level objects. Everything is stored in the photographic visual memory, nothing is lost. A growing set of learning agents continually retrain on new and old VDNA to increase visual knowledge. The current model is a first step, still growing and learning. Test results are included, showing the potential of the synthetic vision model. This book does not make any claims for fitness for a particular purpose. Rather, it points to a future when synthetic vision is a commodity, together with synthetic eyes, ears, and other intelligent life. You will be challenged to wonder how the human visual system works and perhaps find ways to criticize and improve the model presented in this book. That’s all good, since this book is intended as a starting point—to propose a visual genome project to allow for collaborative research to move the synthetic model forward. The visual genome project proposed herein allows for open source code development and joint research, as well as commercial development spin-offs, similar to the Human Genome Project funded by the US government, which motivates this work. The visual
DOI 10.1515/9781501505966-001
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA genome project will catalog Visual DNA on a massive scale, encouraging collaborative research and commercial application development for sponsors and partners. Be prepared to learn new terminology and concepts, since this book breaks new ground in the area of computer vision and visual learning. New terminology is introduced as needed to describe the concepts of the synthetic vision model. Here are some of the key new terms and concepts herein: – Synthetic vision: A complete model of the human visual system; not just image processing, deep learning, or computer vision, but rather a complete model of the visual pathway and learning centers (see Figure 1.1). In the 1960s, aerospace and defense companies developed advanced cockpit control systems described as synthetic vision systems, including flight controls, target tracking, fire controls, and flight automation. But, this work is not directed toward aerospace or defense (but could be applied there); rather, we focus on modeling the human visual pathway. – Volume learning: No single visual feature is a panacea for all applications, so we use a multivariate, multidimensional feature volume, not just a deep hierarchy of monovariate features as in DNN (Deep Neural Networks) gradient weights. The synthetic model currently uses over 16,000 different types of features. For example, a DNN uses monovariate feature weights representing edge gradients, built up using 3×3 or n×n kernels from a training set of images. Other computer vision methods use trained feature descriptors such as the scale-invariant feature transform (SIFT), or basis functions such as Fourier features and Haar wavelets. – Visual DNA: We use human DNA strands as the model and inspiration to organize visual features into higher-level objects. Human DNA is composed of four bases: (A) Adenine, (T) Thymine, (G) Guanine, and (C) Cytosine, combined in a single strand, divided into genes containing related DNA bases. Likewise, we represent the volume of multivariate features as strands of visual DNA (VDNA) across several bases, such as (C) Color, (T) Texture, (S) Shape, and (G) Glyphs, including icons, motifs, and other small complex local features. Many other concepts and terminology are introduced throughout this work, as we break new ground and push the boundaries of visual system modeling. So enjoy the journey through this book, take time to wonder how the human visual system works, and hopefully add value to your expertise along the way.
Synthetic Visual Pathway Model
Figure 1.1: The synthetic visual pathway model. The model is composed of (1) an eye/LGN model for optical and early vision processing, (2) a memory model containing groups of related features emulating neural clusters of VDNA found in the visual cortex processing centers V1-V4-Vn, and (3) a learning and reasoning model using agents to perform high-level visual reasoning. Agents create top-level classifiers using a set of multivariate MCC classifiers (discussed in Chapters 4–6).
Synthetic Visual Pathway Model The synthetic vision model, shown in Figure 1.1, includes a biologically plausible Eye/LGN Model for early vision processing and image assembly as discussed in Chapter 2, a Memory Model that includes several regional processing centers with local memory for specific types of features and objects discussed in Chapter 4, and a Learning/Reasoning Model composed of agents that learn and reason as discussed in Chapter 4. Synthetic Neural Clusters are discussed in Chapter 6, to represent a group of low-level edges describing objects in multiple color spaces and metric spaces, following the standard Hubel and Weisel theories [1] that inspired DNNs. The neural clusters allow low-level synthetic neural concepts to be compared. See Figures 1.2, 1.3, and 1.4 as we go. In this chapter, we provide several overview sections describing the current synthetic visual pathway model, with model section details following in subsequent Chapters 2–11. The resulting volume of features and visual learning agents residing within the synthetic model are referred to as the visual genome model (VGM). Furthermore, we propose and discuss a visual genome project enabled by the VGM to form a common basis for collaborative research to move synthetic vision science forward
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA and enable new applications, as well as spin off new commercial products, discussed in Chapter 12. We note that the visual genome model and corresponding project described herein is unrelated to the work of Krishna et al. [163][164] who use crowd-source volunteers to create a large multilabeled training set of annotated image features, which they call a visual genome. Some computer vision practitioners may compare the VGM discussed in this book to earlier research such as parts models or bag-of-feature models [1]; however the VGM is out of necessity much richer in order to emulate the visual pathway. The fundamental concepts of the VGM are based on the background research in the author’s prior book, Computer Vision Metrics: Survey, Taxonomy, and Analysis of Computer Vision, Visual Neuroscience, and Deep Learning, Textbook Edition (Springer-Verlag, 2016), which includes nearly 900 references into the literature. The reader is urged to have a copy on hand.
Visual Genome Model The VGM is shown in Figure 1.2 and Figure 1.4; it consists of a hierarchy of multivariate feature types (Magno, Parvo, Strands, Bundles) and is much more holistic than existing feature models in the literature (see the survey in [1]). The microlevel features are referred to as visual DNA or VDNA. Each VDNA is a feature metric, described in Chapters 4–10. At the bottom level is the eye and LGN model, described in Chapters 2 and 3. At the top level is the learned intelligence—agents—as discussed in Chapter 4. Each agent is a proxy for a specific type of learned intelligence, and the number of agents is not limited, unlike most computer vision systems which rely on a single trained classifier. Agents evaluate the visual features during training, learning, and reasoning. Agents can cooperate and learn continually, as described in Chapter 4. Visual genomes are composed together from the lower level VDNA features into sequences (i.e. VDNA strands and bundles of strands), similar to a visual DNA chain, to represent higher-level concepts. Note that as shown in Figure 1.2, some of the features (discussed in Chapter 6) are stored as content-addressable memory (CAM) as neural clusters residing in an associative memory space, allowing for a wide range of feature associations. The feature model including the LGN magno and parvo features is discussed in detail in Chapters 2, 3, and 4.
Synthetic Visual Pathway Model
Figure 1.2: The hierarchical visual genome model (VGM). Illustration Copyright © Springer International Publishing 2016. Used by permission (see [166]).
The VGM follows the neurobiological concepts of local receptive fields and a hierarchy of features, similar to the Hubel and Wiesel model [146][147] using simple cells and complex cells. As shown in Figure 1.2, the hierarchy consists of microlevel VDNA parvo features which are detailed and high resolution, low resolution magno features, midlevel strands of features, and high-level bundles of strands as discussed below. Each feature is simply a metric computed from the memory record of the visual inputs stored in groups of neuron memory. This is in contrast to the notion of designing a feature descriptor to represent a group of pixels in a compressed or alternative metric space as done in traditional computer vision methods. VGM stores low-level parvo and magno features as raw memory records or pixel impressions and groups the low-level VDNA memory records as strands and groups of strands, which are associated together as bundles describing higher-level concepts. For texture-like emulation of Hubel and Weisel neuron cells, the raw input pixel values of local receptive fields are used to compose a feature address vector referencing a huge virtual multidimensional address space (for more details see Chapter 6, Figures 6.1, 6.2, 6.3). For CAM features, the address is the feature. The intent of using the raw pixel values concatenated into a CAM address is to enable storage of the raw visual impressions directly in a virtually unlimited feature memory with no intervening processing, following the view-based model of neurobiology. The bit precision of the address determines the size of the memory space. The bit precision and
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA coarseness of the address is controlled by a quantization parameter, discussed later along with the VGM neural model. So, the VGM operates in a quantization space, revealing more or less detail about the features as per the quantization level in view.
Volume Learning The idea of volume learning is to collect, or learn, a large volume of different features and metrics within a multivariate space, including color, shape, texture, and conceptual glyphs, rather than collecting only a nonvolumetric 1D list of SIFT descriptors, or a hierarchical (i.e. deep) set of monolithic gradient features such as DNN convolutional weight templates. Volume learning implies a multidimensional, multivariate metric space with a wide range of feature types and feature metrics, modeled as VDNA. We suggest and discuss a robust baseline of VDNA feature metrics in great detail in Chapters 5–10, with examples tested in Chapter 11. Of course, over time more feature metrics will be devised beyond the baseline. Proxy agents, discussed in detail in Chapter 4, are designed to learn and reason based on the volume of features. So, many styles of learning and reasoning are enabled by volume learning. In fact, agents can model some parts of human consciousness which seem to be outside of the neurobiology itself. Such higher-level consciousness, as emulated in agents, can direct the brain to explore or learn based on specific goals and hypotheses. In this respect, the VGM provides a synthetic biological model of a brain enabling proxy agents with varying IQs and training to examine the same visual impressions and reach their own conclusions. Each proxy agent encodes learnings and values similar to recipes, algorithms, priorities, likes, dislikes, and other heuristics. Proxy agents exist in the higher-level PFC regions of the brain, along with strands, bundles, and other higher-level structures specific to each proxy agent. Thus, domain-specific or application learning is supported in the VGM, built on top of the visual impressions in the VGM. Volume learning is also concerned with collecting a large volume of genomes and features over time—continuous learning. So, volume learning is analogous to the Human Genome Project: sequence first, label and classify later. It is expected that billions of images can be sequenced to identify the common visual genomes and VDNA; at that point, labeling and understanding can begin in earnest. Even prior to reaching saturation of all known visual genomes (if possible), classification and labeling can occur on a grand scale. Figure 1.3 illustrates the general process of learning new genomes from new visual impressions over time: as the number of impressions in memory grows, the number of unique new features encountered decreases. That is where the visual genome project gets interesting.
Synthetic Visual Pathway Model
Figure 1.3: The general process of learning new genomes from new visual impressions. As the number of impressions in memory grows, the number of unique new features encountered decreases.
Volume learning and the VGM assumes that the sheer number of features is more critical than the types of features chosen, as evidenced by the feature learning architecture survey in [1] Chapters 6 and 10, which discusses several example systems that achieve similar accuracy based on a high number of features using differing feature types and feature hierarchies. It is not clear from convolutional neural networks (CNNs) research that the deep hierarchy itself is the major key to success. For example, large numbers of simple image pixel rectangles have been demonstrated by Gu et al. [148] to be very capable image descriptors for segmentation, detection, and classification. Gu organizes the architecture as a robust bag of overlapped features, using Hough voting to identify hypothesis object locations, followed by a classifier, to achieve state-of-the-art accuracy on several tests. The VGM follows the intuition that the sheer number of features is valuable and offers over 16,000 feature dimensions per genome and over 56,000 feature comparison metrics per each reference/target genome comparison, using various distance functions as discussed in Chapters 2–10.
Classifier Learning and Autolearning Hulls Classifier learning is a novel method developed in the VGM, and allows several independent classifiers to be learned in separate agents from the large volume of VDNA features. Classifier learning takes place after the volume feature learning, based on a chosen training protocol used by an agent. Several agents may take a different training protocol and then the agents may operate together in an ensemble. VGM supports complex, structured classifiers as discussed in Chapter 4. Many traditional vision systems rely on a simple final classifier stage, such as a SOFTMAX or support vector machine (SVM), to perform matching and inference [1]. However some classifier models, such as the Inception DNNs [158], use a branching DNN structure with classifiers scattered at branch terminus. However, the common element in typical DNN classifiers is that the classifier acts on a set of learned feature
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA weights, but the classifier is not learned: the classifier method used is a design decision, and the classifier is manually tuned to operate on accumulated feature match thresholds during testing. As discussed in [1], Feed Forward (FF) DNNs develop a hierarchical model consisting of a 1D vector of uncorrelated monovariate weights (for example, double precision numbers or weights), but VDNA metrics enable a volume learning model consisting of a multivariate feature set. A volume of individual metrics is expressive, allowing for a richer classifier. Autolearning hulls are a form of metric learning discussed in detail in Chapter 4. Each autolearning hull provides a biologically plausible feature match hull threshold for each metric to assist in building up structured multivariate classifiers from sets of metrics. First, as discussed in Chapters 2 and 3, each feature metric is computed within the various biological eye/LGN model spaces raw, sharp, retinex, histeq, and blur. Currently, this yields ~16,000 independent feature metrics per genome region. Next, a hull representing the biologically plausible variance of each of the ~16,000 feature metrics within the eye/LGN spaces is computed. During learning, the hulls are sorted to find the metrics with the tightest hull variation to use as a first-order set of metrics to be further tuned via a form of agent-based reinforcement learning to optimize the metric hulls to achieve the best correspondence with the training set. See Chapter 4. The VGM model allows for classifiers to be learned and implemented in separate agents, similar to the way humans manage a specific knowledge domain using domain-specific contextual rules, allowing each agent to define the learning and reasoning domain for a given task, rather than relying on a single classifier. Therefore, the autolearning method establishes a biologically plausible first-order default hull range around each feature metric for correspondence and matching. The default firstorder hulls are used by an agent as a starting point to establish a first-order classifier, and then the first-order metrics are tuned via a variant of reinforcement learning and composed into an optimized classifier during training. Several classifiers may be learned by separate agents for each genome, based on various robustness criteria, training protocols, or other heuristics. See the discussions in Chapters 4 and 5.
Visual Genome Project The visual genome project will be an open repository of visual knowledge to apply and leverage the visual genome model (VGM) within a common software infrastructure and platform, enabling a consortium of vision scientists and engineers to collaboratively move vision science forward by cataloging unique visual features, and visual learning agents, see Figure 1.4. The synthetic visual pathway model is informed by the research and survey into human vision, neuroscience, computer vision models, artificial intelligence, deep learning, and statistical analysis in the author’s text Computer Vision Metrics, Textbook Edition [1]. The synthetic visual pathway model
Visual Genome Project
includes plausible models of the biology and mechanisms of human vision. More background research is covered later in this chapter as well. The synthetic visual pathway model concepts are developed in detail within Chapters 2–10. The learning agents model is developed in Chapter 4. In Chapter 11 we illustrate the models in a complete synthetic vision system test; to sequence visual genome features for a specific environment; train learning agents to find the specific visual genomes; and provide results, analysis, discussion. Chapter 12 points to future work and details on the visual genome project. We take inspiration from the Human Genome Project and the Human Connectome Project, discussed later in this chapter, which demonstrate some of the best attributes of collaborative science projects. We expect synthetic vision to provide benefits analogous to other sciences such as prosthetics and robotics, where specific attributes of human biology and intelligence are analyzed, modeled, and deployed. Synthetic vision is another step along the trend in artificial biology and artificial intelligence. The models described herein are considered to be among the most novel and comprehensive in the literature and form the basis and starting point for future work into better visual pathway models, allowing an entire ecosystem of open source agents to be built by developers on a standard visual genome platform, using a common set of VDNA. Major goals of the visual genome project include: – Common VDNA Catalog: One goal is to create a catalog and database of VDNA and sequences of VDNA, including strands, bundles, objects, and genes as described in Chapter 3 (for specific application domains, see Figure 1.4). Once a set of specific VDNA is catalogued in the database, the VDNA is used by agents for comparison, correspondence, associations, learning, and reasoning. In addition, catalogued VDNA can be assigned a unique VDNA_ID number, providing memory and communications bandwidth savings for some agent methods and applications. Large numbers of VDNA will be catalogued for specific application domains, enabling learning agents to leverage VDNA that are already known. Over time, the VDNA catalog can be made as large as possible, providing high-level advantages to vision science to develop a massive VDNA sequence, similar to the Human Genome Project. This is a realistic goal, given that the resolution of the data is sufficiently compact, and commodity data storage capacity is increasing. – Common Learning Agents: Another goal is to create a repository of common set of learning agents trained for specific application domains. The visual genome format enables the learning agents to operate within a common feature space and metric space, allows for extensions, and enables the agents to work together and share agents in the repository. – Common Application Programming Interface (API) and Open Developer Ecosystem: Both commercial use and collaborative research use of the synthetic vision
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA model as an ecosystem can occur simultaneously, based on an open, standardized platform and API. Researchers can create and share a very large collection of visual genomes and agents for open access, while commercialized agents and VDNA can be developed as well. Effectively, the ecosystem is like the movie The Matrix, where knowledge and understanding (agents) can be deployed and downloaded for specific purposes.
Figure 1.4: The visual genome platform environment, which provides a common model for understanding visual information. Note that all the agents and all the VDNA are shared and persist, allowing for incremental increases in the VDNA catalog, and incremental agent advancements and new agents.
As shown in Figure 1.4, the term CSTG refers to the multivariate VGM feature bases Color (C), Shape (S), Texture (T), and Glyph (G), discussed in detail in Chapter 3 (see especially Figure 3.6).
Master Sequence of Visual Genomes and VDNA The end goal of the visual genome project is to sequence every image possible, create a master list of all the genomes and VDNA, and assign each genome a unique 16-byte sequence number initially in a 1 petabyte (PB) genome space. Once sequenced, each genome and associated metrics will be kept in a master database or VDNA catalog keyed by sequence numbers. It is envisioned that 1PB of unique genome IDs will be adequate for sequencing a massive set of visual images; assuming 3000 genome IDs per 12 megepixel image, 1PB of genome ID space is adequate for sequencing about 333
VDNA Application Stories
gigaimages. Images which contain genomes matching previously cataloged genomes can share the same genome IDs and storage space, so eventually finding common genomes will reducing storage requirements Like the Human Genome Project, the end goal is a complete master sequence of all possible VDNA, which will reveal common genomes and accelerate image analysis and correlation. Storage capability challenging today’s largest file systems will be required. The Phase 1 visual genome project is estimating 3–5 petabytes of storage required to sequence 1 million images and store all the image data, the visual DNA catalog, and the learning agent catalog. Phase 1 will store comprehensive metrics for research purposes. Application specific VDNA catalogs will be space optimized, containing only application essential VDNA metrics, reduced to a small memory footprint to fit common computing devices.
VGM API and Open Source The VGM API provides controls for the synthetic visual pathway model and is available as open source code with an API for local development, or as a cloud-based API for remote systems to leverage a complete VGM in the cloud. The API includes methods to create private genome spaces to analyze specific sets of images, as well as public genome spaces for collaborative work. The cloud-based API is a C++ library, rather than complete open source code. The entire C++ library code will be open sourced once the key concepts are developed and proven useful to the research community. Chapter 5 provides an overview of the VGM API. Chapter 12 provides open source details.
VDNA Application Stories In this section, we provide background on some applications for the synthetic vision model, focusing on two key applications: (1) overcoming DNN spoofing by using DNN glyphs (i.e. trained weight models) in the VDNA model and (2) visual inspection and inventory for objects such as a car, a room, or a building by collecting genomes from the subject and detecting new or changed genomes to add to the subject VDNA set. We also suggest other applications for VDNA at the end of the section.
Overcoming DNN Spoofing with VGM Current research into DNN models includes understanding why DNNs are easily spoofed and fooled in some cases (see Figure 1.5). We cover some of the background here from current DNN research to lay the foundation for using VDNA to prevent DNN spoofing. One
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA problem with a basic DNN that enables spoofing is that each DNN model weight is disconnected and independent, with no spatial relationships encoded in the model. Also, there is limited invariance in the model, except for any invariance that can be represented in the training set. Invariance attributes are a key to overcoming spoofing (see [1]). Part of the DNN spoofing problem relates to the simple monovariate features that are not spatially related, and the other part of the problem relates to the simple classifier that just looks for a critical mass of matching features to exceed a threshold. The VGM model overcomes these problems. Note that the recently developed CapsNet [170] provides a method to overcome some invariance issues (and by coincidence eliminate some spoofing) by grouping neurons (e.g. feature weights) to represent a higher-level feature or capsule, using a concept referred to as inverse graphics to model groups of neurons as a feature with associated graphics transformation matrices such as scale and rotation. CapsNet currently is able to recognize 2D monochrome handwriting character features in the MNIST dataset at state-of-the-art levels. CapsNet does not address the most complex problems of 3D scene recognition, 3D transforms, or scene depth. However, well-known 3D reconstruction methods [1] have been successfully used with monocular, stereo, and other 3D cameras to make depth maps and identify objects in 3D scene space and create 3D graphics models (i.e. inverse graphics), as applied to real estate for mapping houses and rooms, as well as other applications. In any case, CapsNet is one attempt to add geometric invariance to DNNs, and we expect that related invariance methods for DNNs will follow— for example, an ensemble with two models: (1) the DNN weight model and (2) a corresponding spatial relationship model between DNN weights based on objects marked in the training data. CapsNet points to a future where we will see more DNN feature spatial relationships under various 2D and 3D transform spaces. The VGM can also be deployed with DNNs in an ensemble to add invariance to scene recognition as explained below. Gens and Domingos [99][1] describe a DNN similar to Capsnet, using deep symmetry networks to address the fundamental invariance limitations of DNNs, by projecting input patches into a 6-dimensional affine feature space. Trained DNN models are often brittle and overtrained to the training set to such an extent that simple modifications to test images—even one-pixel modifications similar to noise—can cause the DNN inference to fail. DNNs are therefore susceptible to spoofing using specially prepared adversarial images. While DNNs sometimes work very well and even better than humans at some tasks, DNNs are also known to fail to inference correctly in catastrophic ways as discussed next. Research into spoofing by Nguyen et al. [5] show the unexpected effects of DNN classification failure (see Figure 1.5). DNN failure modes are hard to predict: in some cases the images recognized by DNNs can look like noise (Figure 1.5 left) or like patterns (Figure 1.5 right) rather than the natural images they are trained on. Nguyen’s work uses DNNs trained on ImageNet data and MNIST data and then generates perturbed images using a gradient descent-based image generation algorithm which misleads the trained DNNs to mislabel with high confidence.
VDNA Application Stories
Figure 1.5: These adversarial images fooled state-of-the-art DNNs, which incorrectly classified each image with over 99% confidence. The DNNs were trained using the ImageNet training sets. Images © 2105 Anh Nguyen, Jason Yosinski, and Jeff Clune, "Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images." CVPR 2015.
Some researchers believe that generating adversarial images using generative adversarial networks, or GANs, to test and prod other DNN models (see Goodfellow et al. [10]) has opened up a door to a new type of generative intelligence or unsupervised learning, with the goal of little to no human intervention required for DNN-based learning. Generative intelligence is a form of reinforcement learning, at the forefront current DNN research, where a DNN can generate a new DNN model by itself that refines its own model according to reinforcement or feedback criteria from another DNN. In other words, a DNN makes a new and better DNN all by itself by trying to fool another DNN. Generative intelligence will allow a student DNN to create models independently by studying a target DNN to (1) learn to emulate the target DNN, (2) learn to fool the target DNN, and then (3) learn to improve a new model of the target DNN. Generative adversarial networks (GANs) can be used to analyze training sets or reduce the size of the training set, as well as produce adversarial images to fool a DNN. Visual genomes can be used to add a protective layer to an existing DNN to harden the classifier against spoofing, in part by using VDNA strands of features with spatial information including local coordinates and angles as discussed in Chapter 8. The visual genome platform solution to DNN spoofing is possible since the VDNA model incorporates VDNA, such as shape (S) and feature metric spatial relationships, color (C), texture (T), and glyph (G) features together, as well as complex agent classifiers. Therefore, the final classification does not depend solely on thresholding feature correspondence from the simple spatially unaware DNN model weights but rather incorporates a complete VDNA model designed tolerant to various robustness criteria. To overcome DNN spoofing using VDNA, a DNN can be used together with the VGM model by associating VDNA with overlapping DNN features—overlaying a spatial dimension to the DNN model. Since DNN correspondence can be visualized spatially (see Zeiler and Fergus [11]) by both rendering the DNN weight matrices as images and also by locating the best corresponding pixel matrices in the corresponding
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA image and highlighting the locations in the image, the genome region VDNA can be evaluated together with the DNN model score, providing additional levels of correspondence checking beyond the DNN weights (see Figure 1.6).
Figure 1.6: VDNA can be applied to prevent DNN model spoofing. (Top) image is inference to a DNN model, (Center) DNN correspondence is visualized, and (Bottom) the corresponding regions form a genome region, which is then compared to the VDNA catalog using all bases CSTG to mitigate dependence on only the DNN Glyph G feature.
DNNs often produce glyph-like, realistic looking icons of image parts, resembling dithered average feature icons of parts of the training images (see [1], Figure 9.2), like learned correlation templates. While these glyphs are proven to be effective in DNN inference, there are also extreme limitations and failures, attributable to the primitive nature of correlation templates, as discussed in [1]. Also, the gradient-based training nature of DNNs favor gradients of individual weights of pixels, not gradients of solid regions of color or even texture. Furthermore, the DNN correlation templates are not spatially aware of each other, which is a severe DNN model limitation. Visual genomes overcome the limitations of DNN by providing a rich set of VDNA shape (S), color (C), texture (T), and glyph (G) bases, as well as spatial relationships between features supported in the VDNA model as illustrated in Figure 3.1. Using DNNs and the VGM together as an ensemble can increase DNN accuracy.
VDNA Application Stories
DNN failures and spoofing have been overcome by a few methods cited here; for example, using voting ensembles of similarly trained DNNs to weed out false inferences as covered by Bajuja, Covell, and Suthankar [3]. Also Wan [4] trains a hybrid DNN model + structured parts model together as an ensemble. The visual genome method includes hybrid feature models and is therefore similar to Wan’s method but uses a richer feature space with over 16,000 base features. While many methods exist to perturb images to fool a DNN, one method studied by Baluja and Fischer [15] uses specially prepared gradient patterns taken from a trained DNN model combined with a test image. From a human perspective, the adversarial images may be easily recognized, but a DNN model may not have been trained to recognize such images. As shown in the paper, an image of a dog is combined with gradient information from a soccer ball, resulting in the DNN model recognizing the image as a soccer ball instead of a dog. Note that specific adversarial images may fool a specific trained DNN model, but will not necessarily fool other trained DNN models. Quite a bit of research is in progress to understand the spoofing phenomenon. In many cases, adversarial images generated by a computer are not realistic and may never be found in natural images, so the adversarial threat can be dismissed as unlikely in some applications. However, Juraki et al. [19] show that adversarial images generated by computer and then printed onto paper and viewed on a mobile phone are effective in spoofing DNNs. While this work on adversarial images may seem academic, the results demonstrate how easily a malicious actor can develop adversarial images to spoof traffic signs, print them out, and place them over the real traffic signs to fool DNNs. Another related DNN spoofing method to alter traffic signs is described and tested by Evtimov et al. [76] where certain patterns and symbols can be placed on a traffic sign, similar to decals, and the DNN can be spoofed 100% of the time according to their tests. In fact, just adding a Post-it sticker to a traffic sign is enough to fool the best DNN, as demonstrated in the Badnets paper [107] by Garg et al. (see also Evtimov et al. [107]). Clearly, DNNs are susceptible to spoofing, like other computer vision methods. Papernot et al. [6] implement a method of generating adversarial images to fool a specific DNN model by (1) querying a target DNN with perturbed images generated by Goodfellow’s method and Papernot’s method sent to the target DNN for classification and (2) using the classification results returned from the target DNN to build up an adversarial training set used to train up a second adversarial DNN model, which can generate perturbed adversarial images that do not fool humans, but fool a target DNN most of the time. The method assumes no advance knowledge of the target DNN architecture, model, or training images. Another noteworthy approach to generating adversarial networks is developed by Goodfellow et al. [10]. Other related methods include [33][34].
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA Methods also exist to harden a DNN against adversarial attacks, so we discuss as a few representative examples in the next few paragraphs. DNN distillation [1][8] can be applied to prevent spoofing. Distillation was originally conceived as a method of DNN network and model compression or refactoring, to reduce the size of the DNN model to the smallest equivalent size. However, distillation has other benefits as well for anti-spoofing. The basic idea of distillation involves retraining a DNN recursively, by starting subsequent training using the prior DNN model as the starting point, looking for ways to produce a smaller DNN architecture with fewer connections and fewer weights in the model (i.e. eliminate weights which are ~0, and pruning corresponding connections between layers), while still preserving acceptable accuracy. Distillation is demonstrated as an effective anti-spoofing method by Papernot et al. [7] by training a DNN model more than once, starting each subsequent training cycle by using the prior trained DNN model weights. A variety of DNN model initialization methods have been used by practitioners, such as starting training on a random-number initialized weight model [1], or by transfer learning [1], which transfers trained DNN model weights from another trained DNN model as the starting point and then implementing a smaller DNN which distills the DNN model into a smaller network. The basic idea of distilling a large DNN using a deeper DNN architecture pipeline and a larger model weight space into a small DNN architecture with fewer levels and a smaller DNN weight space is first suggested by Ba and Caruana [8]; however their inquiry was simply on DNN architecture complexity reductions and model size reductions. Papernot found that the distillation process of re-using a DNN’s model to recursively train itself also yielded the benefit of reducing susceptibility to adversarial misclassification. A related distillation method intended mainly to reduce DNN model complexity, DNN architecture size, and to optimize performance is demonstrated by Song et al. [9] and is called deep compression, which is highly effective at DNN model reduction via (1) pruning weights from the trained DNN model which are near zero and hardly affect accuracy, (2) pruning the corresponding DNN connection paths from the architecture layers to synthesize a smaller DNN (a type of distillation, or sparse model generation), (3) weight sharing and Huffman coding of weights to reduce memory footprint, and (4) retraining the distilled network using the pruned DNN weight model and connections. This approach has enabled commercially effective DNN acceleration by NVIDIA and others. Model size reduction of 35–49x is demonstrated, network architecture connections are reduced 9–13x, with DNN model accuracy remaining effectively unchanged.
VDNA Application Stories
Inspection and Inventory Using VDNA An ideal application for the VGM is an inventory and inspection system, which decomposes the desired scene into a set of VDNA and stores the VDNA structured into VDNA strands as the base set, or inventory. The initial inventory is a catalog of known VDNA. As discussed in detail in Chapter 3, VDNA can be associated into strands and bundles of strands, including shape metrics for spatial relationships within VDNA bases, to compose higher-level objects and a complete genome for each application area. As shown in Figure 1.7, the initial inspection is used to form a baseline catalog of known DNA. Subsequent inspections can verify the presence of previously catalogued VDNA, and also detect unknown VDNA, which in this case represent scratches to the guitar finish.
Figure 1.7: The basic process of visual inventory, based on an initial baseline inventory to collect known VDNA for inspection purposes. Subsequent inspection of objects may reveal new VDNA (in this case a scratch to the guitar finish), and the newly discovered unknown VDNA can then be added to the catalog and labeled as “defect at position (x,y) in genome.”
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA Other Applications for VDNA Here we provide a summary pointing out a few applications where the VGM may be particularly effective. The visual genome project will enable many other applications as well. See Table 1.1. Table 1.1: Visual genome platform applications summary VGM Application
Description
Defect Inspection for Airlines
Airline maintenance staff collects baseline VDNA inventories of all aircraft. At intervals, new inventories are taken and compared to the baseline. Wear and tear are added to the VDNA catalog and labeled with META DATA for maintenance history and repair tracking. *VGM can be applied to general inspection tasks including rental car or personal automobile inspection.
DNN Anti-Spoofing
Create VDNA of all image segments which compose a DNN glyph model label, use all VDNA bases (CSTG) as well as spatial VDNA strand relationships to cross-check DNN model correspondence G base with correspondence to other bases CST.
Insurance claims
Agent takes detailed visual genome of an insured object, such as a musical instrument. The VDNA are organized into strands and bundles. Genomes are collected from multiple angles (front, back, sides, top, bottom) to form a complete catalog of all VDNA for the insured object.
Factory inspection
Wall-mounted cameras in a factory are initially used to collect a set of VDNA for all objects that should be present in the factory (factory baseline configuration) prior to starting up machinery and allowing workers to enter. Regions of VDNA can be defined into a strand to define “NO ACCESS AREAS” as well. At intervals, subsequent VDNA inventories are taken from the wall-mounted cameras to make sure the factory configuration is correct. Also, NO ACCESS AREAS can be monitored and flagged.
Target tracking
A baseline set of VDNA can be composed into strands to describe a target. At intervals, the target can be located using the VDNA.
GIS learning systems
Current GIS systems map the surface of the earth, so applying VDNA and visual genome processing steps to the GIS data allows for GIS learning, flagging movements, changes and new items over time.
Background Trends in Synthetic Intelligence
Background Trends in Synthetic Intelligence To put the visual genome project in perspective, here we discuss scientific trends and projects where collaboration and widespread support are employed to solve grand challenges to advance science. An exponential increase in scientific knowledge and information of all types is in progress now, enabled by the exponential increases in data collection, computer memory data storage capacity, compute processing power, global communications network bandwidth, and an increasing number of researchers and engineers across the globe mining the data. Data is being continuously collected and stored in vast quantities in personal, corporate, and government records. In the days when scientific knowledge was stored in libraries and books and in the minds of a few scientists, knowledge was a key advantage, but not anymore. Knowledge alone is no longer much of an advantage in a new endeavor, since barriers to knowledge are falling via search engines and other commercial knowledge sources so that knowledge is a commodity. Raw knowledge alone, even vast quantities, is not an advantage until it is developed into actionable intelligence. Rather the advantage belongs to those who can analyze the vast amounts of knowledge and reach strategic and tactical conclusions, enabled by the exponentially increasing compute, memory, and networking power. The trend is to automate the analysis of ever increasing amounts of data using artificial intelligence and statistical methods. With the proliferation of image sensors, visual information is one of the key sources of data—perhaps the largest source of data given the size of images and videos. The term big data is one of the current buzz-words in play describing this opportunity. Scientific advancement, in many cases, is now based on free knowledge, rather than proprietary and hidden knowledge. In fact, major corporations often develop technology and then give it away as open source code, or license hardware/software (HW/SW) for a very low cost, to dominate and gain control over markets. The free and low cost items are the new trend forming building blocks for scientific advancement. For example, free open source software and CPUs for pennies. Computer vision is following this trend also. Governments are taking great interest in collaborative big-data science projects to mine raw information and turn it into actionable intelligence. For visual information, there is still much more that can be done to enable scientific progress and collaborative big-data analysis; thus the opportunity for large-scale collaborative visual information research such as the visual genome project. Here we quickly look at a few examples of collaborative scientific analysis projects, which inspire the visual genome project. The Human Genome Project managed by the National Human Genome Research Institute (https://www.genome.gov/) aims to collect and analyze the entire set of human DNA composing the human genome. The sequenced genomic data is shared in a standard data format, enabling collaborative research and scientific research. The
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA initial genome sequencing and data collection was completed between 1990 and 2003 via international collaborative work, yielding a complete set of human DNA for research and analysis. Other genomics research projects are creating equally vast data sets for future analysis. Yet comparatively little is actually known about the human genome, aside from sparse correlations between a few specific DNA regions and a few specific genetic characteristics. Genomics science is currently analyzing the genome information for scientific and commercial purposes, such as drug research and predictive diagnosis of physical, mental, and behavioral traits present in the DNA, which can inform medical diagnosis and gene editing. For example, Figure 1.8 shows a small part of a human genome sequence, which may or may not contain any cipherable genetic information. In other words, the field of genomics and DNA analysis has only just begun and will continue for hundreds, if not thousands of years, since the genome is so complex, and the analysis is perhaps exponentially more complex. Automating genomic analysis of the genomic sequence database via compute-intensive AI methods is currently in vogue.
Figure 1.8: A small section of a human genome sequence, which must be analyzed to determine the DNA-encoded genetic expressions and trait associations. Image courtesy of U.S. Department of Energy Genomic Science program, http://genomicscience.energy.gov.
Of the roughly 3 million base pairs of DNA in the human genome, so far, the best genomic analysis has uncovered only scant correlations between select diseases, physical and mental traits. The permutations, combinations, and genetic expressions of DNA are vast and still expanding in variety from our common human root parent “Mitochondrial Eve” as discussed by Cann et al. in the seminal research paper [2] “Mitochondrial DNA and Human Evolution.” Commercial DNA analysis tests which trace personal DNA trees corroborate the common root of all mankind, illustrating the vast
Background Trends in Synthetic Intelligence
variation of DNA expression within the single human genome, and the opportunity for research and development. Another collaborative science example is the US government’s (USG) National Institutes of Health’s (NIH) Human Connectome Project [145], which aims to create maps (i.e. connectomes) of the connection pathways in the brain using a variety of imaging modalities that include MRI and fMRI to measure blood flow revealing relationships between neural regions (see Figure 1.9). Connectomes enable neuroscientists to map and understand brain anatomy for specific stimuli; for example, by taking a connectome signature while a subject is viewing, hearing, thinking, or performing a specific action. The connectome signatures may assist in diagnoses and possibly even treatment of neurological problems or perhaps pose as very basic forms of mind reading such as lie detection. The Human Connectome Project is funded by the USG and collaborating private academic research institutes. Paul Allen’s privately funded Allen Brain Atlas project (http://www.brain-map.org/) provides functional mapping of neurological regions to compliment connectome maps. Also, various well-funded BRAIN initiatives have begun across governments in the USA, Europe, and world-wide to invest heavily in the grand challenges of neuroscience.
Figure 1.9: A connectome image of a human brain [145], revealing real-time imaged brain communications pathways. Image courtesy of the USC Laboratory of Neuro Imaging and Athinoula A. Martinos Center for Biomedical Imaging, Consortium of the Human Connectome Project, www.humanconnectomeproject.org.
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA Visual information is one of the most demanding areas for data analysis, given the memory storage and processing requirements, which is inspiring much research and industrial investment in various computer vision and artificial intelligence methods. The trend is to automate the analysis of the raw image information using computer vision and artificial intelligence methods, such as DNNs, recurrent neural networks (RNNs), and other AI models [1]. The visual genome project is timely and well suited to increase visual scientific knowledge.
Background Visual Pathway Neuroscience The major components of the synthetic visual pathway model are inspired and based on key neuroscience research referenced as we go along. For more background, see the author’s prior work [1], especially the bibliography in Appendix C [1] for a list of neuroscience-related research journals and [1] Chapter 9 for deeper background on neuroscience topics that inform and inspire computer vision and deep learning. Also, see the standard texts from Brodal [14] and Kandel et al. [12], and the neuroscience summary by Behnke [13] from a computer vision perspective.
Feature and Concept Memory Locality Neuroscience research shows that the visual pathway stores related concepts in contiguous memory regions [131][132], suggesting a view-based model [133] for vision. Under the view-based model, new memory records, rather than invariant features, are created to store variations of similar items for a concept. Related concepts are stored in a local region of memory proximate to similar objects. The mechanism for creating new memory features is likely based on an unknown learning motivation or bias, as directed by higher layers of reasoning in the visual pathway. Conversely, the stored memories do not appear to be individually invariant, but rather the invariance is built up conceptually by collecting multiple scene views together with geometric or lighting variations. Brain mapping research supports the view-based model hypothesis. Research using functional MRI scans (fMRI) shows that brain mapping can be applied to forensics by mapping the brain regions that are activated while viewing or remembering visual concepts, as reported by Lengleben et al. [134]. In fact, Nature has reported that limited mind reading is possible [131][132][135] using brain mapping via MRI-type imaging modalities, showing specific regions of the cerebral cortex that are electrically activated while viewing a certain subject, evaluating a certain conceptual hypothesis, or responding to verbal questions. (Of related interest, according to some researchers, brain mapping reveals cognitive patterns that can be interpreted to reveal raw intelligence levels. Brain mapping also has been used to record human
Background Visual Pathway Neuroscience
cognitive EMI fingerprints which can be remotely sensed and are currently fashionable within military and government security circles.) New memory impressions will remain in short-term memory for evaluation of a given hypothesis and may be subsequently forgotten unless classified and committed to long-term memory by the higher-level reasoning portions of the visual pathway. The higher-level portions of the visual pathway consciously direct classification using a set of hypotheses against either incoming data in short-term memory or to reclassify long-term memory. The higher-level portions of the visual pathway are controlled perhaps independently of the biology by higher-level consciousness of the soul. The eye and retina may be directed by the higher-level reasoning centers to adjust the contrast and focus of the incoming regions, to evaluate a hypothesis. In summary, the VGM model is based on local feature metrics and memory impressions, stored local to each V1–Vn processing center. The VGM model includes dedicated visual processing centers along the visual pathway, such as the visual cortex specialized feature processing centers. The VGM also supports memory to store agents and higher-level intelligence in the prefontal cortext (PFC) (see Figure 4.1).
Attentional Neural Memory Research Baddely [149] and others have shown that the human learning and reasoning processes typically keep several concepts at attention simultaneously at the request of the central executive, which is directing the reasoning task at hand. VGM models the central executive as a set of proxy agents containing specific learned intelligence, as discussed in Chapter 4. The central executive concept assumes that inputs may come in at different times; thus several concepts need to be at attention at a given time. Research suggest that perhaps up to seven concepts can be held at attention by the human brain at once; thus Bell Labs initially created phone numbers using seven digits. Selected concepts are kept at attention in a working memory or short-term memory (i.e. attention memory, or concept memory), as opposed to a long-term memory from the past that is not relevant to the current task. As shown by GoldmanRakic [152] the attention memory or concepts may be accessed at different rates—for example, checked constantly, or not at all—during delay periods while the central executive is pursuing the task at hand and accessing other parts of memory. The short-term memory will respond to various cues and loosely resembles the familiar associative memory or CAM used for caching in some CPUs. The VGM contains a feature model similar to a CAM model and allows the central executive via agents to determine feature correspondence on demand. Internally, VGM does not distinguish short-/long-term memory or limit short-term memory size or freshness, so agents are free to create distinctions.
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA HMAX and Visual Cortex Models The HMAX model (hierarchical model of vision) (see [1] Chapter 10) is designed after the visual cortex, which clearly shows a hierarchy of concepts. HMAX uses hardwired features for the lower levels such as Gabor or Gaussian functions, which resemble the oriented edge response of neurons observed in the early stages of the visual pathway as reported by Tanaka [137], Logoethtis [150, 151], and others. Logothetis found that some groups of neurons along the hierarchy respond to specific shapes similar to Gabor-like basis functions at the low levels, and object-level concepts such as faces in higher levels. HMAX builds higher-level concepts on the lower level features, following research showing that higher levels of the visual pathway (i.e. the IT) are receptive to highly view-specific patterns such as faces, shapes, and complex objects, see Perrett [138][139] and Tanaka [140]. In fact, clustered regions of the visual pathway IT region are reported by Tanaka [137] to respond to similar clusters of objects, suggesting that neurons grow and connect to create semantically associated view-specific feature representations as needed for view-based discrimination. HMAX provides a viewpoint-independent model that is invariant to scale and translation, leveraging a MAX pooling operator over scale and translation for all inputs. The pooling units feed the higher-level S2, C2, and VTU units, resembling lateral inhibition, which has been observed between competing neurons, allowing the strongest activation to shut down competing lower strength activations. Lateral inhibition may also be accentuated via feedback from the visual cortex to the LGN, causing LGN image enhancements to accentuate edge-like structures to feed into V1–V4 (see Brody [17]). HMAX also allows for sharing of low-level features and interpolations between them as they are combined into higher-level viewpoint-specific features.
Virtually Unlimited Feature Memory The brain contains perhaps 100 billion neurons or 100 giga neurons (GN), (estimates vary), and each neuron is connected to perhaps 10,000 other neurons on average (estimates vary), yielding over 100 trillion connections [141] compared to the estimated 200–400 billion stars in the Milky Way galaxy. Apparently, there are plenty of neurons to store information in the human brain, so the VGM takes the assumption that there is no need to reduce the size of the memory or feature set and supports virtually unlimited feature memory. Incidentally for unknown reasons, the brain apparently only uses a portion of the available neurons, estimates range from 10%–25% (10GN– 25GN). Perhaps with longer life spans of perhaps 1,000 years, most or all of the neurons could be activated into use. VGM feature memory is represented in a quantization space where the bit resolution of the features is adjusted to expand or reduce precision, which is useful for practical implementations. Lower quantization of the pixel bits is useful for a quick-scan,
Background Visual Pathway Neuroscience
and higher quantization is useful for detailed analysis. The LGN is likely able to provide images in a quantization space (see Chapter 2). In effect, the size of the virtual memory for all neurons is controlled by the numeric precision of the pixels. Visual genomes represent features at variable resolution to produce either coarse or fine results in a quantization space (discussed in Chapter 2 and Chapter 6).
Genetic Preexisting Memory More and more research shows that DNA may contain memory impressions or genetic memory such as instincts and character traits (see [142], many more references can be cited). Other research shows that DNA can be modified via memory impressions [136] that are passed on to subsequent generations via the DNA. Neuroscience suggests that some visual features and learnings are preexisting in the neurocortex at birth; for example, memories and other learnings from ancestors may be imprinted into the DNA, along with other behaviors pre-wired in the basic human genome, designed into the DNA, and not learned at all. It is well known that DNA can be modified by experiences, for better or worse, and passed to descendants by inheritance. So, the DNN training notion of feature learning by initializing weights to random values and averaging the response over training samples is a primitive approximation at best and a rabbit trail following the evolutionary assumptions of time + chance = improvement. In other words, we observe that visible features and visual cortex processing are both recorded and created by genetic design, and not generated by random processes. The VGM model allows for preexisting memory to be emulated using transfer learning to initialize the VGM memory space, which can be subsequently improved by recording new impressions from a training set or visual observation on top of the transferred features. The VGM allows for visual memory to be created and never erased and grow without limit. Specifically, some of the higher-level magno, strand, and bundle features can be initialized to primal basis sets—for example, shapes or patterns—to simulate inherited genetic primal shape features or to provide experience-based learning.
Neurogenesis, Neuron Size, and Connectivity As reported by Bergami et al. [143][144] as well as many other researchers, the process of neurogenesis (i.e. neural growth) is regulated by experience. Changes to existing neural size and connectivity, as well as entirely new neuron growth, take place in reaction to real or perceived experiences. As a result, there is no fixed neural architecture for low-level features; rather the architecture grows. Even identical twins (i.e.
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA DNA clones) develop different neurobiological structures based on experience, leading to different behavior and outlook. Various high-level structures have been identified within the visual pathway, as revealed by brain mapping [131][132], such as conceptual reasoning centers and highlevel communications pathways [145] (see also [1] Figure 9.1 and Figure 9.11.) Neurogenesis occurs in a controlled manner within each structural region. Neurogenesis includes both growth and shrinkage, and both neurons and dendrites have been observed to grow significantly in size in short bursts, as well as shrink over time. Neural size and connectivity seem to represent memory freshness and forgetting, so perhaps forgetting may be biologically expressed as neuronal shrinkage accompanied by disappearing dendrite connections. Neurogenesis is reported by Lee et al. [144] to occur throughout the lifetime of adults, and especially during the early formative years. To represent neurogenesis in parvo texture T base structures, VGM represents neural size and connectivity by the number of times a feature impression is detected in volumetric projection metrics, as discussed in Chapter 6, which can be interpreted as (a) a new neuron for each single impression or (b) a larger neuron for multiple impression counts (it is not clear from neuroscience if either a OR b, or both a AND b, are true). Therefore, as discussed in Chapter 9, neurogenesis is reflected in terms of the size and connectivity of each neuron in VGM for T texture bases and can also be represented for other bases as well by agents as needed.
Bias for Learning New Memory Impressions Neuroscience suggests that the brain creates new memory impressions of important items under the view-based theories surveyed in the HMAX section in [1] Chapter 10, rather than averaging and dithering visual impressions together as in DNN backpropagation training. Many computer vision feature models are based on the notion that features should be designed to be invariant to specific robustness criteria, such as scale, rotation, occlusion, and other factors discussed in Chapter 5 of [1], which may be an artificial notion only partially expressed in the neurobiology of vision. Although bias is assumed during learning, VGM does not model a bias factor, but weights are provided for some distance functions for similar effect. Many artificial neural models include a bias factor for matrix method convenience, but usually the bias is ignored or fixed. Bias can account for the observation that people often see what they believe, rather than believing what they see, and therefore bias seems problematic to model.
Synthetic Vision Pathway Architecture
Synthetic Vision Pathway Architecture As discussed above in the “Visual Pathway Neuroscience” section, we follow the view-based learning theories in developing the synthetic vision pathway model, where the visual cortex is assumed to record what it sees photographically and store related concepts together, rather than storing prototype compressed master features such as local feature descriptors and DNN models.
Figure 1.10: The synthetic visual pathway mode overlaid on a common model of the human visual pathway (LGN, V1-V4, PIT, AIT, PFC, PMC, MC), after Benke [13].
Note that Figure 1.10 provides rough estimates on the processing time used in each portion of the visual pathway (see Behnke [13]). It is interesting to note that the total hypothesis testing time can be measured round trip in the visual pathway to less than a second. Since the normal reasoning process may include testing of a set of hypotheses (is it a bird, or is it a plane?), the synthetic learning model uses agents to implement hypothesis testing in serial or parallel, as discussed in Chapters 4 and 11. As shown in Figure 1.10, the synthetic vision model encompasses four main areas summarized next.
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA Eye/LGN Model For the eye model, we follow the standard reference text on human vision by Williamson and Cummins [16]. The basic capabilities of the eye are included in the model, based on the basic anatomy of the human eye such as rods, cones, magno and parvo cells, focus, sharpness, color, motion tracking, and saccadic dithering for fine detail. Image pre-processing which the eye cannot perform, such as geometric transforms (scale, rotation, warp, perspective), are not included in the model. However, note that computer vision training protocols often use (with some success) geometric transforms which are not biologically plausible, such as scaling, rotations, and many other operations. The stereo depth vision pathway is not supported in the current VGM, but is planned for a future version. Note that stereo depth processing is most acute for near-vision analysis up to about 20 feet, and stereo vision becomes virtually impossible as distance increases, due to the small baseline distance between the left and right eye. However, the human visual system uses other visual cues for distance processing beyond 20 feet (see 3D processing in [1] Chapter 1 for background on 3D and stereo processing). From the engineering design perspective, we rely on today’s excellent commodity camera engineering embodied in imaging sensors and digital camera systems. See [1] for more details on image sensors and camera systems. Commodity cameras take care of nearly all of the major problems with modeling the eye, including image assembly and processing. Relying on today’s commodity cameras eliminates the complexity of creating a good eye model, including a realistic LGN model which we assume to be mostly involved in assembling and pre-processing RGB color images from the optics. See Chapter 2 for a high-level introduction to the eye and the LGN highlighting the anatomical inspiration for the synthetic model, as well as model design details and API.
VDNA Synthetic Neurobiological Machinery Within the VGM model, each VDNA can be viewed as a synthetic neurobiological machine representing a visual impression or feature, with dedicated synthetic neurobiological machinery growing and learning from the impression, analogous to the neurological machinery used to store impressions in neurons and grow dendrite connections. In a similar way, VDNA are bound to synthetic neural machinery, as discussed throughout this work and illustrated in Figure 1.11.
Synthetic Vision Pathway Architecture
Figure 1.11: The VDNA synthetic neurobiological machinery in the VGM. Each metric includes a memory impression, metric function, distance functions, and an autolearning hull function to direct activation or firing on correspondence.
Shown in Figure 1.11 is a diagram of the low-level synthetic neurobiological machinery in the VGM for each memory impression. Note that the functions implementing this model are discussed throughout this work. Basic model concepts include the following: 𝑣𝑣 → 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑣𝑣 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑝𝑝𝑝𝑝𝑝𝑝ℎ𝑤𝑤𝑤𝑤𝑤𝑤 𝐿𝐿𝐿𝐿𝐿𝐿
𝛼𝛼𝑠𝑠 = 𝑚𝑚𝑠𝑠 (𝑣𝑣) → 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑣𝑣 𝑖𝑖𝑖𝑖 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 [𝑠𝑠. . 𝑠𝑠 ′ ]
𝑎𝑎𝑠𝑠 = ℎ𝑠𝑠 (𝛼𝛼𝑠𝑠 ) → 𝑎𝑎𝑎𝑎𝑡𝑡𝑡𝑡 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 ℎ𝑢𝑢𝑢𝑢𝑢𝑢 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑜𝑜𝑜𝑜 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝛼𝛼 𝑖𝑖𝑖𝑖 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 [𝑠𝑠. . 𝑠𝑠 ′ ]
𝑑𝑑𝑠𝑠 �𝛼𝛼 𝑟𝑟𝑠𝑠 , 𝛼𝛼 𝑟𝑟𝑠𝑠 , 𝛼𝛼 𝑡𝑡𝑠𝑠 � → 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑎𝑎𝑎𝑎𝑎𝑎 𝑡𝑡𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑡𝑡𝑡𝑡 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 ℎ𝑢𝑢𝑢𝑢𝑢𝑢
Note that each type of CSTG metric will be represented differently, for example, popularity color metrics may be computed into metric arrays of integers, and a volume shape centroid will be computed as an x, y, z triple stored as three floating point numbers. Likewise, the autolearning hull for each metric will be computed by an appropriate hull range functions, and each metric will be compared using an appropriate distance function according to the autolearning hull or heuristic measure used. For more details, see Chapter 4 in the section “VGM Classifier Learning” and the section
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA “AutoLearning Hull Threshold Learning,” and Chapter 5 which summarizes VGM metrics functions.
Memory Model The memory model includes visual memory coupled with a visual cortex model. As shown in Figure 1.11, the synthetic memory contains VDNA features local to associated feature metric processing centers, following neuroscience research showing localized memory and processing centers (see [131][132]). We assume that the V1–V4 portions of the visual cortex control the visual memory and manage the visual memory as an associative memory—a photographic permanent memory. All visual information is retained in the visual memory: “the visual information is the feature,” nothing is compressed to lose information, and difference metrics are computed as needed for hypothesis evaluations and correspondence in the V1–V4 visual processing centers. Many computer vision models assume that feature compression, or model compression, is desirable or even necessary for computability. For example, DNNs rely on compressing information into a finite-sized set of weights to represent a training set, and some argue that the DNN model compression is the foundation of the success of the DNN model. Furthermore, DNNs typically resize all input images to a uniform size such as 300x300 pixels to feed the input pipeline for computability reasons, to make the training time shorter, and also to make the features in the images more uniform in size. NOTE: the resizing eliminates fine detail in the images when downsizing from 12MP images to 300x300 images for example. However, the synthetic memory model does not compress visual information and uses 8-5-4-3-2 bit color pixel information supporting a quantization space, with floats and integers numbers used to represent various metrics. For example, 5-bit color is adequate for color space reductions for popularity algorithms, and integers and floats are used for various metrics as appropriate. In summary, the VGM visual memory model is a hybrid of pixel impressions, and float and integer metrics computed on the fly as memory is accessed and analyzed. More details on the memory model and VDNA are provided in Chapter 3. Currently, no one really knows just how powerful a single neuron is, see Figure 1.12 for a guess at compute capacity; then imagine several billion neurons learning, growing, and connecting in parallel.
Synthetic Vision Pathway Architecture
Figure 1.12: Illustrating a guess at the processing capabilities of a typically sized neuron, easily room for several million 3D stacked 5nm transistors, sufficient for a basic CPU, interconnects, I/O, 1MB memory.
Learning Centers and Reasoning Agent Models The higher-level reasoning centers are modeled as agents, which learn and evaluate visual features, residing outside the visual cortex in the PFC, PMC, and MC as shown in Figure 1.1. The agents are self-contained models of a specific knowledge domain, similar to downloads in the movie The Matrix. Each agent is an expert on some topic, and each has been specially designed and trained for a purpose. The agent abstraction allows for continuous learning and refinement to encompass more information over time, or in a reinforcement learning style given specific goals. Agents may run independently for data mining and exploratory model development purposes, or as voting committees. For example, an exploratory agent may associatively catalog relationships in new data without looking for a specific object—for example, identifying strands of VDNA with similar color, texture, shape, or glyphs (i.e. agents may define associative visual memory structures). Exploratory agents form the foundation of continual learning and associative learning and catalog learning, where knowledge associations are built up over time. Except for GANs and similar refinement learning methods, DNNs do not continually learn: the DNN learns in a one-shot learning process to develop a compressed model. However, the VDNA model is intended grow along associative dimensions. In summary, the VGM learning agents support continual learning and associative learning and preserve all training data within the model at full-resolution, while other
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA computer vision methods support a simple one-shot learning paradigm and then ignore all the training data going forward. See Table 1.2 for a comparison of the VGM learning model to DNN models. Table 1.2: Basic synthetic visual pathway model characteristics compared to DNN models Synthetic Visual Pathway
DNN
Full resolution pixel data
Yes—all pixels are preserved and used in the model
No—the pixel data is rescaled to fit a common input pipeline size, such as x pixels
Spoofing defenses
Yes—the multivariate feature model ensures multiple robustness and invariance criteria must be met for correspondence
No*—DNN spoofing vulnerabilities are well documented and demonstrated in the literature *This is an active area of DNN research
Multivariate multidimensional features
Yes—over , dimensions supported as variants of the bases CSTG
No—DNNs are a one-dimensional orderless vector of edge-like gradient features
Associative learning, associative memory
Yes—associations across over , feature dimensions are supported; all learning models can be associated together over the same training data
No—each DNN model is a unique one-dimensional orderless gradient vector of weights
Continual learning refinement
Yes—all full-resolution training data is preserved in visual memory for continual agent learning
No—Each DNN model is unique, but transfer learning can be used to train new models based on prior models; training data is not in the model
Model association model comparison
Yes—agents can compare models together as part of the associative memory model
No—each model is a compressed representation of one set of training data
Image reconstruction Yes—all pixels are preserved and used in the model; full reconstruction is possible
No—training data is discarded, but a facsimile reconstruction of the entire training set into a single image is possible
Deep Learning vs. Volume Learning It is instructive to compare standard deep learning models to volume learning in our synthetic vision model, since the training models and feature sets are vastly different. We provide a discussion here to help the reader ground the concepts, summarized in Table 1.2 and Table 1.3. At a simplistic level, DNNs implement a laborious one-shot learning model resulting in a one-dimensional set of weights compressed from the average of the
Synthetic Vision Pathway Architecture
training data, while volume learning creates a photographic, multivariate, and multidimensional continuous learning model, as compared in Table 1.3. The learned weights are grouped into final 1D weight vectors as classification layers (i.e. fully connected layers) containing several thousand weights (see [1] Chapter 9 for more background). No spatial relationships are known between weights. However, note that Reinforcement Learning and GANs as discussed earlier go farther and retrain the weights. Volume learning identifies a multidimensional multivariate volume of over 16,000 metrics covering four separate bases—Color, Shape, Texture, and Glyph—to support multivariate analysis of VDNA features. A complete VDNA feature volume is collected from overlapping regions of each training image. All VDNA feature metrics are saved, none are compressed or discarded. Volume learning is multivariate, multidimensional, and volumetric, rather than a one-dimensional monovariate gradient scale hierarchy like DNNs. And agents can continually learn over time from the volume of features. Deep learning is 1D hierarchical (or deep in DNN parlance), and monovariate— meaning that only a single type of feature is used: a gradient-tuned weight in a hierarchy. The idea of deep networks is successive refinement of features over a scale range of low-level pixel features, and abstracting the scale higher to mid- and highlevel features. DNN features are commonly built using 3x3 or 5x5 correlation template weight masks and finally assembled into fully connected unordered network layers for correspondence with no spatial relationships between the features in the model (see [1] Chapters 9 and 10). The DNN’s nxn features represent edges and gradients, following the Hubel and Weiss edge sensitive models of neurons, and usually the DNN models contain separate feature weights for each RGB color. For comparison, some computer vision systems may first employ an interest point detector to locate and anchor candidate features in an image, followed by running a feature descriptor like SIFT for each candidate interest point. Deep learning uses nxn weight templates similar to correlation templates as features and scans the entire image looking for each feature with no concern for spatial position, while VDNA uses a large volume of different feature types and leverages spatial information between features. Volume learning is based on a vision file format or common platform containing large volume of VDNA metrics. The VDNA are sequenced from the image data into metrics stored in visual memory, and any number of learning agents can perform analysis on the VDNA to identify visual genomes. Learning time is minimal. Training DNNs commonly involves compute-intensive gradient descent algorithms, obscure heuristics to adjust leaning parameters, trial and error, several tens of thousands (even hundreds of thousands) of training samples, and days and weeks of training time on the fastest computers. Note that advancements in DNN training time reduction are being made by eliminating gradient descent (see [165][167]). Also,
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA to reduce training time transfer learning is used to start with a pretrained DNN model and then retraining using a single new image or smaller set of images (see [41]). DNNs learn an average representation of all of the gradients present in the training images. Every single gradient in the training images contributes to the model, even background features unrelated to the object class. The feature weights collected and averaged together in this manner form a very brittle model, prone to over-fitting. DNN feature weights are unordered and contain no spatial awareness or spatial relationships. As discussed earlier, adversarial images can be designed to fool DNNs, and this is a serious security problem with DNNs. We note that VDNA does not share the same type of susceptibility to adversarial images, as discussed in the earlier “VDNA Application Stories” section entitled “Overcoming Adversarial Images with VDNA.” Table 1.3: Comparing volume learning to DNN one-shot learning
Computability
Volume Learning and Visual Genomes
DNN/CNN One-Shot Learning
Computable on standard hardware.
Compute intensive to train, inference is simpler
The volume learning model takes minutes to compute, depending on the image size. For a MP image, inference time is ~% of model learning time or less.
Model computability is a major concern; image size must be limited to small sizes (x for example); large numbers of training images must be available (,–, or more); the DNN network architecture must be limited, or else DNNs are not computable or practical without exotic hardware architectures
Category objective
Generic category recognition supported by inferring between positiveID of known positiveIDs, only a select handful of training images are needed.
Generic category recognition (class recognition) is achieved by training with ,–, or more training samples
PositiveID objective
PositiveID of specific object from a single image region, similar to photographic memory
Possible by training on a large training set of modified images from the original image (morphed, geometric transforms, illumination transforms, etc.)
Training time
Minutes–milliseconds
Weeks , or maybe days with powerful compute
Training images
minimum for positiveID - for category recognition
,–, or more Manually labeled (takes months).
Synthetic Vision Pathway Architecture
Table 1.3 (continued)
Image size
Volume Learning and Visual Genomes
DNN/CNN One-Shot Learning
No limit, typically full-size RGB
x RGB typical (small images) *Severe loss of image detail
*testing in this work uses consumergrade camera resolutions such as DNNs use small images due to: MP = x RGB ) Compute load, memory, IO ) Because the available training images are mixed resolution, so must be rescaled to a lowest common denominator size to fit the pipeline ) To keep the images at a uniform scale and size, since x=MB training images would not train the same as x images Region shapes containing features
No restriction, any polygon shape
nxn basic correlation template patterns
Learning protocol
Select an object region from image
Feed in thousands of training images; possibly enhanced images from the original images
Learn in seconds; record multivariate metrics in synthetic photographic memory Agent-based learning; objectives may vary
Deterministic model Yes
Feature approximation
Learning takes billions and trillions of iterations via back-propagation using ad-hoc learning parameter settings for gradient descent yielding a compressed; average set of weights representing the training images No
The system learns and remembers exactly what is presented using the neurological theories of view-based learning models
Back-propagation produces different learned feature models based on initial feature weights, training image order, training image pre-processing, DNN architecture, and training parameters. There is no absolute DNN model.
No
Yes
Visual memory stores exact visual impressions of individual features
Feature weights are a compressed representation of similar patterns in all training features
Chapter 1: Synthetic Vision Using Volume Learning and Visual DNA Table 1.3 (continued)
Feature model
Volume Learning and Visual Genomes
DNN/CNN One-Shot Learning
Volume feature set of over , metrics
Square x, x, nxn weight matrices sometimes combined into D linear arrays (i.e. fully connected layers; see [] Chapter )
Multivariate features include color, shape, texture, glyph, and spatial/topological features. Recognition
Multivariate with structured classifi- D vector Inference ers An inference is made by comparing a Using the volume feature set, multi- reference and target d vector of variate composition and correspond- weights, using a selected statistical ence is used to define and locate ob- classifier jects with several classifiers
Spatial relationships Yes Topology is learned for strands of visual features
No Each feature weight in a fully connected classification layer is dimensionless and independent of other features
Invariance attributes Some invariance attributes are built into the model; each metric offers different invariance attributes, including scale, rotation, occlusion, illumination, color, mirroring; agents can emphasize specific invariance attributes
Limited, depends entirely on quality of training data and image pre-processing applied to test images, but invariance is weak
Eye/LGN model
No
Yes
A plausible biological model of the Ad-hoc training protocols use various eye/LGN is used for retinal assembly image enhancements and recording into visual memory Visual memory model
Yes
No
View-based memory model creates synthetic neuron clusters of metrics and data as photographic visual impressions
Ad-hoc weight matrices, ad-hoc layer architecture, and lossy compressed feature weights
Feature pruning
Yes
Yes
Any features not used by agents may Features space can be pruned or rebe pruned out of the model, reducduced perhaps x (see [] “Deep ing the memory footprint Compression”) with no apparent change in model accuracy
Summary
Table 1.3 (continued) Volume Learning and Visual Genomes Continuous learning Yes
DNN/CNN One-Shot Learning No
Agents may learn in a discovery mode, and discover new features and feature relationships
Once the features are learned (i.e. back propagation errors are sufficiently low based on some heuristic), features are frozen, but the features can be used as Learning agents are shareable, apthe basis for retraining (i.e. transfer plication-specific visual learning ma- learning; see [] Chapter ) chines, which can run in parallel or ensembles
Summary This chapter introduces the visual genome model (VGM) as the basis for a visual genome project, analogous to the Human Genome Project funded by USG agencies such as NIH. We discuss visual genomes and visual DNA (VDNA) used for volume learning of multivariate metrics, which are analogous to human DNA and genes, which are cataloged into VDNA using image feature sequencing methods analogous to human DNA sequencing methods. We lay the foundation for a visual genome project to sequence VDNA and visual genomes on a massive scale, to enable research collaboration and move synthetic vision science forward. Learning agents are introduced, providing novel tools for building and learning complex, structured classifiers. Background neuroscience research is surveyed which motivates and informs the VGM. A complete first version synthetic visual pathway model architecture is introduced, with model details provided in subsequent chapters. The synthetic model is compared and contrasted against DNNs and other computer vision methods, showing the advantages of the VGM.
Chapter 2 Eye/LGN Model The eye is not satisfied with seeing.
—Solomon
Overview In this chapter we model a synthetic eye and a synthetic lateral geniculate nucleus (LGN), following standard eye anatomy [16][1] and basic neuroscience covered in [1] Chapters 9 and 10. First, we develop the eye model emphasizing key eye anatomy features, along with types of image processing operations that are biologically plausible. Next, we discuss the LGN and its primary functions, such as assembling optical information from the eyes into images and plausible LGN image processing enhancements performed in response to feedback from the visual cortex. We posit that the LGN performs image assembly and local region segmentations, which we implement in the VGM.
Eye Anatomy Model As shown in Figure 2.1, a basic eye model provides variable focus, blur, sharpness, lightness, and color corrections. The eye model does not include geometric or scale transforms. Fortunately, digital cameras are engineered very well and solve most of the eye modeling problems for us. Standard image processing operations can be used for other parts of the eye model. The major parts used in the synthetic eye model are: – An aperture (pupil) for light control. – A lens for focus. – A retina focal plane to gather RGBI colors into pixel raster images via rods and cones. – Standard image processing and a few additional novel algorithms. These are provided in the API to perform plausible enhancements to color and resolution used for higher-level agent hypothesis testing, as discussed in Chapter 6.
DOI 10.1515/9781501505966-002
Chapter 2: Eye/LGN Model
Figure 2.1: The basic eye anatomy used for the synthetic eye model. Note that the monochromatic light sensitive rods are in the periphery of the retina and never in focus; the red and green cones in the foveal center may be in focus; while the blue cones are few and scattered in with the rods.
Here we describe the anatomical features in Figure 2.1. Iris—The iris muscle is the colored part of the eye and ranges in color from blue to brown to green and rarer colors. The iris controls the aperture (pupil) or amount of light entering the eye. By squinting the eye to different degrees, a set of images can be observed at differing light levels, similar to the high dynamic range (HDR) methods used to create image sets in digital cameras. Lens—The lens muscle controls the focal plane of the eye, along with the iris muscle, allowing for a set of images to be observed at different focal planes, similar to the all-in-focus methods used by digital cameras. The lens can be used to change the depth of field and focal plane. Retina—The retina is the optical receptor plane of the eye, collecting light passing through the iris and lens. The retina contains an arrangement of optical cells (magno, parvo, konio) containing rods and cones for gross luminance and red, green, blue (RGB) color detection. The retina encodes the color information into ganglion cells which feed the left/right optical nerves for transport into the left/right LGN processing centers. Fovea—The fovea is the region in the center of the retinal focal plane, containing color-sensitive red and green cone cells. The blue cones are in the periphery region of the retina outside the center foveal region. Rods—Rods are more sensitive to low light and fast moving transients than cones and track motion. Rods are sensitive to lightness (L) which overlaps the entire RGB spectrum. Rods are the most common visual receptors. Rods are almost insensitive to red (slowest RGB wavelength light), but highly sensitive to blue (fastest RGB wavelength of light), and note that the blue cones are also located in the periphery of the
Eye Anatomy Model
retina with the rods. The rods are located almost entirely in the periphery of the retina, along with the blue cones, and perform periphery vision. About 120 million rods are located outside the primary foveal center of the retina. Rods are about 10x–15x smaller than cones. Cones—Sensitive to color (RGB). Cones are most sensitive when the eye is focused—when the eye is saccading (dithering) about a focal point. There are approximately 7 million cones: about 64% of which are red cones, 32% green cones, and 2% blue cones. The red and green cones are located almost entirely within the foveal central region. However, the blue cones are located in the peripheral region of the retina along with the rods. The blue cones are highly light sensitive compared to the red and green cones. However, the human visual system somehow performs RGB processing in such a way as to allow for true RGB color perception. Cones are 10x larger than rods. See Figure 2.2.
Figure 2.2: An illustration of (right) the relative, exaggerated size and distribution of rods and cones in the retina according to [16], and (left) number of rods and cones distributed across the retina. Note that rods are about 10x–15x smaller in surface area than cones. The blue cones are not in the foveal center of the retina along with the red and green cones and instead are scattered in the periphery retinal areas with the rods.
Visual Acuity, Attentional Region (AR) We are interested in modeling the retina’s fovea attentional region (AR) of focus size in pixels, which of course varies based on the image subject topology, distance, and
Chapter 2: Eye/LGN Model environmental conditions such as lighting. Since an entire visual acuity model is quite complex, we only seek a simple model here to guide the image segmentation region sizes. For information on visual acuity in the human visual system we are informed by the “Visual Acuity Measurement Standard,” Consilium Ophthalmologicum Universale [22]. For practical purposes, the AR is used for model calibration to determine a useful minimum size for segmented regions containing metrics and feature descriptors. We use the AR to model the eye anatomy of the smallest size region of pixels currently under focus. As shown below for several parvo scale image examples, we work out the size of a typical AR for a few images to be typically ~30x30 pixels, depending on image size and quality, which helps guide parameterization of segmentation and image pre-processing, as well as culling out segmentations that are way too large or small. The highest density of rods and cones is toward the retina’s very center at the fovea. Rod and cone density falls off toward the periphery of the field of view (FOV), as shown previously in Figure 2.2. Visual acuity is a measure of person’s ability to distinguish fine detail at a distance and varies widely among subjects, otherwise referred to as near sightedness and far sightedness. In general, the region of acuity is limited in size to allow for focus on textural details and color, and to define the size of segmented genome regions. The foveal area that is under attention in the center of the FOV subtends an angle of perhaps 0.02–0.03 degrees in the oval retinal field of view, which is a small percentage of the entire FOV. The magno cells take in the rest of the image field of view with less detail. For our simplified model, we define the foveal region in a rectangular image space of perhaps 0.01% of the total FOV in a rectangular image. The rest of the image is not in best focus, under view of mainly rod cells, and not viewed under the highest detail cone cells. We call the center of the FOV under focus the attentional region (AR). To simulate the AR in our simplified visual model, we take the total parvo image size and then compute an AR value to be about 0.01% of the total image, and use the AR for guidance when segmenting the image to identify the minimum segment size segmented regions to cull. For parvo scale images: 2592 ∗ 1936 = 5,018,112 ∗ 0.01%AR = 501, √501 = ~23 x 23 pixel region
3264 ∗ 2448 = 7,990,272 ∗ 0.01%AR = 799, √799 = ~28 x 28 pixel region
4032 ∗ 3024 = 12,192,768 ∗ 0.01%AR = 1211, √1211 = ~35 x 35 pixel region
Note that if segmentations are carried out in magno space, the minimum area sizes should be considered as 5x smaller than the corresponding parvo scale ARs shown above, so magno segmentations are scaled up 5x to match parvo scale prior to using the segmented regions on the parvo image to compute metrics. In other words, the
LGN Model
magno segmentations are the borders; the parvo image pixels fill in the borders for the segmented region. However, segmentations can be computed directly on the parvo scale image, and then a magno scan can be used together with a parvo scan in the model. Since no single segmentation is best for all images, a magno and parvo segmentation used together is a reasonable baseline. However, currently we prefer only parvo segmentations for metrics computations since color information and full resolution detail is included. In practice, we have found interesting features in regions smaller than 0.01% of the total image pixels and have seen useful features in AR regions as small as 0.005%–0.02%. However, this is purely an empirical exercise. In future work, each segmented region will be resegmented again to reveal embedded higher-resolution segments. Also note that the shape of the regions varies depending on the segmentation algorithm and parameters used, so area alone is not always helpful—for example, the bounding box size and other shape factors can be used to guess at shape. We therefore have empirically arrived at a minimum useful parvo segment size parameter for the AR which varies between 0.01% and 0.015% of the total pixels in the image, depending of course on the image content. Note that our default region size was determined empirically using a range of images, pre-processing methods, and segmentation methods. However, our empirical AR value matches fairly well with our mathematical model of the retina and fovea regions. At about 30cm from the eye, the best human visual acuity of 6/6 can resolve about 300 DPI (pixels per inch), which is about 0.05% of the available pixels in a 5MP image at 30cm, and at 1 meter, the value becomes an area with a size of about 0.005% of the total pixels, which approximates our default foveal region size chosen and tested with empirical methods. The human eye also dithers and saccades rapidly between adjacent regions to increase the level of detail and test hypotheses such as color, texture, and shape. So, several segments are kept under attention simultaneously in order to arrive at a synergistic or summary classification of a collection of segments.
LGN Model For the synthetic model, we model the LGN (lateral geniculate nucleus) to perform the following functions: – Camera: The LGN acts as a digital camera to assemble rods and cones into raster images. – Image assembly and representation: The LGN assembles images from its camera to feed into the magno and parvo pathways which feed the visual cortex, including stereo image assembly and color image assembly. – Enhancements: The LGN performs high-level image processing and enhancements as directed during hypothesis testing, such as contrast corrections and other items discussed below.
Chapter 2: Eye/LGN Model –
Segmentations: We model segmentation in the LGN for convenience and posit that the LGN performs image processing related to segmentation of the image into regions based on a desired similarity criteria, such as color or texture, in cooperation with the visual cortex.
Next we will discuss each LGN function in more detail.
LGN Image Assembly The LGN basically assembles optical information together from the eyes into RGBI images: RGB comes from the cones, and I from the rods. The LGN evidently performs some interesting image processing such as superresolution via saccadic dithering, high dynamic range imaging (HDR), and all-in-focus images. Our purposes do not demand a very detailed synthetic model implementation for the LGN camera section, since by the time we have pixels from a digital camera, we are downstream in the visual pathway beyond the LGN. In other words, for our purposes a digital camera models most the eye and LGN capabilities very well. The LGN aggregates the parallel optical nerve pathways from the eye’s magno neuron cells (monochrome lightness rods I), parvo neuron cells (red and green cones R, G), and konio neuron cells (blue cones B). The aggregation is similar to compression or averaging. According to Brody [17], there is likely feedback provided to the LGN from V1, in order to control low-level edge enhancements, which we discuss more along with the V1–V4 model in Chapter 3. Little is known about feedback into the LGN. Feedback into the LGN seems useful for enhancements during hypothesis testing round-trips through the visual pathway. We postulate that the LGN is involved in low-level segmentations and receives feedback from the visual cortex during segmentation. The VGM API provides a wide range of biologically plausible enhancements emulating feedback into the LGN for color and lighting enhancements, plausible edge enhancements, focus, and contrast enhancements allowed by eye biology as discussed in Chapters 6, 7, 8, and 9. As shown in Figure 2.3, gray-scale cones are aggregated together by magno cells, RG rods are aggregated together into parvo cells, and the blue cones are aggregated together in kino cells. The magno, parvo and kino cells send aggregated color information down the optical pathway into the LGN. Note that rods are about 10x–15x smaller than cones, as shown in Figure 2.3. The larger magno cells aggregate several rods, and smaller parvo cells aggregate fewer cones.
LGN Model
Figure 2.3: The relative sizes of magno, parvo, and kino neuron cells in the retina, which aggregate rods and cones to feed over the optical nerves into the LGN. Rods are about 90% smaller than cones. Note that magno cells are the largest aggregator cells and aggregate many rods compared to the parvo cells, which aggregate fewer cones.
Magno neuron cells (L)—Magno cells, located in the LGN, aggregate a large group of light sensitive rods into strong transient signals. Magno cells are 5x–10x larger than the parvo cells. Perhaps magno cells are linked with blue cones as well as rods, since the blue cones are located interspersed with the rods. The magno cells therefore integrate about 5x–10x more retinal surface area than the parvo cells and can thus aggregate more light together than a single parvo cell. Magno cells are therefore more sensitive to fast moving objects, and perhaps contribute more to low-light vision. Parvo neuron cells (R,G)—Parvo cells, located in the LGN, are aggregators of red and green cones in the central fovea region, creating a more sustained in-focus signal. Parvo cells are smaller than magno cells, and aggregate fewer receptors than magno cells. The visual pathway somehow downstream combines the red, green, and blue cones together to provide color perception and apparently amplifies and compensates for the lack of blue receptors and chromatic aberrations. Perhaps since red is the slowest wavelength compared to green, more red cones are needed (64%) for color sensitivity than green (32%) since they are refreshed less often at the lower wavelength. Konio neuron cells (B)—Konio cells, located in the LGN, are aggregators of the blue cones distributed in the periphery vision of the retina, not in the central foveal region with the red and green cones. Konio cells are less understood than the magno and parvo cells. Blue cones are much more light sensitive than red and green cones, especially to faster-moving transients, but since the blue cones are outside the foveal focus region, they are not in focus like the red and green cells, seemingly contributing chromatic aberrations between R, G, and B. The visual system somehow corrects for all chromatic aberrations. Perhaps the fast wavelength of blue light allows for fewer blue cones (2%) to be needed, since the blue cones are refreshed more often at the faster wavelength. Also, the rods have a spike in sensitivity to blue, so evidently the LGN color processing centers have enough blue stimulus between the magno and the kino cells.
Chapter 2: Eye/LGN Model LGN Image Enhancements Here we discuss the major image enhancements provided by the LGN in our model. – Saccadic dithering superresolution: The eye continually saccades or dithers around a central location when focusing on fine detail, which is a technique also used by high-resolution imagers and radar systems to effectively increase the sensor resolution. The LGN apparently aggregates all the saccaded images together into a higher-resolution summary image to pass downstream to the visual cortex. Perhaps the LGN plays a role in controlling the saccades. – High dynamic range (HDR) aggregation images: The LGN likely plays a central role in performing image processing to aggregate and compress rapid sequences of light-varying images together into a single HDR image, similar to methods used by digital cameras. The finished HDR images are delivered downstream to the visual cortex for analysis. Likely, there is a feedback into the LGN from the visual cortex for HDR processing. – All-in-focus images: Perhaps the LGN also aggregates sequences of images using different focal planes together into a final all-in-focus image, similar to digital camera methods, in response to feedback controls from the visual cortex. – Stereo depth processing: The synthetic model does not support depth processing at this time. However for completeness, we provide a brief overview here. The visual pathway includes separate left/right (L/R) processing pathways for depth processing which extract depth information from the cones. It is not clear if the stereo processing occurs in the LGN or the visual cortex. Perhaps triangulation between magno cells occurs in the LGN similar to stereo methods (see [1] Chapter 1). We anticipate an RGB-DV model in a future version of the synthetic model, which includes RGB, depth D, and a surface normal vector V). Depth processing in the human visual system is accomplished in at least two ways: (1) using stereo depth processing within a L/R stereo processing pathway in the visual cortex and (2) using other 2D visual cues associated together at higher-level reasoning centers in the visual pathway. As discussed in [1], the human visual system relies on stereo processing to determine close range depth out to perhaps 10 meters, and then relies on other 2D visual cues like shadows and spatial relationships to determine long range depth, since stereo information from the human eye is not available at increasing distances due to the short baseline distance between the eyes. In addition, stereo depth processing is affected by a number of key problems including occlusion and holes in the depth map due to the position of objects in the field of view and also within the horopter region where several points in space may appear to be fused together at the same location, requiring complex approximations in the visual system. We assume that modern depth cameras can be used when we enhance the synthetic model, providing a depth map image and surface normal vector image to simulate the LGN L/R processing.
LGN Model
LGN Magno and Parvo Channel Model As shown in Figure 2.4, there are two types of cells in the LGN which collect ganglion cell inputs (rods, cones) from the optic nerve: magno cells and parvo cells. Of the approximately 1 million ganglion cells leaving the retina, about 80%–90% are smaller parvo cells with smaller receptive fields, and about 10%–20% are larger magno cells with a larger receptive field. The magno cells track gross movement in 3D and identify “where” objects are in the visual field, identifying objects sensitive to contrast, luminance, and coarse details (i.e. the receptive field is large and integrates coarse scene changes). The parvo cells are slower to respond, perhaps since the receptive field is small, and parvo cells represent color and fine details corresponding to “what?” the object is. Magno cells are spread out across the retina and provide the gross low-resolution outlines, and parvo cells are concentrated in the center of the retina and respond most during the saccadic dithering process to increase effective resolution and fill in the details within the magno outlines. The magno and parvo cell resolution differences suggest a two-level spatial pyramid arrangement built into the retina for magno-subsampled low resolution and parvo high resolution. In addition, the visual pathway contains two separate parallel processing pathways—a fast magno shape tracking monochrome pathway and a slow parvo color and texture pathway. Following the magno and parvo concepts, the synthetic model provides (1) low-resolution luminance genomes for coarse shapes and segmentations (magno features) and (2) higher-resolution color and texture genomes (parvo features).
Figure 2.4: The parietal and temporal visual pathways composed of larger magno cells with lower resolution, and smaller parvo cells with higher resolution. Parvo cells are 10%–20% as large as magno cells. Illustration © 2016 Springer International Publishing. Used by permission (see [166]).
Chapter 2: Eye/LGN Model Following the dual parvo and magno pathways in the human visual system, we model parvo features as microlevel RGB color and texture tiles at higher resolution and magno features as low-level luminance channels at lower resolution, such as primitive shapes with connectivity and spatial relationships. The magno features correspond mostly to the rods in the retina which are sensitive to luminance and fast moving shapes, and the parvo features correspond mostly to the cones in the retina which are color sensitive to RGB and capture low-level details with spatial acuity. The central foveal region of the retina is exclusively RG cones optimized to capture finer detail and contains the highest density of cells in the retina. Blue cones B are located outside the fovea mixed in with the rods (as shown in Figure 2.2). Magno cell density becomes sparser toward the edge of the field of view. The LGN assembles all the RGBI information into images.
Magno and Parvo Feature Metric Details VGM provides an input processing model consisting of a set of separate input images as shown in Figure 2.5, which reflect the processing capabilities of actual vision biology at the eye and into the LGN. – Raw luminance images – Raw RGB color images – Local contrast enhanced images to emulate saccadic dithering using the biologically inspired Retinex method (see Scientific American, May 1959 and December 1977) – Global contrast normalization similar to histogram equalization to mitigate shadow and saturation effects and provided higher dynamic range – Sharpened images emulating focus and dithering for superresolution – Blurred images to emulate out-of-focus regions – *NOTE: LBP (linear binary pattern, see [1]) images are supported on each RGBI channel, although perhaps not biologically plausible in the visual pathway Following the magno and parvo cell biology, VGM computes two types of features and two types of images in a two-level feature hierarchy: – Parvo feature metrics: Parvo features are modeled as RGB features with 100% full resolution detail, following the design of parvo cells; take input images at full resolution (100%); use RGB color; and represent color and texture features. Parvo cells are slower to respond to changes and integrate detail during saccades. – Magno feature metrics: Magno features are modeled as lower resolution luminance features, following the magno cell biology and chosen to be scaled at 20% of full resolution (as a default approximation to magno cell biology). We posit that the larger magno cells integrate and subsample a larger retinal area; therefore we model magno cells at a lower resolution image suited for the rapid tracking of shapes, contours, and edges for masks and cues.
LGN Model
Figure 2.5: Magno and parvo feature channel images, which reflect the processing capabilities of the eye and LGN. Illustration © Springer International Publishing 2016. Used by permission (see [166]).
The parvo texture features are collected in four oriented genome shapes A, B, C, D (introduced in Chapter 6) within overlapped input windows, simulating the Hubel and Weiss [146,147] low-level primitive edges found in local receptive fields and also corresponding to Brodal’s oriented edge shapes [14]. The A, B, C, D oriented shapes are also discussed in detail in Chapter 9 as low-level textures.
Scene Scanning Model To guide the synthetic model design and implementation, we assume that the human visual system scans a scene in a series of recursive stages, arranged in a variable manner going back and forth between stages according to the current task at hand and as directed by the agents in the higher learning centers. See Figure 2.6.
Chapter 2: Eye/LGN Model
Figure 2.6: Visual scene processing stages in the synthetic model. Note that the stages are traversed in an arbitrary order, usually from magno scene scans down to parvo nano stares.
Scene Feature Hierarchy The scene scanning processing stages result in a feature hierarchy across the bases CSTG: 1. Magno object features 2. Parvo object features 3. Parvo micro object features 4. Parvo nano object features Here is an overview of the modeled feature scanning steps. – Magno scene scan: This is a gross scene scan where the eye is roving across the scene primarily gathering luminance contrast, edge, and shape information; focus may be generally hazy and color information is not precise (global boundary scan). The eye is moving and covering a lot of ground. – Magno object segmentation: The eye slows down to analyze on a smaller region, with increased focus to gather specific region boundary, edge, and texture information (local boundary scan). The eye is moving more slowly around a specific
LGN Model
–
–
–
–
–
object. Decisions are made to categorize the object based on gross luma-contrastbased edge shape and contrast. The VGM records a shape mask (segmentation) representing this stage which can be further analyzed into shape moments and other metrics. Magno object VDNA metrics are computed and stored in the visual cortex memory. Parvo object segmentation: Using the culled and pruned segments from the magno segmentation, a color segmentation is performed on each preserved object segment with better focus and better parvo resolution. NOTE: Many parvo segmentations may be useful. During this stage, the eye is quickly moving and studying RGB features of each object segment. Decisions may be made at this level to categorize the object into a strand based mainly on color or texture metrics. Parvo object VDNA metrics are computed and stored in the visual cortex memory. Parvo interior scan: This scan produces a detailed color segmentation of microregions within the object, such as color textures, color space identification, color labeling, and small color boundaries. The eye is in fair focus over a small region covering perhaps a few percent of the center of the field of view, which contains a feature of interest such as a motif. The eye is moving slowly within the interest region but stopping briefly at key points to verify a feature. This stage is similar to the interest point and feature descriptor stage proposed in many computer vision models (see [1]). Parvo micro objects VDNA metrics are computed and stored in the visual cortex memory. Parvo nano stare: The retinex and histeq images are used in this stage, corresponding to the saccadic stage where the retina is closely studying, perhaps staring at a specifically focused point, dithering the focal point of the field of view in the AR, which area contains the smallest identifiable motif or detail of interest, such as a character, logo, or other small feature. Nano stares may also include recursive boundary scans to determine nano feature shape. This stage is similar to the interest point and feature descriptor stage proposed in many computer vision models (see [1]). Parvo nano objects VDNA metrics are computed and stored in the visual cortex memory. Strands: A proxy agent examines the genome region metrics from visual cortex memory and forms strands of related VDNA into one or more strands. Also, an existing strand can be decomposed or edited based on proxy agent hypothesis evaluations to add or remove segmentations and genome features. A strand is a view-based construct which can grow and increase in level of detail. Bundles: Bundles contain related strands, as composed by proxy agent classification results, and bundles can be edited to add or remove strands and features. As illustrated in Figure 2.6, the scanning occurs in semi-random order, so VDNA, strands, and bundles may change, since the proxy agent may direct the visual pipeline through various processing steps as needed, according to the current hypothesis under evaluation.
Chapter 2: Eye/LGN Model Based on the scene scanning model and assumptions just described, we proceed below to describe more specific model details for scene processing.
Eye/LGN Visual Genome Sequencing Phases This section describes the phases, or steps, of our synthetic LGN model and posits that the visual pathway operates in four main phases which are recursive and continual (see also Figure 2.6): 1. Gross scene scanning at magno resolution 2. Saccadic dithering and fine detail scanning at parvo resolution 3. Segmenting similar regions at both magno and parvo resolution 4. Feature metric extraction from segments into visual cortex memory We refer to visual genome sequencing analogous to the Human Genome Project as the process of disentangling the individual genome regions and VDNA, so segmentation is the primary step in sequencing to collect regions of pixel VDNA together. To emulate the eye and LGN, we carefully prepare a set of images, as shown in Figure 2.7, representing the eye/LGN space of biologically plausible image enhancements. The eye and LGN can actually perform similar processing steps to those modeled in Figure 2.7, as discussed above in the section “LGN Image Enhancements.” Next, we go deeper to describe each sequencing phase and outline the algorithms used in the synthetic model.
Figure 2.7: The relative scale of parvo and magno image input. The scale difference corresponds to the magno cell area vs. parvo cell area, where parvo cells are 10%–20% as large as magno cells. The parvo cells are full resolution images and the magno cells are 5:1 downsampled images. Images from left to right are raw, sharp, blur, histeq, and retinex. Illustration © 2016 Springer International Publishing. Used by permission (see [166]).
Eye/LGN Visual Genome Sequencing Phases
Magno Image Preparation We model magno response as a set of luminance images at 20% of full scale (see Figure 2.7). Apparently the visual system first identifies regions of interest via the magno features, such as shapes and patterns, during a scanning phase, where the eye is looking around the scene and not focused on a particular area. During the scanning phase, the eye and LGN model is not optimized for a particular object or feature, except perhaps for controlling gross lighting and focus via the iris and lens. The magno features are later brought into better focus and dynamic range is optimized when the eye focuses on a specific region for closer evaluation. For the scanning phase and magno features, we assume a much simpler model than the parvo cells, since the magno cells are lower resolution and are attuned to fast moving objects, which implies that the LGN does not change the magno features as much from impression to impression. We take the magno features to be preliminary outlines only, which are filled in by details in the parvo cells. Our scanning model assumes the LGN perhaps changes parameters three times during scanning: (1) global scene scan at constant settings—magno cells filled in with very low-resolution info, (2) pause scan at specific location and focus to fill in more parvo RGB details, and (3) contrast enhance nano stare scan at paused position by LGN during saccades so the fine detail is captured from the parvo cells. Since magno cell regions contain several rods and are predominantly attuned to faster-moving luminance changes, our model subsamples luminance from the full resolution raw parvo images 5:1 (i.e. 20% scale), which seems to be a reasonable emulation of magno cell biology. Perhaps due to the larger size of the magno regions and the smaller size of the rods, (1) it is faster to accumulate a low-light response over all the cells in the magno region, and (2) the magno cell output is lower resolution due to the subsampling and accumulation of all the cells within the magno cell region.
Parvo Image Preparation We model parvo response as a set of RGB images at full scale (see Figure 2.7). For closer evaluation, the visual system inspects interesting magno shapes and patterns to identify parvo features and then attempts to identify larger concepts. The eye saccades over a small active region (AR) within the larger segmentation regions. During saccades, the eye adjustments and LGN image enhancements may change several times according to the current hypothesis under attention by the high-level agent; for example, focus and depth of field may be changed at a particular point to test a hypothesis.
Chapter 2: Eye/LGN Model Segmentation Rationale It also seems necessary to carefully mask out extraneous information from the training samples in a segmentation phase in order to isolate regions from which more precise feature metrics can be created. For example, masking out only the apple in a labeled image of apples allows for recording the optimal feature impressions for the apple, and we seek this feature isolation in the synthetic model. Human vision holds segmented regions at attention, allowing for further scanning and saccading as desired to study features. Segmented regions are therefore necessary to model the vision process. We choose to model segmentation in the LGN but assume feedback input comes from the visual cortex to guide segmentation. Note that DNN training often does not include segmented regions, perhaps since collecting hundreds of thousands of images to train a DNN is hard enough without adding the burden of segmenting or matting images to isolate the labeled training regions. However, some systems (i.e. Clarify) claim that masked, or segmented, training images work best when extraneous information is eliminated. Furthermore, some practitioners may argue that DNN training on segmented images is not wanted and even artificial, preferring objects embedded in their natural surroundings during training. The synthetic model does not require hundreds of thousands of images for training. Segmentation is therefore a simple concept to model, and desirable to effectively add biologically plausible training data samples into the model. DNN semantic segmentation methods are improving, and this is an active area of research. However, DNN architecture limitations on image size preclude their use today in the current VGM, since 12MP images and larger are supported in the VGM. DNNs rescale all input images to a common size—perhaps 300x300— due to performance limitations. For more on DNN segmentation research, see [173][174] to dig deeper. Segmentation seems to be ongoing and continuous, as the eye saccades and dithers around the scene responding to each new hypothesis from the PFC Executive. To emulate human visual biology and the LGN, there is no single possible segmentation method to rely on—multiple segmentations are necessary. The VGM assumes that segmentation is perhaps the earliest step in the vision pipeline after image assembly. We scan the image a number of times, similar to the way the eye flickers around interesting regions to focus and gather information. Regions may be segmented based on a variety of criteria such as edge arrangement, shape, texture, color, LBP patterns, or known motifs similar to letters or logos which can be considered primal regions. We assume that the size of each region is initially small and close to the size of the AR region, which enables larger segmentations to be built up from smaller connected segmentations. There is no limit to the number of segmentations that can be used; however, more than one method to produce overlapping segmented regions is desirable, since segmentation is anomalous with no clear algorithm winner.
Eye/LGN Visual Genome Sequencing Phases
Effective segmentation is critical. Since we are not aware of a single segmentation algorithm which mimics the human visual system, we use several algorithms and parameter settings to create a pool of segmented regions to classify, including a morphological method to identify larger segments, and a superpixel method to create smaller, more uniformly sized segments. Both morphological and superpixel methods yield similar yet different results based on tuning parameters. In the future we intend to add an LBP segmentation method. We have tried segmentation on a variety of color space luma channels, such as RGB-I, Lab-L, HSV-V. We have tried a variety of image pre-processing steps prior to segmentation. We have also tried segmentation on color popularity images, derived from 5-bit RGB colors. We have not yet found the best input image format and processing steps to improve segmentations. And of course, we have also tried segmentation on the raw images with no image preprocessing. Based on experience, segmentation is most effective using a selected combination of superpixel methods [153–157], and for some images we like the morphological segmentation developed by Legland and Arganda-Carreras [20] as implemented in MorphoJlib. Image segmentation is an active field of research, perhaps the most interesting part of computer vision. See Chapter 3 of [1] for a survey of methods and also the University of California, Berkeley Segmentation Dataset Benchmark competition results and related research. In addition, image matting and trimap segmentation methods can be useful, see the survey by Xu et al. [23]. A few notable papers on trimap segmentation and matting include [21][22]. A useful survey of the field of segmentation is found in [25]. Open source code for several color palette quantization methods and color popularity algorithms to segment and shrink the color space to common RGB elements is found at [26]. The Morpholib segmentation package is based on unique morphological segmentation methods and is available as a plugin for ImageJ Fiji [27], and a superpixel segmentation plugin for ImageJ Fiji can be found at [28]. Related methods to learn colors from images include [29][30]. See also [35] for a novel method of “selective search” using several region partitioning methods together for segmenting likely objects from images. Segmentation Parameter Automation While we provide a basic model for selecting segmentation parameters based on the image, future research is planned to develop a more extensive automated segmentation pipeline, which first analyzes the image to determine necessary image enhancements and segmentation parameters, based on a hypothesis parameter set (HPS) controlled by the agent. For example, each HPS could be composed to test a hypothesis under agent control, such as a color-saturation invariant test. The HPS follows the idea that the human visual system directs focus, dynamic range compensations, and other types of eye/LGN processing depending on the quality of each scene image and the hypothesis at hand. As explained earlier, the current VGM uses a few different
Chapter 2: Eye/LGN Model types of processing such as histeq, sharpen, blur, retinex, and raw data at different stages in the model to mimic capabilities of the human visual system. Currently, we use a modest and simple default HPS model, discussed next in the “Saccadic Segmentation Details” section.
Saccadic Segmentation Details We optimize segmentation parameters to generate between 300–1500 segments per image as discussed later in this section, depending on image resolution. We cull out the segments which have a pixel area less than the AR size, and compute metrics over each region. Region size minimums of 200–400 pixels have proven to be useful, depending on the image resolution, as discussed earlier in the section “Visual Acuity, Attentional Region (AR).” The details on the two basic segmentations we perform are discussed below: – Magno LUMA segmentation: A segmentation over the pre-processed raw LUMA using magno 20% scale images which are then scaled up to 100% and used as the segmentation mask region over an RGB raw parvo scale image. – Parvo RGB segmentation: A segmentation on the global histogram equalization parvo full-scale RGB image. Using multiple segmentations together mimics the saccadic dithering process. Saccadic dithering likely involves integrating several segmentations over several saccades, so two segmentations is just a model default. Note that segmentation by itself seems to be the most important feature of the early visual pathway, since the segmentations allow region association and region classification. We have experimented with a model to emulate the magno pathway, focusing on the luminance information for the segmentation boundaries which seems not satisfactory. Somewhere likely in V1 or higher, the magno and parvo color channels are apparently combined into a single segmentation to evaluate color- and texture-based segmentations. In the VGM, we can combine magno and parvo segmentations using the agents. However, we postulate that the visual pathway contains a processing center (likely V1 or Vn) which is dedicated to merging the magno and parvo information into segments. At this time, we are still trying various methods to produce combined segmentation and have experimented with feeding HSV-H images to the segmenter, or Lab-L images, HIS-S images, color popularity-based images, etc., but without any satisfactory winner or fool proof rules or consistently winning results. There are several statistical methods we have tried to analyze image texture to guide image pre-processing and segmentation parameter settings. For simplicity, the default model uses the Haralick Inverse Difference Moment (IDM) computed over a co-occurrence matrix (see Chapter 9):
Eye/LGN Visual Genome Sequencing Phases
�� 𝑖𝑖
𝑗𝑗
1 𝑝𝑝�(𝑖𝑖, 𝑗𝑗)� (𝑖𝑖 1 + − 𝑗𝑗)2
As per the Figure 2.8 example, five different images and their corresponding Haralick IDMs are shown. We sort the IDM values into “texture threshold bands” (see Figure 2.8) to understand the fine texture details for parameter setting guidance.
Figure 2.8: The Haralick Inverse Different Moment values for five images and the threshold bands used for segmentation and image pre-processing parameters settings.
Magno and parvo resolution segmentations are optimized separately, depending on the algorithm used: superpixel or morphological segmentation, and depending on the image resolution. Currently for 12MP images (4000x3000 pixels) we optimize the target number of segmentations as follows: – For magno segmentation (~0.5MP images), we aim for 400 segments per image, and we cull out the segments which have a pixel area less than the AR size, or larger than a max size, yielding about 300–400 valid segments per image, each of which are considered a genome region. – For parvo images (8MP–12MP images), we aim for about 2,000 segmentation and cull out small and large regions to yield about 1,000–1,500 segmentations. From experience, we prefer superpixel segmentation methods, and more than one overlapping segmentation is valuable for best results, since no single segmentation is opti-
Chapter 2: Eye/LGN Model mal. More work needs to be done for future versions of the model, and we describe possible enhancements as we go along. As shown in Figure 2.9, each segmented regions is stored in rectangular bounding boxes as a black/white mask region file and a separate RGB pixel region file. We use the rectangular bounding box size of the segmentation as the file size and set all pixels outside the segmented region to 0 (black) so valid pixels are either RGB values or 0xff for mask files. The segmentation masks from the raw image can be overlaid on all images to define the segmentation boundaries, so a full set of metrics are computed over each segmented region on each image: raw, sharp, retinex, global histeq, and blurred images. We refer to segmented regions as visual genomes, which contain over 16,000 different VDNA metrics. The raw RGB image metrics are the baseline for computing the autolearning hull, using retinex and sharp image metrics top define the hull boundary for matching features, as discussed in Chapter 4. Autolearning is similar to more laborious training methods, but only uses a small set of training images that are plausible within the eye/LGN model.
Figure 2.9: The segmentation and feature metric generation process. Note that several files are saved to contain the segmentation masks, segmentation RGB pixel regions, metrics files, and autolearning files.
Eye/LGN Visual Genome Sequencing Phases
Processing Pipeline Flow The basic processing flow for segmentation and metrics generation is shown in Figure 2.9 and described next, using a combination of ImageJ java script code and some descriptive pseudocode. Note that the synthetic model is mostly implemented in C++ code using massively parallel multithreading (not shown here). Magno/Parvo Image Preparation Pipeline Algorithm Magno image preparation is the same as parvo, except the raw image is first downsampled by 5x (by 0.2) to start. Pseudocode: Read raw RGB image Save raw RGB into .png file (1 file) Split raw RGB channels into separate R,G,B,I files (5 files total) Sharpen raw RGB image Save sharp RGB into .png file (1 file) Split sharp RGB channels into separate R,G,B,I files (5 files total) Blur raw RGB image Save blur RGB into .png file (1 file) Split blur RGB channels into separate R,G,B,I files (5 files total) Retinex RGB image Save blur RGB into .png file (1 file) Split blur RGB channels into separate R,G,B,I files (5 files total) Normalize global contrast RGB image Save blur RGB into .png file (1 file) Split blur RGB channels into separate R,G,B,I files (5 files total)
Segmentation Pre-Processing Pipeline The Haralick IDM and other texture metrics are determined first, then the segmentation parameters and pre-processing parameters are set. We present the following code from ImageJ Fiji to illustrate the process. 1) Compute the Haralick Metrics run("GLCM Texture", "enter=1 select=[0 degrees] angular contrast correlation inverse entropy"); global_contrast_g = Haralick_IDM; // save the Haralick IDM for texture analysis and pre-processing guidance
Next, set the segmentation parameters for both the jSLIC and the morphological segmentation methods separately, using the texture threshold bands as shown previously in Figure 2.8. ImageJ script code: if (global_contrast_g > 0.5) { print("Very-lo-contrast > 0.5"); global_contrast_g = 0; // very low jslic_grid_A_g = 30; jslic_regularize_A_g = 0.1; jslic_grid_B_g = 35; jslic_regularize_B_g = 0.04; morpho_radius_A_g = 1; morpho_tolerance_A_g = 10; morpho_radius_B_g = 2; morpho_tolerance_B_g = 4; } else if (global_contrast_g > 0.4 || sdm_contrast_g == 0.0) { print("Lo-contrast > 0.4"); global_contrast_g = 1; // lo jslic_grid_A_g = 30; jslic_regularize_A_g = 0.15;
Chapter 2: Eye/LGN Model jslic_grid_B_g = 40; jslic_regularize_B_g = 0.05; morpho_radius_A_g = 1; morpho_tolerance_A_g = 10; morpho_radius_B_g = 2; morpho_tolerance_B_g = 5; } else if (global_contrast_g > 0.3) { print("Med-contrast > 0.3"); global_contrast_g = 2; // med jslic_grid_A_g = 30; jslic_regularize_A_g = 0.20; jslic_grid_B_g = 40; jslic_regularize_B_g = 0.07; morpho_radius_A_g = 1; morpho_tolerance_A_g = 11; morpho_radius_B_g = 2; morpho_tolerance_B_g = 7; } else if (global_contrast_g > 0.25) { print("Hi-contrast > 0.25"); global_contrast_g = 3; // hi jslic_grid_A_g = 30; jslic_regularize_A_g = 0.10; jslic_grid_B_g = 30; jslic_regularize_B_g = 0.07; morpho_radius_A_g = 1; morpho_tolerance_A_g = 11; morpho_radius_B_g = 2; morpho_tolerance_B_g = 10; } else { print("Extreme-contrast < 0.25"); global_contrast_g = 4; // extreme jslic_grid_A_g = 30; jslic_regularize_A_g = 0.15; jslic_grid_B_g = 30; jslic_regularize_B_g = 0.05; morpho_radius_A_g = 2; morpho_tolerance_A_g = 12; morpho_radius_B_g = 3; morpho_tolerance_B_g = 10; }
2) Perform Image Pre-processing to Adjust for Image Texture Prior to Segmentation Depending on the Haralick IDM contrast metric stored in the global_contrast_g, we perform appropriate image pro-processing. We have found the following image preprocessing steps to be generally helpful prior to running the segmentations; however, future work remains to improve the effectiveness. ImageJ script code: if (global_contrast_g == 0) { print("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=2.2 mask=*None*fast"); run("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=2.2 mask=*None*fast"); } if (global_contrast_g == 1) { print("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=2 mask=*None*fast"); run("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=2 mask=*None*fast"); } if (global_contrast_g == 2) { print("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=1.8 mask=*None*fast"); run("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=1.8 mask=*None*fast"); } if (global_contrast_g == 3) { print("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=1.5 mask=*None*fast"); run("Enhance Local Contrast (CLAHE)", "blocksize=32 histogram=256 maximum=1.5 mask=*None*fast"); } if (global_contrast_g == 4) { run("Enhance Contrast...", "saturated=0.3 equalize"); }
Eye/LGN Visual Genome Sequencing Phases
3) Compute Segmentations Segmentations are computed for each raw, sharp, retinex, histeq, and blur image space, so features can be computed separately in each space. Depending on the desired results, we can choose which segmenter to use (jSLIC superpixels or morphological). See Figure 2.10 and note the difference between superpixel methods and morphological methods: superpixel methods produce more uniformly sized regions, while morphological methods produce a sider range of segmented region sizes. It should be noted that segmentation region size is important, since some metrics computed over large regions can be denser than metrics computed over smaller regions which tend to be sparser. We use a range of normalization methods and other specialized algorithms to compute and compare the metrics with region size in mind, as explained in subsequent chapters where we describe each metric. A fruitful area for future research planned for subsequent versions of the VGM is to provide more segmentations, as well as dynamic segmentations of particular regions using a range of segmentation types and parameters across (1) multiple color spaces and (2) multiple LGN pre-processed images. Using multiple segmentations as described will increase accuracy based on current test results. Because multiple segmentations will result in more genomes and more metrics processing, the increased compute load points to the need for dedicated hardware for segmentation, or least software optimized segmentation algorithms. Here is some sample ImageJ script code used to run each segmenter. function morphoj_segment(filedir, new_file, radius, tolerance, connectivity) { parms = toString(" morphoj_segment( \""+filedir+"\", \""+new_file+"\", \""+radius+"\", \""+tolerance+"\", \""+connectivity+"\" );"); command = toString("cp "+mophoj_segmentation_js_g+" /Applications/Fiji.app/macros/morpho_script.js"); exec("sh", "-c", command); exec("sh", "-c", "printf '"+parms+"' >> /Applications/Fiji.app/macros/morpho_script.js"); rv = runMacro("morpho_script.js"); print("Done."); } function create_jslic_parvo_mask(parm1, parm2, file, type) { run("jSLIC superpixels 2D", "init.=["+n1+"] regularisation=["+parm2+"] export overlap=none indexed save save="+filedir_g+"_textdata"); // add a space at the beginning and end of the line to help with numeric substitutions later jslic_mask_textdata_g = toString(parvo+"_"+n1+"_JSLIC_segmentation_"+type+"_"+filenameonly_g+"_textdata"); exec("sh", "-c", "cat "+filedir_g+"_textdata | sed -e 's/^\\(.*\\)$/ \\1 /' >”+filedir_g+jslic_mask_textdata_g); selectWindow(segmentation); saveAs("PNG", filedir_g+parvo+mask+n1+"_JSLIC_"+file); }
Chapter 2: Eye/LGN Model
Figure 2.10: Morphological methods vs superpixel methods. (Top left) a typical morphological segmentation producing a range of small-large image regions, especially effective for identifying larger connected regions; (Top right) a superpixel segmentation, showing smaller more regular sized regions; (Bottom left) morphological segmentation overlayon image; (Bottom right) superpixel segmenation overlay on image. There is no ideal segmentation method.
Processing Pipeline Output Files and Unique IDs Each segmenter outputs a master segmentation file, which we process to split up the segmented image into separate files—one file per segmented region as shown in Figure 2.11. Each genome region file name incorporates the coordinates and bounding box and the type of segmenter used. We refer to each segment as a visual genome.
Eye/LGN Visual Genome Sequencing Phases
Figure 2.11: A variety of segmented regions as mask files. Black pixels are the background area outside the segmentations.
As shown in Figure 2.11, segmentations are contained RGB pixel regions surrounded by a 0x00 black mask area—the mask delineates the boundary of the genome region in the bounding box. When the segmentation completes, each segmented pixel region is named using a unique 64-bit ID for the image and another ID for the genome region within the image. Genome IDs are unsigned 64-bit numbers such as 0x9100910ed407c200. For example, a metrics file is composed of two 64-bit IDs: metrics___.metrics.
See Table 2.1 in the following section for a typical list of names. The 64-bit IDs provide a good amount of range.
Feature Metrics Generation After the separate genome pixel regions are separated into files (genome_*.png))— one file for each raw, sharp, retinex, histeq, and blur space—metrics are computed from each genome file. As discussed in Chapter 4 in the “Agent Architecture” section, the sequencer controller agent computes the feature metrics for each genome, stores them in a set of metrics files, and also calls any agents registered to perform any special processing on the genome regions. A summary of the files created for each genome is shown in Table 2.1. The metrics files include all the feature metrics, the names of all the associated files, and the coordinates and size of the bounding box for each genome. Other metrics are also computed into separate files.
Chapter 2: Eye/LGN Model Details on the metrics file internal formats is provided in Chapter 5. Feature metric details are discussed separately for each color, shape, texture, and glyph feature base in Chapters 6–10. Table 2.1: Files created for each genome genome_xydxdy________jslic_RGB_maskfile___FLAG_JSLIC_F.png *The bounding box image containing the genome region RGB pixels, background pixel values = . This file can be converted to a binary mask if needed. MASTERGENOME__ dbb_edc_RAW_IMAGE.genome.Z *The D texture file for the RAW IMAGE genome region, contains ,,,, bit scale pyramid MASTERGENOME__ dbb_edc_RETINEX_IMAGE.genome.Z *The D texture file for the RETINEX IMAGE genome region, contains ,,,, bit scale pyramid MASTERGENOME__ dbb_edc_SHARP_IMAGE.genome.Z *The D texture file for the SHARP IMAGE genome region, contains ,,,, bit scale pyramid MASTERGENOME__ dbb_edc_BLUR_IMAGE.genome.Z *The D texture file for the BLUR IMAGE genome region, contains ,,,, bit scale pyramid MASTERGENOME__ dbb_edc_HISTEQ_IMAGE.genome.Z *The D texture file for the HISTEQ IMAGE genome region, contains ,,,, bit scale pyramid METRICS__ dbb_edc.metrics *Cumulative metrics for the genome region, discussed in subsequent chapters. AUTO_LEARNING__ dbb_fbb.comparisonmetrics *The learned ranges of acceptable values (HULL values) for each metric to determine correspondence, based on the collective metrics from the raw, retinex, sharp, and blur data, discussed in detail in Chapter .
Summary
Summary In this chapter we discuss the synthetic eye model and LGN model, which are mostly taken care of by the fine engineering found in common digital cameras and image processing library functions. We develop the concept of visual acuity on an attentional region (AR) to guide selection of a minimum segmentation region size to compose visual genomes. We define a simple model of the LGN, which include magno features for low resolution, and parvo features for full resolution. We discuss how saccadic dithering of the eye is used to increase resolution. We discuss how to automatically set up segmentation parameters and image pre-processing parameters based on image features such as texture, and discuss future work to automate segmentation and image pre-processing to emulate the eye and LGN biology. Finally, some low-level code details are provided to illustrate the model implementation.
Chapter 3 Memory Model and Visual Cortex I know who I am. I remember everything.
—Jason Bourne
Overview We model the visual cortex as a smart memory with dedicated processing centers, and for convenience sometimes refer to the entire visual cortex and associated memory as the memory model. In the visual genome model (VGM), visual feature memory is located in separate regions of the visual cortex, while agents that model learned intelligence reside in the higher-level PFC memory. We follow neurobiological research, as discussed throughout this chapter, to model the visual cortex as separate regions containing tightly coupled local processing centers with local memory, so each region performs specialized feature metric learning and comparisons. The local memory and coupled processing structure follow the visual cortex biology discussed throughout this chapter.
Figure 3.1: Some of the known processing centers within the visual cortex, which, according to current research [1] lists over 30 processing centers.
DOI 10.1515/9781501505966-003
Chapter 3: Memory Model and Visual Cortex We show question marks in Figure 3.1 since nobody is sure of the visual cortex structure. But we know that visual processing happens there. How many processing centers are in the visual cortex? Wandell states [40]: As more responsive sites are identified, a theory based on brain centers becomes untenable. Localization results, in which a theory means identifying functional centers, do not advance understanding of the computational mechanisms needed to achieve the behavioral results (Zeki 1993, Posner & Raichle 1994).
In other words, the visual cortex is extremely complex and defies simple mappings to a fixed set of processing centers. Contemporary research (see the Journal of Neuroscience, for example) has identified many visual cortex processing centers, such as those for recognizing human body parts, faces (which appears to be a genetically designed region), animals, common objects, chairs, tools, building, and many more regions. Perhaps both memory and additional neural processing centers are developed over time according to need and experience. According to the synthetic model, each specialized region would include a CAM associative memory region, with specialized agents. However, for purposes of creating a synthetic model, we model the visual cortex processing centers as a set of over 16,000 processing centers, implemented as metrics functions. See Chapters 6–10 for details on each metric. We further postulate that visual cortex processing centers can be created and learned over time based on experience, as suggested by [171] and many other researchers; thus, not all processing centers are predefined by genetics. Future versions of the VGM will therefore include more metrics and agents. For our purposes, the number of model layers V1–Vn does not matter, since the VDNA model provides more than enough dimensions (currently over 16,000) to emulate most any reasonable model designed in the agents. And we assume that each Vn region contains or has access to dedicated neural processing necessary to compute the desired metrics in any dimension. Also, since the actual metrics used by neural processing regions are not known, the current API and the extendable nature of the VDNA seems like a plausible place to start for implementing plausible models and metrics. As illustrated in Figure 3.1, the visual cortex performs specialized neural processing in separate regions, as confirmed by observations using fMRIs, connectome neural pathway images (see Figure 1.9), and other imaging modalities. Many models of the visual cortex have been proposed. Visual cortex models may assume V1 (low-level edges), V2 (midlevel edges), V3 (edge orientation), V4 (color), V5 (motion), and V6 (depth). Still, many functions of the visual pathway are not very well understood, such as the PIT and AIT regions shown outside of the visual cortex as shown in Figure 3.1. Rather than using a complete biological model of the visual pathway, feedforward DNNs make hierarchical model assumptions using Hubel and Weiss’s
Visual Cortex Feedback to LGN
feedforward concepts to model the V1–Vn layers with increasing levels of detail in each network layer. DNNs model a feedforward hierarchy of features; for example, the DNN layers could be viewed as V1 (lowest-level-edge glyphs), V2 (lowlevel-edge glyphs), V3 (midlevel-edge glyphs), V4 (high-level-edge glyphs), V5 (highest-level-edge glyphs), V6 (fully connected-layer of glyphs). See [1] Chapters 9 and 10 for more detailed discussions of DNN model assumptions. Also note that DNN models typically provide separate model weights for R,G,B and maybe I (gray scale) edges. However, the DNN model weights represent only a single feature type (i.e. edge gradient) built up from a training set. DNN models assume a concept called feature learning for the feature weights, which differs from Brody’s model and the HMAX model. DNNs assume that all features must be learned and otherwise do not exist, while Brody and HMAX assume that some features are pre-wired edge patterns in the visual cortex created from the DNA/RNA genetic design instructions. VDNA assumes that the visual memory itself is the feature (including preexisting or genetic features in memory by genetic design), allowing the VDNA base metrics to be combined into strands and bundles, or recipes describing a visual concept. Table 3.1: Comparing feature genesis models: (1) feature learning vs. (2) hard-wired preexisting features Model
Feature Genesis
HMAX
Some preexisting features are hard-wired into the visual cortex, and some are learned by experience.
Brody
Preexisting features are hard-wired into the visual cortex.
DNN
No preexisting features (except if using transfer learning from a pretrained model as the training starting point)—all features must be learned by averaging together from a set of test images into a common hierarchical set of nxn patterns.
VGM and VDNA
The visual memory itself is the feature—over , VDNA feature metrics model the V–Vn visual cortex processing centers, and agents learn higherlevel concepts by associating base feature metrics into VDNA strands.
Visual Cortex Feedback to LGN According to Brody [17], the LGN passes local visual regions, such as circular regions of rods or cones, through a center-surround filter followed by a nonlinearity function, similar to the methods often employed in DNN activation functions (see [1], Chapters 9 and 10). Brody also suggests that the V1 cells are organized into a structured processing center to provide a set of oriented edge filters similar to a steerable filter bank
Chapter 3: Memory Model and Visual Cortex (see [1], Chapter 3) and that the V1 filters are used to reinforce the edge features into a feedback dendrite to re-excite the LGN to possibly perform additional pre-processing. Also note that the HMAX visual pathway model (see [1], Chapter 10) is similar to Brody’s model and includes a few layers of oriented edge patterns at different scales as basis features for correlation. We posit that PIT and AIT, separate from the LGN and the visual cortex, are used to provide feedback between the LGN and the visual cortex to possibly parameterize object recognition processing. In the VGM model, we do not explicitly model the PIT and AIT regions, see Riesenhuber [171] for relevant research. Research suggests [171] that the AIT and PIT are the penultimate stage in visual processing after the V1–Vn stages, responsible for structuring the visual information into higher-level patterns. However, little is really known to provide guidance for a synthetic model, so in the VGM, agents model the details as they wish.
Memory Impressions and Photographic Memory The visual information is the feature. We postulate that visual impressions from the eye and LGN result in recordings made in the visual cortex memory; for example, pixels at the low level, along with higher-level structures such as shapes, edges, and various types of metrics and feature descriptors at higher levels. The full resolution visual genome pixel regions retained in memory allow an agent to reevaluate the pixels at any time. The VGM supports photographic memory. It is well known that some people have better memory than others in very specific ways. The flexibility of the synthetic model allows for designing genius memory for specific applications. For example, an artist may have a photographic visual memory, enabling them to remember and even render very precise renditions of a scene or object. A musician or composer may have very precise and complete auditory memory for musical sound, enabling them to hear and recreate music very well. Other types of photographic memory seem plausible as well. The VGM supports genetic memory, which is equivalent to transfer learning (see [1], Chapter 9). We consider genetically encoded features to be primal features. A primal feature may or may not be contained in a genome region and may require scanning across the image and each genome region for location, similar to DNN tile scanning, to find a match. Currently, primal feature scanning is performed within the genome regions, but an agent may scan the entire image to look for primal features.
CAM Neurons, CAM Features, and CAM Neural Clusters
CAM Neurons, CAM Features, and CAM Neural Clusters The synthetic model includes a type of low-level neural model, called the CAM neuron, as discussed in Chapter 6. The CAM neuron takes input directly from LGN-assembled images from the magno and parvo pathways and produces CAM features.
Figure 3.2: Volume projection metrics rendered as CAM neural clusters from CAM features (center and right), taken from the LGN-assembled features of a genome region (left).
CAM features are a form of associative memory, or content-addressable memory (CAM). Each CAM features is assembled into an address (see Chapter 6 for full details). All CAM features in a genome can be rendered together into a CAM neural cluster, or volume, to represent a genome in a higher dimensional space, referred to as volume projection metrics (discussed in Chapter 6). Correspondence between genomes can occur in volume projection space, and some texture and shape metrics are computed in volume projection space as well (see Chapters 8 and 9). “The visual information is the feature.”
Chapter 3: Memory Model and Visual Cortex
Visual Cortex and Memory Model Architecture Here we provide a brief overview of the synthetic model architecture and engineering details, which is expanded in subsequent chapters. We model memory inside the visual cortex. A growing list of over 30 specific visual cortex processing centers have been identified according to the latest neuroscience (which is outside the scope of this work), with more visual cortex processing centers being investigated as science progresses. One method used to identify processing centers in the visual cortex, as proposed by Wandell [40], provides a specific visual pattern to the eye and then maps the response in the visual cortex using fMRI to view visual cortex activity, revealing a staggering amount of visual cortex response that seems difficult to localize into processing regions. As shown in Figure 3.3, the VGM visual cortex model includes processing and memory. The major features of the model include tight coupling between the distributed processing centers V1–Vn, their respective local memory, and a global memory across the visual cortex. See also Figure 1.11 and the “VDNA Synthetic Neurobiological Machinery” section in Chapter 1 for details on the processing centers shown in Figure 3.3.
Figure 3.3: The visual cortext model, where each processing center V1 ton contains processing logic, photographic memory, and feature memory which is content-addressable (CAM) supporting associative memory.
Here is a summary of visual cortex and memory model features. – Persistent photographic memory of each genome region segment, containing all the pixels at full resolution (i.e. the actual visual impression from the LGN).
CAM and Associative Memory
– – –
–
– –
Processing centers V1–Vn for computing specific visual attributes and feature metrics. Local memory coupled with each visual cortex V1–Vn processing center. Global content-addressable memory (CAM) throughout the visual cortex across all V1–Vn processing centers, supporting associative memory across the visual cortex available for higher-level learning agents. Visual DNA (VDNA) feature metrics stored in V1–Vn local memory. The local memory also exists within the global memory space and is fully associative and content addressable. Strands of VDNA and bundles of strands stored in global memory. The visual cortex model includes, for convenience, the AIT and PIT, but we allow agents to model the details since little is actually known about them (see [171]).
According to neuroscience, the visual memory model appears to be both local to each processing center, and associative, allowing for directed queries from higher-level reasoning centers to look for similar or different objects within V1–Vn local memory. The locality of the visual cortex processing centers and local memory is visible in fMRI images (see for example Cox and Savoy [39]). The local memories for each V1–Vn processing center are seemingly distributed within a global memory. The synthetic model uses strands and bundles of VDNA as the key data structures to model multi-VDNA and multigenome associativity, allowing the agents to identify similar or associated VDNA in any part of visual memory, including the current image as well as stored images. The visual memory feature description and query metrics are implemented analogous to V1–Vn regions in the API as processing metrics functions emulating plausible V1–Vn regions. The API is covered in Chapters 5–10. Note that the idea of associative memory coupled with dedicated neural processing centers provides a flexible and powerful model for learning and hypothesis testing in the agents. We do not support the stereo vision pathway in the current visual cortex model, but we provide a high-level overview in Chapter 2 of how it might work in a future version.
CAM and Associative Memory We model CAM memory in the visual cortex as a type of feature memory (discussed in detail in Chapter 6). The CAM address reveals the contents, similar to a database key. We borrow the acronym CAM from computer science—content-addressable memory—to describe a memory access method used to retrieve items based on contents rather than a memory address. CAM memory is employed in many CPU designs as a form of cache memory. CAM may sometimes be called CAM memory, despite the obvious redundancy, or associative memory. For a CAM memory there is no need to
Chapter 3: Memory Model and Visual Cortex know the address of an item in memory, since the CAM allows the entire memory to be searched and accessed based on memory contents. A CAM memory can be very useful, and a custom CAM can be made for a variety of separate applications. For example, a CAM can be devised to return all the memory words that contain integers greater than +1,000,000, as well as return the addresses. Another CAM may be devised to return all the addresses of character strings containing the substring “hello.” In software parlance, a CAM may be referred to as associative memory and implemented as a linked list or hash table. In the synthetic model, agents are used to implement custom CAM memory, and may use strands to store lists of associated feature metrics, such as all the genomes in an image that contain a specific texture.
Multivariate Features The visual cortex appears to be organized with very specific processing center regions to support multivariate feature types, following a genetic design. Many types of features seem plausible. As discussed in Chapter 2, the VGM synthetic model provides for a multivariate feature space of VDNA—one feature space in a local memory per each visual cortex processing center. The synthetic model postulates over 16,000 feature dimensions. Feature processing centers include (1) magno low-resolution segmented regions, (2) parvo segmented regions, (3) parvo micro features, and (4) parvo nano features, projected across four bases—color (C), shape (S), texture (T), and glyphs (G). The VDNA model bases CSTG, shown in Table 3.2, seem to be conceptually validated by neuroscience research as mapping into specific V1–Vn regions. Therefore, the synthetic visual model and particularly the VDNA seem to be plausible visual cortex model starting points. Compare Table 3.2 below with Figure 3.1. Table 3.2: Comparing VDNA bases with visual cortex processing centers VDNA Base
Visual Cortex Processing Center
C Color base
V, V
S Shape base
V, V
T Texture base
V (low-level edges)
G Glyph base
PIT, AIT (higher-level patterns)
Feature VDNA
Primal Shapes, Colors, Textures, and Glyphs We refer to primal colors, primal textures, primal shapes, and primal glyphs as being genetically encoded and present from birth embodied in dedicated visual processing centers, which we posit as an obvious extension of the Hubel and Weiss notion of simple and complex cells which are examples of primitive, primal, genetically encoded edge shapes. Also, as shown in Table 3.1 the HMAX model posits primal shapes including oriented edges at different scales being genetically encoded in the visual cortex as well (see [1]). The notion of primal colors implies separate colors are genetically encoded at birth and biologically implemented in the visual processing centers for RGB color recognition. This implies that various forms of color blindness are genetically encoded as well, since people recognize color without being taught about assembling different colors from independent color primaries. And yet we recognize colors from birth and learn the color labels later. However, as noted by Jameson et al. [72], color sensitivity varies genetically between individuals and between the sexes (i.e. the set of primal colors which can be recognized) especially among females who typically have a much higher number of color-sensitive cones in the retina, so females are typically more sensitive to color variations. Primal regions, or shape features, are found during the magno scanning phase of vision, where the scene is explored to identify familiar objects at a gross segmentation level. The segmentation could be based on a contrasting shape, texture, or color. The segmentation defines a bounding region; for example, a floor, ceiling, walls, furniture, door, and window outlines for a room. Primal textures are different between males and females, similar to color acuity variations between the sexes, since visual acuity is determined in part by the number of cones; therefore, women may be more able to see fine-grained colors, details, and textures, and men are more able to track fast-moving objects. The number of rods is usually higher in men [72] who are more able to track fast-moving objects as primal shapes or regions. Primal glyphs may be genetically encoded as well, such as geometric primitives like rectangles, circles, and edges. Higher-level shapes such as faces, eyes, and hands are likely also genetically encoded in the DNA. The VGM model allows for agents to model any primal features.
Feature VDNA The fundamental feature stored in the visual memory is called a visual DNA or VDNA, which persists in a multivariate and multidimensional feature space composed from the original segmented regions of pixels in a range of formats according to eye/LGN anatomy. Formats include raw, sharp, blur, retinex, and histeq. VDNA feature metrics
Chapter 3: Memory Model and Visual Cortex from the feature bases color (C), shape (S), texture (T), and glyphs (G), as well as strands of related VDNA, and can be collected into bundles to represent higher-level concepts (see Figure 3.4). Feature memory is associative and can be searched by agents to locate features with similar traits such as color, texture, shape, or glyphs. The combination of feature metrics used during correspondence is determined by each agent, since not all features are needed for each application. For example, an agent may scan the value of several hundred VDNA feature attributes to compare two VDNA—such as selected color components in many color spaces—together, looking for the best, average, or worst matches. The agent then processes the comparison values in a customized classifier.
Figure 3.4: Various features are stored together in memory in an associative fashion for parvo, magno, strand, and bundle features. Strands are also primal shapes segmenting and collecting lists of the underlying magno and parvo tile textures from the RGB and Luma regions. Bundles are highlevel concepts composed of the primal shape strands. Copyright © 2016 Springer International Publishing. Used by permission (see [166]).
A set of metrics or VDNA are learned from the genome regions and recorded in memory. Each metric includes autolearning, discussed in Chapter 4 with examples provided in Chapter 11, to establish a plausible hull or threshold value around each of over 16,000 VDNA feature metrics—enabling correspondence to be determined
Volume Feature Space, Metrics, Learning
within the hull region. For example, autolearning hulls express variation within texture and color metrics, for matching color shifts or texture range magnitude shifts of similar VDNA. The convenience of the VDNA memory model increases by combining the autolearning-based correspondence of any bases CTSG together with spatial relationships. Agents combine any combination of feature metrics and spatial relationships, emulating the hypothesis testing which occurs during the round trips through the visual pathway during high-level reasoning hypothesis testing. For example, an agent may issue several queries: “How well does the color match?” “How well does the texture match?” “Is the spatial relationship between the genomes in the object consistent with the learned relationship?”
Volume Feature Space, Metrics, Learning The volume feature space is large, multivariate and multidimensional, in contrast to 1D DNN feature hierarchies of a single feature type—gradient weights. We refer to the method of agents using the multidimensional feature volume as volume learning, discussed in more detail in Chapter 4. However, volume learning also includes sequencing (learning) feature metrics from each genome impression into a volumetric or multidimensional memory structure. The VGM has no strict feature hierarchy like a DNN from low- through mid- to high-feature levels. The multidimensional feature space defines multiple branches from each base CSTG. As shown in Figure 3.5, the feature volume branches out across the bases CSTG, derived as metrics from each of the input images which emulate the eye and LGN capabilities as explained in Chapter 2. However, a hierarchy of feature detail is supported with the VGM within the volumetric model. For example, the quantization space discussed in Chapter 6 provides a pyramid-like level of detail space. Also, as shown in Figure 2.6, there is also a hierarchy of scene processing and feature detail in the LGN model including gross magno features and four levels of parvo feature detail: – Magno object features (LUMA 5x downsampled) – Parvo high-level features (RGBI blur full scale) – Parvo raw features (RGBI raw full scale) – Parvo micro object features (RGBI sharp full scale) – Parvo nano object features (RGBI retinex full scale) In a DNN, the computed feature weights are referred to as the model which is a black box after it is computed, and it is used for correspondence as a monolithic 1D feature vector. However, the VGM is not a monolithic black box, but rather a multivariate and multidimensional feature space model where each feature is independently available to agents.
Chapter 3: Memory Model and Visual Cortex
Figure 3.5: The feature volume containing over 16,000 features. Note that some of the features are precomputed and stored in visual cortex local and global memory, and other feature metrics are computed on demand via the VGM API.
Visual DNA Compared to Human DNA Visual DNA is analogous to human DNA. DNA can be combined to represent genes (genetic traits or concepts), and likewise VDNA can be combined into strands and bundles to represent traits and concepts. Visual DNA, or VDNA, is modeled as metrics functions, discussed in subsequent chapters. See Figure 3.6. As shown in Figures 3.6, VDNA bases are associated together into linear strands, similar to human DNA strands. A strand is a list containing VDNA with structural and spatial information. Strands may be (1) interactively created during a training phase or (2) produced automatically by agents designed to look for association between VDNA based on a set of criteria such as spatial relationships or associative relationships between VDNA bases. We loosely refer to a region of pixels as a visual genome, as expressed in the ~16,000 VDNA feature metrics. The VDNA model does not restrict the shape of the pixel region. Note that a color VDNA metric may be expressed in multiple color spaces, shape may be expressed in 2D or 3D, and texture may be expressed in 2D or 3D. A glyph can
Spatial Relationship Processing Centers
be within a 2D or 3D segmented region, a DNN feature weight nxn template representing a higher-level concept, or any other feature descriptor such as SIFT. VDNA types are expected to increase as new feature metrics are discovered.
Figure 3.6: Visualizing the analogy of visual DNA (VDNA) and human DNA. The top image shows a simplified DNA strand of A, T, G, C bases, and the bottom strand shows the VDNA model using CSTG bases. (The human DNA diagram included at the top is in the public domain from the USG NIH.)
Spatial Relationship Processing Centers While neuroscience research is sketchy, the author believes the visual cortex must contain interfeature spatial relationship processing centers to track the spatial distance vectors between features. We expect this should be validated as neuroscience progresses, since spatial and topological reasoning are required parts of vision. Perhaps the V3a and V5 are spatially aware (see Figure 3.1). As shown in Figure 3.7, a VDNA strand includes spatial genome relationship vectors. We discuss the synthetic model for dealing with spatial relationships and topology in the next section as strands and bundles.
Chapter 3: Memory Model and Visual Cortex
Figure 3.7: A VDNA strand with spatial vector relationships. Each VDNA region in the strand is shown with a green circle centroid, connected to white startpoint vectors and black endpoint vectors, defining a local coordinate reference system. Occlusion is supported in the strand model.
Note that the idea of spatial relationships between related feature descriptors is obvious and is likely found in several systems the author is unaware of. However, one noteworthy example is provided by Sivic et al. [32] where SIFT descriptors are composed into a bag-of-descriptors model (similar to a bag-of-words model), with spatial associations between pairs of descriptors in a local region. We discuss spatial relationships more in the following section on strands and bundles.
Strand and Bundle Models A strand is a list of related genomes; for example, all the genomes forming the image of a squirrel. A strand is analogous to a gene, which is a chain of contiguous DNA. The relationship of strands within a genome can be learned by an agent according to various learning criteria (see Chapter 4), or else the strand can be composed manually using the strand editor to select related genomes. The strand is modeled as a list of genomes as follows: – A primary or first genome in the list – A list of subsequent genomes in any order – A terminating or last genome in the list – Spatial relationships between each genome in the list are encoded in the strand
Strand and Bundle Models
The strand structure has many advantages, notable many forms of invariance, and a local strand coordinate system. Strand comparison metrics are discussed in detail in Chapter 8. We introduce the strand model next.
Strand Feature Topology The strand incorporates a strand-relative local coordinate system for recording genome relationships as vectors. Therefore, the strand is a topology descriptor (see Chapter 8 for details). The relative coordinate system is based on primary and terminal genome reference points—genome (x,y) centroids in the image. Alternatively, instead of composing strands from a list of genomes, strands can be built using GLYPH base feature descriptors such as a SURF (see Chapter 8). The strand topology descriptor is defined as a list of vectors between the primary, terminating, and each intermediate element in the strand. By default, the first descriptor in the strand list is the primary, and the last descriptor is the terminating. Each genome in the strand can be weighted as to importance during classification or declared to be an optional genome in the strand, providing a form of occlusion invariance. The data structures used to contain the strand shown in Figure 3.7 is shown below. typedef struct topological_descriptor_element_tag { bool this_genome_is_optional_in_the_strand; double genome_feature_weight; // default: 1.0 range [0.0 to 1.0] double vector_length; double vector_angle; } topological_descriptor_element_t; typedef struct strand_genome_element_t { U64 strand_genome_ID; int strand_genome_index;
// list of genomes // each index corresponds to each strand_genome_ID
} strand_genome_element_t; typedef struct strand_tag { bool strand_completed; char strand_name[STRAND_NAME_LENGTH]; U64 file_id; int strand_genome_count; strand_genome_element_t list[STRAND_ID_LIST_LENGTH]; int strand_t type; // [ CENTROID | GLYPH] // unit distance between primary and terminating genome centroids in list double coordinates_unit_length; // unit angle between primary and terminating genome centroids in list double coordinates_unit_angle; // topological descriptions topological_descriptor_element_t primary_genome[STRAND_ID_LIST_LENGTH]; topological_descriptor_element_t terminating_genome[STRAND_ID_LIST_LENGTH]; } strand_t;
Chapter 3: Memory Model and Visual Cortex To define the strand-relative local coordinate system, the distance between the primary and terminating genome centroids is taken to be the unit length (1.0) for the vectors describing the genome topology descriptor (strand_t), and the angle between the primary and terminating descriptor is taken to be 0 degrees (unit angle 0) in a Cartesian plane. The unit length and unit angle are used to normalize all the vectors in the strand-relative coordinate space. Thus, the topological_descriptor for the strand provides strand-relative scale, rotation, and occlusion invariance for genomes within the strand. All other genomes in the strand are represented as vectors relative to the unit length and unit angle of the relative to the primary and terminating genomes. The unit length and unit angle provide for a relative vector space and coordinate system within the strand. All genome relationships within the strand for angles and distances are normalized according to unit length/angle and recorded in the strand. For redundancy, the same strand can be build two times, using different primary and terminating genome points, in case of occlusion where a primary or terminating genome is not found. Strand metrics are provided to compare strands using a variety of distance functions and criteria and are discussed in detail in Chapter 8. We provide an example next to illustrate the concepts.
Strand Learning Example Figure 3.8 shows a rendering of a “PalmTrunkStrand.strand” file created during a manual training session. (NOTE: Similar strands can be created automatically by an agent.) Each genome in the strand is colored blue, and the centroid of each genome is shown as a yellow circle with the strand list sequence number label. However, the primary genome (first genome in the list) is colored with a green genome centroid and labeled as a green P, and the last genome in the list is the terminating genome, labeled with a red T and a red centroid circle. The vectors between each genome and the primary genome are drawn in green, and the vectors between each genome and the terminating genome are drawn in red.
Strand and Bundle Models
Figure 3.8: A strand of genomes clustered at the top of a palm tree.
Below is a text description produced by the vgv tool showing the genomes in the strand from Figure 3.8 showing position, sequence in list, coordinates of centroid, and angles to primary and terminating genomes. The text description shows (1) the Euclidean length of each genome from primary to terminating, (2) the Cartesian angle of each genome between primary and terminating, and (3) the strand local coordinate system angle of each genome between primary and terminating, see Chapter 8 for details on strands as shape descriptors. Enter a descriptive name for this strand : PalmTrunkTop compute_strand_descriptor(PalmTrunkTop :: file_id 0) - list size 11 *** Highlighting strand centroids and lines *** 929 :: P,T Vector Length 262.7 329.1 Angle: 355.9 23.8 (Vector Unit Angles) 100.2 -231.8 *** 857 :: P,T Vector Length 65.2 150.1 Angle: 327.5 51.2 (Vector Unit Angles) 71.9 -204.4 *** 885 :: P,T Vector Length 148.7 210.8 Angle: 342.8 30.8 (Vector Unit Angles) 87.2 -224.8 *** 916 :: P,T Vector Length 221.9 271.9 Angle: 345.7 20.9 (Vector Unit Angles) 90.0 -234.7 *** 820 :: P,T Vector Length 38.1 133.1 Angle: 209.9 87.4 (Vector Unit Angles) -45.7 -168.2 *** 799 :: P,T Vector Length 112.3 104.1 Angle: 214.1 121.2 (Vector Unit Angles) -41.5 -134.4 *** 921 :: P,T Vector Length 261.3 284.6 Angle: 337.3 10.3 (Vector Unit Angles) 81.7 -245.3 *** 867 :: P,T Vector Length 132.8 127.6 Angle: 307.0 21.1 (Vector Unit Angles) 51.4 -234.5 *** 866 :: P,T Vector Length 133.2 125.0 Angle: 305.8 20.6 (Vector Unit Angles) 50.2 -235.0 *** 838 :: P,T Vector Length 71.3 87.5 Angle: 265.2 67.8 (Vector Unit Angles) 9.6 -187.8
The topological strand descriptor therefore provides some invariance properties that are useful during classification to increase accuracy due to variance (discussed more in Chapter 4). Some invariance properties of strands include the following: – Scale invariance: Relative vector lengths between each genome and primary/terminating genomes can be easily determined between the primary and terminating genome in the strand and then extrapolated to the rest of the strand as a scale factor.
Chapter 3: Memory Model and Visual Cortex –
– –
Rotational invariance: Relative angles between each genome and primary/terminating genomes can be determined and passed/failed based on some criteria set up in the agent. Occlusion invariance: Missing genomes are allowed. Inclusion invariance: Each genome is weighted, so inclusion in the strand can be required or optional, depending on the weight.
Strands may be devised and used together or independently by agents. For example, strands may be devised to represent learnings, as discussed in Chapter 4, such as: – Set invariance and presence: Search for the presence of any set (one of more) of strand features; for example, just look for the squirrel head, or the head and the tail. – Set topology: Verify the vector relationships among sets of genomes in the strand. – Set priority: Genomes in the strand can be weighted to prioritize importance and scoring. For example, agents may enforce that specific genomes be found or a minimum number of genomes be found to assist during classification. For example, a learning agent may create a strand associating VDNA together which have a similar color, shape, or texture. Another agent may be trained to associate a set of strands together as a bundle to represent a complete object during an interactive training session. Another agent may simply explore the genome database recorded in visual memory and look for new associations based on a particular goal, such as connected regions of a similar color or other metric, similar to a high-level segmentation algorithm.
Bundles A bundle is a set of strands. Bundle organization and applications are devised by agents. For example, bundles may represent higher-level objects, or simply collections of related strands. Bundles are modeled within global visual cortex memory (see Figure 3.3).
Visual Genome Sequencing We view VDNA sequencing as the process of encoding an image into a set of VDNA metrics for each genome or pixel region in an image. VDNA are sequenced one 2D genome pixel region at a time into the VGM. VDNA sequencing for the visual genome project is analogous to the Human Genome Project, where a human genome is inspected along a DNA chain (i.e. sequenced) one DNA pair at a time into a set of files
Visual Genome Sequencing
containing all the base DNA pairs. Researchers subsequently identify related segments (i.e. 1D segmentations) of DNA along the chain as genomes when a relationship between the genomes can be established, such as a genetic trait like hair color. The hard part in human DNA analysis is grouping DNA into genetically related segments, and likewise, the hard part in the VGM model is to group 2D pixel regions into related segmentations. To sequence the visual genomes into VDNA and build up a useful set, billions of images can be presented to the system. Then the unique genomes are identified and stored. Genomes from incoming visual impressions are first compared against the existing learned visual genomes prior to committing to memory: if there is a match between new impressions and stored impressions, then the genome need not be stored and instead can be referred to by reference (similar to a pointer or ID number), rather than storing an entirely new genome. This is the central idea behind visual genome sequencing: to catalog visual genomes by metrics. However, this goal will only be realized slowly over time, as billions of visual genomes are sequenced and recorded. Global learning takes place by first choosing sets of genomes to search for, and then agents identify all the images containing the chosen genomes. Once genomes are stored in memory, they are considered to be learned and are given a unique ID. Segmentation is the common problem between human DNA analysis and visual DNA analysis. For human DNA analysis, the segmentation is along a 1D vector to isolate genomes of related DNA. For visual genome analysis, the segmentation is a 2D region of pixels, translated into multiple image and color spaces. Once the segmentations are known, the door is open toward more fruitful testing and analysis. If the proposed segmentations are wrong, the analysis likewise goes the wrong direction. Since the VDNA feature model and VDNA sequencing is analogous to human DNA and genome sequencing, we provide some discussion here on the rationale and vocabulary used in the synthetic model. – VDNA: Independent feature metrics computed on each genome. – Genome: A region of pixels. The pixels may be a segmented region as discussed in the LGN section of Chapter 2 or a primal feature representing genetic memory features, which may be represented by transfer learning. – VDNA bases: As human DNA contains the four bases of CSTG, we serendipitously define four bases for VDNA: color C, shape S, texture T, and glyphs G (CSTG). Details on each VDNA base are provided in Chapters 6–10. – Visual genome format: the collection of all metrics from pixel region genomes, stored in a common file and data structure format, enabling the Visual Genome Project and API. See especially Chapters 5–10.
Chapter 3: Memory Model and Visual Cortex Visual Genome Format and Encodings We imagine that the visual genome format is the container used when an image is sequenced into genomes, and the genomes are further sequenced into VDNA metrics in a genome_compare structure and stored in files. Several encodings are made for each genome (as covered in detail in Chapter 5) and stored in separate structures and files including (1) the genome pixel region, (2) a pixel region mask, (3) basic metrics in the global_metrics_t structure, (4) the autolearning structure, and (5) comparison metrics created on demand by agents. Any portion of the database can be encoded at various levels of detail—for example, to reduce the data size to use only selected metrics, see Chapter 5.
Summary This chapter discusses the synthetic memory model, which includes a visual memory tightly coupled with visual processing centers (i.e. a smart memory) that implement feature metric functions within a synthetic visual cortex. The visual memory is a permanent CAM memory (Content Addressable Memory): associative, photographic, and full resolution. Nothing is compressed or thrown away. VDNA sequencing is the process of encoding an image into the VGM genome region format and creating base feature metrics. The impressions in visual memory do not need to be rescaled, incurring a loss of fine detail for VDNA analysis (i.e. no need to reduce a 4000x3000 12MP image into 300x300 pixels to fit into a DNN pipeline and throw away most of the pixel information). Visual memory can be viewed or projected into any combination of over 16,000 metric spaces derived from the CSTG bases, for consumption by agents in learning and reasoning tasks.
Chapter 4 Learning and Reasoning Agents Will Turner: Oh, so that's the reason for all the. . . . Mr. Gibbs: Reason's got nothin’ to do with it.
—Pirates of the Caribbean
Overview In this chapter, we present the architecture and rationale of the synthetic learning and reasoning agents in detail. The model provides a plausible emulation of human learning and reasoning mechanisms of the PFC executive region of the brain as thought controls using a set of agents, each specialized for a specific domain of intelligence, as shown in Figure 4.1.
Figure 4.1: The high-level consciousness model in the PFC thought control center, containing agents which control motors and vision.
DOI 10.1515/9781501505966-004
Chapter 4: Learning and Reasoning Agents The synthetic agent model follows what we know about human learning, allowing independent specialized tasks to be learned one at a time. It is more flexible than DNNs and other computer vision systems which reason using simple classifier stages employing support vector machines SVMs and statistical functions (see [1]) to make an inference for correspondence. The major parts of the agent model are concerned with PFC executive thought controls and vision controls in the simple model in Figure 4.1: – PFC (thought controls): The prefrontal cortex executive region of the brain performs high-level controls, which are not necessarily purely biological, but appear to be partially at a higher level of consciousness. However, the agents model all PFC activities, as a proxy for consciousness and real intelligence. – MC (motor controls): The body motors (muscles, etc.) are controlled by the agents, and by analogy a synthetic model could use robotics or prosthetics. – VC (vision controls): This covers vision related biological machinery, including the visual cortex, LGN, and eye models. Before reviewing the agent model details, we set the context by surveying learning and reasoning background topics in machine learning and artificial intelligence literature. We look at the following general areas: – Learning models: includes extracting features from the training set images and images under test and various ways of representing the features to meet the learning objective, which may be specific such as learning a particular face, or general, such as learning the average features of a category (such as dogs). – Training protocols: includes collecting images for a topic, creating training and test sets, and devising a training protocol for sending the images to the learning system which may include image pre-processing, randomized training image presentation order, and related topics, see [1]. – Reasoning and inference: includes analyzing features from training images or test images and performing an automated process to classify the features as (1) a specific object (i.e. a specific person’s face) or (2) a category object (some sort of face, but no one in particular).
Machine Learning and AI Background Survey Perhaps the best reference to artificial intelligence and related topics is Schmidhuber [172], which we cannot hope to duplicate. However, our goals in this section are much more modest. In order to present the synthetic learning model, we first proceed through a basic comparative survey of machine learning and AI topics with the following goals:
Machine Learning and AI Background Survey
– –
Set the context: Survey relevant machine vision methods to set the context for presenting details of the synthetic vision learning and reasoning model. Compare and contrast methods: Present the synthetic model by comparison and contrast in the context of current machine learning methods.
Since the synthetic model terminology and goals differ in some respects from conventional machine learning and AI models, we look at conventional learning models and terminology as we proceed through the survey and then introduce new terminology for the synthetic model in context. Our working definition of visual machine learning is: Visual machine learning systems take input images, then extract and organize features into models suitable for correspondence.
What is machine learning? What is artificial intelligence? What is visual learning, and how is it different than other forms of learning? Since computer vision and neuroscience are merging in many universities, interdisciplinary research for neuroscience, machine learning, computer vision, psychology, imaging, and ophthalmology are valuable disciplines to review to learn how researchers are answering questions about visual learning; so, we consult a range of sources here (see also [42]). Our survey is divided between learning models, training protocols, and reasoning and inference models.
Learning Models We review common learning models here to create a context within which to compare the VGM synthetic learning and reasoning model. Learning models are designed for a range of goals. For example, some learning models are designed for category recognition (Is this a dog? Is this an airplane?). Other models are designed for specific object recognition (Whose face is this? What type of airplane is this?). The following survey and discussion covers a range of learning models, since the VGM supports a range of models. Bengio [69] outlines a variety of research topics pointing out the importance of a good model and data representation; he refers to this under the topic of representational learning, identifying several attributes of good representations in the context of neural networks. Bengio points out that the model and data representation can help or hinder the learning. In fact, the model can force a specific type of learning in order to process the data and find correspondence. For this reason, the VGM model is careful to divide the learning model into separate parts—(1) VDNA and (2) agents—to unbind learning from any specific representation.
Chapter 4: Learning and Reasoning Agents The learning model, or representation of features and their organization, heavily influences the effectiveness of the learning system. Typical visual learning models provide the following attributes: – Extract features from image data: DNNs typically start with 3x3 or nxn regions of pixels as the feature and progressively transpose the features via convolution coupled with other functions into higher levels of abstractions as 3x3 or nxn features. Feature descriptor methods build more complex local structure, such as SIFT, FREAK, ORB [1]. This step is also called feature learning and includes some training protocol, covered in the “Training Protocols” section. – Organize features into a model: DNNs organize features into a hierarchy of unordered features, and local feature descriptors such as SIFT may organize individual features into lists or other structures of related features. Many feature organization methods are commonly used (see Table 4.1). The goal is to model features to represent concepts, such as a hierarchy, or list of related features, to allow classification of unknown features against the model. – Classify and match features: This is where the intelligence appears. Many mathematical and statistical methods are used to compare features together (see Table 4.1 below). The goal is to find similarity and differences for feature recognition, using a combination of mathematical functions and programmed heuristic logic. The end result of the various approaches is a learning model that can be trained to hold desirable features, which enables recognition of unknown features via classification within the model. Practitioners may refer to this as machine learning or artificial intelligence. Often, a single classifier is used to make the intelligent classification, such as an SVM. These are primitive models. The VGM learning model is intended to be broad, employing a common VDNA data format with over 16,000 metrics in a multivariate and multidimensional representation, using agents to model correspondence with no constraints to follow a particular learning model. In the next few sections, we look at several topics pertaining to learning models found in the literature (see [1]) including: – Priors, transfer learning, and refinement learning – Mathematical models and statistical and SVM learning – Feature descriptor learning – Supervised and unsupervised learning – Reinforcement learning
Machine Learning and AI Background Survey
Priors, Transfer Learning, and Refinement Learning We refer to learning based on priors as refinement learning, since the goal is to refine the prior assumptions into a more accurate representation. In statistics, a prior is a bias or initial assumption, such as a probability distribution, or what is believed to be true. Some computer vision and learning models are based on priors and default values. Bengio [69] makes the case that incorporating priors into machine learning models is a next step in future research toward model improvements to disentangle meaning from the feature data. Some aspects of human learning appear to be based on priors. Some statistical models and neural network models are based on the use of priors as well. We discuss a few types of learning here that are based on priors used for specializing and refining a prior model into a new model. In DNNs, weight priors are often modeled via transfer learning, where a pretrained DNN model is used as the starting point for retraining the model for a specific instance; for example, retraining a generic face model into a new model for a specific person’s face. DNNs can also be constructed as deep belief networks, which learn to reconstruct their inputs in a probabilistic manner (see Hinton et al. [64]). In biology and genetics, we find genetically encoded intelligence which is passed via DNA and then constructed in the cortex—pre-wired prior intelligence. As discussed briefly in Chapter 1 and elsewhere [43], some learnings are pre-wired and genetically encoded into dedicated learning centers in the brain; such images and visual processing capabilities are not learned personally, but passed down in the DNA. Life experiences (i.e. learnings) can in fact alter DNA which is passed on to descendants (see the research at Cold Spring Harbor Laboratory’s DNA Learning Center). Experience can also alter DNA expression; in other words, the DNA can express a wide range of traits, and experience can bias the DNA to express certain traits. Perhaps genetic images and features, such as primal features or even larger images, are genetically encoded into the visual cortex. So, there is a neurological basis for emulating priors and refinement in a synthetic learning and reasoning model. The VGM model supports refinement learning via agents and the VGM VDNA, as well as many types of priors in the feature metrics. Mathematical and Statistical Learning and SVM Researchers often use standard engineering tools to model learning problems. For example, feature clustering may be used to create groups of related features, or clustering may be used to eliminate redundant features which are very close in value. Features may be projected into new metric spaces (i.e. similar to Fourier projections of spatial features into the frequency domain) to disentangle similar features to enable easier correspondence. Various distance and similarity metrics are used for comparing feature metrics; we discuss several distance metrics in Chapters 5–10. Many
Chapter 4: Learning and Reasoning Agents computer vision and machine learning models employ heuristic and rule-based expert system models written up in software (see [1]). In Table 4.1, we collect and summarize some of the tools used in machine learning, in an intentionally compact list since these topics are covered very well in the literature and references provided. Table 4.1: Illustrating various tools and methods applied to machine learning after [1] Style
Details (for references, see Table . in the book [])
Database searching, sorting
Features are collected into a database for searching and sorting. Very primitive way to compare features
Bayesian probability learning
Naive Bayesian, randomize trees, FERNS
Semantic probabilistic learning
Latent semantic analysis (pLSA), Latent Dirichlet allocation (LDA), hidden Markov models, HMM
Semantic probabilistic learning
Latent semantic analysis (pLSA), Latent Dirichlet allocation (LDA), hidden Markov models, HMM
Kernel machines, kernel methods, SVM
Find feature relationships by projecting features into alternative metric spaces, kernel machines, various types of kernels, PCA, SVM
Neural network methods
Models linear function learning, and sequence learning
Single feature distance metrics
SSD, SAD, Cosine, EMD
Clustering features into groups based on local distance
K-nearest neighbor—clusters features based on distance using some metric, such as size, edge strength, etc.
Feature group outlier removal
RANSAC, PROSAC, Levenberg-Marquardt
Cluster polygon shapes based on centroids
K-means, Voroni tessellation, Delauney triangulation, hierarchical k-means, Nister trees
Connect clusters together
Hierarchical clustering
Combination of features from a feature group Gaussian mixture models Heuristic programming, expert systems
Learning rules and heuristics are written into SW/HW (see [])
Feature Learning, Feature Descriptors In the literature, feature learning is usually reserved to describe how a DNN learns and builds up a set of simple feature descriptors as weights in the shape of rectangular templates from training data. However, many other feature learning approaches have been taken (see [1] for a complete survey and discussion). Once features are learned, they are associated together into a model to represent a higher-level object—perhaps modeled as a list of related features. Feature learning is considered to be a form of
Machine Learning and AI Background Survey
machine learning in some circles and produces a learned basis set of features to represent higher-level objects. Some of the better local feature descriptors are based on biology and neuroscience—such as SIFT and FREAK—and are actually learned and trained from the training data. Local feature descriptors have proven to be very effective for specific applications. See [1] for a discussion and the OpenCV open source library for working code. In summary, feature learning creates a basis set of features for a given object from training images, and the features are associated together via some sort of structure, such as a weight vector, feature list or feature tree. By this definition, the VGM supports a most complex, varied, and detailed feature learning system. Supervised and Unsupervised Learning The terms supervised, semisupervised, and unsupervised learning have been in use perhaps since the 1960s to describe machine learning models and training, and these terms no longer reflect very much useful information, as the state of the art has advanced. However, these terms are still in widespread use among practitioners, so we provide a typical example of use of terms as context. – Supervised learning example: A known labeled input image of a cat is decomposed into a cat model, which can be a feature set model or a functional model. By comparing the cat model to a pretrained “Master Cat model,” an inference is made to the trained model, such as the cat model is 55% similar to the “Master Cat model.” – Unsupervised learning example: An unknown, unlabeled input image is trained into a new model, and since the new model label is unknown, random inferences to random models can be made after training. – Semisupervised learning example: Arbitrary proportions of labeled and unlabeled images are used as training samples, which is intended to produce a more accurate model. Basically, this terminology centers on whether or not the contents of an input image are known or unknown, and whether or not inference is performed during training. The VGM model is not particularly supervised or unsupervised. Volume learning supports both supervised and unsupervised learning. For example, any images can be presented in an unsupervised fashion, and the genomes will be learned regardless. In the VGM, labeling comes later under the guidance of a proxy agent, and genomes can be compared within the volume multivariate space. The image label and genome label are of no importance initially to the VGM. Reinforcement Learning Survey Reinforcement learning is a term used to describe a series of methods where one DNN is trained by trying to copy or improve upon another DNN with little to no human
Chapter 4: Learning and Reasoning Agents intervention; in other words, DNNs learning from DNNs. We will survey a few methods and some of the history in this section. NOTE: The VGM use of the term reinforcement learning is not in the context of DNNs, but rather in the context of volume learning. So the mechanics are different but some of the goals are the same. In the VGM, learning methods are often combined and may even overlap, as discussed in this chapter. Reinforcement learning injects the notion of reward and punishment into the training process—similar to dog training, where the dog receives a doggie biscuit when he performs the action correctly. For example, the trainer says “Spot, roll over,” and either: (1) Spot receives the doggie biscuit as a reward for rolling over, or else (2) Spot receives no reward (or worse, an insult from the dog trainer) if Spot fails to roll over. Reinforcement learning is one of the more interesting and popular topics in DNN research today. Much current research on reinforcement learning is within the context of DNNs, which informs this discussion. Reinforcement learning assumes two DNNs: a learner DNN and a teacher DNN. Additional reinforcement feedback control logic is also needed between the teacher and student; for example, model visibility between DNNs and the ability of the student DNN to generate test images for submission to the teacher DNN. The learner is required to iterate over time to improve the model, based on test images and feedback controls. The feedback controls can be automated between DNNs without human intervention—DNNs learning from DNNs, and DNNs teaching DNNs. One influential concept is generative adversarial networks (GANs) developed by Goodfellow [10], where a student DNN generates images intended for a target DNN, with the goal of spoofing the target DNN model to eventually be able to learn how to mimic the target DNN model. Goodfellow’s GAN method is probably one of the best known methods for reinforcement learning. Goodfellow’s research points toward a future where DNN models are routinely improved by other DNNs. See also the section “Overcoming DNN Spoofing with VGM” in Chapter 1, which surveys DNN spoofing related learning research, including GANs. Bucilla [65] developed a method similar to reinforcement learning. With this method, a DNN ensemble is first trained to label a training set, and then labeled training set output labels and images are used as input to train a single DNN to emulate the ensemble. Hinton et al. [66] developed a generalized method based on Bucilla’s model which included an objective function to tune the classifier. Romero et al. [67] develop a method using “hints” to guide the teacher/student training process. Romero’s work is based in part on Bengio’s earlier work on curriculum learning [68], which uses a progressive refinement training protocol by retraining models in series— allowing training to proceed via generalizations from simple to complex objects.
Machine Learning and AI Background Survey
Training Protocols Part of machine learning is developing a training model, or protocol, to build up a feature and higher-level object model. We consider a training protocol to be all steps in the training process, such as image selection, pre-processing, tuning, and training parameters. See [1] for more on training protocols. Common components of a training protocol include: – Training image set: Choosing the right images can be critical. For DNNs, usually initial model training requires collecting tens and hundreds of thousands of labeled images, which can be a daunting task. The VGM model is trained from a set of unique image impressions producing a set of VDNA and visual genomes. Agents may further associate image impressions together based on some criteria, such as labels (“it’s a dog”) or strands of similar VDNA features. – Training image pre-processing: For DNNs, the training set is often modified using some arbitrary combination of scaling, clipping, rotation, sharpening, and other image processing functions. In the VGM model, we present images for training that emulate the eye/LGN model as explained in Chapter 2. – Training image presentation order: During training, images can be presented to the system in several ways. For example, a DNN is trained making several iterations over the training set, which may be presented in (1) chosen batches of images, (2) in a random order, or (3) in the same order for each training iteration. Methods such as drop-out [1] can also be applied to the training set to omit different images in each training iteration. The VGM model trains using parallel trailing images which emulate the eye/LGN model (see Chapter 2). – Training parameters: Depending on the vision system, widely different training parameters are needed. Training parameters may change as the model is built, based on feedback from training results. We refer the interested reader to [1] for more bibliographic references to learn about specific training protocols and parameters. To illustrate the concept, consider a DNN training over several million iterations to converge feature weights at local minima. A momentum parameter may be used to control the size of weight adjustments, similar to a linear filter. The momentum parameter slows down the rate of model weight changes, to smooth out transient values and prevent wild oscillations (see [1]). The momentum parameter can be generated from a history buffer of weight deltas to implement a 1D smoothing filter to prevent oscillations (overshoot and undershoot) to control the speed of convergence toward desirable local minima. In the VGM, agents may implement complex interactive training protocols, similar to the way an apprentice learns from a master. The protocol includes interaction. For example, an intelligent learning agent may make an initial analysis of image genomes
Chapter 4: Learning and Reasoning Agents as directed by a master trainer (human or another agent), producing summary analysis in the form of canonical strands of similar genomes across the CSTG bases. Then, the master trainer may direct the intelligent learning agent to autogenerate code to create a classifier over specific CSTG bases, using specific metrics functions, and set up API call parameters to get the best result. The master/apprentice training process may include several iterations and testing. In summary, the VGM model allows for a wide range of training protocols once the initial genome sequencing is done, and the VDNA are stored in visual memory. The VGM training model differs from other training protocols that may be more ad hoc and heuristically tuned for best results. Many training protocols (such as DNN training protocols) support a one-shot learning model with little integration between the learning system and a human trainer, besides approving the initial training set and perhaps adjusting training parameters during training. VGM also supports continuous learning once the VDNA are in memory, which the author believes to be unique to the VGM, so that agents can go back and re-explore memory to mine new results, or refine prior results.
Reasoning and Inference Common machine learning vision systems use a classifier, such as a softmax or SVM, to compare unknown images features to learned image features, yielding an inference or prediction. There are many types of classifiers designed according to the data representation and the classification goals, so there is no standard classifier and no guidance to follow. However, see Table 4.1 earlier in this chapter for a list of common mathematical and statistical tools used to create classifiers for machine vision. The end result of classification is an inference or prediction; for example, “the unknown set of features is 80% similar to the known features of a teapot.” In the VGM, agents may implement any combination of simple classifiers and heuristic classifiers and create hierarchies and structures of classifiers, which is believed to be unique to the VGM. Agents can also leverage the visual genomes stored in strands, as well as the VDNA in visual memory, to take advantage of prior learnings.
Synthetic Learning and Reasoning Model Overview The synthetic model takes a departure from standard learning models. Instead, the synthetic model emulates the PFC executive (the center of personality) to allow for many high-level visual learning and reasoning agents to learn in various styles. There is no limit to the number of agents or the tasks which agents perform. Agents may be nested in hierarchies or ensembles. We introduce a wide range of learning topics supported within the VGM in this section.
Synthetic Learning and Reasoning Model Overview
There is no intent to limit the VGM learning styles. In addition to learning, agents may perform reasoning, similar to the way the PFC executive directs hypothesis testing to locate objects and make decisions. In Chapter 11 we provide examples of agents employing hypothesis testing using correspondence signature vectors (CSVs), explained later in this chapter, to aggregate groups of metric comparisons for classification. A set of default CSV agents are available in the API to create CSVs using adaptive, parameterized API calls (discussed in Chapter 5). Standard classifier systems using weight thresholding cannot provide much reasoning. Agents are able to perform most any high-level reasoning tasks. The synthetic vision model does not use a single classifier; instead, each agent implements one or more classifiers to model a specific task for the PFC executive. The PFC executive originates and directs visual hypothesis parameter settings—round trip tests through the visual pathway—and controls test criteria. The PFC analyzes thoughts, resolves conflicts, poses questions and looks for answers, predicts outcomes (i.e. adds bias into the parameters), holds expectations, makes decisions, and is at the center of conscious thought. Agents are a proxy model for the PFC. Hull learning is a novel method introduced in this work to define a set of optimal classifiers for each metric (see the “Autolearning Hull Threshold Learning” section later in this chapter). Hulls are a biologically plausible variation in each feature metric. Autolearning, discussed below, computes and stores hull variations for each metric in all supported metric space, such as image spaces raw, sharp, retinex, histeq, blur, color spaces, and texture spaces. Hull learning tries to learn an optimal classifier for each metric; then the hulls are further tuned via a variant of reinforcement learning, and can be associated together to build stronger classifiers. The synthetic model enables the grand goal of the VGM: to assemble a catalog of all known VDNA—a huge collection of strands of associative visual memory connecting all related VDNA in a multidimensional metric space mesh of strands and bundles acting as associative memory—and a collection of application-specific learning agents available for common use. Such a VDNA catalog of multidimensional metrics, associations, and agents is enabled by the visual genome project (see Figure 1.4). In summary, the VGM learning and reasoning model supports what is known about the PFC executive, is based on continuous learning using an associative memory, and allows for unlimited learning and reasoning agents to operate independently—in groups, in serial, and in parallel.
Conscious Proxy Agents in the PFC We model consciousness in the prefrontal cortex (PFC) in the synthetic model as a set of agents. The PFC generally acts as the executive control center of the brain as shown in Figure 4.1. The premotor cortex (PMC) and motor cortex (MC) are controlled by the PFC. The vision controls (VC) of the visual cortex are under the control of the PFC.
Chapter 4: Learning and Reasoning Agents Agents model intelligences within PFC. See the section “Agent Architecture” later in this chapter for details. As shown in Figure 4.1, the VGM separates the synthetic biology from the synthetic consciousness as follows: – Biological machinery: Includes the eye/LGN and the V1–Vn visual pathway and visual memory. – Consciousness: Proxy agents model the higher-level synthetic consciousness and thinking controls in the PFC, such as inquiry and heuristic learning. Each proxy agent is designed to solve a domain application, such as localization within the current surroundings (Where am I? What is my position? What are my surroundings?). A proxy agent manages hypothesis testing and classification. The proxy agents use the VG memory system to create and store classification structures and heuristics, which may be learned and refined over time. Any number of proxy agents may simultaneously use the VGM, and each proxy agent represents learned behavior. The conscious proxy agents attach meaning to sets of visual genomes. In fact, the same genomes may be a part of several higher-level concepts. For example, many of the same genomes composing facial features, such as the nose, can be shared among several higher-level concepts such as happy face, sad face, and fearful face—the nose is a common feature and does not change appearance under such different labels.
Volume Learning We envision volume learning as a method of learning from the multivariate feature metric space—providing a large volume of feature metrics collected during image sequencing (see Figure 3.6). The multivariate volume of features is encoded within the synthetic model automatically by the synthetic learning centers V1–Vn as discussed with multivariate features in Chapter 3. Agents may implement a wide range of learning styles using the rich feature volume. The VGM provides the basis for combining features together to create custom classifiers. The idea of using multiple features together is obvious and in widespread use. For example, Zhang et al. [31] study various combinations of local feature descriptors to increase accuracy, and many other references could be cited. However, the VGM takes the concept to the extreme and provides a large volume of features—currently over 16,000 features per genome and over 56,000 feature comparison metrics between genomes.
Synthetic Learning and Reasoning Model Overview
VGM Classifier Learning Classifier learning is a step beyond feature learning, since feature learning (see [1]) is concerned with learning a set of features from the training data. The VGM supports classifier learning to automatically select a set of optimal metrics, tune the metrics, and add conditional relationships between metrics to build a classifier. In most systems, the classifier is selected as a design decision (not learned) and then tuned during training. Common classifiers collect features and evaluate them through a simple classifier such as a SOFTMAX or SVM. Classifier learning allows for building a classifier based on learning how to select, qualify, and weight each feature metric. Classifier learning also allows for structured classifiers—such as deep sequential classifier networks, tree classifier structures, and parallel classifiers—as discussed throughout this chapter. In VGM classifier learning, feature metrics selected, learned, and tuned during training to setup the classifier by agents. Many classifiers can be learned for the same task. In some cases, the genome compare metrics do not need to be tuned, for example, the volume shape centroid 8-bit metric is accurate and useful for most classifiers (see Chapter 8). However, some of the feature metric comparisons are not useful as is, which can be observed by studying the first-order metric comparison results in the test results in Chapter 11. To build the classifier, a set of metrics must be selected, tuned, and structured to establish reliable correspondence, since a single metric may not resolve well. In the VGM, training reveals which metrics are reliable qualifiers to place at the top level of the classifier to tune other metrics. Qualifier Metrics Tuning, discussed in the next section, is one method used in the VGM to tune metric pairs for the classifier—similar to other boosting methods in the literature [1], as highlighted in the next section and throughout this chapter.
Qualifier Metrics Tuning The basic tuning approach in VGM classifier learning involves (1) learning qualifier metrics that are more trusted, (2) learning dependent metrics that are less trusted, and (3) learning a feature tuning weight: the trusted metric is used to qualify and tune the dependent metric via a feature weight. Trusted metrics and dependent metrics seem to be a normal, obvious part of the visual reasoning process. Qualifier metric tuning is similar to the ADABOOST approach (see [1], Chapter 4, especially the section on Boosting, Weighting). Qualifier metrics tuning the trusted metrics and dependent metrics can be chosen during interactive training using the MLC or can be automatically chosen by the MLC using a criterion such as best feature match metric score found during training and testing; other criteria are possible. Currently, the default CSV agents use qualifier metrics for tuning parameters and some scoring heuristics; the basic idea is shown here.
Chapter 4: Learning and Reasoning Agents
𝑄𝑄𝑚𝑚 ∶ 𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 (𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡)
𝐷𝐷𝑚𝑚 ∶ 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 (𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡)
𝑊𝑊 𝑡𝑡 ∶ 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑊𝑊𝑊𝑊𝑊𝑊𝑊𝑊ℎ𝑡𝑡 ∶ 𝑊𝑊 𝑡𝑡 = 𝑓𝑓(𝑄𝑄𝑚𝑚 )
𝑇𝑇𝑚𝑚 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀 ∶ 𝑇𝑇𝑚𝑚 = 𝐷𝐷𝑚𝑚 ∗ 𝑊𝑊 𝑡𝑡
For example, if a qualifier metric comparison score 𝑄𝑄𝑚𝑚 is at the desired score level such as 0.5, the dependent metric 𝑀𝑀𝑚𝑚 is tuned using the tuning weight 𝑊𝑊 𝑡𝑡 to correlate correspondence. The qualifier and dependent metric hierarchy can be arbitrarily deep and implements a simple boosting and prejudice method shown below. A simple two-metric example is shown here for illustration: // // BOOSTING to handle false negatives: STEP1: if_qualify, STEP2: weight_tune // If (metric_1 < 1.0) then metric_2 * BOOSTING WEIGHT_TUNER; // // PREJUDICE to handle false positives: STEP1: if_qualify, STEP2: weight_tune // If (metric1 > 1.0) then metric2 * PREJUDICE_WEIGHT_TUNER; // // Example scoring strategies // score = AVE(metric_1, metric_2); score = MIN(metric_1, metric_2); score = MAX(metric_1, metric_2);
In addition to explicit dependent metric tuning heuristic code as shown here, metrics can be qualified and tuned using weight parameters provided to MCC and GMC functions. The VGM master learning controller discussed in Chapter 5 is designed to adjust weights for each metric as needed using a simple qualification based on training session feedback; weight factors can be determined during interactive testing and also used in agents. For more details on function weighting and scoring options, see Chapter 5 in the “Group Metrics Classifier (GMC)” and “Metrics Correspondence Classifiers (MCC)” sections, and later in this chapter in the “Correspondence Permutations and Autolearning Hull Families” and “MCC Scoring Reference/Target Correspondence” sections.
Genetically Preexisting Learnings and Memory Neuroscience is finding that some learnings are encoded in the DNA; therefore, processing centers and memories are in place at birth, apparently pre-wired (according to some scientists) to perform some amazing tasks—such as the ability to walk at birth in newborn horses and other mammals, or the ability of a butterfly to take to the air
Synthetic Learning and Reasoning Model Overview
after emerging from the cocoon. In the VGM, agents model genetic learnings and memories. Some visual learning problems such as face recognition seem to require a very specialized visual processing model. In fact, face recognition seems to be a genetically designed processing center based on fMRI and other imaging modes, since face recognition operates in the same location within the visual cortex from birth. Also, many V1–Vn processing centers seem to operate in the same location of the visual cortex in all humans. Thus, specific visual learning centers seem to be genetically programmed in humans, rather than via memory impressions or training alone. Thus, the VGM decouples visual learning from visual memory, which seems to follow biological neuroscience.
Continuous Learning Agents support continuous learning, inference, and reasoning, as well as visual data mining. Since the visual image impressions are permanently stored in VDNA and visual genomes in the memory model, new and improved agents can re-explore and relearn from the memory over time, creating new strands, bundles, and associations. So, agents allow recognition and learning tasks to be specialized, refined, and ongoing. The VGM allows for learning to be continuous to process new visual information, or reprocess existing information to look for additional meaning, or reevaluate an older hypothesis. In the VGM, the order of learning is reversed and driven by proxy agent inquiry: features are learned first, meaning and labeling come later. A proxy agent is an agent of conscious inquiry, so many proxy agents are useful for finding meaning. The synthetic model first learns the primal features, or genomes, and fills the memory with visual impressions and metrics prior to attempting to learn or label anything, which is biologically plausible and experientially validated for many learned behaviors.
Associative Learning One of the most prominent features of human learning is associative memory: the ability of the PFC to locate associated features in memory; for example, using a color or texture to find similarity. The synthetic model fully supports associative visual memory (see Chapter 3). Criteria for association are encoded in the agents. For example, an agent may associate existing strands according to a topic such as “large mountains” by data mining the entire visual strand memory, using some selection criteria encoded in the agent, and assembling a bundle of related strands or VDNA.
Chapter 4: Learning and Reasoning Agents Besides associative learning and memory, the synthetic model allows for many outcomes from learning and reasoning. Examples such as data mining for pure pleasure and storing the results as fond memories in new strands and bundles are discussed later in this chapter in the section “Agent Learning and Reasoning Styles.”
Object Learning vs. Category Learning VGM is primarily designed to allow for positive identification of an object, which is a genome or strand of genomes. Category recognition is a separate problem often addressed well by DNN learning models, which typically train to identify a category of objects as measured by tests such as ImageNet [1] datasets using tens of thousands of training samples per category, each of which may or may not be optimal examples. Instead, VGM learning relies on optimal ground truth. For category recognition, VGM can learn several specific category reference objects and then extrapolate target object distance to reference objects, like cluster distance between similar objects. DNN training protocols have been developed with related ideas to reduce training set size to a just few samples (see [159]).
Agents as Dedicated Proxy Learning Centers Each agent is like a conscious entity, a dedicated learning center, designed to perform a specific purpose within the PFC executive. We refer to agents as proxy agents, since each agent is a proxy for a particular aspect of intelligence. Like a proxy voter in an election, agents are trained and then delegated the task of voting (inferencing), so we are using the word proxy to illustrate the role of the agent. The VGM model encourages many agents to be developed separately or in groups. Agents may interact with each other in learning and reasoning tasks as needed. An agent is a PFC thought controller, which may interface to motor controls and vision controls and may produce strands and bundles representing logic and learnings stored in persistent associative memory. In this way, VGM classification is a distributed, iterative process according to the design of agents, rather than expecting a single classifier or master agent to be omniscient. The learnings are persistent and shared among all agents. Agents may also store logical structures of parameters for agent operation. For example, an agent may act as a dedicated metric associator to build a mesh of strands containing related color base metrics, texture base metrics, or other metric combination associations. The memory model provides for an associative multidimensional base metric strand and bundle space. Once an associative strand structure is in place, agents may evaluate and modify the strands as well. Associative memory is an integral part of the synthetic model supporting dedicated learning centers.
Synthetic Learning and Reasoning Model Overview
The synthetic model allows for serial and parallel operation of agents, so multiple hypotheses can be evaluated together. The human visual pathway hypothesis evaluation process appears to be serial, and the round trip time from retina through the PFC has been measured to be ~800ms (see Figure 1.10). Agent architecture details are discussed later in this chapter. Next, we discuss exemplary styles of learning that are enabled by volume learning and the VGM model. There are no strict styles of learning or reasoning enforced by VGM, so we envision that a host of novel learning styles may be invented to leverage the VGM.
Agent Learning and Reasoning Styles The VGM enables a range of learning and reasoning styles, and we enumerate several examples in this section. Interactive Agent Learning We envision a common use case for learning and training to be interactive training sessions—similar to the way a teacher may guide a student—to specify objects for agents to learn. The visual genome API contains a trainer agent where an operatorteacher selects parts of an image to associate together into a strand and provides a label. Then, an agent performs final analysis on the strand to set up key metrics, which can be done interactively during training. The operator-teacher trains and tests the agent by object searches across training images. Figure 4.2 illustrates how an operator-teacher can interactively associate image genome regions of a black box together into a labeled object called “blackBox.strand.”
Chapter 4: Learning and Reasoning Agents
Figure 4.2: How interactive learning can teach an agent what a “black box” looks like. It is represented as blue genome regions collected into a strand during interactive training, using the vgv tool discussed in Chapter 5.
Autonomous Learning We apprehend that large parts of learning take place unconsciously, that the bulk of neurons are concerned with recording visual impressions in memory according to the view-based hypothesis, and that neural machinery (i.e. visual processing centers) also operate unconsciously as a reaction to visual impressions, perhaps unconsciously recording associations or flagging objects for further review. Therefore, the synthetic model can emulate unconscious learning via a canonical set of autonomous agents for default learning. The set of autonomous agents may grow and morph over time. The idea is to improve the quality of the visual genome feature metrics by starting from a maturing default set of learning agents. Artificial Labeling Imagine an agent given the task of finding all genomes matching specific criteria and then creating strands with artificial labels. Later, an expert reviews the artificially labeled strands; for example, by visually inspecting a strand. The expert discards or
Synthetic Learning and Reasoning Model Overview
edits strands as needed and labels the strands. Artificial labeling is enabled within the VGM model, where genomes and VDNA can be catalogued first without knowing the correct labels; later labels and associations can be formed according to some criteria: labels can be learned later. The agent may communicate a message: “What is this object I found?” The VGM can learn in reverse, similar to a baby who studies the environment and stores view-based impressions in memory before knowing what it all means, or how to label things. The baby may create an artificial label, or artificial word, to describe the concept, the correct name and concept are learned later. The idea behind visual genomes and VDNA is to store all new unique visual impressions first, and later use agent learnings to reinforce existing impressions. The meaning can be learned according to how a specific agent employs heuristics, such as similarity in color, shape, or texture, allowing for multiple meanings in context. After autonomous learning agents have run, a teacher can inspect what the learning agents have discovered, provide human labels, and bias the labeling via heuristics, similar to the way humans learn. Thus, the VGM provides for unsupervised, minimally supervised, and heuristic-driven learning. The proxy agent can be devised to answer a specific question to perform custom classification and data mining. Thus, VGM allows for labels to be developed artificially at first, and a human expert can come along later and provide a human intelligible label. Agent Hierarchies and Ensembles An agent may be designed to call other agents to return results, which are then fit into a learning function or classifier. The organization of agents may be hierarchical, or in a parallel ensemble as illustated in Figure 4.6, and discussed in the section “Structured Classifiers Using MCC Classifiers” later in this chapter. Agents can implement standard types of classifiers using SVM and softmax functions or more complex classifiers involving heuristics and agent ensembles. Rather than taking millions of training samples to establish a model of a class of objects as is common for DNNs, an agent may classify using only a few VDNA-based models learned from a single training sample and then generalizing to a class label for similar objects. Specific agents working together can generalize their limited training and learning into a class labeling classifier. For example, a set of agents may be devised to learn several types of guitars recorded in visual genomes and VDNA, with variations in color and shape, and then inference to any unknown guitar by extrapolating between the known genomes of guitars (i.e. an ensemble classification approach). Similar approaches have been used to speed up DNN training using a set of small training sets fed to a set of independent DNNs, and then extrapolating correspondence between the set of similarly trained DNNs, with results approaching the accuracy of using massive training sets to train a single DNN.
Chapter 4: Learning and Reasoning Agents Agent Exploratory Cataloging and Mapping Agents can be directed and guided to independently explore an image space, looking for particular sets of genomes with specific CSTG base characteristics, such as all blue objects within a certain color range and certain texture range, and creating associative strand memory structures for similar items. The cataloging and mapping creates a multidimensional associative mesh available to all agents. Maps can be created between images to record similarities across images, allowing for inventory display of all images containing the strand. Guided exploratory learning targets specific CSTG bases and VDNA characteristics, and assembles strands with a proxy label feed into other agents for further analysis and labeling. The idea is to send out agents into new images and then back across groups of known images, to catalog and map the entire visual genome space. This is a key goal in the visual genome concept. A couple variants are described here: – Self-directed mining agents examine all the genomes in the entire image and sort them into lists based on similarity metrics or based on adjacency and position metrics. An expert can later review to assign labels or reject the associations. Mining agents can be parameterized to favor only a small number of metrics (e.g. look for RGB HSV color similarity only or look for texture similarity only in color luminance). – A genome curator compares each new genome in each image with existing genomes that are already stored in memory and creates global statistics for each image in summary form revealing whether or not the image contains any genomes which are already learned and known. Agent Security Groups A critical attribute for machine vision is security and confidence, so a system can be trusted to a degree, and expected to fail classification and inference gracefully. As discussed in Chapter 1 and illustrated in Figure 1.5, DNN models may fail dramatically and incorrectly classifying an image of random pixel values with high confidence. To overcome such problems, agent security groups can be created using multiple agents together to check a wider range of base metrics instead of relying on a single metric. Classification accuracy can be made even higher by adding exception agents, as a part of the testing protocol, to reclassify difficult objects to learn a new agent to develop a model to correctly classify all false positives or false negatives detected. Agent Hunting Agent hunting involves locating a set of candidate VDNA and strands to pass to another agent for further, more careful evaluation. Hunting is different than final classification but similar to casting a wider net to find candidates. The CSV agents, as
Synthetic Learning and Reasoning Model Overview
demonstrated in Chapter 11, use predefined HUNT parameters to aggressively collect candidate features for further analysis. In a hunting scenario, one agent hunts, and another agent performs finer-grained sorting and classification. The agent model enables hunting by autonomous agents or group hierarchies. Reinforcement, Genetic, Evolutionary, Trial and Error Learning Agents are able to implement various forms of so-called genetic learning, evolutionary learning, and trial and error learning—terms used in the literature often interchangeably or used to describe fine distinctions of learning methods. We apprehend that such learning models share the same goal of looking for ways to improve or reinforce changes to a model, often with little guidance. Each approach takes different steps to achieve the goal. A common method employed in such methods is guided trial and error to learn and improve a model. Such learning models can be said to be adaptive. Below is some discussion on various approaches. Trial and error learning can be implemented in several ways, such as (1) using a slot machine approach, where the slot values are adjusted and reinforced one at a time, leaving the correct slot values alone and adjusting the incorrect values to get closer to a learning goal (i.e. an ACE flush), or (2) using range-bounded mutation operators to change function values for new model changes and testing, while preserving values which are already achieving the learning goals. Some of the so-called evolutionary learning methods are actually well-designed trial and error methods, intended to mimic the discredited theory of natural selection or evolution, which has been shown to be simply genetic expression of preexisting traits already present in the DNA, which exhibit robustness criteria for DNA expression within the human genome to adapt to an environment, rather than random DNA mutations surviving over epochs of time. As demonstrated by genetic science, random DNA mutations are resisted and rejected by the ribosome machines that synthetize protein structures from normal DNA. Mutations are interpreted as DNA damage, violating the error-correcting codes built into DNA and ribosomes. Instead, we see that growth, learning, and adaptation are forms of intelligence. Next, we highlight a few examples of trial and error learning under guiding assumptions and intelligent selection criteria. Cervode et al. [73] create a learning system by using several guided assumptions to control the trial and error and the selection. One variant of reinforcement learning developed by Wirstra [70] applied to DNNs eliminates the slow hyperparameter-throttled gradient descent method and instead employs a reinforcement approach to selecting more aggressive gradient ascent adjustment intervals to more quickly converge at local minima. Wirstra’s method maintains a list of the best candidate gradients following the natural gradient with steepest ascent along the most likely routes to convergence and continually reinforces the candidate gradient list after
Chapter 4: Learning and Reasoning Agents each gradient adjustment. This method may make gradient descent obsolete or less needed. (See also [165][167].) In another example of supervised and controlled trial and error learning, Esteban [71] develops a concept referred to as neuro-evolution to develop DNN architectures which produces fully trained models with similar accuracy to hand-made architectures. Fernando [72] creates another method for supervised and controlled trial and error learning, which uses reward criteria of reducing the number DNN weights by using semi-random mutations to the DNN architecture. The architecture is modified in trial and error stages—such as adding and removing nodes and edges to two DNNs, and then comparing the two DNNs and eliminating the worst DNN from the trial and error process. In summary, the adaptive learning models discussed under the terms reinforcement or trial and error learning require intelligence to set up and guide the learning in the desired direction. We expect this trend to continue, and look for automatic machine learning via teacher and learner machines, where ubiquitous machines are able to learn on demand.
Autolearning Hull Threshold Learning Autolearning hulls, novel to the VGM, are established to represent a biologically plausible range for each feature metric within the supported metric spaces modeled in the LGN (see Chapter 2) consisting of images (raw, sharp, retinex, histeq, blur), color spaces (RGBI, leveled spaces), and texture spaces. Hull learning establishes a family of independent classifiers for each metric. In other words, the autolearning hull establishes a family of basic correspondence thresholds for each metric in each metric space—yielding hundreds of individual autolearning hulls per metric. The autolearning hulls are used for hull learning, a method for classifier learning discussed in this chapter to enable specific invariance attributes for hull families. For each metric provided in the VGM, an autolearning hull is established surrounding each metric to use as a default first order threshold for metric comparison; second order hulls are established and tuned via training and learning. Also, a tuned heuristic function is sometimes used to establish variant autolearning hulls for specific metrics, such as the shape centroid discussed in Chapter 8. The autolearning hull is defined from the biologically plausible variations in the eye/LGN model discussed in Chapter 2. Thus, families of autolearning hulls can be established as discussed in this chapter. As discussed in Chapter 5, each VGM platform metric correspondence classifier function (MCC function) uses the autolearning hull for scoring correspondence between a reference/target pair of genomes metrics. The autolearning hull is a learned threshold similar to a weight factor computed separately for each metric, providing a
Synthetic Learning and Reasoning Model Overview
starting point for determining correspondence. In some cases, correspondence outside and close to the hull boundary is acceptable, but the goal is correspondence close to the center of the hull representing zero difference between the reference and target metric. Each genome has a specific learned autolearning hull for each of the 55,000 metrics defining a plausible range of variation acceptable to establish correspondence. Self-comparing a genome yields zero difference—a perfect match. The default autolearning hull is illustrated in Figure 4.3, which is computed using the raw, sharp, retinex, blur, and histeq spaces. Details on computing the autolearning hull and scoring option parameters is provided later in this chapter. Families of autolearning hulls are computed depending on the metric space, such as for a color space or texture space. Also, specific invariance hulls may be computed to establish correspondence under an invariance criteria such as lighting or sharpness. The default hull defines a range of values within the raw, sharp, retinex, blur, and global contrast enhanced images. See Figure 4.3.
Figure 4.3: The default autolearning hull composed around the raw image baseline using the sharp, blur, retinex, and global histeq images from the eye/LGN model.
As shown in Figure 4.3, each compare Raw → Sharp, Raw → Retinex, Raw → Blur, Raw → Histeq produces a separate hull. The final autolearning hull range is the outside boundary (i.e. largest hull distance) of all the autolearning hulls as compared to the
Chapter 4: Learning and Reasoning Agents RAW reference. The hull represents the delta between the raw image baseline and the various pre-processed LGN images (sharp, blur, retinex, histeq). As postulated here, the autolearning hull represents a biologically plausible range of values derived by comparing the RAW image to the retinex, sharp, blur and histeq image variants produced in the LGN. The autolearning hull is a threshold weight for genome comparisons. However, the hull is only a first order threshold guideline, and a few MCC functions make exceptions for deriving thresholds based on other heuristics. The following code sample illustrates how the hull range is stored separately for each RGBI color of image and for each metric, such as the color leveling metric, which expresses the difference within a color space modified for contrast and lightness to represent various lighting conditions, as explained in detail in Chapter 6. // // Set the autolearning hull for each metric separately // For (COLOR_SET_t color=SHARP,RETINEX,HISTEQ,BLUR) set hull for min, max, ave and peak : { metrics_comparison_g[image_slot].color_component[color].color_level[level].min_leveled8bit_delta = (U32)abs((int)(metrics_1_g[RAW_IMAGE].color_component[color].color_level[level].min_leveled8bit – (int)metrics_1_g[image_slot].color_component[color].color_level[level].min_leveled8bit)); metrics_comparison_g[image_slot].color_component[color].color_level[level].max_leveled8bit_delta = (U32)abs((int)(metrics_1_g[RAW_IMAGE].color_component[color].color_level[level].max_leveled8bit – (int)metrics_1_g[image_slot].color_component[color].color_level[level].max_leveled8bit)); metrics_comparison_g[image_slot].color_component[color].color_level[level].ave_leveled8bit_delta = (U32)abs((int)(metrics_1_g[RAW_IMAGE].color_component[color].color_level[level].ave_leveled8bit – (int)metrics_1_g[image_slot].color_component[color].color_level[level].ave_leveled8bit)); metrics_comparison_g[image_slot].color_component[color].color_level[level].peak_leveled8bit_delta = (U32)abs((int)(metrics_1_g[RAW_IMAGE].color_component[color].color_level[level].peak_leveled8bit – (int)metrics_1_g[image_slot].color_component[color].color_level[level].pek_leveled8bit)) }
As shown in the code above, metrics are compared straight across reference (RAW_IMAGE) → target (ALL_IMAGES), and the metrics values are stored as the comparison results in the metric_compare_struct discussed in Chapter 5. Then, the comparison result delta is compared with the autolearning hulls to check magnitude, as explained next.
Correspondence Permutations and Autolearning Hull Families Autolearning hulls allow correspondence can be made between any LGN space, so each individual genome compare can be made at least five ways, as illustrated in Figure 4.4. Comparing image metrics straight across (i.e. raw against raw) is the default method, as specified using the image parameter to each MCC function. However, for application requiring contrast and lighting invariance, correspondence directly between raw against histeq may be better. There are several different autolearning hull families and functions used in the VGM, derived and tuned specific to the metric. Families of autolearning hulls are computed based on the desired correspondence by
Synthetic Learning and Reasoning Model Overview
varying the learning criteria. Each MCC function allows for correspondence to be computed over a range of image and metric space variations, allowing agents to intelligently apply a variety of autolearning hulls such as: – Image variation hulls (default): raw, sharp, retinex, histeq, blur, … – Metric space variation hulls: color, texture, … – Invariance variation hulls: scale, lighting, sharpness, blur, …
Figure 4.4: Genome LGN image compare permutations.
An agent may perform any permutations desired using custom code, such as reference(RAW) → target(RAW, SHARP, RETINEX). For example, an agent may call each MCC function several times and switch the order of the reference and target genomes, as well as switching the parameters around, to achieve various permutations. However, to override the default behavior, a convenience function is provided for fixing the reference image to control correspondence permutations, overriding the default straight image compare behavior, as shown in the following code: // // The default reference image for MCC compare__* functions is fixed // using setComparePermutationReferenceImage() // // fix reference image to SHARP setComparePermutationReferenceImage(SHARP); // permuted compare: target to compare ALL_IMAGES to fixed reference SHARP_IMAGE double result1 = compare__centroid_delta8_xyz(1.0, ALL_IMAGES, ALL_GENOMES, ALL_COLORS, AVE_SCORE);
The setComparePermutationReferenceImage() function allows the reference image to be fixed, allowing permutations of target images to be compared to the fixed reference image metrics. Again, the default MCC function genome metric compare function behavior is to compare each image metric straight across for the reference (RAW_IMAGE) → target (IMAGE_SET), with no image permutations, as shown here: Default behavior (straight across n:n compare) : reference[RAW_IMAGE] -> target[IMAGE_SET_1)
Chapter 4: Learning and Reasoning Agents Permuted behavior (fixed 1:n compare) : reference[FIXED_IMAGE] -> target[IMAGE_SET_1)
The method of using the autolearning hull is built into the MCC functions, so all MCC genome comparisons and correspondence functions use a uniform threshold method as defined in compute_best_match() shown in abbreviated form below, illustrating the use of the default autolearning hull for scoring. inline double compute_best_match( int scoring, double compare_value, // comparison between | raw_metric1 – raw_metric2 | double sharp_hull_value, // | raw-sharp | double retinex_hull_value, // | raw – retinex | double match_weight) // a weight factor to scale the comparison, default to 1.0 for no change { // look for floating point numbers NAN, INF, and ZERO that cause computation problems if ( (isValidNumber(compare_value) == 0) || (isValidNumber(sharp_hull_value) == 0) || (isValidNumber(retinex_hull_value) == 0) || (isValidNumber(match_weight) == 0) ) { printf(" => FLOATING POINT INPUT PARAMETER FORMAT ANOMALIES :: nul final_score 0.0\n"); return 0.0; } // // Be careful to make sure that no divide by zero results are passed along // double sharp_hull_ratio = (compare_value / sharp_hull_value); if (sharp_hull_value == 0.0) { sharp_hull_ratio = 0.0; } double retinex_hull_ratio = (compare_value / retinex_hull_value); if (retinex_hull_value == 0.0) { retinex_hull_ratio = 0.0; } // this simulates a wider double relaxed_hull_value double relaxed_hull_ratio if (relaxed_hull_value ==
hull - useful for rough comparisons in some cases = (retinex_hull_value + sharp_hull_value); = (compare_value / relaxed_hull_value); 0.0) { relaxed_hull_ratio = 0.0; }
double lowest_score = (sharp_hull_ratio < retinex_hull_ratio ? sharp_hull_ratio : retinex_hull_ratio); double highest_score = (sharp_hull_ratio > retinex_hull_ratio ? sharp_hull_ratio : retinex_hull_ratio); double average_score = (sharp_hull_ratio + retinex_hull_ratio) / 2.0; if ((sharp_hull_value == 0.0) && (retinex_hull_value == 0.0)) { lowest_score = compare_value; highest_score = compare_value; average_score = compare_value; relaxed_hull_ratio = compare_value; } if ((sharp_hull_value != 0.0) && (retinex_hull_value == 0.0)) { lowest_score = sharp_hull_ratio; highest_score = sharp_hull_ratio; average_score = sharp_hull_ratio; } if ((sharp_hull_value == 0.0) && (retinex_hull_value != 0.0)) { lowest_score = retinex_hull_ratio; highest_score = retinex_hull_ratio; average_score = retinex_hull_ratio; } // // This is an explicit, obvious case - keep this one last in this list of overrides // if (compare_value == 0.0) // perfect match - no need to compare to the hulls { lowest_score = 0.0; highest_score = 0.0; average_score = 0.0;
Synthetic Learning and Reasoning Model Overview
}
relaxed_hull_ratio = 0.0;
if (
{ }
)
(isValidNumber(sharp_hull_ratio) == 0) || (isValidNumber(retinex_hull_ratio) == 0) || (isValidNumber(relaxed_hull_ratio) == 0) printf(" => FLOATING POINT INTERMEDIATE VALUE FORMAT ANOMALIES"); return 0.0;
double final_score = 0.0; switch (scoring) { case AVE_SCORE: final_score = average_score; break; case MIN_SCORE: final_score = lowest_score; break; case MAX_SCORE: final_score = highest_score; break; case SUM_SCORE: final_score = average_score; break; case RELAXED_HULL_MIN_SCORE: case RELAXED_HULL_MAX_SCORE: case RELAXED_HULL_AVE_SCORE: // this is a special case - logic is shown here - always use relaxed_hull_ratio. final_score = relaxed_hull_ratio; printf(" => relaxed final_score %f \n", final_score); break; default: printf(" ! UNEXPECTED scoring best match parameter value %u\n", scoring); break; }
}
// // Make sure we return a useful number // if ( (isValidNumber(final_score) == 0) || (final_score > 9999999.9)) { printf(" ! return final_score as 9999999.9 :: bad number %f \n", final_score); final_score = 9999999.9; } return final_score;
As shown in the code above, the scoring function, compute_best_match(), compares the difference between two metrics within the autolearning hull, which is composed of raw/sharp and raw/retinex differences. A scoring parameter is used to allow for scoring based on the max, min, average, or a relaxed hull value. The match_weight parameter can be used to effectively relax (> 1.0) or tighten up (< 1.0) the matches. The relaxed hull is the combined (retinex + sharp hull) values. The basic idea is to see if the difference between two metrics is within the hull for either the raw/sharp or raw/retinex hulls or within the relaxed, combined (retinex + sharp) hull. The compare_value parameter is taken from the straight comparison of two genomes; we are looking to see if the genomes match. Agents are free to develop any correspondence metrics; however, the default match method is essentially defined as follows (this pseudocode omits many details shown above): compare_value = |metric1 – metric2| x = compare_value / auto_learning_hull_value if (x == 0) PERFECT_MATCH if (x>= 0 && x < 1.0) MATCH threshold (application specific) if (x >=0 && x < threshold) ACCEPTABLE_MATCH *The threshold is 1.0 by default
Agents are free to implement any classification and correspondence functions and especially structured classifiers using a set of multivariate metric combinations, as
Chapter 4: Learning and Reasoning Agents discussed in the next section. In Chapter 11 we provide code examples for agents using a range of classifiers and metrics.
Hull Learning and Classifier Family Learning Classifier family learning is based upon hull learning and metric learning (in the VGM context) to build up a family of classifiers starting from the autolearning hull families discussed above (see Figure 4.5). Each classifier is further tuned via a variant of reinforcement learning, including qualifier metrics discussed earlier in this chapter, so that the classifier family provides a classifier for each invariance or robustness criteria desired.
Figure 4.5: Illustrating autolearning hulls, one hull for each metric in each metric space (image spaces, color spaces, texture spaces). Each hull is used by the MCC classifier functions and also for hull learning to learn and build a set of classifiers to achieve selected invariance criteria.
Synthetic Learning and Reasoning Model Overview
For VGM classifier learning, hull learning is used to collect a first-order set of metrics into a CSV to add into the classifier. The CSV is then further optimized using a variant of reinforcement learning to tune the weights and add/subtract metrics from the CSV. To find the first-order set of metrics and build a CSV, the set of autolearning hulls are evaluated and sorted to find the lowest hull gradients indicating metrics with a tighter range of correspondence. The smaller hull gradients indicate little variation between the metric spaces, which may indicate good candidate metrics with stronger invariance. For example, smaller hull gradients indicate little difference between the raw, sharp, retinex, histeq, and blur spaces. To further populate the CSV, training set learning involves MCC comparisons between two ground-truth genomes which match very well, and the metrics which correspond best are selected and added into the CSV. Further, the weights for each metric in the CSV may be tuned using qualifier metrics discussed elsewhere in this chapter. A few methods for selecting the first-order hull gradients for each metric across 𝑅𝑅 relative hull gradients in a fairly the metric spaces are shown here, including (1) ∇H𝑚𝑚 𝑀𝑀 tight range, and (2) ∇H𝑚𝑚 moment hull gradients in a variable range depending on the moment. See below for a simplified example using two metric spaces: (1) 𝐼𝐼 (image spaces) and (2) 𝐶𝐶 (color spaces). 𝐼𝐼 = {𝑟𝑟𝑟𝑟𝑟𝑟, 𝑠𝑠ℎ𝑎𝑎𝑎𝑎𝑎𝑎, 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟, ℎ𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖, 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏}
(𝐼𝐼 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠)
𝐶𝐶 = {𝑟𝑟, 𝑔𝑔, 𝑏𝑏, 𝑖𝑖, 𝐻𝐻𝐻𝐻𝑉𝑉𝐻𝐻 , 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙[𝑟𝑟𝑟𝑟𝑟𝑟, 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐, 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙, 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏]}
𝑚𝑚𝐼𝐼𝐼𝐼 {… } 𝑇𝑇𝐼𝐼𝐼𝐼
𝛼𝛼𝑚𝑚𝐼𝐼𝐼𝐼 {… }
( 𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐼𝐼 ⋃ 𝐶𝐶 )
(𝐶𝐶 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠)
( 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑜𝑜𝑜𝑜 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐼𝐼 ⋃ 𝐶𝐶 ) (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝑓𝑓𝑓𝑓𝑓𝑓 𝑒𝑒𝑒𝑒𝑒𝑒ℎ 𝑚𝑚𝐼𝐼𝐼𝐼 )
Relative hull gradients: autolearning hulls relative to metric:
𝛼𝛼𝑚𝑚𝐼𝐼𝐼𝐼 𝑅𝑅 𝛻𝛻𝛻𝛻𝑚𝑚 {… } ⊆ 𝑚𝑚𝑚𝑚𝑚𝑚 � � (𝑠𝑠𝑠𝑠𝑠𝑠 𝑜𝑜𝑜𝑜 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 ℎ𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢) 𝑇𝑇𝐼𝐼𝐼𝐼
Moment hull gradients: autolearning hull moments relative to metric: 𝛼𝛼𝑚𝑚𝐼𝐼𝐼𝐼 𝑀𝑀 𝛻𝛻𝛻𝛻𝑚𝑚 {… } ⊆ 𝑓𝑓 � � 𝑇𝑇𝐼𝐼𝐼𝐼
(𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ℎ𝑢𝑢𝑢𝑢𝑢𝑢 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑓𝑓())
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒: 𝑓𝑓(𝑥𝑥)𝑖𝑖𝑖𝑖 𝑎𝑎 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 (𝑚𝑚𝑚𝑚𝑚𝑚, 𝑚𝑚𝑚𝑚𝑚𝑚, 𝑎𝑎𝑎𝑎𝑎𝑎)
Training set learning: Lowest target/reference hull compares:
Chapter 4: Learning and Reasoning Agents
1 ℎ 𝜇𝜇𝑚𝑚 {… } ⊆ 𝑚𝑚𝑚𝑚𝑚𝑚 �(𝑅𝑅𝑚𝑚 − 𝑇𝑇𝑚𝑚ℎ ) � �� 𝑇𝑇𝐼𝐼𝐼𝐼
(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 − 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇)
Reinforcement learning into CSV: hull learning + training set learning: 𝑅𝑅 𝑀𝑀 ⋃ ∇H𝑚𝑚 ⋃ μ𝑚𝑚 } 𝐶𝐶𝐶𝐶𝐶𝐶{… } = { ∇H𝑚𝑚
(𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚)
In summary, classifier learning takes place after the first-order set of hull gradient metrics and training set metrics and are selected and stored in the CSV; further tuning and optimizations are made via agents using a form of reinforcement learning. Metrics are added or removed from the CSV during the training set learning process, and weights can be tuned by training overrides.
Autolearning Hull Reference/Target Differences The reference target hull is different than the target reference hull, and the MCC correspondence functions use the reference autolearning hull 𝜎𝜎𝐻𝐻𝑅𝑅 . Each autolearning hull is created by self-comparing the raw image genome feature metric to the sharp image and retinex image genome feature metrics respectively, so each genome therefore has a unique set of autolearning hulls for each feature metric. For an illustration of hull use, here is an overview of one of the common VGM autolearning hull function methods. 𝜎𝜎𝐻𝐻𝑅𝑅 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ℎ𝑢𝑢𝑢𝑢𝑢𝑢 𝑓𝑓𝑓𝑓𝑓𝑓 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒:
𝜎𝜎𝐻𝐻𝑅𝑅 = |𝑟𝑟𝑟𝑟𝑟𝑟𝑚𝑚 − 𝑠𝑠ℎ𝑎𝑎𝑎𝑎𝑎𝑎𝑚𝑚 | 𝑂𝑂𝑂𝑂 𝜎𝜎𝐻𝐻𝑅𝑅 = |𝑟𝑟𝑟𝑟𝑟𝑟𝑚𝑚 − 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑚𝑚 |,
𝑚𝑚 = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
For example, to compute the correspondence score between two metrics, one metric 𝑚𝑚 is computed for each of the two genomes: a reference metric 𝑅𝑅𝑚𝑚 and a target metric 𝑇𝑇𝑚𝑚 ; the difference |(𝑅𝑅𝑚𝑚 − 𝑇𝑇𝑚𝑚 )| is compared to the autolearning hull value 𝜎𝜎𝐻𝐻𝑅𝑅 for the reference genome, so that scores of 0.0 are a perfect match, scores < 1.0 are considered to be matches, and scores > ~2 are considered to be misses as follows: 𝑚𝑚′ =
(𝑅𝑅𝑚𝑚 − 𝑇𝑇𝑚𝑚 ) 𝜎𝜎𝐻𝐻𝑅𝑅
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒:
𝑚𝑚′ = 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠, 𝑇𝑇𝑚𝑚 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
𝑅𝑅𝑚𝑚 = 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚,
Synthetic Learning and Reasoning Model Overview
As a result, correspondence results may vary about 10% depending on which hull is used: reference hull or target hull. The default for all MCC classifiers is to use the reference hull; however, an agent is free to call each MCC function twice, switching the order of target and reference genomes and averaging the scores or using another normalization method. The MCC functions use the compute_best_match() function, which contains numeric overflow/underflow processing as well as several parameters and scoring options. When each MCC function is called, the scoring parameter can be set to achieve various results; for example, the CSV agent match__**() convenience functions which call the MCC functions may adjust the scoring parameters as follows: AVE_SCORE: average of retinex and sharp HULL MIN_SCORE: minimum of retinex and sharp HULL MAX_SCORE: maximum of retinex and sharp HULL RELAXED_HULL_SCORE: compare against a wider,
compare ratios compare ratios compare ratios relaxed_HULL = (sharp_HULL + retinex_HULL)
Structured Classifiers Using MCC Classifiers As shown in Figure 4.6, agents construct a structured classifier using a set of Metric Combination Classifier functions (MCC). The agent model allows for any type of classifier structure to be created—such as a single classifier, a tree of classifiers, or a sequence of classifiers. A top-level classifier is the last classifier in a stage of classifiers. There are over 100 MCC functions in the VGM platform for each of the CSTG base metrics. Each MCC is flexible and can take hundreds and thousands of classification parameter combinations. Each MCC uses a selected set of distance functions and metric spaces, providing over 56,000 metric comparison options. Thus, MCCs allow each agent to create complex structured classifiers, typically containing over several million parameters (see Chapter 5 for details).
Chapter 4: Learning and Reasoning Agents
Figure 4.6: This illustrates a structured classifier in a tree structure. An agent may compose a structured classifier from MCC classifiers, and each MCC is computed over selected feature comparison metrics across the CSTG bases.
VDNA Sequencing and Unique Genome IDs Each sequenced genome is assigned a unique genome ID as a 64-bit unsigned number, as discussed in Chapter 3, allowing for 18,446,744,073,709,551,615 different genomes. The intent of VDNA sequencing into a persistent visual memory is to grow the VGM to contain as many unique genomes as possible. For example, an application may sequence at 2-bit resolution or 4-bit resolution, instead of 8-bit resolution, to build up a large enough set of unique genomes to begin finding recurring genomes. The lower bit resolution reduces the number of unique genomes possible. In an 8-bit image genome space, it is unclear how often genomes will recur across images without extensive testing using very large-scale computing resources. Since the memory model can grow for as long as new images are presented and new genomes are recorded, eventually duplicate genomes will already be stored in the memory, allowing for genome IDs to be shared. As learning continues, existing genomes begin to be found more frequently in new impressions. Since each genome is pointed to by a genome ID, eventually entire images could be represented by genome ID codes representing the same concepts, resulting in a reduced memory representation but also leading to knowledge about image similarity.
Synthetic Learning and Reasoning Model Overview
A future area for visual genome and VDNA sequencing research involves sequencing 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, 7-bit, and 8-bit pixel genome impressions to find commonality and repeating genomes, as well as commonality among VDNA features and counting the number of VDNA metric impressions across images and across the entire visual memory space. Perhaps, recurring genomes at a lower resolution may prove useful as segmentation guidance metrics or for short-listing correspondence candidates for further analysis at 8-bit resolution. We refer to this as a quantization space approach to finding metrics, and it seems that the magno feature path into the LGN may operate in a reduced dimensional quantization space, due to the smaller size of cones and the transitory nature of the magno features. Also, counting occurrences of matching VDNA feature metrics sequenced in an image using a low resolution such as 4 bits, or even VDNA within range of values, can be a useful metric for image analysis.
Correspondence Signature Vectors (CSV) Agents can create a correspondence signature vector (CSV) containing any number of selected VDNA metrics. The signature vector is a convenience structure to aggregate the scores of a group of metrics together, which is useful to establish the best feature metrics to use for a particular type of genome. Agents create CSVs using the data structures discussed in Chapter 5. In addition, the default CSV agents use signature vectors during interactive reinforcement learning sessions, as discussed in Chapter 5 and Chapter 11. This section illustrates how agents use CSVs during the reinforcement learning process to record and tune the best corresponding metrics in a CSV. As shown in Figure 4.7, two genome regions of the palm tree are identified, highlighted, and sequenced into genomes and VDNA metrics, which are recorded into the CSV shown in Table 4.2. The CSV records the best scoring metrics from ten different agents, defined as different CSV agent parameterizations: NORMAL, STRICT, RELAXED, HUNT, PREJUDICE, BOOSTED, RGB_VOLUME_RAW, RGB_VOLUME_MIN, RGB_VOLUME_LBP, RGB_VOLUME_AVE. The ten agents, shown in the last column of Table 4.2, each collect a correspondence signature vector shown as a line in the table, selected from the default first order autolearning hulls, finding the best scores for nine selected feature metrics taken across all images (raw, sharp, blur, retinex, histeq) across all color components (RGBI).
Chapter 4: Learning and Reasoning Agents
Figure 4.7: Two genome regions identified for comparison. Selected shape metrics correspondence using autolearning hulls is shown in Table 4.2. Table 4.2: A correspondence signature vector set of nine selected shape metrics produced by ten different agents on the two genomes in the image of Figure 4.7 SHAPE AGENT NAME cent__ dens__ dens__ disp__
disp__
full___
full___
spread spread
MATCH CRITERIA
.
.
.
.
.
.
.
.
.
NORMAL
.
.
.
.
.
.
.
.
.
STRICT
.
.
.
.
.
.
.
.
.
RELAXED
.
.
.
.
.
.
.
.
.
HUNT
.
.
.
.
.
.
.
.
.
PREJUDICE
.
. . . . . . .
. BOOSTED
.
.
.
.
.
.
.
.
.
RGB_VOLUME_RAW
.
.
.
.
.
.
.
.
.
RGB_VOLUME_MIN
.
.
.
.
.
.
.
.
RGB_VOLUME_LBP
.
.
.
.
.
.
.
.
.
RGB_VOLUME_AVE
Note in this example that the centroid-8 bit values in column 1 cent_8 of Table 4.2, range between a high of 4.83 and a low of 0, with the lowest correspondence score of 0.0 from the RGB_VOLUME_LBP agent (row 9) using an LBP feature. So, the ensemble of ten agents has learned the best first order features and CSV agent parameterization.
Agent Architecture and Agent Types
Also note that the centroid-8 bit correspondence is measured at 0.1667 for the HUNT agent and the BOOSTED agent, illustrating how sometimes the CSV agents agree. The correspondence of 0.0 is perfect, and correspondence > 1.0 exceeds the autolearning hull range. Details on correspondence computations for each default CSV agent shown in Table 4.2 are provided in Chapter 5. Examples using other CSTG base features in a reinforcement learning agent ensemble are provided in Chapter 11.
Alignment Spaces and Invariance VGM features exist in alignment spaces, providing richer correspondence options. Alignment spaces provide a way to project features into alignment prior to correspondence, using a family of alignment criteria. For example, assume two images of the same object are taken in different lighting conditions—one dark and one bright. An alignment space can be used to first line up and remap the pixels of the target and reference regions at the respective histograms centroids to normalize lightness offset prior to correspondence, as illustrated in Figure 7.2. Alignment spaces include the image spaces (raw, sharp, retinex, histeq, blur), volume projection spaces, the color spaces (RGBI), the sliding color spaces, and color leveling spaces. For example, Chapter 7 discusses methods to emulate the eye lighting invariance methods of squinting and pupil size changes to allow for color lighting invariant comparisons by sliding color histograms within a value range for a series of comparisons. Also, volumetric projections of neural clusters, discussed in Chapter 8, can be centroid-aligned prior to comparison to emulate lighting invariance as well. Future versions of the VGM will be expanded to support additional alignment spaces appropriate for each CSTG base. For VGM training, agents can develop alignment spaces by simply creating more image spaces. For example, by permuting the image space and creating a set of images across a lightness scale for the raw image from dark to light, lightness invariance is increased, and CSV agents can be used to train across the expanded metric space, as well as locate genomes in expanded alignment image spaces. Agents can then store the alignment spaces for future reference. Conventional training protocol image permutations (rotations, scales, etc.) are intended to add invariance to the training set and resulting model, but after training the permutations are no longer a part of the model. However, the alignment spaces in the VGM are different: alignment spaces remain part of the synthetic feature model, adding levels of invariance.
Agent Architecture and Agent Types Agents do not have a specific architecture, but all are bound to use the VGM Platform API in order to access the genomes and VDNA stored in visual memory. In this way,
Chapter 4: Learning and Reasoning Agents an agent is programmable logic, which leverages visual genomes and VDNA. Human learnings are analogous to agent learnings; learnings are like recipes, or processes we hold in memory, follow, and change as we learn and improve. Thus, the value of learning style monikers, such as supervised, reinforcement, or any other moniker, is very primitive and limited, compared to human learning and reasoning capabilities as expressed in agents. A wide range of specific tasks and applications are enabled in the agent architecture; a few examples are listed here: – Retinal processing controller agent: An agent called by the sequencer controller may control the eye and LGN models by changing the image pre-processing to simulate various camera parameters, which changes the visual genomes and VDNA produced. – Training protocol controller agent: An agent called by the sequencer controller to provide specific image pre-processing to train for conditions in a specific environment, as well as produce different autolearning hulls using variations in match criteria. – Hypothesis controller agent: An agent called by the correspondence controller may implement various methods to examine and compare objects against a range of hypotheses using a specific set of agents trained using reinforcement learning and interactive training. – Environmental controls agent: An agent called by the correspondence controller can provide environmental processing to the pixel values within each feature, either at sequencing time or correspondence time, to allow for hypothesis testing of environmental conditions affecting color, lighting, and shading. For example, colorimetric and environmentally accurate pixel processing can be performed by the agent to alter color and luminance to simulate daylight and shadows for different times of the day and change pixel values for seasonal lighting, cloud cover, rain, snow, fog, noise, or haze. Also, environment-specific genomes can be computed, and classification can be repeated. Such environmental hypothesis testing is highly relevant for surveillance and military applications. Custom agents can be added to the VGM as discussed next.
Custom Agents Custom agents are developed as a C++ DLL and bound to the VGM library. Custom agents use the VGM API to read and write the visual memory, compute and compare VDNA metrics using the supplied metrics functions, create strands and bundles, and perform correspondence. Custom agents can perform special processing during the genome sequencing phase and the correspondence phase, as discussed next.
Agent Architecture and Agent Types
The sequencer controller is provided for convenience to call registered custom agents as each genome is sequenced, allowing an opportunity for special processing and recording of additional feature metrics, strands, or bundles. Sequencing is performed in a loop by the sequencer controller so each genome region is processed separately, and external agents can be called for special processing for each genome. The sequencer controller manages region segmentation as discussed in Chapter 2 in the “Processing Pipeline” discussion. Also, see Chapter 5 for more details on the sequencer controller API. The correspondence controller, discussed in detail in Chapter 5, automatically loops through each genome in the target image to look for a reference genome and calls a list of registered custom agents which may determine correspondence, create strands and bundles, highlight genomes in the image, or record state information in files for final classification. For example, the agents listed in the last column of Table 4.2 were each called by the correspondence controller for each of the two compared genomes from Figure 4.7 to generate independent correspondence metrics.
Master Learning Controller: Autogenerated C++ Agents The VGM sample code in the open source distribution provides a default master learning controller which autogenerates C++ code to create specific agents designed via an interactive training process. The MCC functions and the default CSV agents are called in the autogenerated C++ code. The interactive learning process follows a form of reinforcement learning or trial and error learning. The default CSV agents produce correspondence signature vectors selected by the trainer that guides reinforcement learning. To set the background and context for how the master learning controller works, we first review a fully nonautomated method of coding an agent to learn the best feature metrics, which involves the following steps, which can be performed interactively using the command-line tools provided with the VGM API discussed in Chapter 5: 1. Sequence the image 2. Examine the global image metrics 3. Examine the local image metrics 4. Examine the metrics comparison between selected test genomes 5. Create strands defining objects 6. Iterate the following reinforcement learning process: o Select the best metrics o Select the API calls to get the selected metrics o Tune metric parameters, create dependent metrics as needed o Retest 7. Finalize and select the best metrics, API calls, and parameter settings 8. Code the agent to make the calls 9. Use the default correspondence controller agent to test the agent in target images
Chapter 4: Learning and Reasoning Agents As discussed in the “Reinforcement Learning Survey” section earlier in this chapter, to implement a form of reinforcement learning, the master learning controller generates code to call several default CSV agents with a full range of overrides and parameterizations in an ensemble, where each CSV agent uses different correspondence methods and selected feature metrics. Then each CSV agent creates a CSV as output. The master learning agent collects and compares all the correspondence signature vectors from each CSV agent to find the best scores and then autogenerates C++ code complete with API calls with correct parameter settings to duplicate the best performing parameterized CSV agent function calls for a new custom agent.
Default CSV Agents Here we provide an introduction to the default CSV agents, which are provided as sample code with the platform API. The CSV agents aggregate metrics collection and correspondence (see Chapter 5). Each CSV agent provides the following attributes: – Agent Name: The default agent names are summarized here: o mr_smith: Computes basic metrics over separate RGB volumes and 2D genome regions. o persephone: Computes volumetric metrics over an RGB projection. See Chapter 6 for details on volumetric projections. o mr_jones: Like mr_smith, except metrics are computed over LUMA space—no color metrics. o neo: This agent computes a very wide range of metrics. – Match criteria: Used mainly for scoring bias, the match criteria controls feature metric API parameters and also weight adjustments for scaling classification results of intermediate scores and final scores, biasing the results for the desired prejudice or relaxation. – Overrides: Used mainly for algorithm and API parameter bias, the overrides provide capabilities to change the scoring algorithms by enabling specific API calls and parameter settings, which bias intermediate and final scoring. – Correspondence Signature Vectors: The CSV agents output a CSV, as discussed previously in this chapter, containing scores for selected feature. Each agent uses a MATCH_CRITERIA parameter and an AGENT_OVERRIDES parameter to set parameters for metrics API functions and to bias scoring. Each default agent is tuned to provide a unique personality. For example, MATCH_CRITERIA_NORMAL uses the AVERAGE value of selected metrics groups with no bias (weight = 1.0), and MATCH_CRITERIA_STRICT looks for the MAX, or worst scoring metrics from the available options (MIN, AVE, MAX).
Summary
Agent Ecosystem The visual genome platform allows researchers to collaborate and share agents, as well as using the default agents. An agent registry is included in the API for listing a description of each agent, registering new agents, and unregistering obsolete agents. Each agent is developed as a DLL (dynamic link library) and bound into the VGM. See Chapter 5 for details.
Summary This chapter provides an overview of the synthetic learning and reasoning model, which emulates the higher-level consciousness and learning centers in the prefrontal cortex (PFC) thought controls as a set of agents. Common machine learning models and mathematical tools are surveyed to provide background and set the context to differentiate the synthetic model learning styles, particularly volume learning, which encompasses a multivariate and multidimensional metric space with over 16,000 dimensions in the current model. Several applications of volume learning were enumerated, based on various styles of learning. The autolearning method and the autolearning hull were introduced as a first order method to model feature similarity, to automatically establish plausible baseline thresholds for comparing feature metrics during correspondence. The default agents provided in the platform were introduced, along with the concept of correspondence signature vectors (CSVs) to collect selected feature metrics for further tuning and training. The chapter also outlines how the master learning agent can use the default agents and the CSVs to perform a variant of reinforcement learning.
Chapter 5 VGM Platform Overview In the field of observation, chance favours only the prepared mind.
―Louis Pasteur
Overview This chapter provides an overview of the visual genome model (VGM) platform, including registries, controllers, functions, APIs, and data structures. The VGM platform allows for cooperative development of the visual genome project, as well as other specialized projects, and can be deployed in several configurations: – Cloud server: IoT (internet of things) devices, phones, tablets, or other remote devices can use a C++ or SOAP/REST API to access a VGM cloud server. – Devices: The entire VGM can be hosted on any suitable device with adequate compute power. A major goal for this chapter is to provide an overview of the base infrastructure of the VGM, followed by the detailed discussion each of the CSTG feature metric API and the corresponding algorithms discussed in Chapters 7–10. The following major topics are covered in this chapter: – Overviews: Feature metrics, distance functions, and invariance – VGM database and registries: Image registry, agent registry, strand registry, formats and structures – Controllers: For sequencing VDNA, launching agents, automated learning, and interactive training – Correspondence signature vector (CSV) agents, signatures, group classifiers: Base metrics, strands, bundles, correspondence signature vectors, grouping metrics, and scoring – Metric combination classifiers (MCCs): Multiclassifier structures, trees and networks, and final scoring methods – Custom agents: Developing agents as DLLs to integrate into the VGM – VGM platform infrastructure: Overview of features and functionality To illustrate the VGM platform operation, Chapter 11 provides source code examples and test results, showing how to take advantage of CSV agents, CSV signatures, and MCC functions.
DOI 10.1515/9781501505966-005
Chapter 5: VGM Platform Overview
Feature Metrics, Old and New As shown in Figure 5.1, feature metrics are computed for each genome region across the base features CSTG (see the “Base Genome Metrics” section later in this chapter for details on each metric). The result is a huge multivariate feature volume. Distance functions are provided separately for each type of metric function; some of the distance functions are common to math and statistics, and some are new and specifically developed for the VGM. Distance functions are the key to effective correspondence. As said well by Cha [73]: “The importance of finding suitable distance and similarity measures cannot be overemphasized. There is a continual demand for better ones.”
Figure 5.1: A very high-level view of the multidimensional feature space showing individual CSTG metric spaces and features. Over 16,000 individual feature metrics are computed for each genome.
This chapter provides an overview of the VGM platform feature metrics to serve as background for the more detailed discussions in Chapters 7–10 for each base CSTG
Invariance
metric. For good reading on distance and similarity metrics, Cha [72] provides a survey of 65 different distance functions grouped into families, with practical discussions. In addition, Duda et al. [74] and Deza [75] provide valuable references.
Invariance In this section we provide a general discussion on robustness and invariance provided in the VGM feature metrics. Each individual feature metric provides varying degrees of invariance and robustness. Here we provide background information Since different degrees of invariance are built into each metric, the agents may combine metrics together and add new ones rather than relying on a single metric for required invariance attributes (generally a good practice). Invariance or robustness criteria should be evaluated separately for each metric during testing. Local feature descriptors [1] such as SIFT, FREAK, and ORB are often developed with good invariance to scale, rotation, and noise. The human visual system is selectively robust to changes in lighting, contrast, color, and scale, but often less invariant over rotation, symmetry transforms, and geometric transforms. For example, upside-down or mirrored alphabetic characters are not easily recognized. Even faces, when mirrored, may be slightly different and present slight challenges. A working list and summary discussion of general invariance and robustness across the feature metrics functions is provided in Table 5.1. Table 5.1: General description of invariance and robustness metrics across the visual genome feature metrics functions. Table design after [1] Invariance Attribute
Discussions
Scale
Scale invariance is not naturally supported in the eye/LGN model, but evidently the visual cortex processing centers extrapolate into some sort of image resolution pyramid, common in computer vision applications []. Agents can emulate additional scale invariance by image downsampling on each genome region, followed by computing and comparing metrics. Upscaling can be performed by the agent as well. For color metrics, scale invariance is less critical.
Rotation
Rotational invariance is addressed for texture and shape partially by the genome edge orientations (/// degree edge detectors) which can be applied to each image easily. For color metrics, rotation invariance is less critical.
Other Affine transforms Mirroring Translation
Some support in the texture metrics and color metrics; mixed support elsewhere. Strands support most all affine transforms.
Chapter 5: VGM Platform Overview Table 5.1 (continued) Invariance Attribute
Discussions
Brightness Contrast Uneven illumination Vignette
Partially cover by the eye/LGN model which provides raw, sharp, retinx, global histeq, and blur. Also, the color-related metrics provide a wide range of color space transformations, including changes to illumination curves and illumination spaces.
Color Accuracy Variations Vignette
Color accuracy is addressed in great detail in the color metrics, which support a wide range of feature searches across color spaces and color enhancements. Also partially covered by the eye/LGN model which provides raw, sharp, retinex, global histeq, and blur.
Clutter, Occlusion, Clipping
Clutter, occlusion, and clipping may obscure parts of objects. Addressed through the use of strands of genomes, since a few genomes from the strand may be missing from an image, yet the rest of the strand can still be found.
Noise Texture Level Of Detail Sharpness, Blur, Motion Blur, Jitter, Judder
Partially cover by the eye/LGN model which provides raw, sharp, retinex, global histeq, and blur images. Also covered in texture metrics which provide several methods for examining feature level of detail.
Geometric Transforms, Warp, Radial Distortion
Not directly supported; however, some invariance is provided for color metrics since color metrtics are computed cumulatively over genome area.
Level Of Detail, Bit Depth
Level of detail is supported in the quantization space, since a resolution pyramid is provided over the image region in //// bit versions of the genome data.
Strands
Strands preserve topological relationships between sets of genomes within a strand-relative coordinate space, providing some scale, rotation, and occlusion invariance.
Visual Genomes Database In this section we provide an overview of the VGM database files and the agent file management API, following from the model concepts discussed in Chapter 2 in the “Processing Pipeline Flow” section (see Table 2.2). As shown in Figure 5.2, the VGM database includes various registries and files as follows: – Agent registry, containing all known agents. Each agent is a DLL inheriting from the C++ base agent class for full integration with the VGM. – Strand and bundle registry, containing references to all strands. Note that the strands are created relative to a reference image, and the strands reside in the directory with the reference image.
Visual Genomes Database
–
Image and genome registry. Each input image is sequenced, and the results are stored in an image-specific directory in the image registry, containing all corresponding metrics. The image registry is therefore potentially large.
Figure 5.2: The VGM database illustrating the VGM database and how individual metric sections can be selected from each genome for the neuron encoder and QoS profiles.
Global Unique File ID and Genome ID The VGM provides for global unique IDs using unsigned 64-bit numbers for file_IDs and genome_IDs. A unique file_ID is assigned when an image is added to the database, and a unique genome_ID is assigned as each new genome is created. The total number of unique images is thus 264 or 18,446,744,073,709,551,616 exabytes, which is easily large enough to contain the estimated 200-400 billion stars in the Milky Way galaxy. The unique 64-bit genome_ID is concatenated to the 64-bit file_ID to form the globally unique 128-bit genome ID (see Figure 5.3).
Chapter 5: VGM Platform Overview
Figure 5.3: Unique file IDs and genome IDs in the VGM database.
Neuron Encoder and QoS Profiles The VGM database supports neuron encoder profiles, enabling various levels of metric detail to be filtered out and extracted from the VGM format to eliminate nonessential metrics for a specific application. For example, after agents have been tested, nonessential metrics can be filtered out of the base metrics format into a custom profile for a given application. Neuron encoding is similar to the MPEG video standard, which provides quality of service (QoS) profiles supporting for variable spatial resolution (4320p .. 1080i .. 240p), variable frame rate (2.5 to 30 frames per second), and variable pixel bit depth, providing variable bandwidth video streams. In a similar manner, the VGM format can be encoded into reduced-detail formats for specific applications to save memory space and I/O bandwidth. The limitations and details of the method are available via source code license (see Chapter 12).
Agent Registry Each agent is stored in the agent registry, which contains a catalogue of all agents, DLLs for each agent, a description of their purposes, and other details provided in the agent registry management API shown later in this chapter. The agent registry provides capability for agent downloads and intelligence on demand, similar to the movie The Matrix. Using the agent registry, an agent may request groups of agents to perform some activity, communicating via mailboxes kept in files. The default CSV
Visual Genomes Database
agents provide a range of functionality, as discussed in Chapter 4, and are each listed in the following section. Built-in CSV Agents The default CSV agents, shown below, are built into the VGM and provide a wide range of convenience functionality, as discussed in this chapter and Chapter 4. // Default 433657 Mar 433657 Mar 433657 Mar 433657 Mar
agents in agent registry 9 13:05 Agent_Mr_Jones 9 13:05 Agent_Mr_Smith 9 13:05 Agent_Neo 9 13:05 Agent_Persephone
Custom Agents in the Agent Registry Custom agents are developed as DLLs and bound to the VGM and are added to the agent registry to make them available to other agents. Once the agent is added to the registry via the function add_agent_into_registry(), the agent can also be added to the sequencer controller agent callback list or to the correspondence controller agent callback list, as discussed later in this chapter. The agent registry API is shown here and illustrated in source code in Chapter 11. ///////////////////////////////////////////////////////////////////// // Global registry for all agents // // Use add_agent_into_registry() first to make the Agent visible // ///////////////////////////////////////////////////////////////////// typedef struct agent_tag { int agent_type; // SEQUENCER_AGENT | CORRESPONDENCE_AGENT | CUSTOM_AGENT U64 agent_ID; char agent_name[64]; U64 timestamp; char agent_description[1024]; char *agent_pathname; // pathname to Agent DLL } agent_t; typedef agent_t agent_registry_t[];
STATUS get_agent_registry_list( OUT agent_registry_t *agent_registry); STATUS get_agent_pathname( IN U64 agent_ID, IN agent_registry_t *agent_registry, OUT char *DLLpathname); STATUS add_agent_into_registry( int agent_type, // SEQUENCER_AGENT | CORRESPONDENCE_AGENT | CUSTOM_AGENT IN U64 agent_ID, IN agent_registry_t *agent_registry, IN char description[1024], IN char *DLLpathname); STATUS remove_agent_from_registry( IN U64 agent_ID, IN agent_registry_t *agent_registry, IN char *DLLpathname);
Each agent is built from the base class agent_DLL_base_class{} to link to the VGM library. The standard agent_DLL_base_class{} template provides separate function entry points for (1) agent-agent communications, (2) sequencer controller to agent callbacks, (3) correspondence controller to agent callbacks, and (4) master learning
Chapter 5: VGM Platform Overview controller callbacks. Once the agent is registered, other interested agents may query the agent registry to find and use the agent. However, a custom agent may be devised independent of the agent registry and access all sequenced files and metrics APIs directly. class agent_dll_base_class { ///////////////////////////////////////////////////////////////////////////////////////////// // This is the base class for creating Agents. // Agents may be registered in the sequencer controller registry, // or the correspondence controller registry, // or the custom agent registry // agent_dll_base_class(); ~agent_dll_base_class(); STATUS init(); STATUS sequencer_controller_callback( global_metrics_structure_t * metrics, // contains genome_ID, file_ID, ... int sequence_number); // NULL when sequencing finished, 1..n otherwise STATUS correspondence_controller_callback( global_metrics_structure_t * metricsRef, global_metrics_structure_t * metricsTarget, metrics_comparison_t * compare_metrics, U64 strand_ID, // NULL, or strand_ID of target genome_ID under compare int strand_item, // NULL, or the genome item in strand_ID under compare int sequence_number); // NULL when all genomes compared, 1..otherwise STATUS agent_entry( U64 caller_AgentID, void *data);
};
STATUS master_learning_controller_callback( global_metrics_structure_t * metricsRef, global_metrics_structure_t * metricsTarget, metrics_comparison_t * compare_metrics);
Image Registry A single image file, typically 24-bit RGB, is provided as input to the VGM. Noncompressed files, such as BMP or PNG files, are preferred. For each image file, a directory is created as a container to hold all the metrics and other files generated from the image during genome sequencing. The image name is used to create a top-level access point for a directory containing all other files associated with the image, in the format image_DIR/*. After invoking the sequencer via the vgv command line interface, or via the sequence_image(IN U64 *image_ID) function as discussed later in the “Sequencer Controller” section, each image directory will contain the image as well as several files generated by the sequencer controller for each genome as discussed in Chapter 2: – Image files: The source image and LGN space pre-processed images – Segmentation files: The segmentations of each local genome region in the image, saved into a bounding box rectangle image file and mask file – Metrics files: All the base metrics, autolearning metrics, stored in files
Visual Genomes Database
Each new image is added to the database using the image registry management functions shown in the following, and the image is placed inside a new directory named for the input file. For example, input image “Squirrel-0-1.png” is stored in a newly created directory called “Squirrel-0-1_DIR.” // The input image, other files … $ cd Squirrel-0-1_DIR; ls -sz 1158674 May 26 12:41 Squirrel-0-1.png … other files, metrics files, strands, … ///////////////////////////////////////////////////////////////////// // Global registry for all images ///////////////////////////////////////////////////////////////////// typedef struct image_tag { U64 char U64 char void
image_ID; agent_name[64]; timestamp; image_description[1024]; *genome_pathname;
} image_t; typedef image_t *image_registry_t; image_registry_t *global_image_registry; STATUS get_image_registry_list( OUT image_registry_t *image_registry); STATUS get_image_from_registry( IN U64 image_ID, OUT image_registry_t *image_registry); STATUS add_image_into_registry( IN U64 image_ID, OUT image_registry_t *image_registry); STATUS remove_image_from_registry( IN U64 image_ID, OUT image_registry_t *image_registry);
A set of magno and parvo images are created from the input image according to the eye/LGN model discussed in Chapter 2. Each of the images is available via the Image Registry API. $ cd Squirrel-0-1_DIR $ show_MAGNO_PARVO_files 24068797 May 26 12:42 Parvo_RGB_histeq_of_Squirrel.png-0.png 20193408 May 26 12:42 Parvo_RGB_blur_of_Squirrel.png-0.png 28774580 May 26 12:42 Parvo_RGB_sharpen_of_Squirrel.png-0.png 8174078 May 26 12:42 Parvo_RGB_luminance_of_Squirrel.png-0.png 26498764 May 26 12:42 Parvo_RGB_retinex_of_Squirrel.png-0.png 25017921 May 26 12:42 Parvo_RGB_Squirrel.png-0.png 1134300 May 26 12:42 Magno_blur_of_Squirrel.png-0.png 1117549 May 26 12:42 Magno_histeq_of_Squirrel.png-0.png 1158674 May 26 12:42 Magno_luminance_of_Squirrel.png-0.png 1220248 May 26 12:42 Magno_sharpen_of_Squirrel.png-0.png 1158674 May 26 12:42 Magno_RAW_of_Squirrel.png-0.png 1166830 May 26 12:42 Magno_retinex_of_Squirrel.png-0.png
Strand Registry Strand and bundle structures are created either by agents or by using the VGM platform training and development tools and stored in *.strand files in the image_DIR/* database. Each strand is associated with a reference image and created from genomes in the reference image. The VGM platform tools are discussed later in this chapter. Agents may create strands also. The strand and bundle model details are discussed
Chapter 5: VGM Platform Overview in Chapter 3 (for example, see Figure 3.7). Strand management functions and strand file structures are shown here: 10408 Jul 17 13:37 /Squirrel_DIR/STRAND__0000000000000000_PalmTrunkTop.strand ///////////////////////////////////////////////////////////////////// // Local registry for all strands in parent_image ///////////////////////////////////////////////////////////////////// typedef struct strand_tag { strand_t type; // [CENTROID | GLYPH] U64 parent_image_ID; U64 strand_ID; char strand_name[64]; U64 timestamp; char strand_description[1024]; void *strand_pathname; } strand_t; typedef strand_t *strand_registry_t; strand_registry_t *global_strand_registry; STATUS get_strand_registry_list( OUT strand_registry_t *strand_registry); STATUS get_strand_pathname( IN U64 strand_ID, OUT strand_registry_t *strand_registry); STATUS add_strand_into_registry( IN U64 strand_ID, OUT strand_registry_t *strand_registry); STATUS remove_strand_from_registry( IN U64 strand_ID, OUT strand_registry_t *strand_registry);
Segmenter Intermediate Files Segmentation produces some intermediate files, derived from the input image, used to produce the set of genomes and base VDNA metrics. For each image fed into the segmenter pipeline (discussed in Chapter 2), two different global image segmentations produced and stored in the database: 1. Binary segmentation: An image, such as *.png, with unique numbers assigned to the pixels in each segment, so the segmentation file will be named *MASK_100_JSLIC*. The pixels are in a binary format. 2. Text segmentation: A text image format. Text files represent pixels as an ascii number, and each pixel is separated in the file by spaces and lines delimited with . So if jSLIC and morpho segmentations are used, there will be two segmentation file stored for JSLIC and two for morpho, for a total of four segmentation files. As discussed in Chapter 2, multiple segmentations are advantageous and increase model expressiveness and robustness. Each global segmentation image is split into a set of local *.png mask files, which are the bounding box image containing each genome. Each mask file name text encodes the coordinates of the genome region within the source image, as well as a text tag indicating the type of segmentation method used (jSLIC, morpho). The back-
Visual Genomes Database
ground pixel value in the mask files is set to binary 0x00000000 marking pixels outside the segmented region (see Chapter 2, Figure 2.10). Example segmentation file names are shown here: 48000000 May 26 12:43 gray_mask_JSLIC_4000_3000__3562.raw 446153 May 26 12:43 Parvo_Mask_100_JSLIC_Parvo_RGB_histeq_of_Squirrel.png-0.png 55912283 May 26 12:43 Parvo__100_JSLIC_segmentation_histeq__Squirrel.png_textdata
Visual Genome Metrics Files Each genome region mask file is sequenced into a set of base metrics files, one file for each LGN space (raw, sharp, retinex, histeq, blur), following the model discussed in Chapter 2 in the section “Feature Metrics Generation.” Metrics files include the base metrics in the METRICS__* files, the corresponding AUTO_LEARNING__* files, and the MASTERGENOME__* files containing volumetric texture information in a quantization space pyramid of 8,5,4,3,2 bit resolution. Note that there is no compression of the images or features—they are stored at full input resolution. 272616 131880 28232 28683 46442
May May May May May
26 26 26 26 26
14:18 14:18 14:18 14:18 14:18
METRICS__0000000000000000_060e0b8100a300a3.metrics AUTO_LEARNING__0000000000000000_060e0b8100a300a3.comparisonmetrics MASTERGENOME__0000000000000000_060e0b8100a300a3_RAW_IMAGE.genome.Z MASTERGENOME__0000000000000000_060e0b8100a300a3_RETINEX_IMAGE.genome.Z MASTERGENOME__0000000000000000_060e0b8100a300a3_SHARP_IMAGE.genome.Z
Base Genome Metrics The separate CSTG base metrics are stored by the sequencer into a set of global_metrics_structures as shown below. From the base metrics, ~56,000 specific feature metrics are computed into the genome_compare_struct discussed below. The MCC feature metric distance functions can be used to compare base genomes, yielding ~56,000 distance metrics. Individual metrics are stored in the global_metrics_structure that follows and can be directly accessed by agents. However, rather than accessing metrics individually, the metrics combination classifier (MCC) functions, discussed later in this chapter, provide many convenience functions for efficiently accessing groups of selected metrics for evaluation. A pair of base metric structures are used by each MCC, as discussed later in this chapter. // // The MASTER ring that binds them all... // typedef struct global_metrics_structure { U64 file_id; char master_genome_filename[256]; char metrics_filename[256]; char auto_learning_filename[256]; char mask_filename[256]; char maskfile_png[256]; char metadata_tag[256]; U64 genome_id; long x_bounding_box;
Chapter 5: VGM Platform Overview long y_bounding_box; long dx_bounding_box; long dy_bounding_box; long empty_mask_pixel_count; long full_mask_pixel_count; double pixel_displacement; long pixel_xcentroid, pixel_ycentroid; struct tagh { U32 histogram8bit[COLOR_HISTOGRAM_BIN_COUNT]; float histogram8bit_normalized[COLOR_HISTOGRAM_BIN_COUNT]; U32 min8, max8, mean8; } colors[N_COLOR_INDEXES]; struct color_s { // These are textures Haralick_t Haralick; // 9 textures @ 4 orientations SDMX_t SDMX; // 15 textures @ 4 orientations U32 histogram_leveled8bit[COLOR_HISTOGRAM_BIN_COUNT]; float histogram_leveled8bit_normalized[COLOR_HISTOGRAM_BIN_COUNT]; U32 min_leveled8bit; U32 max_leveled8bit; U32 ave_leveled8bit; U32 peak_leveled8bit; } color_level[N_COLOR_LEVELS]; } color_component[N_COLOR_INDEXES]; struct popularity_color_level_tag { U32 popularity5[256]; float popularity5_bin_percent[256]; U32 popularity4[256]; float popularity4_bin_percent[256]; } popularity_color_level[N_COLOR_LEVELS]; struct item_tag { char genome_filename[N_GENOME_BIT_SLOTS][128]; U64 sample_count; U64 vmax[BIT_RESOLUTION]; //8-2 bits U64 empty[BIT_RESOLUTION]; //8-2 bits U64 full[BIT_RESOLUTION]; //8-2 bits U64 largest[BIT_RESOLUTION]; //8-2 bits double spread[BIT_RESOLUTION]; //8-2 bits double displacement[BIT_RESOLUTION]; //8-2 bits double density[BIT_RESOLUTION]; //8-2 bits U64 weights[BIT_RESOLUTION]; //8-2 bits coordinate3D_t volume_centroids[BIT_RESOLUTION]; //8-2 bits } item[NUMBER_OF_GENOMES][N_COLOR_INDEXES]; } global_metrics_structure_t[N_PREPROCESSED_IMAGES];
Genome Compare Scores The metric_compare_struct records the genome compare score between two sets of base metrics. Background details on scoring are provided in Chapter 4 in the “Correspondence Permutations and Autolearning Hull Families” section. To populate the metrics_comparison_struct, the difference between base metrics pairs from two genomes is taken, and the result is compared with the autolearning hull threshold and stored in the GENOMECOMPARE__* file. If the compare value exceeds the threshold (> 1.0), the match is not good; if the match is < 1.0, the match is within the autolearning hull range and considered a good match—zero difference is a perfect match. About ~56,000 individual comparison metrics are available using the MCC functions. Agents may use any
Visual Genomes Database
combination of comparison metrics to learn and build classifiers, as discussed in the section “Metric Combination Classifiers (MCCs)” later in this chapter. // // Difference of two separate global_metric_structs’s and autolearning hull // typedef struct metrics_comparison_struct { long dx_bounding_box_size_delta; long dy_bounding_box_size_delta; long empty_mask_pixel_count_delta; long full_mask_pixel_count_delta; double pixel_displacement_delta; long pixel_xcentroid_delta, pixel_ycentroid_delta; struct tagh { double color_histogram8bit_normalized_compare_SAD; double color_histogram8bit_normalized_compare_Hellinger; U32 min8_delta, max8_delta, mean8_delta; } colors[N_COLOR_INDEXES]; struct color_s { struct color_l { // These are textures Haralick_comparae_t Haralick; // 9 textures @ 4 orientations SDMX_compare_t SDMX; // 15 textures @ 4 orientations double histogram_leveled8bit_compare_SAD; double histogram_leveled8bit_compare_Hellinger; U32 min_leveled8bit_delta; U32 max_leveled8bit_delta; U32 ave_leveled8bit_delta; U32 peak_leveled8bit_delta; double double double double int int int int int
SAD_lighting_centroid[METRIC_RESOLUTION]; //8,5 bits Hellinger_lighting_centroid[METRIC_RESOLUTION]; //8,5 bits Pearson_lighting_centroid[METRIC_RESOLUTION]; //8,5 bits JensenShannon_lighting_centroid[METRIC_RESOLUTION]; //8,5 bits JensenShannon_lighting_centroid[METRIC_RESOLUTION]; //8,5 bits
double double double double int int int int
SAD_lighting_min[METRIC_RESOLUTION]; //8,5 bits Hellinger_lighting_min[METRIC_RESOLUTION]; //8,5 bits Pearson_lighting_min[METRIC_RESOLUTION]; //8,5 bits JensenShannon_lighting_min[METRIC_RESOLUTION];//8,5 bits
SAD_contrast_min[METRIC_RESOLUTION]; //8,5 bits Hellinger_contrast_min[METRIC_RESOLUTION]; //8,5 bits Pearson_contrast_min[METRIC_RESOLUTION]; //8,5 bits JensenShannon_contrast_min[METRIC_RESOLUTION]; //8,5 bits
SAD_contrast_centroid[METRIC_RESOLUTION]; //8,5 bits Hellinger_contrast_centroid[METRIC_RESOLUTION]; //8,5 bits Pearson_contrast_centroid[METRIC_RESOLUTION]; //8,5 bits JensenShannon_contrast_centroid[METRIC_RESOLUTION];//8,5 bits
} color_level[N_COLOR_LEVELS]; } color_component[N_COLOR_INDEXES]; struct popularity_color_level_tag { // METRICS: 8x5=40 double popularity5_overlap_delta; double popularity4_overlap_delta; double popularity5_overlap_bestmatch_delta; double popularity4_overlap_bestmatch_delta; double popularity5_SAD_standard_colors_delta; double popularity4_SAD_standard_colors_delta; double popularity5_HAMMING_standard_colors_delta; double popularity4_HAMMING_standard_colors_delta; double popularity5_overlap_proportional_delta; double popularity4_overlap_proportional_delta; double popularity5_bestmatch_SAD_proportional_delta; double popularity4_bestmatch_SAD_proportional_delta; } popularity_color_level[N_COLOR_LEVELS]; struct tag { char genome_filename[N_GENOME_BIT_SLOTS][128]; // one slot per filename double SAD_genome_correlation[N_GENOME_BIT_SLOTS]; double IntersectionSAD_genome_correlation[N_GENOME_BIT_SLOTS];
Chapter 5: VGM Platform Overview double double double double double double double double double double double double double double double double double double double double
SSD_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionSSD_genome_correlation[N_GENOME_BIT_SLOTS]; Hellinger_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionHellinger_genome_correlation[N_GENOME_BIT_SLOTS]; Hamming_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionHamming_genome_correlation[N_GENOME_BIT_SLOTS]; pyramid_Chebychev_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionDivergences_genome_correlation[N_GENOME_BIT_SLOTS]; Outliermagnitude_genome_correlation[N_GENOME_BIT_SLOTS]; Outlierratio_genome_correlation[N_GENOME_BIT_SLOTS]; Cosine_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionCosine_genome_correlation[N_GENOME_BIT_SLOTS]; Jaccard_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionJaccard_genome_correlation[N_GENOME_BIT_SLOTS]; Fidelity_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionFidelity_genome_correlation[N_GENOME_BIT_SLOTS]; orensen_genome_correlation[N_GENOME_BIT_SLOTS]; IntersectionSorensen_genome_correlation[N_GENOME_BIT_SLOTS]; Canberra_genome_correlation[N_GENOME_BIT_SLOTS]; Intersection Canberra_genome_correlation[N_GENOME_BIT_SLOTS];
U64 sample_count_delta; U64 vmax_delta[BIT_RESOLUTION]; //8-2 bits U64 empty_delta[BIT_RESOLUTION]; //8-2 bits U64 full_delta[BIT_RESOLUTION]; //8-2 bits U64 largest_delta[BIT_RESOLUTION]; //8-2 bits double spread_delta[BIT_RESOLUTION]; //8-2 bits double displacement_delta[BIT_RESOLUTION]; //8-2 bits double density_delta[BIT_RESOLUTION]; //8-2 bits U64 weight_delta[BIT_RESOLUTION]; //8-2 bits coordinate3D_t centroid_delta[BIT_RESOLUTION]; //8-2 bits } item[NUMBER_OF_GENOMES][N_COLOR_INDEXES]; } metrics_comparison_t[N_PREPROCESSED_IMAGES];
Agent Management Agents may be called at various phases of the synthetic visual pathway including the sequencer phase, correspondence phase, and the learning phase, discussed in the following sections. Agents register themselves into the various registries as callback functions using the VGM APIs discussed in this chapter, or else agents operate cooperatively. See Figure 5.4.
Agent Management
Figure 5.4: The agent registry and the interface to the segmentation controller and the correspondence controller.
Sequencer Controller To start the sequencer, agents use the function run_sequencer_controller(IN U64 *image_ID). After each region is segmented into genome regions, the sequencer controller optionally calls registered agents to process each genome after recording the base genome metrics in global_metrics_structure_t. Agents may then perform any action, such as reprocessing genomes, recomposing metrics, or creating and modifying strands. The sequencing phase is where the input image is segmented to feed into the visual processing centers V1–Vn of the memory model. As discussed in Chapter 2, the sequencer controls the magno and parvo feature impression recording stage. Once the agent is in the registry via the function add_agent_into_registry(), the agent can be used by the sequencer controller. The sequencer controller API is shown here: ///////////////////////////////////////////////////////////////////// // Sequencer Controller Agent Registry // // Use add_agent_into_registry() first to make the agent visible // ///////////////////////////////////////////////////////////////////// STATUS get_sequencer_controller_registry_list( OUT U64 *sequencer_controllers); // Agent ID list STATUS add_agent_to_sequencer_registry(
Chapter 5: VGM Platform Overview int priority, IN U64 agent_ID); STATUS remove_agent_from_sequencer_registry( IN U64 agent_ID); // // Call the sequencer controller, compute genomes and metrics for image // Callback: agent_dll_base_class.sequencer_controller_callback() // STATUS run_sequencer_controller( IN U64 image_ID);
Correspondence Controller The run_correspondence_controller() function, shown in the upper right of Figure 5.4, searches a target image, genome by genome, for a reference genome or strand. The correspondence controller calls the genome compare controller function vgc_genomecompare(), discussed later in this chapter, to create a metrics_comparison_t structure and a GENOMECOMPARE__* file. Each genome in the target image is compared against the reference genome one at a time. The correspondence controller calls each registered agent to provide any desired genome compare postprocessing. The registered agents are called in order if there are several registered agents, based on the priority parameter in the add_correspondence_Agent_callback() function. The basic correspondence score is computed within the autolearning hull for the reference genome metric, and then stored in the struct metrics_comparison_t, in a GENOMECOMPARE__* file. Agents called by the correspondence controller are unrestricted and may reprocess genomes, call other agents, recompute metrics, create and modify strands, as well as modify and restore the struct metrics_comparison_t and the GENOMECOMPARE__* files. Agents are free to develop a means of interagent communications based on some mechanism, such as leaving status info in mailbox files. The agent callback mechanism passes the genome_ID of the current target genome under comparison to the called agent along with the reference genome_ID. The correspondence controller is not required for agents that directly access the sequenced genome files. Also, agents may call the function vgc_genomecompare() directly and implement a custom correspondence controller, as discussed later in this chapter. The correspondence controller API is shown here: ///////////////////////////////////////////////////////////////////// // Correspondence Controller Agent Registry // // Use add_agent_into_registry() first to make the agent visible // ///////////////////////////////////////////////////////////////////// STATUS get_correspondence_controller_registry_list( OUT U64 *correspondence_controllers); // Agent ID list STATUS add_agent_to_correspondence_registry( int priority, IN U64 agent_ID); STATUS remove_agent_from_correspondence_registry(
Agent Management
IN U64 agent_ID); // // Call the correspondence controller, look for reference genome|strand in target image // Callback: agent_dll_base_class.sequencer_controller_callback() // STATUS run_correspondence_controller( IN int search_flag, // [ SEARCH_FOR_GENOME | SEARCH_FOR_STRAND ] image_registry_t registry, IN U64 target_image_ID, // image to search within IN U64 reference_image_ID, // image where reference genome or strand is based IN U64 reference_strand_ID, // reference strand ID. NULL if using a single genome IN U64 reference_genome_ID // reference genome ID, NULL is using a strand );
As discussed in Chapter 3, VGM model correspondence is performed in the visual cortex model V1–Vn between two genomes using the MCC functions. Proxy agents perform final classification. On a practical note: a good baseline test is an identity test comparing a genome against itself in the same image to confirm the perfect correspondence baseline scores of 0.0 (i.e. no difference between genomes), as demonstrated in Chapter 11. Pseudocode for the correspondence controller logic is shown here: compare_genome(reference_genome, target_image) {
}
// search each genome in the target image for (target_id=0, target_id < number_of_genomes_in_target_image; target_id++) { if (GENOME_SEARCH) { compare_genomes(reference_genome_ID, target_id); save_compare_score(&metrics_comparison_struct); call_registered_agents( (PF*)agent_cb(), target_image_ID, NULL, // this genome is not in a strand target_genome_ID, metrics_comparison); } else (for STRAND_SEARCH) { compare_genomes(reference_genome_ID, EACH GENOME IN STRAND); save_compare_score(&metrics_comparison_struct); call_registered_agents( (PF*)agent_cb(), target_image_ID, target_strand_ID, // strand_ID containing the target genome target_genome_ID, // target genome in strand metrics_comparson); } }
The agents are expected to classify all the comparison scores to find the best matches for single genomes and strands in a top-level classifier. Within the correspondence callback mechanism, an agent may implement several types of classifiers to evaluate the results of genome comparisons: – A custom top-level classifier to compare selected groups of metrics to develop a final score, using any logic and heuristics. – A low-level classifier may be implemented using standard MCC functions for selected CSTG metrics. A summary of MCC functions is provided in Table 5.2 in the “Metric Combination Classifier (MCC) Summary” section.
Chapter 5: VGM Platform Overview –
A set of high-level CSV group metric classifiers, discussed below in this chapter, can be used to aggregate several low-level MCCs in a variety of network structures, such as hierarchical classifier trees, network classifiers, and a parallel classifier ensembles.
All of these classifier styles, including CSV and MCC, are discussed in more detail throughout later sections of this chapter.
Master Learning Controller (MLC) The master learning controller (MLC) is used by a human teacher during interactive training sessions using the platform tools discussed at the end of this chapter, to implement an interactive form of reinforcement learning. The MLC generates C++ code to implement the learnings, using the default CSV agents and signature vectors. The MLC uses the most trusted metrics to qualify dependent metrics, as discussed in Chapter 4. For example, the volume centroid metric is usually reliable, and if the centroid agrees with the teacher, the centroid becomes of the qualifier metrics and causes a weight factor to be applied to dependent metrics, which are less reliable features. The MLC takes input from an image which has already been sequenced, and the operator then interactively creates strands to define objects. The predefined CSV agents are each called to score the correspondence into separate CSVs. The MLC selects the best metrics from the CSVs automatically. The operator can override the chosen signature metrics and adjust weights. The MLC generates C++ code to implement the trained agent, which can be executed and reinforced by subsequent training sessions. In summary, the MLC automates simple agent training and code generation. The API is shown here: ///////////////////////////////////////////////////////////////////// // Master Learning Controller Agent Registry ///////////////////////////////////////////////////////////////////// STATUS get_masterLearning_controller_registry_list( OUT U64 *masterLearning_controllers); // Agent ID's from add_agent_into_registry() STATUS add_agent_to_masterLearning_registry( IN U64 *agent_ID); STATUS remove_agent_from_masterLearning_registry( IN U64 *agent_ID); // // callback agent_DLL_base_class.master_learning_controller_callback() //
CSV Agents The correspondence signature vector (CSV) agents are predefined default agents that rely on CSVs to classify genome matches in the CSTG bases color, shape, texture, and glyph. The CSV agents use the high-level group metrics classifiers to produce CSVs
CSV Agents
containing the strength of each selected VDNA metric match score, as discussed in Chapter 4 in the “Correspondence Signature Vectors (CSVs)” section. The CSV agent code provides a good starting point for developing custom agents. The CSV agents are defined in a table-driven manner from standard parameters for match criteria and overrides, allowing each CSV agent to be customized to reach correspondence goals, as defined in the predefined_agents_g[] data structure that follows. // // Default CSV Agents // typedef struct agent_configuration_tag { string agent_name; MATCH_CRITERIA match_criteria; AGENT_OVERRIDES overrides; } agent_configuration_t predefined_agents_g[] = { // Agent name MATCH_CRITERIA // mr_smith uses separate RGB volumes {"mr_smith_normal", {"mr_smith_optimistic_shape_color_texture", {"mr_smith_optimistic_color", {"mr_smith_optimistic_shape", {"mr_smith_optimistic_texture", {"mr_smith_strict", {"mr_smith_relaxed", {"mr_smith_hunt_for_best", {"mr_smith_prejudice", {"mr_smith_rotation_invariant", {"mr_smith_contrast_invariant", {"mr_smith_lighting_invariant", {"mr_smith_critical", {"mr_smith_favor_majority",
MATCH_NORMAL, MATCH_NORMAL, MATCH_NORMAL, MATCH_NORMAL, MATCH_NORMAL, MATCH_STRICT, MATCH_RELAXED, MATCH_HUNT, MATCH_PREJUDICE, MATCH_NORMAL, MATCH_NORMAL, MATCH_NORMAL, MATCH_NORMAL, MATCH_NORMAL,
AGENT_OVERRIDES
AGENT_RETRY }, AGENT_FAVOR_COLOR_SHAPE_TEXTURE }, AGENT_FAVOR_COLOR }, AGENT_FAVOR_SHAPE }, AGENT_FAVOR_TEXTURE }, NO_OVERRIDES }, NO_OVERRIDES }, NO_OVERRIDES }, NO_OVERRIDES }, AGENT_ROTATION_INVARIANT }, AGENT_CONTRAST_INVARIANT }, AGENT_LIGHTING_INVARIANT }, AGENT_CRITICAL_TOLERANCE }, AGENT_FAVOR_MAJORITY },
// persephone uses a single volumetric projection of RGB on each axis {"persephone_normal", MATCH_RGB_VOLUME_RAW, AGENT_RETRY }, {"persephone_min", MATCH_RGB_VOLUME_MIN, NO_OVERRIDES }, {"persephone_LBP", MATCH_RGB_VOLUME_LBP, NO_OVERRIDES }, {"persephone_ave_BLUR", MATCH_RGB_VOLUME_AVE, NO_OVERRIDES }, {"persephone_favor_shape", MATCH_RGB_VOLUME_RAW, AGENT_FAVOR_SHAPE }, MATCH_RGB_VOLUME_RAW, AGENT_FAVOR_COLOR }, {"persephone_favor_color", {"persephone_favor_texture", MATCH_RGB_VOLUME_RAW, AGENT_FAVOR_TEXTURE }, MATCH_RGB_VOLUME_RAW, AGENT_FAVOR_COLOR_SHAPE_TEXTURE }, {"persephone_favor_shape_color_texture", {"persephone_hunt_for_best", MATCH_RGB_VOLUME_RAW, NO_OVERRIDES }, {"persephone_critical", MATCH_RGB_VOLUME_RAW, AGENT_CRITICAL_TOLERANCE }, {"persephone_favor_majority", MATCH_RGB_VOLUME_RAW, AGENT_FAVOR_MAJORITY }, // mr_jones uses LUMA gray scale volumes, otherwise similar to mr_smith {"mr_jones_normal", MATCH_NORMAL, AGENT_LUMA + AGENT_RETRY }, {"mr_jones_optimistic_shape_color_texture", MATCH_NORMAL, AGENT_FAVOR_COLOR_SHAPE_TEXTURE }, {"mr_jones_optimistic_color", MATCH_NORMAL, AGENT_LUMA + AGENT_FAVOR_COLOR }, MATCH_NORMAL, AGENT_LUMA + AGENT_FAVOR_SHAPE }, {"mr_jones_optimistic_shape", {"mr_jones_optimistic_texture", MATCH_NORMAL, AGENT_LUMA + AGENT_FAVOR_TEXTURE }, MATCH_STRICT, AGENT_LUMA }, {"mr_jones_strict", {"mr_jones_relaxed", MATCH_RELAXED, AGENT_LUMA }, {"mr_jones_hunt_for_best", MATCH_HUNT, AGENT_LUMA }, {"mr_jones_prejudice", MATCH_PREJUDICE, AGENT_LUMA }, {"mr_jones_rotation_invariant", MATCH_NORMAL, AGENT_LUMA + AGENT_ROTATION_INVARIANT {"mr_jones_contrast_invariant", MATCH_NORMAL, AGENT_LUMA + AGENT_CONTRAST_INVARIANT {"mr_jones_lighting_invariant", MATCH_NORMAL, AGENT_LUMA + AGENT_LIGHTING_INVARIANT {"mr_jones_critical", MATCH_NORMAL, AGENT_LUMA + AGENT_CRITICAL_TOLERANCE {"mr_jones_favor_majority", MATCH_NORMAL, AGENT_LUMA + AGENT_FAVOR_MAJORITY },
}, }, }, },
// neo is the most powerful Agent- with all the options {"neo_normal", MATCH__COLOR_NORMAL___SHAPE_NORMAL___TEXTURE_NORMAL, 0 }, {"neo_volumetric", MATCH__COLOR_NORMAL___SHAPE_RGB_VOLUME_RAW___TEXTURE_RGB_VOLUME_RAW, 0 }, {"neo_hyperspace", AGENT_HYPERSPACE, 0 }, };
{"",
0,
0
// // Scoring and weighting bias criteria, only one criteria is used perAgent // enum MATCH_CRITERIA {
MATCH_NORMAL, MATCH_STRICT, MATCH_RELAXED,
}
Chapter 5: VGM Platform Overview MATCH_HUNT, MATCH_PREJUDICE, MATCH_BOOSTED, MATCH_RGB_VOLUME_RAW, MATCH_RGB_VOLUME_MIN, MATCH_RGB_VOLUME_LBP, MATCH_RGB_VOLUME_AVE,
// these 'convenience values' f must be bit-unpacked later
MATCH__COLOR_NORMAL___SHAPE_NORMAL___TEXTURE_NORMAL, MATCH__COLOR_NORMAL___SHAPE_RGB_VOLUME_RAW___TEXTURE_RGB_VOLUME_RAW, };
MATCH_CRITERIA_END
// // Algorithm parameter bias overrides, several overrides may be used together (bit OR’d) // enum AGENT_OVERRIDES { NO_OVERRIDES = 0, AGENT_NORMAL = 0x00000001, AGENT_STRICT = 0x00000002, AGENT_RELAXED = 0x00000004, AGENT_HUNT = 0x00000008, AGENT_PREJUDICE = 0x00000010,
AGENT_LIGHTING_INVARIANT AGENT_CONTRAST_INVARIANT AGENT_SCALE_INVARIANT AGENT_ROTATION_INVARIANT
= = = =
AGENT_CASUAL_TOLERANCE AGENT_CRITICAL_TOLERANCE
= 0x00000200, = 0x00000400,
AGENT_FAVOR_COLOR AGENT_FAVOR_SHAPE AGENT_FAVOR_TEXTURE AGENT_FAVOR_CONSTRAINT_PRESENCE AGENT_FAVOR_CONSTRAINT_PRIORITY AGENT_FAVOR_CONSTRAINT_VECTOR_SPACE AGENT_FAVOR_MAJORITY AGENT_FAVOR_COLOR_SHAPE_TEXTURE
= = = = = = = =
0x00000020, 0x00000040, 0x00000080, 0x00000100,
0x00000800, 0x00001000, 0x00002000, 0x00004000, 0x00008000, 0x00010000, 0x00020000, 0x00040000,
// The below options are special cases, and can be boolean 'OR'd with the above overrides = 0x10000000, = 0x80000000,
AGENT_RETRY AGENT_LUMA
AGENT_HYPERSPAC
}; enum GLYPH_TYPE {
};
= 0xffffffff // use all overrides
COLOR__HUE_SATURATION_SIFT, //RGB Genome Image RGB_SURF, //RGB Genome Image COMPONENT_FREAK, //RGBI ABCD COMPONENT_ORB, //RGBI ABCD RGB_DNN, //RGB Genome Image GLYPH_TYPE_END
Correspondence Signature Vectors (CSVs) As we saw in Chapter 4, CSV records a group of selected metric scores (see the example in Table 4.2). The default CSV agents use CSVs to record only the best scores from a selected group of metrics, comparing all metrics across the selected feature spaces defined in the CSV function parameters as discussed in the next section “Group Metric Classifiers.” The CSVs can be used to establish or learn the best metrics and parameters for a given feature metric correspondence case. Details on the CSV are provided in the data structures that follow (see also Chapter 11). typedef struct texture_signature_tag { char agent_name[256]; U32 agent_overrides; U32 match_criteria;
Group Metric Classifiers (GMCs) int texture_items; int genome_type[N_TEXTURE_ITEMS]; // int genome_volume_index[N_TEXTURE_ITEMS]; // double double double double
genomeA_score[N_TEXTURE_ITEMS]; genomeB_score[N_TEXTURE_ITEMS]; genomeC_score[N_TEXTURE_ITEMS]; genomeD_score[N_TEXTURE_ITEMS];
// // // //
metric_1 metric_1 metric_1 metric_1
.. .. .. ..
metric_n metric_n metric_n metric_n
-
for for for for
a a a a
given given given given
T T T T
base base base base
} texture_signature_t; typedef texture_signature_t *texture_signature_list; typedef struct shape_signature_tag { char agent_name[256]; U32 agent_overrides; U32 match_criteria; double score[N_SHAPE_ITEMS]; // metric_1 .. metric_n – for a given S base } shape_signature_t; typedef shape_signature_t *shape_signature_list; typedef struct color_signature_tag { char agent_name[256]; U32 agent_overrides; U32 match_criteria; double score[N_COLOR_ITEMS]; // metric_1 .. metric_n – for a given C base } color_signature_t; typedef color_signature_t *color_signature_list;
Group Metric Classifiers (GMCs) Each CSV agent uses a predefined set of group metric classifiers (GMCs), which are predesigned and tested to aggregate classification using groups of selected VDNA CSTG feature metrics via the low-level MCC functions, discussed later in this chapter. The GMCs record a list of all the scores found and save the best score for each metric in a CSV (see the “MCC Best Metric Search” section later in this chapter for details). The GMCs incorporate heuristic logic and all the CSV agents parameterized to take overrides and match criteria parameters to tune and influence the learning behavior. The predefined group metrics functions group are: // Match on genome shape factors AGENT_SCORE_RESULT match__shape( MATCH_CRITERIA criteria, AGENT_OVERRIDES agent_parameter, double weight_override, // < 1.0 for stricter match, > 1.0 for relaxed match double *match_strength // Return value: the strength of the match ); // Match on color space metrics AGENT_SCORE_RESULT match__color( MATCH_CRITERIA criteria, AGENT_OVERRIDES agent_parameter, double weight_override, // < 1.0 for stricter match, > 1.0 for relaxed match double *match_strength // Return value: the strength of the match ); // Match on volume RGBI textures as neuron clusters AGENT_SCORE_RESULT match__volume_texture( MATCH_CRITERIA criteria, AGENT_OVERRIDES agent_parameter, double weight_override, // < 1.0 for stricter match, > 1.0 for relaxed match double *match_strength // Return value: the strength of the match );
Chapter 5: VGM Platform Overview
// Match on 2d Haralick features AGENT_SCORE_RESULT match__Haralick_texture( MATCH_CRITERIA criteria, AGENT_OVERRIDES agent_parameter, double weight_override, // < 1.0 for stricter match, > 1.0 for relaxed match double *match_strength // Return value: the strength of the match ); // Match on 2d SDMX features AGENT_SCORE_RESULT match__SDMX_texture( MATCH_CRITERIA criteria, AGENT_OVERRIDES agent_parameter, double weight_override, // < 1.0 for stricter match, > 1.0 for relaxed match double *match_strength // Return value: the strength of the match ); // Match on interest point / descriptors (best 3) within genome AGENT_SCORE_RESULT match__glyph( MATCH_CRITERIA criteria, GLYPH_TYPE glyphs, // ID of glyph to match double weight_override, // < 1.0 for stricter match, > 1.0 for relaxed match double *match_strength // Return value: the strength of the match ); // Match on DNN model scores over genome AGENT_SCORE_RESULT match__DNN( DNN_MODEL dnn_model, // ID of DNN model double weight_override, // < 1.0 for stricter match, > 1.0 for relaxed match double *match_strength // Return value: the strength of the match );
Each of the match__* functions takes the following parameters: – criteria—See the MATCH_CRITERIA definitions above – agent_overrides—See the AGENT_OVERRIDS structure above – weight_override—Default is 1.0, used to bias final score. – *match_strength—on return, contains the actual score: 0.0 = perfect match, < 1.0 probable match, 99999.0 = way bad – *signature—on return, contains the CSV (see Table 4.2 earlier in the chapter along with the discussion on correspondence signature vectors). Each function match__shape(), match__color(), and match__texture() returns an AGENT_SCORE_RESULT: if (score gis segtype resolution blur_image > *** FINISHED. ***
segmentation_file
raw_image
sharp_image
histeq_image
retinex_image
// Function signature STATUS gis( IN Segmentation_t segtype, // JSLICTEXT | MORPHOTEXT IN Resolution_t resolution, // [FLAG_JSLIC_A | FLAG_JSLIC_F | FLAG_MORPHO_A FLAG_MORPHO_F] IN char *segmentation_file, // either text or binary format as per segtype parameter IN char *raw_image_file, // IN char *sharp_image_file, // IN char *histeq_image_file, // IN char *retinex_image_file, // IN char *blur_image_file // );
Compute Visual Genomes: vg The vg controller computes the base metrics as discussed in Chapter 4, and the vg controller is launched automatically by the gis controller after segmentation—one thread per genome region. The vg controller can be launched from the command line, or called using the function API. Refer to Figure 5.7.
Chapter 5: VGM Platform Overview The vg input parameters: – : Location of all files for the image, including metrics and masks – , , , : Coordinates of the bounding box containing the raw_mask_file pixel region within the image – : The names of all the files to be masked into genome regions for computing separate metrics for each image space (raw, sharp, retinex, histeq, blur) – : The mask file computed by the gis controller for the raw image, used as the mask file for all other image spaces The vg output: all the metrics files for each genome region, as discussed in the Visual Genomes Metrics Files section earlier in this chapter. // Function signature STATUS vg( IN char *filedir, // location of all files for this image IN int xpos, // x bounding box top left IN int ypos, // y bounding box top left IN int xsize, // x bounding box size IN int ysize, // y bounding box size IN All files_t filepaths, // paths to all raw,sharp,retinex,histeq,blur files IN char *raw_mask_file // raw image genome region mask file );
Comparing and Viewing Metrics: vgc The vgc controller is the main metrics generation interface for the VGM and provides several types of functionality (see Tables 5.3 and 5.4). vgc is automatically called by the gis, vg, and vgv controllers, and not called from the command line in normal use. Refer to Figure 5.7. The major vgc functions are listed here: – GENOMECOMPARE: Compare two base metrics files and mask files [raw|sharp|retinex|histeq|blur] together yielding a genome comparison structure – AUTO_LEARNING: Create an autolearning hull file by comparing the base genomes RAW SHARP RETINEX BLUR to themselves to establish the hull range – AUTO_CORRESPONDENCE: Compare a genome comparison structure to an autolearning hull to create default correspondence scores – PRINTING METRICS: Print out base metrics files, autolearning hull files, and genome compare files, to the shell window // Function signatures STATUS vgc_print( Int metric_print_t, // type of metric [ BASE_METRICS | GENOMECOMPARE | AUTO_LEARNING_HULL ] IN char *fullpath, // location of all files for this image ); // output: print formatted file to command shell STATUS vgc_genomecompare( IN char *directory1, // IN char *basemetricfile1, // IN char *directory2, // IN char *basemetricfile2, // ); // output: GENOMECOMPARE__* file STATUS vgc_autolearning ( IN char *directory1, // IN char *basemetricfile1, // IN char *directory2, //
VGM Platform Controllers IN char *basemetricfile2, // ); // output: AUTO_LEARNING__* file STATUS vgc_autocorrespondence ( IN char *directory1, // IN char *genomecomparefile, // IN char *directory2, // IN char *autolearningfile, // ); // output: AUTO_CORRESPONDENCE__* file
Table 5.3: vgc command syntax vgc command syntax summary > vgc GENOMECOMPARE directory1 metricsfile1 master_genomefile1 directory2 metricsfile2 master_genomefile2 > vgc AUTO_LEARNING directory metrics_filename_g mastergenome_RAW_filename_g mastergenome_SHARP_filename_g mastergenome_RETINEX_filename_g > vgc AUTO_CORRESPONDENCE directory1 GENOMECOMPARE__* directory2 AUTO_LEARNING_* > vgc PRINTMETRICS directory METRICS__* file > vgc PRINTGENOMECOMPARE fullpath > vgc PRINT_AUTO_LEARNING fullpath > vgc AGENT_COMPARE directory1 GENOMECOMPARE__* directory2 AUTO_LEARNING_*
Table 5.4: vgc parameter details vgc
Parameters
Description
vgc
GENOMECOMPARE
Compare any two genomes using respective base metrics files from any two images in any directory
vgc
Perform AUTOLEARNING using SHARP and RETINEX hulls. OUTPUT: GENOMECOMPARE__* file
AUTO_LEARNING
Chapter 5: VGM Platform Overview Table 5.4 (continued) vgc
Parameters
Description
Vgc
AUTO_CORRESPONDENCE
Measure two genomes against the autolearning hull OUTPUT: AUTOCORRESPONDENCE file
vgc
PRINTMETRICS
Print the metrics file
vgc
PRINTGENOMECOMPARE
Print the genome compare file
vgc
AGENT_COMPARE
Call an agent to evaluate the GENOMECOMPARE file against the AUTOLEARNING HULL file
vgc
PRINT_AUTO_LEARNING
Print the autolearning file
Agent Testing and Strand Management: vgv The vgv controller is the main command line interface for using the VGM after sequencing genomes, providing a wide range of interactive testing options from the command line. The vgv implements several custom test and development functions and also calls into VGM platform functions. To start, vgv displays an image which has previously been sequenced into genome masks and metrics files and then allows for interactive and command line driven testing. The vgv command syntax is summarized in Table 5.5, and the parameters are explained in Table 5.6. Several vgv examples and parameter options are provided in Chapter 11. Refer to Figure 5.7. The vgv controller is used from the command line only, intended for interactive training and testing, genome comparison experiments, strand building, and detailed metrics analysis, so no function interface is provided in the API.
VGM Platform Controllers
Table 5.5: vgv command syntax vgv command syntax summary > vgv HISTORY > vgv SEARCH > vgv COMPARE > vgv CREATE_STRAND > vgv SEARCH_STRAND > vgv HIGHLIGHT_STRAND > vgv EDIT_STRAND > vgv PRINT_STRAND > vgv LIST_STRANDS > vgv DRAW_SEGMENTS_TEST > vgv COLOR_LEVELING_TEST > vgv DRAW_POPULARITY > vgv SHOW_GENOME_DETAILS > vgv RUN_COMPARE_TEST > vgv LIST_AGENTS > vgv RUN_AGENT \ > vgv DISPLAY_SEGMENTATION_MASK
x [RMIN,GMIN,BMIN] [-bit, -bit, -bit, -bit, -bit] RANK-AVE -> x [RAVE,GAVE,BAVE] [-bit, -bit, -bit, -bit, -bit] RANK-MAX -> x [RMAX,GMAX,BMAX] [-bit, -bit, -bit, -bit, -bit]
Spaces: RGB, LBP, MIN, MAX, AVE quantizations per space
Figure 6.4: The 25 different types of CAM neurons, corresponding to the CAM neuron input spaces and magno and parvo feature channels.
Chapter 6: Volume Projection Metrics
CAM Neural Clusters All the CAM addresses in a genome feed into a set of summary CAM neural clusters, which record all CAM features from each input space as shown in Figure 6.4 into a set of 3D histogram volumes to sum the occurrence of each feature in the genome for each input space (see Figure 6.5).
Figure 6.5: A CAM neural cluster, which records all occurrences of each CAM neuron within each genome into a 3D histogram volume.
As shown in Figure 6.5, the CAM neurons feed into a CAM neural cluster to sum all the CAM features in the genome—one cluster for each specific metric input space. Since there are 25 CAM input spaces (Figure 6.4), there are 25 corresponding CAM neural clusters per genome, one per each of the five pre-processed images (raw, sharp, retinex, histeq, blur), for a total of 125 CAM cluster neurons per genome. For well-segmented genomes representing homogenous bounded regions, the CAM neural clusters are regular shapes centered about the axis, usually with very few outliers. In other words, the feature counts are concentrated in a smaller area revealing similar features, rather than spread out in the volume revealing something more like a noise distribution of unlike features. The magnitude (corresponding to size) of the CAM cluster neuron emulates biologically plausible neural growth [1] each time a CAM neuron feature impression increments the corresponding (x,y,z) cell in the CAM neural cluster. The CAM cluster neuron is a memory device. Each cluster represents related features from an input
CAM Feature Spaces
space. The size of each neuron follows plausible neuroscience findings and is determined by (1) how often the visual impression is observed and (2) the number of neural connections. Thus, CAM cluster neuron size is a function of the frequency which a visual function is observed. As an alternative to the 3x1 pixel mappings to generate the CAM cluster addresses, the VGM supports various other methods as discussed in Table 6.1; for example, RGB volume clustering uses each RGB pixel component to compose an (x,y,z) address by assigning x = R, y = G, z = B, so for each pixel in the genome we increment the neural cluster: 𝑙𝑙𝑒𝑒𝑒𝑒𝑥𝑥
𝑙𝑙𝑒𝑒𝑒𝑒𝑦𝑦
𝑥𝑥=0
𝑦𝑦=0
� � 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖_𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓_𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐(𝑣𝑣 𝑥𝑥=𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟(𝑥𝑥,𝑦𝑦),𝑦𝑦=𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔(𝑥𝑥,𝑦𝑦),𝑧𝑧=𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏(𝑥𝑥,𝑦𝑦) ) 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒:
𝑣𝑣𝑥𝑥,𝑦𝑦,𝑧𝑧 = 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑥𝑥, 𝑦𝑦, 𝑧𝑧 𝑎𝑎𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓 𝑒𝑒𝑒𝑒𝑒𝑒ℎ 𝑟𝑟, 𝑔𝑔, 𝑏𝑏 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
Volume Projection Metrics CAM neural clusters can be rendered as a simple volume rendering as shown in Figure 6.6. The volume is the feature; another way to say it is the neural memory is the feature. CAM neural clusters are used for correspondence using various distance functions discussed in this chapter. The number of times a CAM feature is discovered over the entire genome region is recorded or summed in the volume, so the volume projection is a feature metric. The metric projection concept is often employed in statistics; for example, the support vector machine (SVM) approach of representing metrics in higher dimensional spaces is commonly used to find a better correspondence (see Vapnik [80][77][78][79], and also [1]). Likewise, we find insights via multivariate volumetric projection metrics. The basic volume projection is based on simple Hubel and Weisel style edge information over RGBI + LBP color spaces taken within a quantization space, as discussed below, emulating varying levels of detail across the low levels of the magno and parvo pathways.
Chapter 6: Volume Projection Metrics
Figure 6.6: Volumetric projection metrics in a range of genome quantization spaces: left to right, 2-bit, 3-bit, 4-bit, 5-bit, 8-bit.
As shown in Figure 6.6, the volumetric projection metrics contain a range of color, shape, and texture metrics. The false coloring in the renderings represents impression counts (magnitude) across the genome for each CAM feature, so the volume rendering is a 4D representation. The volume renderings in Figure 6.6 are surface renderings in this case, obscuring the volume internals, and use a color map to represent magnitude at each voxel. Other volume rendering styles can be used to view the internals of the volume as shown later in this chapter. Volume metrics are often rendered using the familiar 3D scatter-plot for data visualization (see [81][82][83]). However, we use volumetric projections as a native metric for feature representation and correspondence. Several distance functions have proven to be useful as discussed later in this chapter. Note that the CAM neural clusters are accessed by the visual processing centers V1–Vn of the visual cortex and used for correspondence as texture, shape, and color features.
Quantization Space Pyramids Quantization space pyramids are used to represent visual data at various levels of detail and can be used to perform first-order comparisons of genome metrics to narrow down the best matches within the genome space by using progressively more bits to increase resolution. We use 8-bit, 5-bit, 4-bit, 3-bit, and 2-bit quantization. The bitlevel quantization simulates a form of attentional level of detail, which is biologically plausible [1]. As shown in Figure 6.7, different bit resolution per color yield different levels of detail, and quantization to 6-bits and 7-bits seems unnecessary, given that 8-bit quantization results are perceptually close to 5-bit quantization. Based on testing, we have found that 8-bit and 5-bit quantization yield similar correspondence results, so 5-bits are used for some of the color and volume metrics, but 8-bit color is better suited for many metrics. For color, using 5-bits instead of 8-bits coalesces similar colors into a common color, which is desirable for some metrics. The quantization input 𝛼𝛼 to the CAM neuron shown earlier in Figure 6.1 can be used to shape the memory address by
CAM Feature Spaces
masking each pixel to coalesce similar memory addresses which focuses and groups similar features together. Even so, the full 8-bit resolution is still preserved in the genome and used when needed for various metrics.
Figure 6.7: Bit quantization. Top left: 2-bits per RGB color, top right: 3-bits per RGB color, bottom left: 4-bits per RGB color, bottom right: 5-bits per RGB color. Note that 8-bit color is virtually indistinguishable from 5-bit color in almost all cases.
Strand CAM Cluster Pyramids For each image, a strand containing a summary of 2-bit quantizations of CAM clusters for each genome can be created to assist in optimizing correspondence, similar to an image pyramid used by SIFT at various resolutions [1], where SIFT correspondence is measured across the image pyramid to find features even when the image scale changes. In an analogous manner, 2-bit quantized CAM neural cluster features can be
Chapter 6: Volume Projection Metrics created for each genome in 128 bits, which can be evaluated natively in most Intel processor instruction sets today. Therefore, by searching for 128-bit strands, it is possible to quickly narrow down candidates target genomes to follow up with higherlevel correspondence at 8- or 5-bit resolution. Using quantization spaces larger than 2-bits is beyond 128 bits, more complicated, and not supported natively in the CPU instruction set. We will illustrate the concept with the example below. Imagine we use a 2-bit (four unique values) resolution CAM volume, with 4x4x4=64 cells in the (x,y,z) volume. We reduce the resolution of each cell counter to 4-bits and scale the input magnitudes using floats for input and mask off to the range 0..4. Then the total number of unique 2-bit genomes is: 22 ∗ 22 ∗ 22 = 64 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 464 = 340,282,366,920,938,463,463,374,607,431,768,211,456
2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456
*By coincidence, the Intel XEON processor provides 128-bit arithmetic, and 464 = 2128
So, an address composed of all 64 counters in a 2-bit quantized volume, each with a 4-bit counter in base 4 (0,1,2,3), can be represented in a 128-bit value and compared in a 128-bit Intel ALU register as follows: $ bc 2^128 340282366920938463463374607431768211455 ibase=4 # now enter in a 64-cell address, each with 4 bits in base 4 (0,1,2,3) 3333333333333333333333333333333333333333333333333333333333333333 340282366920938463463374607431768211455 # We see that 4^64 = 2^128
Typically, a 20MP image sequences to perhaps 3,000 unique genome regions, so a strand for each 20MP image would contain a set of 3,000 2-bit quantized genomes, each having 128 bits or 16 bytes, which is supported by current Intel architecture. // // compute the SummaryCAMStrand for 2-bit quantization // for (int n=0; n < number_of_genome_regions_for_this_image; n++) { for (int index = 0; index < N_PREPROCESSED_IMAGES; index++) { for (int g = 0; g < NUMBER_OF_ORIENTATIONS; g++) { for (int c = 0; c < N_COLOR_INDEXES; c++) { for (int q = 0; q < N_QUANTATIONS; s++) { SummaryCAMStrand.address[n] = (U128)genome[n][index][g][c][q].CAM_CLUSTER_2bit_address; } } } } }
CAM Feature Spaces
Volume Metric Details In this section we provide some discussion on the details of volume projection metrics, including the definitions, distance functions used, and memory size requirements.
Volume Impression Recording Each time a CAM feature address is detected in the image, the count for the address is incremented in the CAM neural cluster volume, corresponding to feature commonality. The method for computing the feature addresses and counts is simple (as illustrated in the following code snippet) and relies on the quantization input value as an 8-bit hexadecimal mask value of 0xF8 (binary 1111 1000). Then each pixel value in the address is bit-masked into the desired quantization space to ignore the bottom three bits. for (int y=0; y < ysize-2; y++) { for (int x=0; x < xsize-2; x++) { getRegion3x3_u8((U8_PTR)filedata_8u_g, x, y, xsize, ysize, (U8_PTR)w3x3); // // [x x x] [x B x] [C x x] [x x D] // [A A A] [x B x] [X C x] [x D x] // [x x x] [x B x] [X X C] [D x x] // U32 orientation_A_address = ((w3x3[1][1]) & 0xff) | ((w3x3[0][1] 𝜎𝜎 ? 𝑉𝑉 ∶ 0.0)
𝑖𝑖𝑖𝑖𝑖𝑖𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = 𝑚𝑚𝑖𝑖 =
�
(𝕍𝕍1 ≠0) ⇔ (𝕍𝕍2 ≠0)
𝑓𝑓(|𝕍𝕍1 − 𝕍𝕍2 |)
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 𝑚𝑚𝑡𝑡 = � 𝑓𝑓(|𝕍𝕍1 − 𝕍𝕍2 |) 𝛼𝛼
, 𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = �|𝑅𝑅𝑖𝑖𝜎𝜎 − 𝑇𝑇𝑖𝑖𝜎𝜎 |
𝑚𝑚𝑆𝑆𝑆𝑆𝑆𝑆 = � |𝑅𝑅𝑖𝑖 − 𝑇𝑇𝑖𝑖 |
𝑖𝑖
𝑖𝑖
𝛼𝛼
𝑚𝑚𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑅𝑅𝑖𝑖 − 𝑇𝑇𝑖𝑖 𝑖𝑖
𝛼𝛼
)2
𝑚𝑚𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = 2�1 − � �𝑅𝑅𝑖𝑖 𝑇𝑇𝑖𝑖 𝑖𝑖
𝛼𝛼
𝛽𝛽
,
𝛽𝛽
𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = �(𝑅𝑅𝑖𝑖𝜎𝜎 − 𝑇𝑇𝑖𝑖𝜎𝜎 )2 𝑖𝑖
𝛽𝛽
, 𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = 2�1 − � �𝑅𝑅𝑖𝑖𝜎𝜎 𝑇𝑇𝑖𝑖𝜎𝜎 𝑖𝑖
𝑚𝑚𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻𝐻 = � (𝑅𝑅𝑖𝑖 + 𝑇𝑇𝑖𝑖 == 0) ⟹ 1, (𝑅𝑅𝑖𝑖 > 0 & 𝑇𝑇𝑖𝑖 > 0) ⟹ 1 , 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 ⟹ 0 𝑖𝑖
𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼
𝛽𝛽
= �(𝑅𝑅𝑖𝑖𝜎𝜎 + 𝑇𝑇𝑖𝑖𝜎𝜎 == 0) ⟹ 1, (𝑅𝑅𝑖𝑖𝜎𝜎 > 0 & 𝑇𝑇𝑖𝑖𝜎𝜎 > 0) ⟹ 1 , 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑖𝑖
⟹0
𝑚𝑚𝐶𝐶ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒ℎ𝑒𝑒𝑒𝑒 =
𝑚𝑚𝑚𝑚𝑚𝑚 | 𝑅𝑅𝑖𝑖 − 𝑇𝑇𝑖𝑖 | 𝑖𝑖 𝛽𝛽
𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = � 𝛼𝛼
𝑖𝑖
(𝑅𝑅𝑖𝑖𝜎𝜎 − 𝑇𝑇𝑖𝑖𝜎𝜎 )2 (𝑅𝑅𝑖𝑖𝜎𝜎 + 𝑇𝑇𝑖𝑖𝜎𝜎 )2
𝑚𝑚𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 = �(𝑅𝑅𝑖𝑖 ≠ 0 && 𝑇𝑇𝑖𝑖 = 0) ⟹ 𝑅𝑅𝑖𝑖 , (𝑇𝑇𝑖𝑖 ≠ 0 && 𝑅𝑅𝑖𝑖 = 0) 𝑖𝑖
⟹ 𝑇𝑇𝑖𝑖 , 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 ⟹ 0
Volume Projection Metrics for CAM Clusters
𝛽𝛽
𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = �(𝑅𝑅𝑖𝑖𝜎𝜎 ≠ 0 && 𝑇𝑇𝑖𝑖𝜎𝜎 = 0) ⟹ 𝑅𝑅𝑖𝑖𝜎𝜎 , (𝑇𝑇𝑖𝑖𝜎𝜎 ≠ 0 && 𝑅𝑅𝑖𝑖𝜎𝜎 = 0) 𝑖𝑖
𝑚𝑚𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 =
⟹ 𝑇𝑇𝑖𝑖𝜎𝜎 , 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 ⟹ 0 ∑𝛼𝛼𝑖𝑖 𝑅𝑅𝑖𝑖 𝑇𝑇𝑖𝑖
�∑𝛼𝛼𝑖𝑖 𝑅𝑅𝑖𝑖 2 �∑𝛼𝛼𝑖𝑖 𝑇𝑇𝑖𝑖 2
𝑚𝑚𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝐽𝑟𝑟𝑟𝑟 =
𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 =
∑𝛼𝛼𝑖𝑖 𝑅𝑅𝑖𝑖 𝑇𝑇𝑖𝑖 ∑𝑖𝑖𝛼𝛼 𝑅𝑅𝑖𝑖 + ∑𝛼𝛼𝑖𝑖 𝑇𝑇𝑖𝑖 − ∑𝛼𝛼𝑖𝑖 𝑅𝑅𝑖𝑖 𝑇𝑇𝑖𝑖
𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 =
𝛼𝛼
𝑚𝑚𝐹𝐹𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = � �𝑅𝑅𝑖𝑖 𝑇𝑇𝑖𝑖 𝑖𝑖
𝑚𝑚𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑚𝑚𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
,
,
∑𝑖𝑖𝛽𝛽 𝑅𝑅𝑖𝑖𝜎𝜎
�∑𝛽𝛽𝑖𝑖 𝑅𝑅𝑖𝑖𝜎𝜎 2 �∑𝛽𝛽𝑖𝑖 𝑇𝑇𝑖𝑖𝜎𝜎 2
∑𝛼𝛼𝑖𝑖 𝑅𝑅𝑖𝑖𝜎𝜎 𝑇𝑇𝑖𝑖𝜎𝜎
𝛽𝛽 𝛽𝛽 + ∑𝑖𝑖 𝑇𝑇𝑖𝑖𝜎𝜎 − ∑𝑖𝑖 𝑅𝑅𝑖𝑖𝜎𝜎 𝑇𝑇𝑖𝑖𝜎𝜎
|𝑅𝑅𝑖𝑖 − 𝑇𝑇𝑖𝑖 | = � |𝑅𝑅𝑖𝑖 + 𝑇𝑇𝑖𝑖 | 𝑖𝑖
𝛽𝛽
, 𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = � �𝑅𝑅𝑖𝑖𝜎𝜎 𝑇𝑇𝑖𝑖𝜎𝜎
∑𝛼𝛼𝑖𝑖|𝑅𝑅𝑖𝑖 − 𝑇𝑇𝑖𝑖 | , 𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝑠𝑠𝑠𝑠𝑠𝑠 = ∑𝛼𝛼𝑖𝑖|𝑅𝑅𝑖𝑖 + 𝑇𝑇𝑖𝑖 |
𝛼𝛼
∑𝛽𝛽𝑖𝑖 𝑅𝑅𝑖𝑖𝜎𝜎 𝑇𝑇𝑖𝑖𝜎𝜎
𝑖𝑖
∑𝛽𝛽𝑖𝑖|𝑅𝑅𝑖𝑖𝜎𝜎 − 𝑇𝑇𝑖𝑖𝜎𝜎 | ∑𝛽𝛽𝑖𝑖�𝑅𝑅𝑖𝑖𝜎𝜎 + 𝑇𝑇𝑖𝑖𝜎𝜎 � 𝛽𝛽
, 𝑚𝑚𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 = � 𝑖𝑖
|𝑅𝑅𝑖𝑖𝜎𝜎 − 𝑇𝑇𝑖𝑖𝜎𝜎 | �𝑅𝑅𝑖𝑖𝜎𝜎 + 𝑇𝑇𝑖𝑖𝜎𝜎 �
Volume Distance Function Correspondence Groups Based on VGM test results, some distance metrics seem to provide similar correspondence results, so distance function group parameters for similar MCC distance metrics have been created (shown below) for convenience. However, distance functions may be specified independently or by using the group parameters to the CSV texture functions. See the match__texture() CSV function discussed later in this chapter for details. The final correspondence score can be computed using a variety of parameter options such as: (1) AVE of all distance functions, (2) MIN of chosen distance functions, or (3) MAX of group of similar distance functions. Note that the metric groups below are based on trial and error combinations, so results will vary according to the images and genome regions. INTERSECTION_METRICS (ID_pyramid_IntersectionSAD_genome_correlation + \ ID_pyramid_IntersectionSSD_genome_correlation + \ ID_pyramid_IntersectionHellinger_genome_correlation + \ ID_pyramid_IntersectionHammingsimilarity_genome_correlation + \ ID_pyramid_Outliermagnitude_genome_correlation + \ ID_pyramid_Outlierratio_genome_correlation + \ ID_pyramid_IntersectionDivergencesimilarity_genome_correlation + \ ID_pyramid_IntersectionCosine_genome_correlation + \ ID_pyramid_IntersectionJaccard_genome_correlation) NONINTERSECT_METRICS
Chapter 9: Texture Metrics (ID_pyramid_SAD_genome_correlation + \ ID_pyramid_SSD_genome_correlation + \ ID_pyramid_Hellinger_genome_correlation + \ ID_pyramid_Hammingsimilarity_genome_correlation + \ ID_pyramid_Outliermagnitude_genome_correlation + \ ID_pyramid_Outlierratio_genome_correlation + \ ID_pyramid_IntersectionDivergencesimilarity_genome_correlation + \ ID_pyramid_Cosine_genome_correlation + \ ID_pyramid_Jaccard_genome_correlation) RESOLUTION_INDEPENDENT_METRICS (quantization space independent) (ID_pyramid_Chebychev_genome_correlation + \ ID_pyramid_Cosine_genome_correlation + \ ID_pyramid_IntersectionCosine_genome_correlation + \ ID_pyramid_Jaccard_genome_correlation + \ ID_pyramid_IntersectionJaccard_genome_correlation + \ ID_pyramid_Fidelity_genome_correlation + \ ID_pyramid_IntersectionFidelity_genome_correlation) HUNTING_METRICS (ID_pyramid_Chebychev_genome_correlation + \ ID_pyramid_Sorensen_genome_correlation + \ ID_pyramid_IntersectionSorensen_genome_correlation)
Haralick Features Here we provide a brief overview, with background references, of Haralick texture metrics [84] as applied to 2D pixel regions. Haralick metrics are based on the relationships between adjacent pixels in an image using SDMs, otherwise known as GLCMs and image co-occurrence matrices. Haralick has catalogued a history of related SDM research (see [85–105]). For practical applications, good bibliography references, and comprehensive discussions, see Hall-Beyer [122]. Optimizations for each Haralick metric are covered by Pham [123]. In addition, the author created a list of extended SDM metrics (SDMX) [1] discussed in the next section, complimenting the Haralick metrics, particularly inspired by evaluating the visual SDM plots. In related research, Fernandez et al. [125] develop a histogram of equivalent patterns approach to taxonomize a wide range of textural features into a common framework, including SDMs and LBPs. LBP texture analysis is already included in the volume texture metrics discussed in this chapter. Akono et al. [124] developed a method of creating SDMs using 3x1 pixel groups instead of 2x1 pixel groups.
Haralick Features
Figure 9.4: A very simple illustration showing how to create four (4) oriented SDMs from a 2x2 image (top left). See also the detailed illustrations in the original papers [84] and [122].
For the Haralick and SDM-based metrics, the first step is to compute each of the oriented SDMs as illustrated in Figure 9.4. For SDMX data, the SDM is built using int’s range 0..255. For Haralick metrics, each SDM cell is normalized using the SDM normalization constant 𝑅𝑅𝜃𝜃 based on the dimensions of the SDM (dimensions would be (𝑖𝑖 = 4, 𝑗𝑗 = 4) for Figure 9.4). Each SDM cell is normalized by 𝑅𝑅𝜃𝜃 following marginal probability theory as shown here:
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒:
𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] =
𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] 𝑅𝑅𝜃𝜃
𝑅𝑅𝜃𝜃 = 𝑖𝑖(𝑗𝑗 − 1)
SDMs record local 2x1 pixel features in four orientations as shown in Figure 9.4. Note that each oriented matrix is symmetrical and can be created by either (1) double-counting adjacent pixels or by (2) adding the matrix to its transpose. Haralick et al. composed over 20 metrics, each computed in four orientations. In practice, about 14 of the metrics are widely used. Also, most practitioners ignore the four
Chapter 9: Texture Metrics orientations (see Figure 9.4 describing the SDM orientations), preferring the average of all four directions. However, the average value obscures the texture orientations and minimizes the value of each metric, since the metrics are truly oriented, as shown in Figure 9.6. Therefore, VGM provides all four oriented metrics, as well as their average.
Haralick Metrics Haralick metrics are computed over the Haralick SDM features, providing statistical metrics and ratios. Note that Haralick recommends using groups of individual metrics together rather than any single metric. The Haralick metrics are not simple to understand or to apply and therefore are not commonly used. Haralick metrics may not generalize across different types of images, and effective use is more like an art than a science, requiring patience and trial and error to get meaningful results. For a deeper treatment and background, see Hall-Beyer [122] and Pham [123]. We provide the Haralick metric equations here with no guidance; intuition is developed by trial and error, as well as modifications to the metrics. For example, Clausi [160] notes that the inverse difference metric can be improved by further normalization using the number of gray levels as a scale factor. Based on benchmark test results, Clausi favors a small set of the metrics such as contrast, entropy, correlation, and dissimilarity (i.e. difference variance) and notes that contrast and dissimilarity seem to provide good results. Clausi also notes that 64-gray levels (6-bit resolution) seem optimal for some of the Haralick metrics, while 128 or 256 levels seem too large for some metrics and in most cases provide no additional benefit. Clausi provides test data showing that accuracy decreases for some metrics with increasing gray levels, and accuracy for other metrics increases with a lower number of gray levels (such as 48). See also Soh and Tsatsoulis [161]. Note that VGM provides 8-bit (256 levels) and 5-bit (32 levels) resolution for all Haralick features, allowing for a range of training and tuning options. Note also that when dealing with genome mask images as shown in Figure 9.5, the mask background pixels (zero-valued) are ignored by the Haralick metrics to avoid biasing results toward the empty mask regions.
Haralick Features
Figure 9.5: An 8-bit SDM (left) compared to a 5-bit SDM (center) for an 8-bit gray scale image (right). In this case, SDMs are generated ignoring zero-valued background mask pixels which would otherwise bias the SDM.
The supported Haralick metrics are shown below (metrics not supported include: sum variance, sum average, sum entropy, maximal correlation coefficient, information measures of correlation 1,2). Haralick’s equations are shown next, with slight notation modifications to show independent 0,45,90,135 SDM feature angles. Mean 𝜇𝜇𝑖𝑖𝜃𝜃 𝜇𝜇𝑗𝑗𝜃𝜃
= =
0,45,90,135
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑖𝑖�𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗]�
� 𝜃𝜃
0,45,90,135
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑗𝑗�𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗]�
� 𝜃𝜃
𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛: 𝜇𝜇𝑖𝑖𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝜇𝜇𝑗𝑗𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
Sum of Squares: Variance
𝑣𝑣𝑣𝑣𝑣𝑣𝑖𝑖𝜃𝜃 =
0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
2
� � 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗]�𝑖𝑖 − 𝜇𝜇𝑖𝑖𝜃𝜃 �
Chapter 9: Texture Metrics
𝑣𝑣𝑣𝑣𝑣𝑣𝑗𝑗𝜃𝜃
=
0,45,90,135
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗]�𝑗𝑗 − 𝜇𝜇𝑗𝑗𝜃𝜃 �
� 𝜃𝜃
2
𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛: 𝑣𝑣𝑣𝑣𝑣𝑣𝑖𝑖𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑗𝑗𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 ℎ𝑒𝑒𝑟𝑟𝑒𝑒𝑒𝑒𝑒𝑒
Standard Deviation
𝜎𝜎𝑖𝑖𝜃𝜃 = �𝑣𝑣𝑣𝑣𝑣𝑣𝑖𝑖𝜃𝜃 𝜎𝜎𝑗𝑗𝜃𝜃 = �𝑣𝑣𝑣𝑣𝑣𝑣𝑗𝑗𝜃𝜃
𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛: 𝜎𝜎𝑖𝑖𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝜎𝜎𝑗𝑗𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
Angular Second Moment, Energy 𝐴𝐴𝑆𝑆𝑆𝑆𝜃𝜃 =
0,45,90,135
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
2
� ��𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗]�
� 𝜃𝜃
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝜃𝜃 = �𝐴𝐴𝐴𝐴𝐴𝐴𝜃𝜃
Contrast 𝜃𝜃
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 =
0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗](𝑖𝑖 − 𝑗𝑗)2
Difference Variance (*Dissimilarity) 𝐷𝐷𝐼𝐼𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝜃𝜃 =
0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗]|𝑖𝑖 − 𝑗𝑗|
Inverse Difference Moment (IDM) 𝜃𝜃
𝐼𝐼𝐼𝐼𝐼𝐼 =
0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� ��
𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] � 1 + (𝑖𝑖 − 𝑗𝑗)2
SDMX Features
Entropy 𝜃𝜃
𝐸𝐸𝐸𝐸𝐸𝐸𝑅𝑅𝑂𝑂𝑂𝑂𝑂𝑂 =
0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] �−𝑙𝑙𝑙𝑙(𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] )�
Correlation 𝜎𝜎𝑗𝑗𝜃𝜃
=
0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
�𝑖𝑖 − 𝜇𝜇𝑖𝑖𝜃𝜃 ��𝑗𝑗 − 𝜇𝜇𝑗𝑗𝜃𝜃 � � � 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] � � 𝑣𝑣𝑣𝑣𝑣𝑣𝑖𝑖𝜃𝜃 𝑣𝑣𝑣𝑣𝑣𝑣𝑗𝑗𝜃𝜃
SDMX Features The extended SDM feature metrics provide information on adjacent pixel relationships and were developed as metrics to visually interpret SDM plot features (as shown in Figure 9.6), complimenting the more abstract and statistical Haralick metrics. To understand the value of the SDMX metrics, SDMs are visualized in Figure 9.6 above a corresponding table of SDMX metrics. Note that the 90 degree SDM (second from right) looks different than the other plots and captures pixel relationships in the y dimension or vertical up/down direction of the image. Therefore, the 90 degree SDM shows the vertical pixel relationship structures and lines in the image. Referring to the table in Figure 9.6, the 90 degree SDM (second from right) has the longest locus length metric, indicating a wider range of pixel values. Also, the 90 degree SDM has the highest locus mean density metric and lowest low-frequency coverage metric in the table, indicating that there are not many outliers, and adjacent pixel values are contained within a narrower range, which also corresponds to the highest linearity metric of all the 0,45,90,135 degree orientations.
Chapter 9: Texture Metrics
texture for object: area: gray value moments: moments:
CocaCola.png 353, 323 min:0 max:255 mean:57 adev:23.8378 sdev:31.8880 svar:1016.8426
skew:1.922015
curt:9.619062
----------------------------------------------------------------------------------------metric 0deg 90deg 135deg 45deg ave ----------------------------------------------------------------------------------------xcentroid 77 77 77 77 77 ycentroid 77 77 77 77 77 low_frequency_coverage 0.193 0.082 0.176 0.205 0.164 total_coverage 0.693 0.811 0.699 0.664 0.717 corrected_coverage 0.500 0.729 0.523 0.459 0.553 total_power 9.000 2.000 8.000 10.000 7.250 relative_power 29.000 15.000 27.000 32.000 25.750 locus_length 164 230 168 150 178 locus_mean_density 59 55 51 51 54 bin_mean_density 11 18 11 10 12 containment 0.816 0.920 0.820 0.791 0.837 linearity 0.844 0.984 0.824 0.773 0.856 linearity_strength 0.542 0.414 0.362 0.362 0.420
Figure 9.6: The SDMX metrics using 8-bit data: (top, left to right) source image, 0 degree SDM, 45 degree SDM, 90 degree SDM, 135 degree SDM.
SDMX Metrics SDMX metrics are computed from 8-bit and 5-bit resolution SDM features, which is useful for finding the best resolution for a given application (see Figure 9.5). Unlike Haralick metrics, the SDMX metrics are recorded on unnormalized SDMs; however, some individual SDMX metrics are normalized in specific ways as shown in the equations below. NOTE: Source code implementing the SDMX metrics can be found in [1] Appendix D, and also in the VGM open source code, allowing for various normalizations and customizations. 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁: 𝑛𝑛 𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚 𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑎𝑎𝑎𝑎𝑎𝑎 𝑡𝑡ℎ𝑒𝑒 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑜𝑜𝑜𝑜 𝑡𝑡ℎ𝑒𝑒 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑆𝑆𝑆𝑆𝑆𝑆
∴ 𝑛𝑛 = 𝑚𝑚 , 𝑤𝑤ℎ𝑖𝑖𝑖𝑖ℎ 𝑖𝑖𝑖𝑖 256 𝑓𝑓𝑓𝑓𝑓𝑓 8 − 𝑏𝑏𝑏𝑏𝑏𝑏 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑜𝑜𝑜𝑜 32 𝑓𝑓𝑓𝑓𝑓𝑓 5 − 𝑏𝑏𝑏𝑏𝑏𝑏 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
Centroid (Weighted Center of Impressions) 0,45,90,135
� 𝜃𝜃
𝑧𝑧 += 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [(𝑖𝑖, 𝑗𝑗] � � �𝑥𝑥 += 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [(𝑖𝑖, 𝑗𝑗)] ∗ 𝑥𝑥 � , 𝑖𝑖 𝑗𝑗 𝑦𝑦 += 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [(𝑖𝑖, 𝑗𝑗)] ∗ 𝑦𝑦 𝑛𝑛
𝑚𝑚
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑥𝑥𝜃𝜃 =
𝑥𝑥 𝑦𝑦 , 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑦𝑦𝜃𝜃 = 𝑧𝑧 𝑧𝑧
SDMX Features
𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛: 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑥𝑥𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑦𝑦𝜃𝜃 𝑎𝑎𝑎𝑎𝑎𝑎 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑐𝑐𝑐𝑐𝑐𝑐 𝑓𝑓𝑓𝑓𝑓𝑓 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑎𝑎𝑎𝑎 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 Total Coverage (Image Smoothness) 0,45,90,135
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� �(𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [(𝑖𝑖, 𝑗𝑗] == 0 ? 0 ∶ 1),
� 𝜃𝜃
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑇𝑇𝜃𝜃 =
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑛𝑛𝑛𝑛
Low-Frequency Coverage (Uncommon, Values Noise) 0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� �(𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [(𝑖𝑖, 𝑗𝑗] == [1. . 𝛽𝛽] ? 1 ∶ 0),
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝐿𝐿𝜃𝜃 =
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝛽𝛽 = 𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 𝑓𝑓𝑓𝑓𝑓𝑓 𝑙𝑙𝑙𝑙𝑙𝑙 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑡𝑡𝑡𝑡𝑡𝑡. 3
𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝐿𝐿 𝑛𝑛𝑛𝑛
Corrected Coverage (Total Coverage with Low-frequency Noise Removed) B 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝐶𝐶𝜃𝜃 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑇𝑇𝜃𝜃 − 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝐿𝐿𝜃𝜃
Total Power, Relative Power (Swing in Bin Count Magnitude) 0,45,90,135
� 𝜃𝜃
𝑛𝑛 𝑚𝑚 𝑧𝑧 += 1 𝑡𝑡 � �(𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] ≠ 0 �𝑡𝑡 += |𝑖𝑖 − 𝑗𝑗|� , 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑇𝑇𝜃𝜃 = , 𝑛𝑛𝑛𝑛 𝑖𝑖
𝑗𝑗
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑅𝑅𝜃𝜃 ==
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑇𝑇𝜃𝜃 𝑧𝑧
Locus Mean Density (Degree of Clustering around Central Axis) 0,45,90,135
� 𝜃𝜃
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒:
𝑛𝑛 𝑚𝑚 𝑛𝑛 + + � � 𝑖𝑖𝑖𝑖(|𝑖𝑖 − 𝑗𝑗| < 𝛽𝛽)&& (𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [(𝑖𝑖, 𝑗𝑗] ≠ 0) ⟹ �𝑑𝑑 += 𝑠𝑠𝑠𝑠𝑠𝑠[𝑖𝑖. 𝑗𝑗]� 𝑖𝑖
𝑗𝑗
locusmean𝜃𝜃D =
𝑑𝑑 𝑛𝑛
𝛽𝛽 = 𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 𝑓𝑓𝑓𝑓𝑓𝑓 𝑙𝑙𝑙𝑙𝑙𝑙 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓, 𝑡𝑡𝑡𝑡𝑡𝑡. 7 𝑓𝑓𝑓𝑓𝑓𝑓 8 − 𝑏𝑏𝑏𝑏𝑏𝑏, 2 𝑡𝑡𝑡𝑡𝑡𝑡. 𝑓𝑓𝑓𝑓𝑓𝑓 5 − 𝑏𝑏𝑏𝑏𝑏𝑏
Chapter 9: Texture Metrics Locus Length (Binning Strength about Central Axis) 0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑖𝑖𝑖𝑖 (|𝑖𝑖 − 𝑗𝑗| ≠ 0 && 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [(𝑖𝑖, 𝑗𝑗] == [1. . 𝛽𝛽]) ⟹ 𝑙𝑙 + + locusL𝜃𝜃 = l
𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒:
𝛽𝛽 = 𝑡𝑡ℎ𝑟𝑟𝑟𝑟𝑟𝑟ℎ𝑜𝑜𝑜𝑜𝑜𝑜 𝑓𝑓𝑓𝑓𝑓𝑓 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟, 𝑡𝑡𝑡𝑡𝑡𝑡. 7 𝑓𝑓𝑓𝑓𝑓𝑓 8 − 𝑏𝑏𝑏𝑏𝑏𝑏, 2 𝑓𝑓𝑓𝑓𝑓𝑓 5 − 𝑏𝑏𝑏𝑏𝑏𝑏
Bin Mean Density (Mean Density of Non-empty Bins) 0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
𝑧𝑧 + + � � 𝑖𝑖𝑖𝑖 (𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗] ≠ 0) ⟹ �𝑏𝑏+= 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑗𝑗],� 𝜃𝜃 𝑏𝑏𝑏𝑏𝑏𝑏𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 =
𝑏𝑏 𝑧𝑧
Containment (Extrema Range Bleeding, Overflow) 0,45,90,135
� 𝜃𝜃
0,45,90,135
� 𝜃𝜃
0,45,90,135
� 𝜃𝜃
0,45,90,135
� 𝜃𝜃
0,45,90,135
� 𝜃𝜃
𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑖𝑖𝑖𝑖 (𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 0] ≠ 0) ⟹ 𝑐𝑐1𝜃𝜃 + + 𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑖𝑖𝑖𝑖 �𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑖𝑖, 𝑚𝑚] ≠ 0� ⟹ 𝑐𝑐2𝜃𝜃 + + 𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑖𝑖𝑖𝑖 �𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑛𝑛. 0] ≠ 0� ⟹ 𝑐𝑐3𝜃𝜃 + + 𝑛𝑛
𝑚𝑚
𝑖𝑖
𝑗𝑗
� � 𝑖𝑖𝑖𝑖 (𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑛𝑛. 𝑗𝑗] ≠ 0) ⟹ 𝑐𝑐4𝜃𝜃 + + containment 𝜃𝜃𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 =
c1𝜃𝜃 + c2𝜃𝜃 + c3𝜃𝜃 + c4𝜃𝜃 𝑛𝑛𝑛𝑛
Haralick and SDMX Metric Comparison Graphs
Linearity, Linearity Strength 0,45,90,135
� 𝜃𝜃
𝑚𝑚
𝑧𝑧 + + � 𝑖𝑖𝑖𝑖 (𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑗𝑗, 𝑗𝑗 ∗ 𝑚𝑚] ≠ 0) ⟹ �𝑏𝑏+= 𝑠𝑠𝑠𝑠𝑠𝑠𝜃𝜃 [𝑗𝑗, 𝑗𝑗 ∗ 𝑚𝑚]� 𝑗𝑗
𝜃𝜃 = 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜃𝜃 = 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠ℎ
𝑏𝑏 𝑚𝑚
(𝑏𝑏⁄𝑧𝑧 + .000001) 𝑚𝑚
Haralick and SDMX Metric Comparison Graphs
To provide intuition, a selection of Haralick and SDMX comparison metric graphs are shown in the following sections, across all combinations of supported image spaces, color levels, color spaces, and SDM angles as follows: 3 Image spaces (raw, sharp, retinex) 4 Color levels (raw, centered, LAB constant I, HSL saturation boost) 5 Color components (red, green, blue, luma, HSL_H) 5 SDM angles (0, 90, 45, 135, AVE) TOTAL: 3x4x5x5 = 300 permutations of each metric The metric comparison charts are based on comparing a reference genome and target genome illustrating two examples: (1) texture similarity and (2) texture dissimilarity. The graphs compactly illustrate how some metrics are more applicable than others for correspondence in a given application. As usual, some of the metrics are highly correlated for specific correspondence cases, and others are not. However, some metrics may be interesting only as general SDM statistics. As shown in the summary graphs, there is varying sensitivity to specific leveled color spaces. Learning and training are required to select texture metrics and tune classifiers for each application, as discussed in more detail in Chapter 11.
Texture Similarity Graphs (Match < 1.0) The following texture metric graphs each compactly show 300 different correspondence scores as explained in the legend notes below, by comparing the similararity of limestone block genomes in Figure 9.7. The graphs reveal good textural correspondence for many metrics with no weight tuning. Note the metric correspondence varies with the input color leveled space.
Chapter 9: Texture Metrics
Figure 9.7: A limestone block wall and two selected similar limestone genome block regions.
Haralick and SDMX Metric Comparison Graphs
Chapter 9: Texture Metrics
Haralick and SDMX Metric Comparison Graphs
Texture Dissimilarity Graphs (Nonmatch > 1.0) The following texture metric graphs show texture dissimilarity by comparing an orange flower pot to the granite pavement in Figure 9.8. Nearly all of the metric comparisons reveal very poor textural similarity, as expected. Note that 300 different correspondence scores are compactly displayed, as explained in the legend notes below.
Figure 9.8: An orange flower pot genome and a dissimilar granite pavement genome region.
Chapter 9: Texture Metrics
Haralick and SDMX Metric Comparison Graphs
Chapter 9: Texture Metrics
MCC Texture Functions The MCC texture functions include support for volumetric, Haralick, and SDMX features. In addition, the CSV function match__texture() combines all the texture metrics together into a high-level structured classifier using a combination of qualifier metric tuning and heuristics, as discussed in the next section. The MCC texture function signatures are summarized in Table 9.1. // Compute Volume Texture metrics on two genomes, compare and return the results as a double double compare__VolumeTexture__delta( double weight, // 1.0 default int distance_functions, // allow for specific distance functions to be specified MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] int images, // [ RAW_IMAGE | SHARP_IMAGE | RETINEX_IMAGE SUMMARY_VOLUME | ALL_IMAGES ] int colors, // [ RGB_COLORS | RED | GREEN | BLUE | LUMA | HSL | ALL_COLORS ] int scoring, // [ AVE_SCORE | MIN_SCORE | MAX_SCORE ] ); // Compute Haralick metrics on two genomes, compare and return the results as a struct double compare__Haralick__delta( double weight, // 1.0 default TEXTURE_HINTS hints, // allow for specific texture metrics to be used MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] HARALICK_MASK mask, // bit mask to select Haralick metrics to use int images, // [ RAW_IMAGE | SHARP_IMAGE | RETINEX_IMAGE SUMMARY_VOLUME | ALL_IMAGES ] int colors, // [ RGB_COLORS | RED | GREEN | BLUE | LUMA | HSL | ALL_COLORS ] int levels, // [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL_LEVELS ] int scoring, // [ AVE_SCORE | MIN_SCORE | MAX_SCORE ] Haralick_compare_t *metrics // All the Haralick deltas in a struct ); // Compute SDMX metrics on two genomes, compare and return the results as a struct double compare__SDMX__delta( double weight, // 1.0 default TEXTURE_HINTS hints, // allow for specific texture metrics to be used MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] SDMX_MASK mask, // bit mask to select SDMX metrics to use int images, // [ RAW_IMAGE | SHARP_IMAGE | RETINEX_IMAGE SUMMARY_VOLUME | ALL_IMAGES ] int colors, // [ RGB_COLORS | RED | GREEN | BLUE | LUMA | HSL | ALL_COLORS ]
Haralick and SDMX Metric Comparison Graphs int levels, // [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL_LEVELS ] int scoring, // [ AVE_SCORE | MIN_SCORE | MAX_SCORE ] SDMX_compare_t *metrics // All the SDMX deltas in a struct ); // Compute Haralick metrics double compute__Haralick_metrics( U64 genome_ID, int images, // [ RAW_IMAGE | SHARP_IMAGE | RETINEX_IMAGE SUMMARY_VOLUME | ALL_IMAGES ] int colors, // [ RGB_COLORS | RED | GREEN | BLUE | LUMA | HSL | ALL_COLORS ] int levels, // [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL_LEVELS ] Haralick_t *metrics // All the Haralick deltas in a struct ); // Compute SDMX metrics double compute__SDMX_metrics( U64 genome_ID, int images, // [ RAW_IMAGE | SHARP_IMAGE | RETINEX_IMAGE SUMMARY_VOLUME | ALL_IMAGES ] int colors, // [ RGB_COLORS | RED | GREEN | BLUE | LUMA | HSL | ALL_COLORS ] int levels, // [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL_LEVELS ] SDMX_t *metrics // All the SDMX deltas in a struct );
Table 9.1: Summarizing MCC texture correspondence functions TEXTURE BASE Volumetric Projections
double compare__pyramid_SAD_genome_correlation( ... ) double compare__pyramid_IntersectionSAD_genome_correlation( ... ) double compare__pyramid_SSD_genome_correlation( ... )
Component Spaces: RED projections GREEN projections BLUE projections LUMA projections
double compare__pyramid_IntersectionSSD_genome_correlation( ... ) double compare__pyramid_Hellinger_genome_correlation( ... ) double compare__pyramid_IntersectionHellinger_genome_correlation( ... ) double compare__pyramid_Hammingsimilarity_genome_correlation( ... ) double compare__pyramid_IntersectionHammingsimilarity_genome_correlation( ... ) double compare__pyramid_Chebychev_genome_correlation( ... )
RGB Spaces: Raw projections Blur projections LBP projections RANK MIN projections
double compare__pyramid_IntersectionDivergence_genome_correlation( ... ) double compare__pyramid_Outliermagnitude_genome_correlation( ... ) double compare__pyramid_Outlierratio_genome_correlation( ... ) double compare__pyramid_Cosine_genome_correlation( ... ) double compare__pyramid_IntersectionCosine_genome_correlation( ... ) double compare__pyramid_Jaccard_genome_correlation( ... ) double compare__pyramid_IntersectionJaccard_genome_correlation( ... ) double compare__pyramid_Fidelity_genome_correlation( ... ) double compare__pyramid_IntersectionFidelity_genome_correlation( ... ) double compare__pyramid_Sorensen_genome_correlation( ... ) double compare__pyramid_IntersectionSorensen_genome_correlation( ... ) double compare__pyramid_Canberra_genome_correlation( ... ) double compare__pyramid_IntersectionCanberra_genome_correlation( ... )
Haralick Features double compare__Haralick_delta( ... ) Parameters: double weight, // 1.0 default MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ]
Chapter 9: Texture Metrics Table 9.1 (continued) TEXTURE BASE HARALICK_MASK mask,
// bit mask to select Haralick metrics to use
int images, // [ RAW| SHARP | RETINEX | SUMMARY_VOLUME | ALL_IMAGES ] int colors, // [ RGB_COLORS | RED | GREEN | BLUE int levels,
SDMX Features
| LUMA
| HSL
| ALL]
// [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL ]
int scoring, //
[ AVE_SCORE | MIN_SCORE | MAX_SCORE ]
Haralick_delta_t *metrics // All the Haralick deltas in a struct double compare__SDMX_delta( ... ) Parameters: double weight,
//
MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] SDM_MASK mask,
// bit mask to select SDMX metrics to use
int images,
// [ RAW | SHARP | RETINEX | HISTEQ |BLUR | ALL_IMAGES ]
int colors,
// [ RGB | RED | GREEN | BLUE
int levels,
| LUMA
| HSL_INDEX
| ALL ]
int scoring,
// [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL ] //
[ AVE_SCORE | MIN_SCORE | MAX_SCORE ]
Haralick_delta_t *metrics // All the Haralick deltas in a struct // Compute Volume Texture metrics on two genomes, compare and return the results as a double double compare__VolumeTexture__delta( ... ) Parameters: double weight,
//
int images,
// [ RAW | SHARP | RETINEX | HISTEQ |BLUR | ALL_IMAGES ]
int t colors,
// [ RGB | RED | GREEN | BLUE
int scoring,
//
| LUMA
| HSL_INDEX
| ALL ]
[ AVE_SCORE | MIN_SCORE | MAX_SCORE ]
Haralick_delta_t *metrics // All the Haralick deltas in a struct // Compute Haralick metrics double compute__Haralick_metrics( ... ) Parameters: U64 genome_ID, int images,
// [ RAW | SHARP | RETINEX | HISTEQ |BLUR | ALL_IMAGES ]
int colors,
// [ RGB | RED | GREEN | BLUE
int levels,
| LUMA
| HSL_INDEX
| ALL ]
// [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL ]
Haralick_metrics_t *metrics // All the Haralick deltas in a struct // Compute SDMX metrics double compute__SDMX_metrics( ... ) Parameters: U64 genome_ID, int images, int colors, int levels,
// [ RAW | SHARP | RETINEX | HISTEQ |BLUR | ALL_IMAGES ] // [ RGB | RED | GREEN | BLUE
| LUMA
| HSL_INDEX
| ALL ]
// [ RAW | CENTERED | LAB_CONSTANT | SATURATION_BOOST | ALL ]
SDMX_metrics_t *metrics // All the SDMX deltas in a struct
Haralick and SDMX Metric Comparison Graphs
CSV Texture Functions The match__texture() CSV function is a high-level classifier, including heuristics to automatically select groups of MCC functions and set the MCC function parameters, with metrics qualifier functions incorporated to tune weights. The TEXTURE_HINTS parameter is provided to override the default CSV scoring behavior (see the code illustration below). As discussed in Chapter 4, each CSV function is tuned to perform heuristic logic to derive the metrics according to the MATCH_CRITERIA parameter to allow for customization and degrees of confidence. enum TEXTURE_HINTS {
TEXTURE_HINT_xcentroid_delta, TEXTURE_HINT_ycentroid_delta, TEXTURE_HINT_asm_delta, TEXTURE_HINT_low_frequency_coverage_delta, TEXTURE_HINT_total_coverage_delta, TEXTURE_HINT_corrected_coverage_delta, TEXTURE_HINT_total_power_delta, TEXTURE_HINT_relative_power_delta, TEXTURE_HINT_locus_length_delta, TEXTURE_HINT_locus_mean_density_delta, TEXTURE_HINT_bin_mean_density_delta, TEXTURE_HINT_containment_delta, TEXTURE_HINT_linearity_delta, TEXTURE_HINT_linearity_strength_delta, TEXTURE_HINT_autocorrelation_delta, TEXTURE_HINT_covariance_delta, TEXTURE_HINT_Mean_delta, TEXTURE_HINT_SumOfSquaresVariance_delta, TEXTURE_HINT_StandardDeviation_delta, TEXTURE_HINT_ASM_delta, TEXTURE_HINT_Contrast_delta, TEXTURE_HINT_DifferenceVariance_delta, TEXTURE_HINT_IDM_delta, TEXTURE_HINT_Entropy_delta, TEXTURE_HINT_Correlation_delta,
};
TEXTURE_HINT_END
// // Match on texture metrics – allmetrics (volume, Haralick, SDMX) // Description: // Creates SDMX and Haralick texture metrics for two genomes, // Compare SDMX and Haralick metrics // Compare Volumetric features // Compute final score based on weighted combination of all features // AGENT_SCORE_RESULT match__texture( TEXTURE_HINTS hints, // allow for specific texture metrics to be used MATCH_CRITERIA criteria, // [ NORMAL |STRICT |RELAXED |HUNT |PREJUDICE |VRAW |VMIN |VMAX |VAVE |VLBP ] AGENT_OVERRIDES agent_parameter, double weight_override, // ddefault 1.0 double *match_strength // Return value: the strength of the match );
Summary This chapter provides describes the VGM texture metrics: volume projection metrics for CAM clusters, Haralick metrics, and SDMX metrics. The volume projection metrics encompass several metric spaces and quantization spaces. Separate volume projection metrics are provided for each r,g,b,l component taken across NxN pixel kernels.
Chapter 9: Texture Metrics Projections include MIN value projections, AVE value projections, and LBP projections. In addition, 2D metrics are provided based on SDMs, otherwise referred to as GLCMs. The 2D SDM-based metrics include Haralick metrics and SDMX metrics. The MCC texture classifier functions and CSV texture functions are summarized, with details provided regarding tuning and bias parameters.
Chapter 10 Region Glyph Metrics The more I study nature, the more I stand amazed at the work of the Creator.
―Louis Pasteur
Overview We refer to a glyph as a type of feature descriptor in the VGM. As surveyed in [1], features may be computed globally over an entire image or locally in selected areas. Features may be representented in various metric spaces. According to the taxonomy in [1], the following families of feature descriptors can be named: – Local binary descriptors: features are defined as binary sets or vectors; for example, SIFT, LBP, BRISK, ORB, BRIEF, CENSUS, FREAK – Spectra descriptors: features are defined using a range of metrics such as statistical metrics, histograms, gradients, and colors; for example, SIFT or SURF – Basis space descriptors: features are defined from a basis space such as Fourier, Haar, wavelets, DNN weight templates, or feature vocabularies – Polygon shape descriptors: record geometric metrics or image moments, such as perimeter, area, and circularity The VGM refers to all types of regional feature descriptors as glyphs, like an icon or a logo; by definition, a glyph is a high-level description of a group of pixels. A group of pixels may be a rectangular nxn kernel, a circular kernel, a set of patch kernels, a polygon kernel, or a feature in another basis domain such as a Fourier feature. DNNs produce a set of nxn glyphs as correlation template weights adjusted during the training cycle and perhaps flattened into a 1D fully connected vector of weights for correspondence. As a result of research and testing discussed in [1], we have selected specific glyphs for the first VGM version, described in the following sections. Note that each glyph in the VGM is a learned feature, trained from a genome region using a range of training images from the eye/LGN model discussed in Chapter 2. Each type of feature descriptor carries advantages and limitations. When choosing a glyph feature, note that the computation times vary widely. For the VGM, several glyph features were chosen and adapted to operate with selected color spaces. ORB was chosen since it is well engineered and among the fastest to compute, being about one order of magnitude faster than SURF and three orders of magnitude faster than SIFT. Variants of the common SIFT and SURF features are incorporated. DNN feature models are incorporated as well, since the feature model is a glyph or set of glyphs according to the VGM definition.
DOI 10.1515/9781501505966-010
Chapter 10: Region Glyph Metrics The VGM supported glyphs include: – Color SIFT, HSL hue, and saturation – Color component R,G,B,I G-SURF – Color component R,G,B,I ORB – RGB DNN The glyphs are supported over the full range of input images (Raw, Sharp, Retinex, Histeq, Blur) at 8-bit resolution and mostly within r,g,b,l color spaces. However, for the first VGM version, glyphs are not supported within the base metrics structures, so no autolearning hulls are computed. Glyph features can be called and managed from agent code. Next, we provide details on the supported glyphs and corresponding MCC functions.
Color SIFT VGM provides a variant of SIFT, developed by van de Wiejer et al. [110] and referred to as Color SIFT, as illustrated in Figure 10.1. Color SIFT uses hue and saturation to form parts of the descriptor, and since the original SIFT algorithm [128] is patented, we prefer to avoid it especially for the VGM open source environment. ADVANTAGES: Since the color space components used are HSL hue and saturation, this descriptor provides a unique light invariant approach to building descriptors.
Figure 10.1: A hue saturation color SIFT method [110], which provides some measure of color appearance invariance under various lighting and shading conditions. Image Copyright © 2006 Joost van de Weijer and Cordelia Schmid [110]. Used by permission.
Color Component R,G,B,I ORB
Color Component R,G,B,I G-SURF The G-SURF descriptor (Gauge-SURF) [127] is a variant of the original SURF descriptor [130] with some improvements, using a local gauge coordinate system and multiscale gauge derivatives for improved rotational invariance. SURF is similar in some respects to SIFT in terms of accuracy and performance and is also patented. Since we wish to avoid patents, we choose G-SURF instead of SURF. ADVANTAGES: It is widely accepted, includes good invariance to scale and rotation, and will perform well on LUMA image components.
Figure 10.2: The G-SURF descriptor: The FAST-Hessian oriented interest points are shown using a yellow line length representing magnitude, and red (positive valued local extrema) or blue (negative valued non-extrema culling candidate) point origin indicating the sign of the Hessian determinant.
Color Component R,G,B,I ORB The ORB descriptor [129] is a binary descriptor, recording oriented interest point features in a binary feature vector. ORB learns the points using criteria of high variance, with comparable accuracy and invariance to SIFT and SURF. ORB is not patented.
Chapter 10: Region Glyph Metrics ADVANTAGES: Since ORB is a binary feature vector, it is amenable to fast Hamming feature matching, since Hamming distance is usually provided as an assembler instruction, and the overall performance is at least an order of magnitude faster than SIFT and SURF with comparable accuracy and invariance (see Figure 10.3).
Figure 10.3: An ORB descriptor match computed between two genomes; one genome is rotated and scaled. Only the top ten feature points are shown here which keeps the illustration simpler.
RGB DNN DNNs provide good accuracy for many applications at the expense of potentially timeconsuming and fragile training protocols (see [1]). VGM provides a DNN model using an abbreviated training protocol emulating the eye and LGN model discussed in Chapter 2. NOTE: The VGM DNN is a plugin available by special license arrangement with Krig Research, which produces a trained DNN model for each genome. The model is compressed and based on a master model of nxn genome features. Other DNN models may be incorporated into the VGM by agents.
MCC Functions for Glyph Bases Separate functions are provided for computing and comparing each glyph base. Note that the performance of glyph related functions may be fairly slow, so instead of building in the glyph metrics into the base metric structure and creating autolearning hulls for each genome, each glyph metric is computed on demand by agents, perhaps
MCC Functions for Glyph Bases
as a final classifier stage. For ORB, G-SURF, and Color SIFT, the metrics can be computed on any single color component r,g,b,l. For the DNN, each RGB component is used to produce model weights. For the glyph compare functions, the MATCH_CRITERIA is defined in Chapter 5 and includes heuristic rules to alter parameters and weights. // // COMPUTE GLYPH FUNCTIONS // STATUS U64 int int OUT );
compute__COLOR__HUE_SATURATION_SIFT( genome_ID, images, // fixed: HSL HUE, HSL_SATURATION component, // [ RED | GREEN | BLUE | LUMA | HSL_S] COLOR__HUE_SATURATION_SIFT_t *metrics // returns a list of the 10 strongest features
STATUS U64 int int OUT );
compute__RGBI_COMPONENT_GSURF( genome_ID, images, // [ RAW | SHARP | RETINEX | HISTEQ | BLUR ] component, // [ RED | GREEN | BLUE | LUMA | HSL_S] RGBI_COMPONENT_GSURF_t *metrics // returns a list of the 10 strongest features
STATUS U64 int int OUT );
compute__RGBI_COMPONENT_ORB( genome_ID, images, // [ RAW | SHARP | RETINEX | HISTEQ | BLUR ] component, // [ RED | GREEN | BLUE | LUMA | HSL_S] RGBI_COMPONENT_ORB_t *metrics // returns a list of the 10 strongest features
STATUS U64 int int OUT );
compute__RGB_DNN( genome_ID, images, // [ RAW | SHARP | RETINEX | HISTEQ | BLUR ] component, // fixed: RGB RGB_DNN_t *metrics // returns a list of the 10 strongest features
// // COMPARE GLYPH FUNCTIONS // double compare__COLOR__HUE_SATURATION_SIFT__delta( U64 reference_genome_ID, U64 target_genome_ID, double weight, // 1.0 default MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] int scoring, // [ AVE_SCORE | MIN_SCORE | MAX_SCORE ] OUT COLOR__HUE_SATURATION_SIFT_delta_t *metrics // returns the feature comparison scores, 10 compares ); double compare__RGBI_COMPONENT_GSURF__delta( U64 reference_genome_ID, U64 target_genome_ID, double weight, // 1.0 default MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] int scoring, // [ AVE_SCORE | MIN_SCORE | MAX_SCORE ] OUT RGBI_COMPONENT_GSURF_delta_t *metrics // returns the feature comparison scores, 10 compares ); double compare__RGBI_COMPONENT_ORB__delta( U64 reference_genome_ID, U64 target_genome_ID, double weight, // 1.0 default MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] int scoring, // [ AVE_SCORE | MIN_SCORE | MAX_SCORE ] OUT RGBI_COMPONENT_ORB_delta_t *metrics // returns the feature comparison scores, 10 compares ); double compare__RGB_DNN__delta( U64 reference_genome_ID,
Chapter 10: Region Glyph Metrics U64 target_genome_ID, double weight, // 1.0 default MATCH_CRITERIA criteria, // [NORMAL |STRICT |RELAXED |HUNT |PREJUDICE ] int scoring, // [ AVE_SCORE | MIN_SCORE | MAX_SCORE ] OUT RGB_DNNB_delta_t *metrics // returns the feature comparison scores, 10 compares );
Glyph Base CSV Agent Function The match__glyph() CSV function encapsulates compute_glyph() functions and compute_compare() functions together, along with heuristic logic, and is a compute-intensive function. NOTE: The glyph metrics are not included in the base compare metrics and have no autolearning hull metrics. enum glyph_bases {
};
// bit masks
COLOR__HUE_SATURATION_SIFT RGBI_COMPONENT_GSURF RGBI_COMPONENT_ORB RGB_DNN
= = = =
GLYPH_END
= 0xff,
0x01, 0x02, 0x04, 0x08,
AGENT_SCORE_RESULT match__glyph( U64 reference_genome_ID, U64 target_genome_ID, MATCH_CRITERIA criteria, // [NORMAL | HUNT | PENALIZE | RELAXED | BOOST | STRICT] glyph_bases glyphs, // bit mask of glyphs to analyze double *match_strength // Return value: the strength of the match );
Summary In this chapter we introduce the glyph base metrics for a Color Hue/Saturation SIFT, an RGBI color component SURF and ORB, and an RGB DNN. The MCC metric glyph functions are discussed, and the glyph matching CSV function is discussed as well.
Chapter 11 Applications, Training, Results There is nothing so powerful as truth—and often nothing so strange. —Daniel Webster
Overview This chapter walks through the process of training a test application, using an interactive training scenario to provide insight into genome metric comparisons, and scoring results. The process of genome selection, strand building, genome compare testing, metric selection, metric scoring, and reinforcement learning are outlined at a high level. This chapter does not provide a complete working application as source code, but instead explores low-level details using VGM command line tools to develop intuition about VGM internals. The key section in this chapter for understanding metrics and scoring for the genome compare tests is “Selected Uniform Baseline Test Metrics” in the “Test Genomes and Correspondence Results” section, where a uniform baseline set of thousands of color, shape, and texture metrics are explained, as used for all genome compare tests. By examining the detailed test results in the scoring tables for each genome compare, insight is developed for metric selection, qualifier metrics tuning, classifier design, and the various learning methods discussed in Chapter 4. This chapter illustrates how each reference genome is a separate ground truth, requiring specific metrics discovered via reinforcement learning, as discussed in Chapter 4. The volume learning model leverages reinforcement learning, beginning from the autolearning hull thresholds, to establish an independent classifier for each metric in each space for each genome. VGM allows for classifier learning by discovering optimal sets of metrics to combine and tune as a group into a structured classifier for each genome. The VGM command line tools are used to illustrate an interactive test application development process, including genome selection and training (see Chapter 5 for tool details). Problems and work-arounds are explored. The test application is designed to locate a squirrel in a 12MB test image (see Figure 11.1). Reinforcement learning considerations are highlighted for a few robustness criteria such as rotation and lighting invariance. Detailed metric correspondence results are provided. In addition to the test application, three unit tests are presented and executed to show how pairs of genomes compare against human expectations. The three unit test categories are (1) genome pairs which look like matches, (2) genome pairs which appear to be close matches and might spoof the metrics comparisons, and (3) genomes
DOI 10.1515/9781501505966-011
Chapter 11: Applications, Training, Results which are clearly nonmatches. Summary test tables are provided containing the final unit test scores, which reveal anomalies in how humans compare genomes, why ground truth data selection is critical, and why reinforcement learning and metrics tuning are needed to build a good set of baseline metrics for scoring within a suitable classifier structure. Major topics covered in this chapter include: – Test application outline – Background segmentations, problems, and work-arounds – Genome selection, strands, and training – Baseline genome metrics selection from CSV agents – Structured classification – Testing and reinforcement learning – Nine selected test genomes to illustrate genome compare metrics – Selected metrics (Color, Texture, Shape) – Scoring strategies – Unit tests showing first order metrics for MATCH, NOMATCH CLOSE MATCH – Sample high level agent code
Figure 11.1: (Left) 12MP 4000x3000 test image containing several squirrel instances, (center) 400x300 squirrel region from 12MP image, and (right) squirrel head downsampled 10x to 40x30 (as a DNN might see it after downsampling the entire training image from 4000x3000 to 400x300).
Test Application Outline The test image in Figure 11.1 (left) includes several views of a squirrel, with different pre-processing applied to each squirrel, so the squirrels are all slightly different. A set of test genomes are extracted from the test image for genome comparisons, identified later in this chapter. VGM operates on full resolution images, and no downsampling of images is required, so all pixel detail is captured in the model.
Test Application Outline
Here is an outline and discussion of the test application parameters, with some comparison of VGM training to DNN training: – Image resolution: 4000x3000 (12MP image). Note that the VGM is designed to fully support common image sizes such as 5MP and 12MP, or larger. Note that a DNN pipeline operates typically on a fixed size such as 300x300 pixel images; therefore, this test application could not be performed using a DNN (see the Figure 11.1 caption). – Robustness criteria: scale, rotation, occlusion, contrast, and lighting variations are illustrated in the compare metrics. The training set contains several identical squirrels, except that the separate squirrel instances are each modified using image pre-processing applied to change rotation, scale, and lighting to test invariance criteria. DNN training protocols add training images expressing the desired invariance criteria. – Interactive training: interactive training sessions are explained, where the trainer selects the specific squirrel genome features to compose into a strand object, similar to the way a person would learn from an expert. (Note that an agent can also select genomes and build strands, with no intervention.) By contrast, DNN training protocols learn the entire image presented, learning both intended and unintended features, which contributes to DNN model brittleness. – Rapid training: VGM can be trained quickly and interactively from selected features in a single image, and fine-tuned during training using additional test images or other images. VGM training is unlike DNN training, since DNN training requires epochs of training iterations (perhaps billions) as well as a costly-to-assemble set of perhaps tens and hundreds of thousands of hand-selected (or machine-selected and then hand-selected) training images of varying usefulness. – CSV agent reinforcement learning: the best metric comparison scores for each genome comparison are learned and stored in correspondence signature vectors (CSVs) by CSV agents, using the autolearning hull thresholds yielding first order metric comparisons. Reinforcement learning and training is then applied to optimize the metrics. Here are the key topics in the learning process covered in subsequent sections of this chapter: – Strands and genome segmentations: selecting genomes to build strands, problems, and work-arounds – Testing and interactive reinforcement learning: CSV agents, finding the best metrics, classifier design, metrics tuning – Test genomes and correspondence results: compare test genomes; evaluate genome comparisons using baseline metrics
Chapter 11: Applications, Training, Results
Strands and Genome Segmentations As shown in Figure 11.2, note that the parvo segmentations (left) are intended to be smaller and more uniform in size, simulating the AR saccadic size, and the magno segmentations (right) are intended to include much larger segmentated regions as well as blockier segmentation boundaries. The jSLIC segmentation produced 1,418 total parvo scale genome regions, but only 1,246 were kept in the model, since 172 were culled because they were smaller than the AR size. See Chapter 2 regarding saccadic dithering and segmentation details. The morpho segmentation produced 551 magno scale features; 9 were culled as too large (> 640x480 pixels) and 75 were culled as smaller than the AR, yielding 467 modeled genome features. By using both the parvo and magno scale features, extra invariance is incorporated into the model. Using several additional segmentations combined would be richer and beneficial, but for brevity only two are used.
Figure 11.2: (Left) jSLIC parvo resolution color segmentation into uniform sizes and (right) morpho segmentation into nonuniform sizes.
Resulting genome segmentations of the squirrel are shown in Figure 11.3. Note that the parvo features are generally smaller and more uniform in size, and the magno features contain larger and nonuniform sizes, which is one purpose for keeping both types of features. No segmentation is optimal. Comparing nonuniform sized regions (i.e. a small region compared to a large region) is performed in the MCC classifiers by normalizing each metric to the region size prior to comparison. However, based on current test results, a few metric comparisons seem to work a little better when the region size is about the same; however, region size sensitivity is generally not an issue.
Strands and Genome Segmentations
Figure 11.3: Two pairs of segmentations shown top and bottom. jSLIC hi-res segmentation (left of the pairs) from 4000x3000 RGB image vs. morpho lo-res segmentation (right of the pairs) from 800x600 5:1 downsampled image. See Chapter 2 for details.
Building Strands An interactive training process is performed using the vgv command line tool discussed in Chapter 5, which displays the test image and allows for various interactive training options. Using vgv, a parvo strand and a magno strand are built to define the squirrel features, as well as define additional test features. To learn the best features, a default CSV agent is used to create and record comparison metrics in a CSV signature vector to allow the trainer to interactively learn preferred metrics. Not shown is the process of using the master learning controller, discussed in Chapter 5, to locate the best metrics and auto-generate C++ code to implement the CSV agent learnings in a custom agent. For this example, we build a parvo strand as shown in Figure 11.4 and a magno strand as shown in Figure 11.5 to illustrate segmentation differences. By using both a parvo strand and a magno strand of the same object, robustness can be increased. Note that the magno strand includes coarser level features since the morpho segmentation is based on the 5:1 reduced resolution magno images, compared to the full resolution parvo images. Recall from Chapter 2 that the VGM models the magno pathway as the fast 5:1 subsampled low-resolution luma motion tracking channel, and the parvo channel is the slower responding high-resolution channel for RGB color using saccadic dithering for fine detail. The following sections build strands from finer parvo features and coarser magno features, with some discussion about each strand. Note that during training, there can be problems building strands which manifest as shown in our examples. We discuss problems and work-arounds after both strands are built.
Chapter 11: Applications, Training, Results Parvo Strand Example
Figure 11.4: Parvo strand, smaller genome regions, smoother boundaries. This is the reference strand for this chapter exercise.
The commands used to build the parvo strands in Figure 11.4 are shown below. Details on the command vgv CREATE_STRAND are provided in Chapter 5. $ ./vgv CREATE_STRAND /ALPHA_TEST/ JSLIC_ENHANCED_PARVO_squirrels.png JSLIC_ENHANCED_PARVO_squirrels_DIR . . . build the strand interactively . . .
$ ./vgv HIGHLIGHT_STRAND /ALPHA_TEST/ JSLIC_ENHANCED_PARVO_squirrels.tif JSLIC_ENHANCED_PARVO_squirrels_DIR/ STRAND__0000000000000000_squirrel_PARVO.strand . . . the image is loaded and the strand is overlayed using blue false coloring for visualization . . . :: strand file loaded - genome count: 9
Strands and Genome Segmentations
Magno Strand Example
Figure 11.5: Magno strand from morph coarse segmentations; blocky region perimeter boundaries are due to the 5:1 subsampling of the magno images, also note back segmentation is wrong and includes part of the wall.
The commands used to build and show the magno strand in Figure 11.5 are shown below. Details on the command vgv CREATE_STRAND are provided in Chapter 5. $ ./vgv CREATE_STRAND /ALPHA_TEST/ MORPHO_ENHANCED_MAGNO_squirrels.png MORPHO_ENHANCED_MAGNO_squirrels_DIR . . . build the strand interactively . . .
$ ./vgv HIGHLIGHT_STRAND /ALPHA_TEST/ MORPHO_ENHANCED_MAGNO_squirrels.tif MORPHO_ENHANCED_MAGNO_squirrels_DIR/ STRAND__0000000000000000_squirrel_MAGNO.strand . . . the image is loaded and the strand is overlayed using blue false coloring for visualization . . . :: strand file loaded - genome count: 3
The magno strand in Figure 11.5 only contains three genomes, each coarser in size and shape compared to the nine finer grained genomes for the parvo strand in Figure 11.4. Also note that the squirrel back genome in Figure 11.5 contains part of the stucco wall—not good. A better strand will need to be built from a different segmentation. See the next section for discussion. Discussion on Segmentation Problems and Work-arounds Figure 11.5 illustrates the types of problems that can occur with image segmentation, which manifest as ill-defined genome region boundaries including wrong colors and textures, resulting in anomalous metrics and suboptimal genome region correspondence. As shown in Figure 11.5, the magno squirrel back+front_leg region is segmented
Chapter 11: Applications, Training, Results incorrectly to include part of the stucco wall. What to do? Some discussion and workarounds are given here to deal with segmentation issues as follows: 1. Use a different LGN input image segmentation (raw, sharp, retinex, blur, histeq) or different color space component (Leveled, HLS_S, . . .) 2. Use a different segmentation method or parma (jSLIC, morpho, . . .) 3. Use multiple segmentations and perhaps create multiple strands; do not rely on a single segmentation 4. Combinations of all of the above; learn and infer during training. Future VGM releases will enhance the LGN model to provide as many of the above work-around options as possible to improve segmentation learning. The current VGM eye/LGN segmentation pipeline is still primitive but emulates how humans scan a scene by iterating around the scene, changing LGN image pre-processing parameters and resegmenting the scene in real-time. So for purposes of LGN emulation, it is advantageous to use more than one LGN image pre-processing method and more than one segmentation method, based on the analysis of the image metrics, to emulate the human visual system. Then strands can be built from the optimal segmentations. Strand Alternatives: Single-image vs. Multi-image There are two basic types of strands: single image or multi image. A default singleimage strand collects genomes from a single-image segmentation, and a multi-image strand includes genomes from multiple-image segmentations, which may be desirable for some applications. Since the interactive training vgv app works on a single image at a time, we use single-images strands here. However, it is also possible to collect a set of independent genomes, and the agent can perform genome-by-genome correspondence and manage the scoring results. For the current test exercise, we emulate agent-managed strands for genome-by-genome comparison, rather than using the strand set structure and shape correspondence functions provided in the VGM, since the aim is to provide more insight into the correspondence process. The strand correspondence functions are discussed in Chapter 8.
Testing and Interactive Reinforcement Learning In this section we provide metrics comparison results between selected genome reference/target pairs in Figure 11.7 below. For each reference genome we illustrate the process of collecting and sorting genome-by-genome correspondence results to emulate the strand convenience function match__strand_set_metrics ( ... ) discussed in Chapter 8. By showing the metrics for selected genome compares, low-level details and intuition about genome correspondence is presented. For brevity, we ignore the strand metrics for angle, ellipse shape, and Fourier descriptor, which are provided in
Strands and Genome Segmentations
the MCC shape analysis function match__strand_shape_metrics ( ... ), and instead focus on just the set metrics and correspondence scores. Hierarchical Parallel Ensemble Classifier For these tests, a structured classifier is constructed, as shown in Figure 11.6, using a hierarchical ensemble of parallel MCCs. to compute correspondence using a group of CSV agents operating in an ensemble, where each CSV agent calls several MCC classifiers and performs heuristics and weighting based on the MATCH_CRITERIA and the OVERRIDES parameters (as discussed in Chapter 4). Each CSV agent uses an internal hierarchical classifier to select and weight a targeted set of metrics. The CSV agent MR_SMITH is called six times in succession using different match criteria parameters for each of the six agent invocations: MATCH_NORMAL, MATCH_STRICT, MATCH_RELAXED, MATCH_HUNT, MATCH_PREJUDICE, MATCH_BOOSTED. The end result is a larger set of metrics which are further reduced by a learning agent to a set of final classification metrics, shown in the following sections.
Figure 11.6: The hiearchical ensemble classifier used for the tests.
Chapter 11: Applications, Training, Results Reinforcement Learning Process In this chapter, we go through a manual step-by-step process illustrating how reinforcement learning works inside the CSV agent functions. This approach builds intuition and reveals the challenges of learning the best metrics in the possible metric spaces. The CSV agents are used to collect metrics and sort the results, which we present in tables. Each CSV agent calls several MCC functions, performs some qualifier metric learning, as well as sorting the metric comparisons into MIN, AVE, and MAX bins for further data analysis. Please refer to Chapter 4 for a complete discussion of the VGM learning concepts including CSVs, qualifier learning, autolearning hulls, hull learning, and classifier family learning. The approach for VGM reinforcement learning is to qualify and tune the selected metrics to correspond to the ground truth, so if metrics agree with the ground truth, they are selected and then further tuned. Obviously bad ground truth data will lead to bad classifier development. The learning process followed by the master learning controller reinforces the metric selection and sorting criteria to match ground truth, and involves interactive genome selection to build the reference and target strands, followed by an automated test loop to reinforce the metrics. We manually illustrate the MLC steps in this chapter as an exercise, instead of using the MLC, to provide intuition and low-level details. In the examples for this chapter, the reference genome metrics are the ground truth. For VGM training as well as general training protocols, the reference genomes chosen are critical to establish ground truth assumptions. A human visual evaluation may not match a machine metric comparison, indicating that selected ground truth may not be a good sample to begin with, or that the metric is unreliable. The VGM learning process evaluates each reference genome metric against target test genome metrics, selecting the best scoring metrics, adjusting metric comparison weights, and reevaluating genome comparisons to learn and reinforce the best metrics to match ground truth. This is referred to as VGM classifier learning. Ground truth, as usual, is critical. The master learning controller (MLC) can automatically find the best genomes to match the reference genome ground truth assumption by comparing against various target genomes and evaluating the first order metric comparison scores. However, in this chapter the MLC process is manually emulated to provide low-level details. The MLC process works as follows: // // Learn the optimum weights to reinforce the best metrics // For each (target_genome[n]): // (target genomes may be all genomes in an image, or another strand) For each reference_genome[m] in strand Compare reference_genome[m] with target_genome[n]) Record metrics correspondence in CSV[n][m] Locate the best scoring metrics in CSV[n][m] Record and mark best scoring metrics in CSV[n][m] in weight_mask[n][m] Repeat process over selected random target_genomes Create learned master_weight_mask[m] from all weight_masks[n][m] Auto-generate code to weight each CSV using the learned master_weight_mask[m]
Strands and Genome Segmentations
Note that various VGM reinforcement learning models can be defined in separate agents, and later agents can be retrained to produce derivative agents, allowing for a family of related agents to exist and operate in parallel. The default CSV agents can be used as starting points. In the next sections we illustrate only the initial stages of the reinforcement learning process, step by step, by examining low-level details for a few select genome comparison metrics to illustrate how the agent may call MCC functions and CSV agents and then sort through correspondence scores to collect and reinforce the learnings.
Test Genomes and Correspondence Results As shown in Figure 11.7, nine test genomes have been selected from Figure 11.1. Pairs of test genomes are assigned as target and reference genomes for correspondence tests. For each of the nine test genomes, a uniform set of base genome metrics are defined for texture, shape, and color. Most of the test metrics are computed in RGB, HLS, and color leveled spaces, while some metrics such as Haralick and SDMX texture are computed over luma only. Genome comparison details are covered in subsequent sections.
Figure 11.7: Parvo genome masks, shown as RGB masks and luma masks, used for metrics comparison illustrations.
Chapter 11: Applications, Training, Results *Note on object vs. category or class learning: This chapter is concerned with specific object recognition examples as shown in Figure 11.7. VGM is primarily designed to learn a specific object, a genome, or strand of genomes, rather than learn a category of similar objects as measured by tests such as ImageNet [1] which contain tens of thousands of training samples for each class, some of which are misleading and suboptimal as training samples. VGM learning relies on optimal ground truth. However, VGM can learn categories by first learning a smaller selected set of category reference objects and then extrapolating targets to category reference objects via measuring group distance in custom agents. See Chapter 4 on object learning vs. category learning. Selected Uniform Baseline Test Metrics A uniform set of baseline test metrics are collected into CSVs for each reference/target genome comparison. The idea is to present a uniform set of metrics for developing intuition. But for a real application, perhaps independent sets of metrics should be used as needed to provide learned and tuned ground truth metrics for each reference genome. In the following sections, the uniform baseline metrics are presented spanning the selected (T) texture, (S) shape and (C) color metrics. No glyph metrics are used for these tests. Thousands of metric scores are computed for each genome compare by ten CSV agents used in an ensemble, spanning (C) color spaces, (T) texture spaces, (S) shape spaces, and quantization spaces. However, only a uniform baseline subset of the scores are used as a group to compute the final AVE and MEDIAN score for the genome compare, highlighted in yellow in the scoring tables below. Further tuning by agents would include culling the baseline metric set to use only the best metrics for the particular reference genome ground truth. Of course, other scoring strategies can be used such as deriving the average from only the first or second quartiles, rather than using all quartiles as shown herein. After each test genome pair is compared, the results are summarized in tables. Finally, the summary results for all tests are discussed and summarized in Tables 11.10, 11.11, and 11.12. Uniform Texture (T) Base Metrics The SDMX and Haralick base texture metrics for each test genome (top row of each table) are shown in Tables 11.1 and 11.2. Base metrics of each reference/target genome are compared together for scoring. In addition to the Haralick and SDMX metrics, selected volume distance texture metrics can also be used, as discussed in Chapter 6. Each base metric in the SDMX and Haralick tables below is computed as the average value of the four 0-, 45-, 90-, and 135-degree orientations of SDMs in luma images.
Strands and Genome Segmentations
Table 11.1: SDMX base metrics for test genomes (luma raw)
-bit SDMX metric (ave)
brush
leaves
stucco
right
left
Head
head
head
(lo-res)
front
saddle
squirrel enhanced
saddle rotated
low_frequency_coverage .
.
.
.
.
.
.
.
.
total_coverage
.
.
.
.
.
.
.
.
.
corrected_coverage
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
locus_length
locus_mean_density
bin_mean_density
containment
.
.
.
.
.
.
.
.
.
linearity
.
.
.
.
.
.
.
.
.
linearity_strength
.
.
.
.
.
.
.
.
.
relative_power
Table 11.2: Haralick base metrics for test genomes (luma raw) -bit Haralick (ave)
ASM
CON
CORR
IDM
ENT
brush
.
.
.
.
.
leaves
.
.
.
.
.
stucco
.
.
.
.
.
right head
.
.
.
.
.
left head
.
.
.
.
.
head (lo-res)
.
.
.
.
.
front squirrel
.
.
.
.
.
saddle rotated
.
.
.
.
.
saddle enhanced
.
.
.
.
.
Uniform Shape (S) Base Metrics In the base metrics displayed in Table 11.3, centroid shape metrics are shown taken from the raw, sharp, and retinex images as volume projections (CAM neural clusters) as discussed in Chapter 8. Also see Chapter 6 for details on volume projections. The centroid is usually a reliable metric for a quick correspondence check. Each [0,45,90,135] base centroid metric in Table 11.3 is computed as the cumulative average value of all volume projection metrics in [RAW, SHARP, RETINEX] images and each color space component [R,G,B,L, HLS_S].
Chapter 11: Applications, Training, Results Table 11.3: Centroid base metrics for the nine test genomes Centroids (-bit)
RAW
SHARP
RETINEX
brush
RAW
SHARP
RETINEX
leaves
A__DEGREES
,,
,,
,,
A__DEGREES
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
B__DEGREES
,,
,,
,,
B__DEGREES
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
C__DEGREES
,,
,,
,,
C__DEGREES
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
D__DEGREES
,,
,,
,,
D__DEGREES
,,
,,
,,
stucco
head (lo-res)
A__DEGREES
,,
,,
,,
A__DEGREES
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
B__DEGREES
,,
,,
,,
B__DEGREES
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
C__DEGREES
,,
,,
,,
C__DEGREES
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
D__DEGREES
,,
,,
,,
D__DEGREES
,,
,,
,,
right head
left head
A__DEGREES
,,
,,
,,
A__DEGREES
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
B__DEGREES
,,
,,
,,
B__DEGREES
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
C__DEGREES
,,
,,
,,
C__DEGREES
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
D__DEGREES
,,
,,
,,
D__DEGREES
,,
,,
,,
front (squirrel)
saddle rotated
A__DEGREES
,,
,,
,,
A__DEGREES
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
B__DEGREES
,,
,,
,,
B__DEGREES
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
C__DEGREES
,,
,,
,,
C__DEGREES
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
D__DEGREES
,,
,,
,,
D__DEGREES
,,
,,
,,
A__DEGREES
,,
,,
,,
RGB_VOLUME_RAW
,,
,,
,,
B__DEGREES
,,
,,
,,
RGB_VOLUME_MIN
,,
,,
,,
C__DEGREES
,,
,,
,,
RGB_VOLUME_AVE
,,
,,
,,
D__DEGREES
,,
,,
,,
saddle enhanced
Strands and Genome Segmentations
Uniform Color (C) Base Metrics The selected color base texture metrics for each test genome are listed and named in the rows of Table 11.4. The 15 uniform color metrics are computed across all available color spaces and color-leveled spaces by the CSV agents, and the average value is used for metric comparisons. Color metric details are provided as color visualizations below for each test genome: Figure 11.8 shows 5-bit and 4-bit color popularity methods, and Figure 11.9 shows the popularity colors converted into standard color histograms. Table 11.4: 8-bit Color space metrics for test genomes
Sl_lighting _SAD
Sl_lighting_ Hellinger
Sl_lighting_ Jensen Shannon
Sl_contrast_ SAD
Sl_contrast_ Hellinger
Sl_contrast_ Jensen Shannon
Popularity_ CLOSEST
Stdcolors_
Proportional_
Proportional
Proportional_
Proportional
SAD_leveled
Hellinger_
SAD
SAD
SAD
Popularity_ ABS_MATCH
leveled
Each of the color metrics is computed a total of 24 times by each CSV agent, represented as three groups [RAW,SHARP,RETINEX] in each of eight color leveled spaces. Each CSV agent is invoked six times using different match criteria to build up the classification metrics set. The color visualizations in Figures 11.8 and 11.9 contain a lot of information and require some study to understand (see Chapter 7, Figures 7.10a and 7.11, for details on interpretation).
Chapter 11: Applications, Training, Results
Figure 11.8: 5-bit MEDIAN_CUT and 5-bit POPULARITY_COLORS for each test genome. Each image contains 24 lines, each representing separate color spaces as per the details provided in Figure 7.10.
Strands and Genome Segmentations
Figure 11.9: Standard color histograms for each test genome. The histogram contains 24 lines representing separate color spaces as per the details provided in Figure 7.11.
Chapter 11: Applications, Training, Results Note that traces of unexpected greenish color are visible in some of the standard histograms in Figure 11.9 (see also Figure 11.8). This is due to a combination of (1) the LGN color space representation, (2) a poor segmentation, and (3) standard color distance conversions. The popularity colors distance functions can filter out the uncommon greenish color, by (1) ignoring the least popular colors (i.e. use the 64 most popular colors) to maximize the use of the most popular colors, or (2) using all 256 popularity colors, thereby diminishing the effect of the uncommon colors. In addition, the color region metric algorithms, as described in Chapter 7, contain several options to filter and focus on specific popularity or standard colors metric attributes. Test Genome Pairs In the next section we provide low-level details for five genome comparisons between selected test genomes to illustrate the metrics and develop intuition. The metrics are collected across each of the ten different CSV agents discussed in Chapter 4. Note that each agent is designed to favor a different set of metrics as specified in MCC parameters to tune heuristics and weighting. Therefore, each agent can compute hundreds and thousands of different metrics, and each MCC function is optimized for different metric space criteria. The five genome comparisons are: – Compare leaf : head (lo-res) genomes – Compare front squirrel : stucco genomes – Compare rotated back : brush genomes – Compare enhanced back : rotated back genomes – Compare left head : right head genomes For each genome comparison, the uniform metrics set is computed, and summary tables are provided for each score. Metrics are computed across the CST spaces for selected color, shape, and texture. The uniform set of metrics is highlighted in yellow in the tables, since not all the metrics in the tables are selected for use in the final scoring. The testing and scoring results are discussed after the test sections. Compare Leaf : Head (Lo-res) Genomes This section provides genome compare results for the test metrics on the leaf and head (lo-res) genomes in Figure 11.10. A total of 5,427 selected metrics were computed and compared by the CSV agents and summarized in Table 11.5. NOTE: Some metrics need qualification and tuning, such as dens_8, locus_mean_density. TOTAL SCORE : 1.438 NOMATCH(MEDIAN)
Strands and Genome Segmentations
Figure 11.10: Leaf:head (lo-res) genome comparison. Table 11.5: Uniform metrics comparison scores for leaf:head genomes in Figure 11.10 Sl_ lighting _SAD
Sl_ lighting_ Hellinger
Sl_ contrast_ SAD
Sl_lighting_ JensenShannon
Sl_contrast_ Hellinger
Sl_contrast_ JensenShannon
Popularity_ Popularity_ CLOSEST ABS_MATCH Agent
.
.
.
.
.
.
.
. MATCH_NORMAL
.
.
.
.
.
.
.
. MATCH_STRICT
.
.
.
.
.
.
.
. MATCH_RELAXED
.
.
.
.
.
.
.
. MATCH_HUNT
.
.
.
.
.
.
.
. MATCH_PREJUDICE
.
.
.
.
.
.
.
. MATCH_BOOSTED
Stdcolors_ SAD
Proportional_ SAD
Proportional
Proportional_ SAD
Proportional
SAD_leveled
Hellinger_ leveled
Agent
.
.
.
.
.
.
.
.
.
.
.
.
. MATCH_NORMAL
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. MATCH_PREJUDICE
.
.
.
.
.
.
. MATCH_BOOSTED
. MATCH_STRICT . MATCH_RELAXED . MATCH_HUNT
SHAPE cent__
dens__
dens__
disp__
disp__
full___
full___
spread
spread
Agent
.
.
.
.
.
.
.
.
. NORMAL
.
.
.
.
.
.
.
.
. STRICT
.
.
.
.
.
.
.
.
. RELAXED
.
.
.
.
.
.
.
.
. HUNT
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. PREJUDICE . BOOSTED
.
.
.
.
.
.
.
.
. RGB_VOLUME_RAW
.
.
.
.
.
.
.
.
. RGB_VOLUME_MIN
.
.
.
.
.
.
.
. RGB_VOLUME_LBP
.
.
.
.
.
.
.
.
. RGB_VOLUME_AVE
Chapter 11: Applications, Training, Results Table 11.5 (continued) Volume Texture A - AVE
B - AVE
C - AVE
.
D - AVE
Agent
.
. MATCH_NORMAL
.
.
. MATCH_STRICT
.
.
.
.
.
.
. MATCH_HUNT
.
.
.
. MATCH_PREJUDICE
.
.
.
. MATCH_BOOSTED
.
.
. MATCH_NORMAL
.
.
. MATCH_NORMAL
RGBVOLRAW - MIN
RGBVOLRAW - MAX
RGBVOLRAW - AVE
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
. MATCH_RELAXED
AGENT
Haralick Texture CON * .
CORR
.
IDM
ENT
GENOME COMPARE
.
.
. Leaf head (lo-res) comparison
.
.
.
. Front squirrelstucco comparison
.
.
.
. Rotated backbrush comparison
.
.
.
.
.
. Enhanced backrotated back comparison . Left headright head comparison
SDMX Texture deg
deg
deg
deg
AVE
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Metric
. low_frequency_coverage . total_coverage . corrected_coverage . relative_power . locus_length . locus_mean_density
.
.
.
.
.
.
.
. containment
. bin_mean_density
.
.
.
.
. linearity
.
.
.
.
. linearity_strength
Compare ront Squirrel : Stucco Genomes This section provides genome compare results for the test metrics on the squirrel and stucco genomes in Figure 11.11. A total of 5,427 selected metrics were computed and compared by the CSV agents and summarized in Table 11.6. NOTE: Some metrics need qualification and tuning, such as dens_8.
Strands and Genome Segmentations
CORRESPONDENCE SCORE : 15.57 NOMATCH (MEDIAN)
Figure. 11.11: Front squirrel : stucco genome comparison. Table 11.6: Uniform metrics comparison scores for squirrel:stucco genomes in Figure 11.11 Sl_contrast_
Popular-
lighting
Sl_ Sl_lighting_
JensenShan-
Sl_lighting_ Sl_contrast_
Sl_contrast_
JensenShan-
ity_
_SAD
Hellinger
non
SAD
Hellinger
non
CLOSEST
Popularity_ ABS_MATCH Agent
.
.
.
.
.
.
.
. MATCH_NORMAL
.
.
.
.
.
.
.
. MATCH_STRICT
.
.
.
.
.
.
.
. MATCH_RELAXED
.
.
.
.
.
.
.
. MATCH_HUNT
.
.
.
.
.
.
.
. MATCH_PREJUDICE
.
.
.
.
.
.
.
. MATCH_BOOSTED
Proportional
SAD_leveled
Stdcolors_
Proportional_
SAD
SAD
Proportional_ Proportional
SAD
Hellinger_ leveled
Agent
.
.
.
.
.
.
. MATCH_NORMAL
.
.
.
.
.
.
. MATCH_STRICT
.
.
.
.
.
.
. MATCH_RELAXED
.
.
.
.
.
.
. MATCH_HUNT
.
.
.
.
.
.
. MATCH_PREJUDICE
.
.
.
.
.
.
. MATCH_BOOSTED
SHAPE cent__
dens__
dens__
disp__
disp__
full___
full___
spread
spread
Agent
.
.
.
.
.
.
.
.
. NORMAL
.
.
.
.
.
.
.
.
. STRICT
.
.
.
.
.
.
.
.
. RELAXED
.
.
.
.
.
.
.
.
. HUNT
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. PREJUDICE . BOOSTED
.
.
.
.
.
.
.
.
. RGB_VOLUME_RAW
.
.
.
.
.
.
.
.
. RGB_VOLUME_MIN
.
.
.
.
.
.
.
. RGB_VOLUME_LBP
.
.
.
.
.
.
.
.
. RGB_VOLUME_AVE
Chapter 11: Applications, Training, Results Table 11.6 (continued) Volume Texture A - AVE
B - AVE
C - AVE
.E+
.E+
.E+
.E+ MATCH_NORMAL
.E+
.E+
.E+
.E+ MATCH_STRICT
.E+
.E+
.E+
.E+ MATCH_RELAXED
.E+
.E+
.E+
.E+ MATCH_HUNT
.E+
.E+
.E+
.E+ MATCH_PREJUDICE
.E+
.E+
.E+
.E+ MATCH_BOOSTED
.E+
.E+
.E+
.E+ MATCH_NORMAL
.E+
.E+
.E+
.E+ MATCH_NORMAL
RGBVOLRAW - MIN
RGBVOLRAW - MAX
D - AVE Agent
RGBVOLRAW - AVE
AGENT
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
Haralick Texture CON * .
CORR .
IDM
ENT
GENOME COMPARE
.
.
. Leaf head (lo-res) comparison
.
.
.
. Front squirrelstucco comparison
.
.
.
. Rotated backbrush comparison
.
.
.
.
.
. Enhanced backrotated back comparison . Left headright head comparison
SDMX Texture deg
deg
deg
deg
AVE
Metric
.
.
.
.
. low_frequency_coverage
.
.
.
.
. total_coverage
.
.
.
.
. corrected_coverage
.
.
.
.
. relative_power
.
.
.
. locus_length
.
.
.
.
. locus_mean_density
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. bin_mean_density . containment . linearity . linearity_strength
Strands and Genome Segmentations
Compare Rotated Back : Brush Genomes This section provides genome compare results for the test metrics on the rotated back and brush genomes in Figure 11.12. A total of 5,427 selected metrics were computed and compared by the CSV agents and summarized in Table 11.7. NOTE: Some metrics need qualification and tuning, such as Proportional_SAD8, dens_8, locus_mean_density. NOTE: Haralick and SDMX are computed in LUMA only for this example. Computing also in R,G,B,HSL_S and taking the AVE would add resilience to the metrics. CORRESPONDENCE SCORE : 1.774 NOMATCH (MEDIAN)
Figure 11.12: Rotated back:brush genome comparison. Table 11.7: Uniform metrics comparison scores for back:brush genomes in Figure 11.12 Sl_ Sl_
Sl_
lighting_
lighting
lighting_
Jensen
Sl_contrast_
Sl_contrast_
Sl_contrast_ Jensen
Popularity_
Popularity_
_SAD
Hellinger
Shannon
SAD
Hellinger
Shannon
CLOSEST
ABS_MATCH
Agent
.
.
.
.
.
.
.
. MATCH_NORMAL
.
.
.
.
.
.
.
. MATCH_STRICT
.
.
.
.
.
.
.
. MATCH_RELAXED
.
.
.
.
.
.
.
. MATCH_HUNT
.
.
.
.
.
.
.
. MATCH_PREJUDICE
.
.
.
.
.
.
Stdcolors_
Proportional_
SAD
SAD
Proportional_ Proportional
SAD
MATCH_BOOSTED
Hellinger_ Proportional
SAD_leveled
leveled
Agent
.
.
.
.
.
.
.
.
.
.
.
.
. MATCH_NORMAL . MATCH_STRICT
.
.
.
.
.
.
. MATCH_RELAXED
.
.
.
.
.
.
. MATCH_HUNT
.
.
.
.
.
.
. MATCH_PREJUDICE
.
.
.
.
.
. MATCH_BOOSTED
Chapter 11: Applications, Training, Results Table 11.7 (continued) SHAPE cent__
dens__
dens__
disp__
disp__
full___
full___
spread
spread
Agent
.
.
.
.
.
.
.
.
. NORMAL
.
.
.
.
.
.
.
.
. STRICT
.
.
.
.
.
.
.
.
. RELAXED
.
.
.
.
.
.
.
.
. HUNT
.
.
.
.
.
.
.
.
. PREJUDICE
cent__
dens__
dens__
disp__
disp__
full___
full___
spread
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. RGB_VOLUME_RAW
.
.
.
.
.
.
.
.
. RGB_VOLUME_MIN
.
.
.
.
.
.
.
. RGB_VOLUME_LBP
.
.
.
.
.
.
.
.
. RGB_VOLUME_AVE
SHAPE spread Agent . BOOSTED
Volume Texture A - AVE
B - AVE
C - AVE
D - AVE
Agent
.
.
.
. MATCH_NORMAL
.
.
.
. MATCH_STRICT
.
.
.
. MATCH_RELAXED
.
.
.
.
.
.
.
.
.
.
.
.
. MATCH_NORMAL
.
.
.
. MATCH_NORMAL
RGBVOLRAW - MIN
RGBVOLRAW - MAX
. MATCH_HUNT . MATCH_PREJUDICE . MATCH_BOOSTED
RGBVOLRAW - AVE
AGENT
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
Haralick Texture CON * .
CORR
IDM
ENT
GENOME COMPARE
.
.
.
.
Leaf head (lo-res) comparison
.
.
.
.
Front squirrelstucco comparison
.
.
.
.
Rotated backbrush comparison
.
.
.
.
.
.
.
Enhanced backrotated back comparison Left headright head comparison
Strands and Genome Segmentations
Table 11.7 (continued) SDMX Texture deg
deg
deg
deg
AVE
Metric
.
.
.
.
.
low_frequency_coverage
.
.
.
.
.
total_coverage
.
.
.
.
.
corrected_coverage
.
.
.
.
.
relative_power
.
.
.
locus_length
.
.
.
.
.
locus_mean_density
.
.
.
.
.
bin_mean_density
.
.
.
.
.
containment
.
.
.
.
.
linearity
.
.
.
.
.
linearity_strength
Compare Enhanced Back : Rotated Back Genomes This section provides genome compare results for the test metrics on the enhanced back and rotated back genomes in Figure 11.13. A total of 5,427 selected metrics were computed and compared by the CSV agents, and summarized in Table 11.8. NOTE: Several metrics need qualification and tuning, and are slightly above 1.0 which is the match threshold, such as linearity_strength.* This is a case where tuning and second order metric training is needed, since the total score is so close to the 1.0 threshold. *Haralick and SDMX are computed in LUMA only for this example. Computing also in R,G,B,HSL_S and taking the AVE would add resilience to the metrics. CORRESPONDENCE SCORE: 0.9457 MATCH (MEDIAN)
Figure 11.13: Enhanced back: rotated back genome comparison.
Chapter 11: Applications, Training, Results Table 11.8: Uniform metrics comparison scores for enhanced back:rotated back genomes in Figure 11.13 Sl_contrast_
Popular-
Sl_lighting
Sl_lighting_
JensenShan-
Sl_lighting_ Sl_contrast_
Sl_contrast_
JensenShan-
ity_
Popularity_
_SAD
Hellinger
non
SAD
Hellinger
non
CLOSEST
ABS_MATCH
Agent
.
.
.
.
.
.
.
.
MATCH_NORMAL
.
.
.
.
.
.
.
.
MATCH_STRICT
.
.
.
.
.
.
.
.
MATCH_RELAXED
.
.
.
.
.
.
.
.
MATCH_HUNT
.
.
.
.
.
.
.
.
MATCH_PREJUDICE
.
.
.
.
.
.
.
.
MATCH_BOOSTED
Stdcolors_
Proportional_
SAD
SAD
Proportional_ Proportional
Hellinger_
SAD
Proportional
SAD_leveled
leveled
Agent
.
.
.
.
.
.
.
MATCH_NORMAL
.
.
.
.
.
.
.
MATCH_STRICT
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
MATCH_PREJUDICE
.
.
.
.
.
.
.
MATCH_BOOSTED
MATCH_RELAXED MATCH_HUNT
SHAPE cent__
dens__
dens__
disp__
disp__
full___
full___
spread
spread
Agent
.
.
.
.
.
.
.
.
.
NORMAL
.
.
.
.
.
.
.
.
.
STRICT
.
.
.
.
.
.
.
.
.
RELAXED
.
.
.
.
.
.
.
.
.
HUNT
.
.
.
.
.
.
.
.
.
PREJUDICE
.
.
.
.
.
.
.
.
.
BOOSTED
.
.
.
.
.
.
.
.
.
RGB_VOLUME_RAW
.
.
.
.
.
.
.
.
.
RGB_VOLUME_MIN
.
.
.
.
.
.
.
.
RGB_VOLUME_LBP
.
.
.
.
.
.
.
.
.
RGB_VOLUME_AVE
Volume Texture A - AVE
B - AVE
C - AVE
D - AVE
Agent
.
.
.
.
MATCH_NORMAL
.
.
.
.
MATCH_STRICT
.
.
.
.
.
.
.
.
.
.
.
.
MATCH_PREJUDICE
.
.
.
.
MATCH_BOOSTED
.
.
.
.
MATCH_NORMAL
.
.
.
.
MATCH_NORMAL
MATCH_RELAXED MATCH_HUNT
Strands and Genome Segmentations
Table 11.8 (continued) RGBVOLRAW - MIN
RGBVOLRAW - MAX
RGBVOLRAW - AVE
AGENT
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
Haralick Texture CON * .
CORR
IDM
ENT
GENOME COMPARE
.
.
.
.
Leaf head (lo-res) comparison
.
.
.
.
Front squirrelstucco comparison
.
.
.
.
Rotated backbrush comparison
.
.
.
.
.
.
.
Enhanced backrotated back comparison Left headright head comparison
SDMX Texture deg
deg
deg
deg
AVE
Metric
.
.
.
.
.
low_frequency_coverage
.
.
.
.
.
total_coverage
.
.
.
.
.
corrected_coverage
.
.
.
.
.
relative_power
.
.
.
.
.
locus_length
.
.
.
.
.
locus_mean_density
.
.
.
.
bin_mean_density
.
.
.
.
.
containment
.
.
.
.
.
linearity
.
.
.
.
.
linearity_strength
Compare Left Head : Right Head Genomes This section provides genome compare results for the test metrics on the left head and right head genomes in Figure 11.14. NOTE: the genomes should be identical, and are mirrored images of each other. A total of 5,427 selected metrics were computed and compared by the CSV agents, and summarized in Table 11.9. NOTE: Several metrics are perfect matches, such as centroid8 and Haralick CORR. And as usual the centroid metric works very well.
Chapter 11: Applications, Training, Results CORRESPONDENCE SCORE: 0.1933 MATCH (MEDIAN)
Figure 11.14: Left head: right head genome comparison. Table 11.9: Uniform metrics comparison scores for left head:right head genomes in Figure 11.14 Sl_
Sl_
Sl_lighting_
Sl_
Sl_
lighting
lighting_
Jensen
contrast_
contrast_
Sl_contrast_ Jensen
Popularity_
Popularity_
_SAD
Hellinger
Shannon
SAD
Hellinger
Shannon
CLOSEST
ABS_MATCH
Agent
.
.
.
.
.
.
.
. MATCH_NORMAL
.
.
.
.
.
.
.
. MATCH_STRICT
.
.
.
.
.
.
.
. MATCH_RELAXED
.
.
.
.
.
.
.
. MATCH_HUNT
.
.
.
.
.
.
.
. MATCH_PREJUDICE
.
.
.
.
.
.
Stdcolors_
Proportional_
SAD
SAD
Proportional_ Proportional
MATCH_BOOSTED
Hellinger_
SAD
Proportional
SAD_leveled
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
leveled
Agent
. MATCH_NORMAL . MATCH_STRICT . MATCH_RELAXED . MATCH_HUNT . MATCH_PREJUDICE . MATCH_BOOSTED
SHAPE cent__
dens__
dens__
disp__
disp__
full___
full___
spread
spread
Agent
.
.
.
.
.
.
.
.
. NORMAL
.
.
.
.
.
.
.
.
. STRICT
.
.
.
.
.
.
.
.
. RELAXED
.
.
.
.
.
.
.
HUNT
.
.
.
.
.
.
.
PREJUDICE
.E
.
.
.
.
.
.
.
.
.
.
.
.
.
RGB_VOLUME_RAW
.
.
.
.
.
.
.
RGB_VOLUME_MIN
.
.
.
.
.
RGB_VOLUME_LBP
.
.
.
.
.
.
.
RGB_VOLUME_AVE
BOOSTED
Strands and Genome Segmentations
Table 11.9 (continued) Volume Texture A AVE
B AVE
C AVE
D AVE
Agent
.
.
.
. MATCH_NORMAL
.
.
.
. MATCH_STRICT
.
.
.
. MATCH_RELAXED
.
.
.
. MATCH_HUNT
.
.
.
. MATCH_PREJUDICE
.
.
.
. MATCH_BOOSTED
.
.
.
. MATCH_NORMAL
.
.
.
. MATCH_NORMAL
RGBVOLRAW MIN
RGBVOLRAW MAX
RGBVOLRAW AVE
AGENT
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
.
.
.
MATCH_NORMAL
Haralick Texture CON * .
CORR .
IDM
ENT
.
.
.
.
.
.
.
.
.
.
.
.
.
GENOME COMPARE . Leaf head (lores) comparison . Front squirrelstucco comparison . Rotated backbrush comparison . Enhanced backrotated back comparison . Left headright head comparison
SDMX Texture deg
deg
.
deg
deg
AVE
Metric
.
.
.
.
.
.
.
. total_coverage
.
.
.
.
. corrected_coverage
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. low_frequency_coverage
. relative_power . locus_length . locus_mean_density . bin_mean_density . containment . linearity . linearity_strength
Chapter 11: Applications, Training, Results Test Genome Correspondence Scoring Results All compare metrics are weighted at 1.0 unless otherwise noted; these are raw first order metric scores without qualification and weight tuning applied. The metric correspondence scores all come out at reasonable levels, and some scores may be suitable as is with no additional tuning or learning beyond the built-in autolearning hull learning applied to all first order metric comparisons by the MCC functions. As expected, the left and right mirrored squirrel heads show the best correspondence and the stucco and squirrel head show the worst correspondence. Note that not all the uniform compare metrics are useful in each case. Some are more reliable than others for specific test genomes, and as expected the optimal metrics must be selected and tuned using reinforcement learning for best results. The uniform base metrics used for scoring are summarized in Tables 11.10 and 11.11 below, including agent override criteria details. Table 11.10: Uniform set of metrics for the test genome correspondence scoring Metric Spaces Computed Metric Category
Per Category
Agent Overrides
Color metrics
Proportional_SAD Sl_contrast_JensenShannon Popularity_CLOSEST
MR_SMITH BOOSTED MR_SMITH NORMAL MR_SMITH NORMAL
Volume shape metrics
Centroid_ Density_
AGENT_RGBVOL_RAW AGENT_RGBVOL_MIN
Volume texture metrics
RGB VOL AVE of (,,, GENOMES) R,G,B,L HSL_S AVE of (,,, GENOMES)
AGENT_RGBVOL_RAW AGENT_RGBVOL_AVE
SDMX metrics
SDMX AVE locus_mean_density SDMX AVE linearity_strength
MR_SMITH NORMAL MR_SMITH NORMAL
Haralick metrics
HARALICK CONTRAST (scaled uniformly using a weight of .)
MR_SMITH NORMAL
Strands and Genome Segmentations
Table 11.11: Total base metrics computed for each test genome Metric Category
Total Metrics
Metric Spaces Computed Per Category
Color metrics
,
metrics, color space components [R,G,B,L,HLS_S], color levels, images [RAW,SHARP,RETINEX,HISTEQ,BLUR], agents: total metrics = xxx
Volume shape metrics
,
metrics, color space components [R,G,B,L,HLS_S], images [RAW,SHARP,RETINEX,HISTEQ,BLUR], agents, total metrics xxx
Volume texture metrics
,
metrics, color space components [R,G,B,L,HLS_S], images [RAW,SHARP,RETINEX,HISTEQ,BLUR], agents, total metrics: xxx=
SDMX metrics
metrics, angles, images [RAW,SHARP,RETINEX], I color LUMA: total metrics: xxx=
Haralick metrics
metrics, images [RAW,SHARP,RETINEX], color LUMA, total metrics: xx=
ALL TOTAL
,
Total of CSV metric comparisons made for each genome compare
A total 5,427 base metrics are computed for each genome comparison by the CSV agents, as shown in Table 11.11 and then compared using various distance functions as discussed in Chapters 6–10. The final correspondence scores are summarized in Table 11.12, showing that the uniform set of metrics used in this case does in fact produce useful correspondence. The CSV agents used to collect the test results include built-in combination classification hierarchies as well as metrics qualifier pairs and weight adjustments (see Chapter 4 and the open source code for CSV agent for details). In an application, using a uniform set of metrics for all genome comparisons is usually not a good idea, since each genome still corresponds best to specific learned metrics and tuning. In other words, each reference genome is a separate ground truth, and the best metrics to describe each reference genome must be learned, as well as adding qualifier metrics tuning using trusted and dependent metrics as discussed in Chapter 4. Therefore, each genome is optimized for correspondence by learning its own classifier. Scoring Results Discussion Scoring results for all tests are about as expected. Table 11.12 summarizes the final scores. Note that even using only first order metric scores for the tests, the results are good, and further training and reinforcement learning would improve the scoring. Note that the left/right head genome mirrored pair are expected to have the best cumulative scores, as observed in Table 11.12, while the enhanced back and rotated back are very similar and do match, revealing robustness to rotation and color similarity.
Chapter 11: Applications, Training, Results The other genome scores show nonmatches as expected. For scoring, the MEDIAN may be preferred instead of the AVE if the scores are not in a uniform distribution or range, as is the case for the front squirrel/stucco compare scores (see Table 11.12). Table 11.12: Test correspondence scores (< 1.0 = MATCH, 0 = PERFECT MATCH) Genome Compare Left/ Right head
Genome Compare Enhanced back/ Rotated
Genome Compare Rotated Back/ Brush
Genome Compare Front Squirrel/ Stucco
Genome Compare Leaf/ Head (lo-res)
AVE
.
.
.
.
.
MEDIAN
.
.
.
.
.
ADJUSTED SCORE*
.
.
.
.
.
*NOTE: Adjusted score is the average of the AVE+MEDIAN. Other adjusted score methods are performed by each CSV agent.
To summarize, the test metric correspondence scores in Table 11.12 are first order metrics produced by the CSV agents and MCC functions using default agent learning criteria and default overrides. No second order tuned weights are applied to the scores. Also note that the test scores in the preceding individual genome tests show that not all the metrics are useful in a raw first order state and need qualification pair and tuning to incorporate into an optimal learned classifier. Scoring Strategies and Scoring Criteria The master learning controller (MLC) would select a similar set of metrics similar to those in Table 11.4 and then create qualifier metrics and dependent metric tuning weights for a series of tests. As discussed in the “VGM Classifier Learning” section of Chapter 4, we can learn the best scoring feature metrics to reserve as trusted metrics and then qualify and weight dependent metrics based on the strength of the trusted metrics scores. Using qualifier metrics to tune dependent metrics is similar to boosting strategy used in HAAR feature hierarchies (see [168]). However, scoring criteria are developed using many strategies, none being a perfect strategy. If the trainer decides that the sky is red, and the metrics indicate that the sky is blue, the metrics will be adjusted to match ground truth (“No, the sky should be red, it just looks blue; tune the metrics to make the sky look red, minimize this metric contribution in the final classifier, and adjust the training set and test set to make it appear that the sky is actually red.”). This is the dilemma of choosing ground truth, training sets, test sets, metrics, distance functions, weight tunings, and classifier algorithms.
Strands and Genome Segmentations
Unit Test for First Order Metric Evaluations This section contains the results for three unit tests and is especially useful to understand genome ground truth selection and scoring issues. The unit tests computing correspondence between visually selected genome pairs, intended to explore cases where the human trainer has uncertainty (close match or maybe a spoof), as well as where the human trainer expects a match or nonmatch. The tests provide only first order comparisons against a uniform set of metrics that are not learned or tuned. The unit tests include a few hundred genome compare examples total—valuable for building intuition about MCC functions. Unit Test Groups The test groups are organized by creating named test strands containing a pair of genomes to compare as follows: – GROUP 1: MATCH_TEST: The two genomes look like a match, based on a quick visual check. The metric compare scores in this test set do not all agree with the visual expectation, and this is intended to illustrate how human and machine comparisons differ. – GROUP 2: NOMATCH_TEST: The two genomes do not match, based on a quick visual check. This is expected to be the easiest test group to pass with no false positives. – GROUP 3: CLOSE_TEST: The genomes are chosen to look similar and are deliberately selected to be hard to differentiate and cause spoofing. This test group is the largest, allowing the metrics to be analyzed to understand details. For the unit tests, the test pairs reflect human judgment without the benefit of any training and learning. Therefore, the test pairs are sometimes incorrectly scored, intended to demonstrate the need for refinement learning and metrics tuning. Scoring problems may be due to (1) arbitrary and unlearned metrics being applied or (2) simply human error in selecting test pairs. Unit Test Scoring Methodology The scoring method takes the average values from a small, uniform set of metrics stored in CSVs for shape, color, and texture, as collected by the ./vgv RUN_COMPARE_TEST command output; a sample is shown below. The metrics in the CSV are not tuned or reinforced as optimal metrics, and as a result the average value scoring method used is not good and will be skewed away from a more optimal tuned metric score. The unit tests deliberately use nonoptimal metrics to cause the reader to examine correspondence problems to force analysis of each metric manually, since this exercise is intended for building intuition about metric selection, tuning, and learning.
Chapter 11: Applications, Training, Results The complete unit test image set and batch unit test commands is available in the developer resources in open source code library, including an Excel spreadsheet with all metric compare scores as shown in the ./vgv RUN_COMPARE_TEST output below. Each unit test is run using the vgv tool (the syntax is shown below), where the file name parameters at the end of each vgv command line (TEST_MATCH.txt, TEST_NOMATCH.txt, TEST_CLOSE.txt) contains the list of strand files containing the chosen genome reference and target pairs encode in individual strand files. The strand reference and target genome pairs can be seen as line-connected genomes in the test images shown in the unit tests in Figures 11.15, 11.16, and 11.17. $ ./vgv RUN_COMPARE_TEST $ ./vgv RUN_COMPARE_TEST $ ./vgv RUN_COMPARE_TEST
/ALPHA_TEST/ /ALPHA_TEST/ /ALPHA_TEST/
TEST_IMAGE.png TEST_IMAGE.png TEST_IMAGE.png
TEST_IMAGE_DIR/ TEST_IMAGE_DIR/ TEST_IMAGE_DIR/
TEST_MATCH.txt TEST_NOMATCH.txt TEST_CLOSE.txt
The final score for each genome pair compare is computed by the CSV agent agent_mr_smith() using a generic combination of about fifty CTS metrics, heavily weighted to use CAM clusters from volume projection metrics. Without using a learned learned classifier, metric qualification, or tuning, the scoring results are not expected to be optimal and only show how good (or bad) the autolearning hull thresholds actually work for each metric compare. For example, compare metrics produced by agent mr_smith NORMAL are shown below for a genome comparison between the F18 side (gray) and the blue sky by the giant Sequoia trees (see Figure 11.17). The final average scores are shown at the end of the test output, as a summary AVE score of color, shape, and texture scores. Also, the independent AVE scores for color, shape, and texture are listed as well, showing that the color score of 6.173650 (way above the 1.0 match threshold) is an obvious indication of the two genomes not matching. ================================================================= !!! run_compare_test() :: test 0 :: STRAND__0000000000000000_CLOSE_F18_side_and_sky.strand !!! ================================================================= ************ COMPARE A PAIR OF METRICS FILES & GENOME FILES ************ ************ AGENT_COMPARE :: Calling agent_hpp(MR_SMITH, 0 0) ************ VOLUME PROJECTION METRICS FOR ABCD ORIENTATIONS – CAM NEURAL CLUSTERS - (BEST SCORES) A_0 deg. B 90 deg C 135 deg. D 45 deg. DISTANCE FUNCTION ---------- ---------- ---------- --------------------------------------------------------------------------A 002.1744 B 001.8444 C 001.8959 D 001.7419 : METRIC_texture_SAD_genome_correlation A 001.3315 B 001.2270 C 000.8975 D 001.9645 : METRIC_texture_IntersectionSAD_genome_correlation A 004.5912 B 003.4654 C 003.2624 D 003.3306 : METRIC_texture_SSD_genome_correlation A 003.7322 B 002.9356 C 000.6163 D 003.7083 : METRIC_texture_IntersectionSSD_genome_correlation A 005.4542 B 003.9923 C 003.9741 D 002.6546 : METRIC_texture_Hellinger_genome_correlation A 002.6464 B 002.1619 C 001.3709 D 004.1064 : METRIC_texture_IntersectionHellinger_genome_correlation A 001.8120 B 001.7394 C 001.8052 D 001.7152 : METRIC_texture_Hammingsimilarity_genome_correlation A 001.8120 B 001.7394 C 001.8052 D 001.7152 : METRIC_texture_IntersectionHammingsimilarity_genome_correlation A 001.8656 B 001.7249 C 001.6195 D 001.6133 : METRIC_texture_Chebychev_genome_correlation A 001.4125 B 001.5173 C 002.4158 D 001.3723 : METRIC_texture_IntersectionDivergencesimilarity_genome_correlation A 013.3407 B 008.3001 C 009.5862 D 001.1830 : METRIC_texture_Outliermagnitude_genome_correlation A 053.8464 B 032.1818 C 016.9527 D 001.0618 : METRIC_texture_Outlierratio_genome_correlation A 001.1133 B 001.4927 C 010.0647 D 001.4369 : METRIC_texture_Cosine_genome_correlation A 000.8341 B 001.0287 C 004.6751 D 000.7755 : METRIC_texture_IntersectionCosine_genome_correlation
Strands and Genome Segmentations
A A A A A A A A
001.0046 000.3954 000.0000 000.0000 002.1743 002.0506 000.0000 000.0000
B B B B B B B B
001.4117 C 013.2461 D 001.3929 : METRIC_texture_Jaccard_genome_correlation 000.5405 C 007.6901 D 000.7412 : METRIC_texture_IntersectionJaccard_genome_correlation 000.0000 C 000.0000 D 000.0000 : METRIC_texture_Fidelity_genome_correlation 000.0000 C 000.0000 D 000.0000 : METRIC_texture_IntersectionFidelity_genome_correlation 001.8442 C 001.5167 D 001.7094 : METRIC_texture_Sorensen_genome_correlation 001.7363 C 000.9721 D 001.9104 : METRIC_texture_IntersectionSorensen_genome_correlation 000.0000 C 000.0000 D 000.0000 : METRIC_texture_Canberra_genome_correlation 000.0000 C 000.0000 D 000.0000 : METRIC_texture_IntersectionCanberra_genome_correlation
COLOR METRICS color_signature->criteria 0 color_signature->agent_overrides 1 027.7231 METRIC_popularity5_overlap_bestmatch 003.9174 METRIC_popularity5_overlap 002.3360 METRIC_popularity5_proportional_bestmatch_SAD 006.7802 METRIC_popularity4_overlap_bestmatch 003.9792 METRIC_popularity4_overlap 000.6787 METRIC_popularity4_proportional_bestmatch_SAD 000.8796 METRIC_popularity5_standardcolors_SAD 001.6601 METRIC_popularity5_standardcolors_Hamming 000.8766 METRIC_popularity4_standardcolors_SAD 001.6573 METRIC_popularity4_standardcolors_Hamming 004.5645 METRIC_SAD_match_strength_leveled8 010.0390 METRIC_Hellinger_match_strength_leveled8 SHAPE METRICS shape_signature->criteria 6 shape_signature->agent_overrides 1 004.2778 METRIC_shape_centroid8 001.6073 METRIC_shape_density8 000.4140 METRIC_shape_density5 003.3914 METRIC_shape_displacement8 000.4325 METRIC_shape_displacement5 016.8291 METRIC_shape_full8 001.4148 METRIC_shape_full5 005.4746 METRIC_shape_spread8 000.9254 METRIC_shape_spread5 >>> match_strength_color 6.173650 match_strength_shape 2.942523 match_strength_texture 2.050557 >>> AGENT_NORMAL :: AGENT_SCORE_NO_MATCH :: 3.722243 (AVERAGE OF COLOR, SHAPE, TEXTURE) === >>> DONE scoring. +++ Finished AGENT_COMPARE.
As shown in the test output above, the volume projection metrics (CAM neural clusters) show quite a bit of variability according to the orientation of the genome comparison (A, B, C, or D). Using the average orientation score does not seem to be very helpful. Rather, simply selecting the best scoring orientation from each CAM comparison and culling the rest would certainly make sense as a metric selection criteria, followed by a reinforcement learning phase. The color and texture metrics show similar scoring results, illustrating how metrics selection must be learned for optimal correspondence. MATCH Unit Test Group Results The MATCH test genome pairs are connected by lines, as shown in Figure 11.15. Each MATCH pair is visually selected to be a reasonable genome match. The final AVE scores for each genome compare are collected as shown in Table 11.13 and reflect no metric tuning or reinforcement learning. The results show mainly good match scoring < 1.0, which is to be expected in this quick visual pairing. However, the scoring results generally validate the basic assumptions of the VGM volume learning method.
Chapter 11: Applications, Training, Results
Figure 11.15: MATCH test genome pairs connected by lines Table 11.13: MATCH scores from genome pairs in Figure 11.15 STRAND
SCORE
MATCH_F_boxes.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_F_shade.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_Guadalupe_flows.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_birch_branch.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_birch_branch_shadows.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_birch_branches.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_blue_sky.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_duck_back_scaled.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_eves_shade.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_foliage_shaded.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_foliage_shaded.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_mountain_high.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_mountain_top.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_mulch_shadow.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_red_leaves.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
Strands and Genome Segmentations
Table 11.13 (continued) STRAND
SCORE
MATCH_rose_leaves_variegated.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_sequoia_dark_shade.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_squirrel_back_shade.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_squirrel_body_brightness.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_squirrel_body_lightness.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_squirrel_head_scaled.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_squirrel_head_scaled.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_squirrel_tail_scaled.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_stucco_blurred.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_stucco_brown.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_stucco_shadows.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_white_blurry_stucco.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_white_stucco_far.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_white_stucco_shadows.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_window_shades_dark.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_window_shades_scaled.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_window_shades_scaled_dark.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_F_side.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_bush.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_dark_squrrel.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_dark_stream.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_dark_window_shades.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_green_stucco.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_light_brown_stucco.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_light_squirrel.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_red_leaves.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_rose_foliage.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_scaled_squirrel_tails.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_sequoia_foliage.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_white_stucco.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_mulch_patch.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_F_black_box.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_F_dashboard.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_F_side.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_F_topside.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_Guadalupe_river.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
Chapter 11: Applications, Training, Results Table 11.13 (continued) STRAND
SCORE
MATCH_black_rail.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_blue_sky.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_blurry_white_stucco.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_branch_pieces.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_bricks.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_brown_branch.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_brown_stucco.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_dark_bird_parts.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_dark_branch_pieces.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_dark_squirrel_sides.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_dark_window_shade.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_distant_forest.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_dove_bird.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_drainpipe.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_drainpipe_sections.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_duck_back_scaled.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_forest.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_green_foliage_scaled.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_green_stucco.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_gutter_sections.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_gutters.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_red_dark_leaves.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_red_leaf.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_red_leave_blurry.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_reddish_leaves.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_rose_leaves.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_rose_leaves.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
MATCH_rose_stem_and_leaves.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_scaled_sparrow_back.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_scaled_stucco.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_sequoia_foliage.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_sequoia_trunk.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_sequoia_trunk_parts.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_shrub_dead_leaves.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_shrub_leaves.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_shrub_pieces.strand
AGENT_SCORE_NO_MATCH :: .
Strands and Genome Segmentations
Table 11.13 (continued) STRAND
SCORE
MATCH_sky_pieces.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_squirrel_front_leg.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_squirrel_head_scaled.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
MATCH_squirrel_heads_scaled.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_squirrel_side_patch.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_squirrel_tail.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
MATCH_squirrel_thigh.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_squirrel_thigh.strand
AGENT_SCORE_NO_MATCH :: .
MATCH_textured_stucco.strand MATCH_thin_plant.strand
AGENT_SCORE_POSSIBLE_MATCH :: . AGENT_SCORE_EXCELLENT_MATCH :: .
Plot of Table 11.13 MATCH scores: AVE 1.67 MEDIAN 1.260213
NOMATCH Unit Test Group Results The NOMATCH test genome pairs shown in Figure 11.16 are selected to be visually obvious nonmatches for false positive testing. The final AVE scores are collected 11.14, reflecting no metric tuning or reinforcement learning. The results show zero matches scoring < 1.0, which is to be expected, and generally validates the basic assumptions of the VGM volume learning method.
Chapter 11: Applications, Training, Results
Figure 11.16: NOMATCH test genome pairs connected by lines. Table 11.14: NOMATCH scores from genome pairs in Figure 11.16 STRAND
CORE
NOMATCH_Guadalupe_and_gray_sweater.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_brick_and_stucco.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_ceiling_and_gray_sweater.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_clouds_and_leave.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_dark_bird_and_dark_foliage.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
NOMATCH_dark_bird_and_dark_windfow_shades.strand AGENT_SCORE_UNLIKELY_MATCH :: . NOMATCH_dark_rail_and_F_side.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_dove_and_sky.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_duck_back_and_gray_sweater.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_face_and_F_side.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_face_and_dark_window_shades.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_face_and_dead_leaves.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_face_and_reddish_blurry_leaf.strand
AGENT_SCORE_NO_MATCH :: .
Strands and Genome Segmentations
Table 11.14 (continued) STRAND
CORE
NOMATCH_face_and_stucco.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_gray_sweater_and_F_blackbox.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_gray_sweater_and_F_side.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_red_blurred_leaf_and_stucco.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_red_leaf_and_foliage.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_red_leaf_and_squirrel_back.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_reddish_leave_and_mulch.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_sky_and_branch.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_squirrel_and_bird.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
NOMATCH_squirrel_and_brown_wood.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_stucco_and_bricks.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_yellow_and_reddish_leaves.strand
AGENT_SCORE_NO_MATCH :: .
NOMTACH_stucco_and_squirrel.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_F_metals_dark_light.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
NOMATCH_F_metals_highlights.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_F_and_sequoias.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_bird_and_rail.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_bird_and_red_foliage.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_blue_sky_and_white_sky.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_dark_stucco_and_squirrel.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_dark_window_shade_and_white_stucco.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_green_stucco_and_dark_bird.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_green_stucco_and_foliage.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_red_foliage_and_squirrel.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_roof_eves_and_bird.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_roof_eves_and_sky.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_rose_leave_and_squirrel.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_squirrel_and_F.strand
AGENT_SCORE_NO_MATCH :: .
Chapter 11: Applications, Training, Results
Plot of Table 11.14 NOMATCH scores : AVE 4.82 MEDIAN 4.042901
CLOSE Unit Test Group Results The CLOSE test genome pairs shown in Figure 11.17 are selected to be visually close to matches, but are intended to fool the scoring and metric anomalies for further study. The final AVE scores are collected as shown in Table 11.15, reflecting no metric tuning or reinforcement learning. The results show only three matches scoring < 1.0, with a few other scores close to 1.0, validating the intent of the test.
Figure 11.17: CLOSE test genome pairs connected by lines.
Strands and Genome Segmentations
Table 11.15: CLOSE scores from genome pairs in Figure 11.17 STRAND
SCORE
CLOSE_F_side_and_sky.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_F_white_gear.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
CLOSE_Guadalupe_and_F_side.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_bark_mulch.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_bird_back.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
CLOSE_branch_and_dark_dove.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_brown_bird.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_brown_bird_and_mulch.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_cream_branch.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
CLOSE_dark_bush_and_dark_sequoia.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_dark_rail_and_F_black_box.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_dark_window_shade_and_F_black_box.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_ducks_scaled_in_Guadalupe.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
CLOSE_face_and_mulch.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_face_and_sequoia_trunk.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_foresst_and_shrub.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_green_stucco_and_forest.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_gutter_and_drainpipe.strand
AGENT_SCORE_POSSIBLE_MATCH :: .
CLOSE_mulch_and_sequoia_trunk.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_rose_old_new.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_scaled_shaded_squirrels.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_sequoia_and_rose_leaf.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_sequoia_and_shrub.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_shaded_scaled_blurred_squirrel_side.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_shrubbery.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_squirrel_and_brown_bird.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
CLOSE_squirrel_and_brown_bird_back.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_squirrels_shaded_blurred.strand
AGENT_SCORE_NO_MATCH :: .
Chapter 11: Applications, Training, Results Table 11.15 (continued) STRAND
SCORE
CLOSE_stucco_white_brown.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_yellow_leaves.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_yellowish_leaves.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_F_metal_regions.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_F_metals.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_F_shaded_regions.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_F_white_metal_red_trim_bad_segmentation.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_Guadalupe_river.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
CLOSE_birch_branch_brightness.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_bird_and_bush_stem.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_bush_trunk_and_birch_trunk.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_ceiling_and_stucco.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_ceiling_bright_and_stucco.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_dark_squirrel.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_dead_leaves_and_mulch.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_different_green_leaves.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_duck_back.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_foliage_and_foliage.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_foliage_and_rose_leaf.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_foliage_and_rose_leaves.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_foliage_shaded.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_light_squirrel.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_pole_and_Fbox.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_red_leaves_different.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_sequoia_shade_scale.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_sequoia_shaded.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_squirrel_head_and_thigh.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_squirrel_saddle_and_tail.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_squirrel_tail_and_breast.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_stucco_brown_white.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_F_and_sky.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_bird_and_suirrel.strand
AGENT_SCORE_NO_MATCH :: .
Strands and Genome Segmentations
Table 11.15 (continued) STRAND
SCORE
CLOSE_branch_and_bird.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_brick_and_dark_stucco.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_bush_and_rose_foliage.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_dead_leaves_and_mulch.strand
AGENT_SCORE_EXCELLENT_MATCH :: .
CLOSE_fall_leaves.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_metal_rail_and_box.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_mulch_and_squirrel.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_dark _eves_and_stucco.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_rose_and_sequoia_foliage.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_rose_and_sequoia_foliage.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_sequoia_branches_and_sky.strand
AGENT_SCORE_UNLIKELY_MATCH :: .
CLOSE_sequoia_trunk_and_mulch.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_sequoia_trunk_and_squirrel_tail.strand
AGENT_SCORE_NO_MATCH :: .
CLOSE_yellowish_and_reddish_leaves.strand
AGENT_SCORE_NO_MATCH :: .
NOMATCH_Guadalupe_and_gray_sweater.strand
AGENT_SCORE_NO_MATCH :: .
Plot of Table 11.15 CLOSE scores : AVE 3.57 MEDIAN 3.367237
Chapter 11: Applications, Training, Results
Agent Coding Basic boilerplate agent code is provided below, illustrating one method to write a basic agent to: register a custom agent, receive sequencer pipeline callbacks, receive correspondence callbacks, and start and stop the pipeline. This section is intended to be brief, since the VGM open source code library contains all the information and examples needed to write agent code. See Chapter 12 for details on open source code. //////////////////////////////////////////////////////////////////////////////////////// // // AGENT OUTLINE //
#define AGENT_REGISTRY_LIMIT 100 #define STRAND_REGISTRY_LIMIT 100 image_t global_image_registry[AGENT_REGISTRY_LIMIT]; agent_t global_agent_registry[AGENT_REGISTRY_LIMIT]; agent_t global_strand_registry[STRAND_REGISTRY_LIMIT]; class squirrel_agent : public agent_dll_base_class { STATUS init() { return STATUS_OK; } ///////////////////////////////////////////////////////////////////// // SQUIRREL Custom entry point // STATUS agent_custom_entry( void *data) { // Add pre-processing here to set up anything return STATUS_OK; } ///////////////////////////////////////////////////////////////////// // SQUIRREL Sequencer Controller Callback // STATUS sequencer_controller_callback( global_metrics_structure_t * metrics, // contains genome_ID, file_ID, ... int sequence_number) // NULL when sequencing finished, 1..n otherwise { // Called after each genome is segmented, after base metrics computed return STATUS_OK; } ///////////////////////////////////////////////////////////////////// // SQUIRREL Correspondence Controller Callback // STATUS correspondence_controller_callback( metrics_comparison_t * compare_metrics, // target/reference genome IDs, image IDs U64 strand_ID, // NULL, or strand_ID of target genome_ID under compare int sequence_number) // NULL when all genomes compared, 1..otherwise { // Called after genome is compares, after compare metrics computed return STATUS_OK; } // Entry point for Agents STATUS vgm_app( U64 reference_image_ID, U64 reference_strand_ID, U64 reference_genome_ID, U64 target_image_ID) { STATUS status;
Strands and Genome Segmentations
agent_registry_t agent_registry = NULL; U64 agent_ID;
// default
status = add_agent_into_registry( SEQUENCER_AGENT, &agent_ID, &agent_registry, "Squirrel agent", "/squirrel_agent.dll"); // default path uses global agent directory status = add_agent_to_sequencer_registry( HIGH_PRIORITY, agent_ID); STATUS add_agent_to_correspondence_registry( HIGH_PRIORITY, agent_ID); status = run_sequencer_controller(reference_image_ID); status = run_correspondence_controller( (int)SEARCH_FOR_GENOME, global_image_registry, target_image_ID, reference_image_ID, reference_strand_ID, reference_genome_ID ); }
return STATUS_OK;
}; // class // // This function calls a preset group of combinations of match_criteria and agent_overrides // STATUS agent_t CSV_based_agent () { double match_strength_color; double match_strength_shape; double match_strength_texture; double match_strength; double weight_override = 1.0; double total; AGENT_SCORE_RESULT result; U32 agent_match_criteria; AGENT_OVERRIDES; shape_signature_t shape_signatures[MATCH_CRITERIA_END]; color_signature_t color_signatures[MATCH_CRITERIA_END]; texture_signature_t texture_signatures[MATCH_CRITERIA_END]; int int int for {
color_count = 0; shape_count = 0; texture_count = 0; (MATCH_CRITERIA criteria = MATCH_NORMAL; criteria != MATCH_CRITERIA_END; criteria++) printf ("((( matrix - %s )))", match_criteria_string(criteria)); agent_overrides = (AGENT_OVERRIDES) (AGENT_CONTRAST_INVARIANT + AGENT_LIGHTING_INVARIANT); result = match__color(criteria, agent_overrides, weight_override, &match_strength_color, &color_signatures[color_count]); color_count++; strcpy(color_signatures[color_count-1].agent_name, "matrix"); agent_overrides = (AGENT_OVERRIDES)AGENT_FAVOR_SHAPE; result = match__shape(criteria, agent_overrides, weight_override, &match_strength_shape, &shape_signatures[shape_count]);
Chapter 11: Applications, Training, Results shape_count++; strcpy(shape_signatures[shape_count-1].agent_name, "matrix");
} }
agent_overrides = (AGENT_OVERRIDES) (AGENT_ROTATION_INVARIANT); result = match__texture(criteria, agent_overrides, weight_override, &match_strength_texture, &texture_signatures[texture_count]); texture_count++; strcpy(texture_signatures[texture_count-1].agent_name, "matrix");
return STATUS_OK;
Summary This chapter lays the foundation for VGM application development, using interactive training to learn strands of visual genome features of a squirrel and other test set genomes. The training process illustrates how each reference genome is a separate ground truth item—with its own learned classifier using the autolearning hull and the best learned metrics, which shloud be tuned via reinforcement learning using qualifier metrics to learn a structured classifier. To illustrate the process and build intuition, a small set of test genomes is selected using the interactive vgv tool, and then CSV agents are used to collect and compare metrics between test genomes. Reinforcement learning details are illustrated by walking through the process of evaluating genome comparison scores, with discussion along the way regarding scoring criteria and scoring strategies. A set of three unit tests are executed to evaluate cases where the genomes (1) visually match, (2) might match, and (3) do not match. Finally, brief sample agent code is provided as an example of how to run through the VGM pipeline.
Chapter 12 Visual Genome Project What is seen was made from things that are not visible. —Paul of Tarsus
Overview This book describes the first release of the synthetic vision model and the VGM, pointing the way toward a visual genome project. During the early SW development and preparation for this book, several areas for future work have been identified and listed later in this chapter. Some of the areas for future work are expected to become trends in synthetic vision and machine learning research, as reflected in the topics of various published papers, other items are currently under development for the next version of the VGM regardless. It is expected that improvements in future synthetic vision systems will be made by incorporating foundational improvements in several key areas including: – AI and learning model improvements – image segmentation accuracy and parameter learning – complimentary segmentation methods including real-time micro-segmentations – re-segmentations of interesting regions in multiple image and color spaces – developing better distance functions and learning distance function parameters – learning complimentary distance functions, – learning feature matching thresholds for scoring – learning the best features for a specific classifier or set of classifiers – automating the code generation and parameterization of classifiers – and learning multi-level classifier group structures instead of relying on a single classifier To complete this chapter, we identify specific areas where the VGM will be enhanced, as well as how engineers and scientists may get involved in the project via software development.
DOI 10.1515/9781501505966-012
Chapter 12: Visual Genome Project
VGM Model and API Futures Current enhancements are being made to the VGM API, with several more planned for a future version. The items under development are shown in the list below. – Multiple segmentation API: most critical features include segmentation learning to identify the best parameters and region sizes, including targeted segmentations to support a range of image resolutions. – Dynamic segmentation API: an interactive tool interface providing several choices of genome segmentations around a feature of interest, like a segmentation display and picker. – Image space permutation API: allow for the selection of multiple genome crosscompare permutations in various image spaces such as reference:[raw, sharp, retinex, histeq, blur] target:[raw, sharp, retinex, histeq, blur] to find the optimal correspondence. This will be a high-level API which calls the MCC functions several times with various permutation ordering and parameterization and collects the results into a CSV. – Color space permutation API: provide additional API support to agents for the selection of multiple color space compare permutations: reference[color component, color_space, leveled-space]: target[color component, color_space, leveledspace]. This will be a high-level API which calls the MCC functions several times with a set of permutation orderings and parameterizations, and collects the results into a CSV. – Expanded volume processing spaces: provide parameter options for additional sharp, retinex, blur, and histeq volumes over a sliding range of strength, such as raw → soft_sharpen → medium_sharpen → heavy_sharpen. – Chain metric qualification API: start from selected reliable metrics as qualifier metrics, creating chains of candidate dependent metrics, similar to the boosting methods in ADABOOST [1]. – Expanded autolearning hull families: provide additional autolearning hull defaults to cover image permutations, metric spaces, and invariance criteria. Hull selection will be parameterized into the MCC functions, with parameterized hull family code generation. – Expanded scoring API: provide API support in MCC functions for a family of autolearning hull computations tuned for each type of metric, taking better account of the metric range, using separate algorithms for volume hulls, color hulls, Haralick hulls, SDMX hulls. Will include a default Softmax classifier. – Color autolearning hull filtering: filter out unwanted colors due to poor segmentations by reducing the hull to the most popular colors. – Qualifier metric selection overrides API: a learning method to pair and weight dependent metrics against trusted metrics.
Licensing, Sponsors, and Partners
–
–
Low-bit resolution VDNA histogram API tool: for grouping and plotting metrics groups to visualize gross metric distance for selected metrics in an image or a group of images. Alignment spaces specification API: to provide additional MCC parameters for 3D volume comparison alignment of target to reference volume, using 3D centroids as the alignment point. Besides centroid alignment, other alignments will be added such as sliding volume alignment (similar to the sliding color metrics). Additional 2D alignment spaces will be added to align target to reference values within color leveling alignment spaces and other alignment markers, such as min, max, ave, and centroid. (NOTE: centroid marker alignment is currently supported for MCC volume related functions via a global convenience function). Add a new alignment space model for squinting using a common contrast remapping method.
VGM Cloud Server, API, and iOS App The VGM open source can be ported and hosted in a cloud server for interfacing to IoT devices, phones, tablets, or other remote devices. The entire VGM can be hosted on any suitable Linux device with adequate compute power, Porting considerations include the size of the VGM feature model used (i.e. VGM model profile level of detail), and necessary compute and memory resources to meet performance targets. Multicore hosts are recommended for acceleration; the more cores the better. A VGM cloud server is available from krigresearch.com, including a remote cloud API. The cloud is designed for cooperative work and contains all the registries for images and model files, agents, and controllers discussed in Chapter 5. A remote device can rely on the cloud server for storage and computations for supplied images. A simple app is available for Apple machines which is cloud based, allowing a nonprogrammer to interactively register images, build and train strands, learn classifiers, and search for strands in other registered images. A basic cloud API using SOAP/REST HTML protocol is also available with the app in order to build simple webbased applications. Also, the basic command line tools described in Chapter 5 are available as an iOS app, which runs on an iMac with at least four cores (the more cores the better).
Licensing, Sponsors, and Partners The open-source code contains additional documentation and source code examples. Contact http://krigresearch.com to obtain open source code license information, as well as VGM Cloud Server, API, and App licensing information. Potential sponsors and partners for the visual genome project are encouraged to contact Krig Research.
Bibliography [1] Krig, Scott. 2016. Computer Vision Metrics: Survey, Taxonomy and Analysis of Computer Vision, Visual Neuroscience, and Deep Learning. Textbook Edition. Switzerland: Springer-Verlag. Some illustrations, figures and text from Appendix E are reused herein “With permission of Springer Nature.” [2] Cann, R.L., M. Stoneking, and A.C. Wilson. 1987. Mitochondrial DNA and human evolution. Nature 325. [3] Baluja, S., M. Covell, and R. Sukthankar. 2015. The Virtues of Peer Pressure: A Simple Method for Discovering High-Value Mistakes. In Proceedings of 16th International Conference of Computer Analysis of Images and Patterns, Part II, ed. G. Azzopardi and N. Petkov. Berlin: Springer. [4] Li Wan. 2015. Joint Training of a Neural Network and a Structured Model for Computer Vision. PhD diss., Department of Computer Science, New York University. [5] Nguyen, A., J. Yosinski, and J. Clune. 2015. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In 2015 IEEE Conference on Computer Vision and Pattern Recognition. [6] Papernot, N. et al. 2017. Practical Black-Box Attacks against Machine Learning. In Proceedings of the ACM Conference on Computer and Communications Security. [7] Papernot, N. et al. 2015. Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1511.04508. [8] Ba, J., and R. Caruana. 2014. Do deep nets really need to be deep? In Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2: 2654–62. Cambridge, MA: MIT Press. [9] Song Han, H. Mao, and W.J. Dally. 2016. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. Presented at the International Conference on Learning Representations. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1510.00149. [10] Goodfellow, I.J. et al. 2014. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2: 2672–80. Cambridge, MA: MIT Press. [11] Zeiler, M.D., and R. Fergus. 2014. Visualizing and Understanding Convolutional Networks. In Proceedings of the 13th European Conference on Computer Vision—ECCV 2014, ed. David Fleet et al. Switzerland: Springer. [12] Kandel, E.R., J.H. Schwartz, and T.M. Jessell (eds.). 2000. Principles of Neural Science. 5th ed. New York: McGraw-Hill. [13] Behnke, S. 2003. Hierarchical neural networks for image interpretation. Lecture Notes in Computer Science 2766. Berlin: Springer-Verlag. [14] Brodal, P. 2010. The Central Nervous System: Structure and Function. 4th ed. New York: Oxford University Press. [15] Baluja, S., and I. Fischer. 2017. Adversarial Transformation Networks: Learning to Generate Adversarial Examples. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1703.09387. [16] Williamson, S.J., and H.Z. Cummins. 1983. Light and Color in Nature and Art. New York: Wiley. [17] Brody, C.D. 1992. A Model of Feedback to the Lateral Geniculate Nucleus. In Advances in Neural Information Processing Systems, vol. 5: 409–16. San Francisco, CA: Morgan Kaufmann. [18] Rodríguez Sánchez, A.J. Attention, Visual Search and Object Recognition. PhD diss., Department of Computer Science, York University, Toronto. [19] Kurakin, A., I.J. Goodfellow, and S. Bengio. 2017. Adversarial Examples in the Physical World. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1607.02533.
DOI 10.1515/9781501505966-013
Bibliography [20] Legland, D., I. Arganda-Carreras, and P. Andrey. 2016. MorphoLibJ: integrated library and plugins for mathematical morphology with ImageJ. Bioinformatics 32, no. 22: 3532–34. [21] Juan, O., and R. Keriven. 2005. Trimap Segmentation for Fast and User-Friendly Matting. In Variational, Geometric, and Level Set Methods in Computer Vision, ed. N. Paragios et al. Lecture Notes in Computer Science, 3752. Berlin: Springer. [22] Rhemann, C. et al. 2008. High Resolution Matting via Interactive Trimap Segmentation. Technical Report corresonding to the CVPR ’08 paper. TR-188-0-2008-04. https://www.ims.tuwien.ac.at/projects/alphamat/downloads/technical-report-highresolution-matting.pdf. [23] Ning Xu et al. 2017. Deep Image Matting. Computer Vision and Pattern Recognition 2017. [24] International Council of Ophthalmology. 1988. Visual Acuity Measurement Standard. Italian Journal of Ophthalmology II/I, 15. [25] Vantaram, S.R., and E. Saber. 2012. Survey of contemporary trends in color image segmentation. Journal of Electronic Imaging 21 (4). [26] Smart K8. 2012. A Simple – Yet Quite Powerful – Palette Quantizer in C#. Code Project: For those who code (website). July 28. https://www.codeproject.com/Articles/66341/A-Simple-Yet-QuitePowerful-Palette-Quantizer-in-C. [27] Arganda-Carreras, I. and D. Legland. 2017. Morphological Segmentation (IJPB-plugins). https://imagej.net/Morphological_Segmentation. [28] Borovec, J. 2013. CMP-BIA Tools: jSLIC superpixels plugins. ImageJ. https://imagej.net/CMPBIA_tools. [29] van de Weijer, J., C. Schmid, and J. Verbeek. 2007. Learning Color Names from Real-World Images. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, 2007, 1–8. doi: 10.1109/CVPR.2007.383218. [30] van de Weijer, J., and C. Schmid. 2007. Applying Color Names to Image Description. In Proceedings of the International Conference on Image Processing 3 (3): III-493–III-496. [31] Zhang, J. et al. 2007. Local features and kernels for classification of texture and object categories: An in-depth study. International Journal of Computer Vision 73, no. 2: 213–38. [32] Sivic, J. et al. 2005. Discovering objects and their location in images. In Proceedings of the Tenth IEEE International Conference on Computer Vision Workshops, Washington, DC. http://repository.cmu.edu/cgi/viewcontent.cgi?article=1285&context=robotics. [33] Jiajun Wu et al. 2016. Learning a Probabilistic Latent Space of Object Shapes via 3D GenerativeAdversarial Modeling. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain. [34] Radford, A., L. Metz, and S. Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1511.06434. [35] Uijlings, J.R.R. et al. 2013. Selective Search for Object Recognition. International Journal of Computer Vision 104, no. 2: 154–71. [36] Arbelaez, P. et al. 2010. Contour Detection and Hierarchical Image Segmentation. University of California, Berkeley Technical Report No. UCB/EECS-2010-17. [37] Smith, K. 2013. Brain decoding: Reading minds. Nature 502, no. 7472. [38] Smith, K. 2008. Mind-reading with a brain scan. Nature (online), March 5. doi:10.1038/news.2008.650. [39] Cox, D.D., and R.L. Savoy. 2003. Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex. NeuroImage 19 no. 2 (Pt. 1): 261–70. [40] Wandell, B.A. 1999. Computational Neuroimaging of Human Visual Cortex. Annual Review of Neuroscience 22: 145–73.
Bibliography
[41] Caelles, S. et al. 2017. One-Shot Video Object Segmentation. Computer Vision and Pattern Recognition (CVPR), 2017. [42] Journals for neuroscience, machine learning, computer vision, psychology, and opthomology: Computer Vision: CVPR, CCVC, SIGGRAPH, BMVC, ACCV, ICPR. Machine Learning and AI: ICML, KDD, NIPS. Neuroscience: Journal of NeuroScience, Nature Neuroscience, Visual Neuroscience. Psychology: Biological Psychology, NeuroImage, Neurology. Opthomology: Progress in Retinal and Eye Research, Ophthalmology, American Journal of Ophthalmology, Investigative Ophthalmology and Visual Science, Vision Research. [43] Weinberger, K.Q., J. Blitzer, and L.K. Saul. 2005. Distance Metric Learning for Large Margin Nearest Neighbor Classification. In Proceedings of the 18th International Conference on Neural Information Processing Systems, 1473–80. Cambridge, MA: MIT Press. [44] Fischler, M., and R. Bolles. 1981. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM 24, no. 6 (June). [45] Raguram, R., J.-M. Frahm, and M. Pollefeys. 2008. A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus. In Computer Vision – ECCV 2008, ed. D. Forsyth et al. Lecture Notes in Computer Science, 5303. Berlin: Springer. [46] Moré, J.J. 1978. The Levenberg-Marquardt Algorithm Implementation and Theory. Numerical Analysis Lecture Notes in Mathematics 630: 105–16. [47] Hartigan, J.A., and M.A. Wong. 1979. Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society 28, no. 1. [48] Nister, D., and H. Stewenius. 2006. Scalable Recognition with a Vocabulary Tree. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2: 2161–68. Washington, DC: IEEE Computer Society. [49] Hastie, T., R. Tibshirani, and J. Friedman. 2009. Hierarchical Clustering: The Elements of Statistical Learning. 2nd ed. New York: Springer. [50] Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society Series B 39, no. 1: 1–38. [51] Goodfellow, I., Y. Bengio, and A. Courville. 2016. Deep Learning. Cambridge, MA: MIT Press. [52] Heckerman, D. 1996. A Tutorial on Learning with Bayesian Networks. Technical Report MSR-TR95-06. Redmond, WA: Microsoft Research. [53] Amit, Y., and D. Geman. 1977. Shape Quantization and Recognition with Randomized Trees. Neural Computation 9, no. 7. [54] Ozuysal, M. et al. 2010. Fast Keypoint Recognition Using Random Ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, no. 3. [55] Niebles, J.C., H. Wang, and L. Fei-Fei. 2008. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words. International Journal of Computer Vision 79, no. 3: 299–318. [56] Rabiner, L.R., and B.H. Juang. 1986. An Introduction to Hidden Markov Models. IEEE Acoustics, Speech, and Signal Processing (February). [57] Krogh, A. et al. 2001. Predicting Transmembrane Protein Topology with a Hidden Markov Model: Application to Complete Genomes. Journal of Molecular Biology 305, no. 3: 567–80. [58] Vapnik, V. Statistical Learning Theory. Hoboken, NJ: John Wiley, 1998. [59] Hofmann, T., B. Scholkopf, and A.J. Smola. 2008. Kernel Methods in Machine Learning. The Annals of Statisics 36, no. 3: 1171–1220. [60] Pearson, K. 1901. On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine 2: 559–72. [61] Hotelling, H. 1936. Relations between Two Sets of Variates. Biometrika 28 nos. 3–4: 321–77.
Bibliography [62] Schlkopf, B., and A.J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Cambridge, MA: MIT Press. [63] Cortes, C., and V.N. Vapnik. 1995. Support-Vector Networks. Machine Learning 20, no. 3 (September): 273–97. [64] Hinton, G.E., S. Osindero, and Y.W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, no. 7: 1527–54. [65] C. Bucila, R. Caruana, and A. Niculescu-Mizil. 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM. [66] Hinton, G.E., O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1503.02531. [67] Romero, A. et al. 2014. FitNets: Hints for Thin Deep Nets. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1412.6550. [68] Bengio, Y. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, no. 1 (January): 1–127. [69] Bengio, Y., A. Courville, and P. Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, no. 8. [70] Wierstra, D. et al. 2014. Natural Evolution Strategies. Journal of Machine Learning Research 15: 949–980. [71] Real, E. et al. 2017. Large-Scale Evolution of Image Classifiers. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia. [72] Fernando, C. et al. 2016. Convolution by Evolution: Differentiable Pattern Producing Networks. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, Denver, Colorado, 109–16. New York: ACM. [73] Cervone, G. et al. 2000. Combining Machine Learning with Evolutionary Computation: Recent Results on LEM. In Proceedings of the Fifth International Workshop on Multistrategy Learning, 41–58. [72] Jameson, K.A., S.M. Highnote, and L.M. Wasserman. 2001. Richer color experience in observers with multiple opsin genes. Psychonomic Bulletin and Review 8, no. 2: 244–61. [73] Cha, S.-H. 2007. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences 1, no. 4. [74] Duda, R.O., P.E. Hart, and D.G. Stork. 2001. Pattern Classification. 2nd ed. New York: Wiley. [75] Deza, M.M., and E. Deza. 2006. Dictionary of Distances. Oxford, UK: Elsevier. [76] Eykholt, K. et al. 2018. Robust Physical-World Attacks on Deep Learning Models. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1707.08945. [77] Vapnik, V.N., E. Levin, and Y. LeCun. 1994. Measuring the dimension of a learning machine. Neural Computation 6, no. 5: 851–76. [78] Boser, B.E., I.M. Guyon, and V.N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory (COLT ’92), Pittsburgh, Pennsylvania. New York: ACM. [79] Cortes, C., and V.N. Vapnik. 1995. Support-vector networks. Machine Learning 20, no. 3: 273–97. [80] Vapnik, V.N. 1995. The Nature of Statistical Learning Theory. New York: Springer. [81] Chan, W.W.-Y. 2006. A Survey on Multivariate Data Visualization. Department of Computer Science and Engineering, Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong, June. http://www.saedsayad.com/docs/multivariate_visualization.pdf. [82] Young, F.W., R.A. Faldowski, and M.M. McFarlane. 1993. Multivariate Statistical Visualization. Psychometrics Laboratory, University of North Carolina at Chapel Hill. http://forrest.psych.unc.edu/research/vista-frames/pdf/MSV.pdf.
Bibliography
[83] Stasko, J. 2015. Multivariate Visual Representations 1. CS 7450 – Information Visualization, August 31. [84] Haralick, R.M., K. Shanmugam, and I. Dinstein. 1973. Textural Features for Image Classification. IEEE Transactions on Systems, Man, and Cybernetics SMC-3, no. 6: 610–21. [85] Yokoyama, R., and R.M. Haralick. 1978. Texture Synthesis Using a Growth Model. Computer Graphics and Image Processing 8, no. 3: 369–81. [86] Haralick, R.M., and K. Shanmugam. 1973. Computer Classification of Reservoir Sandstones. IEEE Transactions on Geoscience Electronics GE 11, no. 4: 171–77. [87] Pressman, N.J. et al. 1979. Texture Analysis for Biomedical Imagery. Life Sciences Research Report 15. In Dahlem Workshop on Biomedical Pattern Recognition and Image Processing, ed. K.S. Fu and T. Pavlidis, 153–78. Berlin, Germany. [88] Yokoyama, R., and R.M. Haralick. 1979. Texture Pattern Image Generation by Regular Markov Chain. Pattern Recognition 11, no. 4: 225–33. Also: Haralick, R.M. 1979. Statistical and Structural Approaches to Texture. Proceedings of the IEEE 67, no. 5: 786–804. [89] Pong, T.-C. et al. 1983. The Application of Image Analysis Techniques to Mineral Processing. Pattern Recognition Letters 2, no. 2: 117–23. [90] Lumia, R. et al. 1983. Texture Analysis of Aerial Photographs. Pattern Recognition 16, no. 1: 39– 46. [91] Haralick, R.M. 1971. On a Texture-Context Feature Extraction Algorithm for Remotely Sensed Imagery. In Proceedings of the IEEE Computer Society Conference on Decision and Control, 650– 57. Gainesville, Florida. [92] Haralick, R.M., K. Shanmugam, and I. Dinstein. 1972. On Some Quickly Computable Features for Texture. In Proceedings of the 1972 Symposium on Computer Image Processing and Recognition, vol. 2: 12-2-1–12-2-10. University of Missouri. [93] Haralick, R.M., and R.J. Bosley. 1973. Spectral and Textural Processing of ERTS Imagery. Presented at Third ERTS Symposium, NASA SP-351, Goddard Space Flight Center, December 10– 15. [94] Haralick, R.M. 1975. A Resolution Preserving Textural Transform for Images. In Proceedings of the IEEE Computer Society Conference on Computer Graphics, Pattern Recognition, and Data Structure, 51–61. San Diego, California. [95] Haralick, R.M. 1978. Statistical and Structural Approaches to Texture. In Proceedings of the International Symposium on Remote Sensing for Observation and Inventory of Earth Resources and the Endangered Environment, 379–431. Freiburg, Federal Republic of Germany. [96] Haralick, R.M. 1978. Statistical and Structural Approaches to Texture (Revised). In Proceedings of the Fourth International Joint Conference on Pattern Recognition, 45–69. Kyoto, Japan. Also: Pressman, N.J. et al. 1979. Texture Analysis for Biomedical Imagery. Life Sciences Research Report 15. In Dahlem Workshop on Biomedical Pattern Recognition and Image Processing, ed. K.S. Fu and T. Pavlidis, 153–78. Berlin, Germany. [97] Pressman, N.J. et al. 1979. Texture Analysis for Biomedical Imagery. Life Sciences Research Report 15. In Dahlem Workshop on Biomedical Pattern Recognition and Image Processing, ed. K.S. Fu and T. Pavlidis, 153–78. Berlin, Germany. [98] Haralick, R.M. et al. 1981. Texture Discrimination Using Region Based Primitives. In Proceedings of IEEE Conference on Pattern Recognition and Image Processing, 369–72. Dallas, Texas. [99] Deep Symmetry Networks, Robert Gens, Pedro Domingos, NIPS 2014 [100] Craig, J.R. et al. 1984. Image Analysis in the Study of Pentlandite Exsolution Rates. In International Congress of Applied Mineralogy, 1984 Special Volume. Los Angeles, California. [101] Craig, J.R. et al. 1984. Mineralogical Variations During Comminution of Complex Sulfide Ores. In Process Mineralogy III, ed. W. Petruk, 51–63. New York: Society of Mining Engineers of the American Institute of Mining, Metallurgical and Petroleum Engineers, Inc.
Bibliography [102] Chetverikov, D. et al. 1996. Zone Classification Using Texture Features. In Proceedings of the 13th International Conference on Pattern Recognition 3: 676–80. Vienna, Austria. [103] Aksoy, S., and R.M. Haralick. 1998. Textural Features for Image Database Retrieval. In Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries, 45– 49. Santa Barbara, California. [104] Aksoy, S., and R.M. Haralick. 1998. Content-Based Image Database Retrieval using Variances of Gray Level Spatial Dependencies. In Proceedings of the IAPR International Workshop on Multimedia Information Analysis and Retrieval, 3–19. Hong Kong, China. [105] Aksoy, S., and R.M. Haralick. 1999. Using Texture in Image Similarity and Retrieval. In Proceedings of Texture Analysis in Machine Vision Workshop, 111–17. Oulu, Finland. [106] Evtimov, I. et al. 2017. Robust Physical-World Attacks on Machine Learning Models. August 7. [107] Gu, T., B. Dolan-Gavitt, and S. Garg. 2017. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. Ithaca, NY: Cornell University Library. http://arxiv-exportlb.library.cornell.edu/abs/1708.06733. [108] Kelly, K., and D. Judd. 1955. The ISCC-NBS color names dictionary and the universal color language. NBS Circular 553, November 1. [109] Hu, J., and A. Mojsilovic. 2000. Optimal Color Composition Matching of Images. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR-2000). [110] van de Weijer, J., and C. Schmid. 2006. Coloring Local Feature Extraction.In Proceedings of the 9th European conference on Computer Vision (ECCV ’06), 334–48. Graz, Austria. [111] van de Sande, K., T. Gevers, and C. Snoek. 2010. Evaluating Color Descriptors for Object and Scene Recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence 32, no. 9. [112] Finlayson, G.D., B. Schiele, and J.L. Crowley. 1998. Comprehensive colour image normalization. In: Computer Vision (ECCV 1998), ed. H. Burkhardt and B. Neumann, 475–90. Lecture Notes in Computer Science 1406. Berlin: Springer. [113] Mojsilovic, A. 2002. A method for color naming and description of color composition in images. In Proceedings of the 2002 International Conference on Image Processing. [114] Khan, R. et al. 2013. Discriminative Color Descriptors. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition. [115] Smart K8. 2012. A Simple – Yet Quite Powerful – Palette Quantizer in C#. Code Project: For those who code (website). July 28. https://www.codeproject.com/Articles/66341/A-Simple-Yet-QuitePowerful-Palette-Quantizer-in-C. [116] Lex, K. 2013. Comparison of Color Difference Methods for multi-angle Application. Presented at the BYK-Gardner GmbH 10th BYK-Gardner User Meeting, Innsbruck Austria, April. http://www.byk.com/fileadmin/byk/support/instruments/technical_information/datasheets/ All%20Languages/Color/Metallic/Comparison_of_Color_Difference_Methodes_for_MultiAngle_Application__Konrad_Lex__BYK-Gardner.pdf [117] Haralick, R.M., K. Shanmugam, and I. Dinstein. 1973. Textural Features for Image Classification. IEEE Transactions on Systems, Man, and Cybernetics SMC-3, no. 6: 610–21. [118] Werman, M. and O. Pele. 2010. Distance Functions and Metric Learning: Part 1. Presented at the 11th European Conference on Computer Vision (ECCV 2010), Heraklion, Crete, Greece. [119] Pele, O., and M. Werman. 2010. The Quadratic-Chi Histogram Distance Family. In the Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece, [120] Pearson, K. 1895. Contributions to the Mathematical Theory of Evolution. II. Skew Variation in Homogeneous Material. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. [121] Bohi, A. et al. 2017. Fourier Descriptors Based on the Structure of the Human Primary Visual Cortex with Applications to Object Recognition. Journal of Mathematical Imaging and Vision, 57, no. 1: 117–33.
Bibliography
[122] Hall-Beyer, M. 2017. GLCM Texture: A Tutorial v. 3.0. Department of Geography, University of Calgary, March. [123] Pham, T.A. 2010. Optimization of Texture Feature Extraction Algorithm. MSc thesis, Computer Engineering, TU Delft, CE-MS-2010-21. [124] Akono, A. et al. 2003. Nouvelle méthodologie d'évaluation des paramètres de texture d'ordre trois. International Journal of Remote Sensing 24, no. 9. [125] Fernández, A., M.X. Álvarez, and F. Bianconi. 2013. Texture Description Through Histograms of Equivalent Patterns. Journal of Mathematical Imaging and Vision 45, no. 1: 76–102. [126] He, D.-C., and L. Wang. 1991. Texture features based on texture spectrum. Pattern Recognition 24, no. 12. [127] Alcantarillaa, P.F., L.M. Bergasa, and A.J. Davidson. 2012. Gauge-SURF Descriptors. Preprint submitted to Elsevier, December 10. http://www.robesafe.com/personal/pablo.alcantarilla/papers/Alcantarilla13imavis.pdf. [128] Lowe, D.G. 1999. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, Corfu, September. [129] Rublee, E. 2011. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011), Barcelona, Spain. doi: 10.1109/ICCV.2011.6126544. [130] Bay, H., T. Tuytelaars, and L. Van Gool. 2006. SURF: Speeded Up Robust Features. In Proceedings of the Ninth European Conference on Computer Vision (ECCV 2006), Part I, Graz, Austria. [131] Smith, K. 2013. Brain decoding: Reading minds. Nature 502, no. 7472. [132] Smith, K. 2008. Mind-reading with a brain scan. Nature (online), March 5. doi:10.1038/news.2008.650. [133] Tarr, M.J. 1999. News on Views: Pandemonium Revisited. Nature Neuroscience 2, no. 11. [134] Langleben, D.D., and F.M. Dattilio. 2008. Commentary: the future of forensic functional brain imaging. The Journal of the American Academy of Psychiatry and the Law 36, no. 4: 502–4. [135] Finn, E.S. et al. 2015. Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nature Neuroscience 18: 1664–71. [136] Gjoneska, E. et al. 2015. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature 518: 365–69. doi: 10.1038/nature14252. [137] Tanaka, K. 1996. Inferotemporal cortex and object vision. Annual Review of Neuroscience 19, 109–39. [138] Perrett, D. et al. 1991. Viewer-centred and object-centred coding of heads in the macaque temporal cortex. Experimental Brain Research 86, 159–73. [139] Perrett, D.I., E.T. Rolls, and W. Caan. 1982. Visual neurons responsive to faces in the monkey temporal cortex. Experimental Brain Research 47: 329−42. [140] Tanaka, K. et al. 1991. Coding visual images of objects in the inferotemporal cortex of the macaque monkey. Journal of Neurophysiology 66: 170–89. [141] Voytek, B. 2013. Brain Metrics: How measuring brain biology can explain the phenomena of mind. Nature (online), May 20. https://www.nature.com/scitable/blog/brainmetrics/are_there_really_as_many. [142] Dias, B.G., and K.J. Ressler. 2014. Parental olfactory experience influences behavior and neural structure in subsequent generations. Nature Neuroscience 17: 89–96. [143] Bergami, M. et al. 2015. A Critical Period for Experience-Dependent Remodeling of Adult-Born Neuron Connectivity. Neuron 85, no. 4: 710–17. [144] Wei-Chung Allen Lee et al. 2006. Dynamic Remodeling of Dendritic Arbors in GABAergic Interneurons of Adult Visual Cortex. PLOS Biology 4, no. 2.
Bibliography [145] The Human Connectome Project is a consortium of leading neurological research labs whe are mapping out the pathways in the brain. See Human Connectome Project. About. http://www.humanconnectomeproject.org/about/. [146] Wiesel, D.H., and T.N. Hubel. 1959. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology 148, no. 3: 574–91. [147] Hubel, D.H., and T. Wiesel. 1962. Receptive fields, binocular interaction, and functional architecturein the cat’s visual cortex. The Journal of Physiology 160, no. 1: 106–54. [148] Gu, Chunhui et al. 2009. Recognition using Regions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009). [149] Baddeley, A., M. Eysenck, and M. Anderson. 2009. Memory. 1st ed. New York: Psychology Press. [150] Logothetis, N. et al. 1994. View dependent object recognition by monkeys. Current Biology 4: 401–14. [151] Logothetis, N.K., and D.I..Sheinberg. 1996. Visual object recognition. Annual Review of Neuroscience 19: 577–621. [152] Goldman-Rakic, P.S. 1995. Cellular basis of working memory. Neuron 14, no. 3: 477–85. [153] Hunt, R.W.G., and M.R. Pointer. 2011. Measuring Colour. 4th ed. Chichester, UK: Wiley. [154] Hunt, R.W.G. 2004. The Reproduction of Color. 6th ed. Chichester, UK: Wiley. [155] Berns, R.S. 2000. Billmeyer and Saltzman’s Principles of Color Technology. 3rd ed. New York: Wiley. [156] Morovic, J. 2008. Color Gamut Mapping. Chichester, UK: Wiley. [157] Fairchild, M. 1998. Color Appearance Models. Reading, MA: Addison Wesley Longman. [158] Szegedy, C. et al. 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. CVPR Open Access paper. https://www.cvfoundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_ CVPR_2016_paper.pdf. [159] Vinyals, O. et al. 2016. Matching Networks for One Shot Learning. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1606.04080. [160] Clausi, D.A. 2002. An analysis of co-occurrence texture statistics as a function of grey-level quantization. Canadian Journal of Remote Sensing 28, no. 1: 45–62. [161] Soh, L.K., and C. Tsatsoulis. 1999. Texture analysis of SAR sea ice imagery using gray level cooccurrence matrices. IEEE Transactions on geoscience and remote sensing. IEEE Transactions on Geoscience and Remote Sensing 37, no. 2: 780–95. [162] Heckbert, P. 1982. Color image quantization for frame buffer display. Computer Graphics 16, no. 3: 297–307. [163] Krishna, R. et al. 2016. Visual Genome Connecting Language and Vision Using Crowdsourced Dense Image Annotations. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1602.07332. [164] Krause, J. et al. A Hierarchical Approach for Generating Descriptive Image Paragraphs. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). [165] Li, P. 2010. Robust Logitboost and adaptive base class (ABC) Logitboost. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence, ed. Peter Grünwald and Peter Spirtes, 302–11. N.p.: AUAI Press. [166] Krig, Scott. 2016. Appendix E. In Computer Vision Metrics: Survey, Taxonomy and Analysis of Computer Vision, Visual Neuroscience, and Deep Learning. Textbook Edition. Switzerland: Springer-Verlag. [167] Chen, T., and C. Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785– 94. New York: ACM.
Bibliography
[168] Viola, P., and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Accepted Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii. [169] Friedman, J.H. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, no. 4: 367–78. [170] Sabour, S., N. Frosst, and G.E. Hinton. 2017. Dynamic Routing Between Capsules. In Advances in Neural Information Processing Systems 30, ed. I. Guyon et al. [171] Riesenhuber, M. 2004. Object recognition in cortex: Neural mechanisms, and possible roles for attention. Preprint submitted to Elsevier Science. May 26. https://pdfs.semanticscholar.org/8c0a/21b845d9af8218b7b28812c7e23edadce1a4.pdf. [172] Schmidhuber, J. Deep learning in neural networks: an overview. Neural Networks 61: 85–117. https://arxiv.org/abs/1404.7828. [173] Garcia-Garcia, A. et al. 2017. A Review on Deep Learning Techniques Applied to Semantic Segmentation. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1704.06857. [174] Long, J., E. Shelhamer, and T. Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. Ithaca, NY: Cornell University Library. https://arxiv.org/abs/1411.4038.
Index 2-bit quantization 176–78, 183, 186 2-bits 176–78, 229 3-bits 119, 176–77, 229 5-bit color input 209, 211 5-bits 119, 171, 176–77, 188–89, 194, 198, 208, 215, 229 6-bits 119, 176 7-bits 119, 176 8-bit 170, 176, 187–89, 194, 198, 208, 214, 250, 254 8-bit Color space metrics 289 A ABCD genome orientation 241, 243 Accuracy 7, 16, 61, 98, 105, 108, 197, 250, 271–72 ACM 329–30, 334 Address 5–6, 12, 71, 74, 169, 172, 175, 178– 79, 230 Adjacent pixels 170, 186, 239, 248 Adversarial images 13, 15, 34 Agent architecture 63, 98, 103, 122 Agent architecture and agent types 121, 123 Agent callbacks 133 Agent hunting 106 Agent ID list 141–42 Agent learning 102–3, 105, 122 Agent management 140–41, 143 Agent match criteria 163–64 Agent name 120, 124, 145, 163–64, 166 Agent registry 125, 127, 130, 132–34, 141, 166 Agent types 121, 123 Agents 95–107, 116–25, 132–35, 140–50, 161–64, 292–98, 300–305, 308–15, 317–22 – autonomous 104, 107 – back.strand 312, 315, 317 – box.strand 317 – char 133, 135, 146–47 – foliage.strand 314–15 – leaves.strand 312, 315, 319 – registered 141–42, 166 – scaled.strand 311–13 – separate 7–8, 285 – shades.strand 311, 314
DOI 10.1515/9781501505966-014
– shade.strand 311–12 – shadows.strand 310–11 – side.strand 315, 317 – stucco.strand 315 – sweater.strand 314, 319 – vgc 161–62 Agents model 70, 88, 101, 107, 117 Algorithms 6, 52, 55, 57, 124, 208–9, 211, 329 API (Application Programming Interface) 9– 11, 28, 39, 73, 85, 97, 125, 127, 324–25 API calls 123–24 Application domains 9, 186 Applications 1–2, 11–12, 18, 132, 275–76, 282, 304–6, 329, 332 – test 275, 277 AR (Attentional Region) 41–43, 51, 53, 56, 65, 158, 278 Associative memory 23, 30, 32, 71–74, 97, 101–2 Autolearning hull families 100, 110, 114, 138 Autolearning hull files 160, 162 Autolearning hull range 121, 138 Autolearning hull thresholds 138, 275, 277, 308 Autolearning hulls 7–8, 29, 108–16, 120, 122, 125, 139, 270, 272 – default 109, 112 Average score 233, 236 B Background 11, 22, 28, 33, 123, 125, 128, 136, 138 Background Survey 88–89, 91, 93, 95 Back.strand 317–18 Base attributes MCC metric functions 153– 56, 216–18 Base feature metrics 69, 86, 237 Base genome metrics 128, 137, 141, 285 Base metric structures 137, 270, 272 Base metrics 106, 127, 134, 137–38, 157, 159, 286–87, 289, 305 Base metrics files 137, 160, 166 Bases 2, 37, 85, 133–34, 142–44, 147, 178, 230–31, 274
Index Bengio 89, 91, 94, 327, 329–30 Best scores 119, 124, 146–47, 152, 167, 308 – returning 216–17 Bestmatch 139, 154, 211, 217–18, 309 Biomedical Imagery 331 Biomedical pattern recognition and image processing 331 Bird.strand 312, 315, 317, 319 Bit mask 264, 266, 274 Bit quantization 177, 184 Bits 15, 130, 138–40, 170, 172–73, 176, 178–79, 182, 229–30 Bitslots 151 Blue 151, 153, 155–56, 216–18, 234–35, 264–66, 273 Blue cones 40–41, 44–45, 48 Blur 108–11, 130, 157–60, 216–18, 230–32, 234–36, 266, 273, 305 Blur projections 155–56, 234, 265 Blur RGB 59 Blurred.strand 311, 317 Bounding box 58, 62–63, 160, 167, 226 Box.strand 311, 318–19 Brain 6, 21, 24, 26, 87–88, 91, 97, 334 Brain mapping 22, 26 Branch 150, 310, 312, 317–19 Branch.strand 310, 312, 315, 317 Bricks.strand 312, 315 Bundle models 80–81, 83 Bundles 4–6, 17–18, 51, 73, 76, 78–80, 84, 101–2, 122–23 C C, shape S, texture T, and glyphs G (CSTG) 18, 85 Callback 134, 142–44, 320 CAM (content-addressable memory) 4, 68, 71–74, 169, 172, 174–76, 178–79, 237, 239 CAM address 5, 73, 169–72 CAM clusters, volume projection metrics for 240–41, 243, 245, 267 CAM feature spaces 170, 172–73, 175, 177, 179, 181, 185, 187, 189 CAM features 5, 71, 169, 174–76, 180 CAM memory 73–74 CAM neurons 71, 169–70, 172–74, 176, 179, 194 Candidate genomes 225–27, 235–36
Candidate genomes Cn 224 Candidates 106, 156, 224–25, 232, 235, 324 – target strand ST 224 Category 34, 88, 102, 286, 304–5 Category reference objects 102, 286 CCIN (comprehensive color image normalization) 196 Cells 47–48, 53, 69, 174, 178, 244–45 Centroid 81–83, 138–40, 144, 151–52, 156, 202–3, 208–9, 226–29, 234–35 Char 133, 158–61 CLAHE 60 Classification 6–7, 15, 81, 83–84, 90, 96– 98, 106–7, 113, 122 Classifier family learning 114, 284 Classifiers 7–8, 36, 94, 96–97, 99, 105, 114, 117, 323 – metrics correspondence 100, 219 Clausi 250, 334 CLOSE 307–8, 317–19 CLOSE scores 317, 319 CLOSE test genome pairs 316 Code 59, 110–11, 113, 123–24, 144, 279, 284, 328, 332 Color 110–11, 138–40, 145–49, 151–54, 195–200, 202–11, 216–18, 264–66, 321 – 8-bit 176–77 – best 151, 153, 216–17 – common 176, 204–5, 209–10 – double 139 – opponent 197, 206 – popular 207–9, 211, 292, 324 – primal 75 – standard 166, 197, 211, 214–16, 218 Color base 74, 153–54 Color channels 181–82, 227, 230 Color component 119, 172, 205, 241–42, 257, 270–71, 324 Color component values 240, 242 Color correspondence 195 Color image segmentation 197, 328 Color information 40, 43, 50, 181, 205 Color labeling 51, 195, 207 Color labels 75, 196–97, 207–8, 214 – standard 196, 207, 214 Color leveling 204–5, 210–11, 215, 218 Color leveling alignment spaces 205, 325 Color leveling methods 204 Color leveling works 165, 204
Index
Color levels 152, 257, 305 Color list, standard 214–16 Color maps 176, 210–11 – standard 211 Color metrics 77, 124, 129–30, 195, 199, 286, 289, 304–5, 309 Color metrics function signatures 216 Color metrics functions 216–17 Color palettes 207, 218 Color perception 45, 196–97 Color popularity 199, 207–8 Color popularity algorithms 55, 207 Color range 106, 204 Color score 308 Color segmentation 51, 196 Color SIFT 270, 273 Color space components 270, 282, 287, 305 Color spaces 108–10, 114, 121, 130, 195–99, 207–9, 218, 286, 289 – leveled 211, 257 – multiple 3, 61, 78, 324 – separate 290–91 Color visualizations 289 COLORS 111, 151, 153, 164, 198, 211, 216– 18, 234–35, 264–66 Command shell 160, 164 Commands 61, 157, 163, 280–81 Comparison 110, 112, 120–21, 138–40, 142– 43, 277–78, 294, 296, 298 – brush 294, 296, 298, 301, 303 Comparison metrics 64, 86, 137–39 Component 110, 138–39, 146, 156, 204–5, 241, 243–44, 267, 273–74 Component values 243–44 Comprehensive color image normalization (CCIN) 196 Compute SDMX metrics 264–66 Computer vision 2, 4, 19, 22, 89, 91–92, 195, 327–29, 332–34 Computer Vision and Pattern Recognition. See CVPR Computer Vision Metrics 4, 327, 334 Computer vision models 8, 30, 51 Concepts 2, 22–24, 27, 65, 78, 94–95, 98, 105, 108 – higher-level 4–5, 24, 69, 76, 79, 98 Cones 28, 39–48, 69, 75, 119, 172, 197, 207 – green 40–41, 44–45
Connectivity 25–26, 48, 61 Constant 151, 154, 198, 204, 215–17, 234, 264–66 Containment 226, 254, 256, 287, 294, 296, 299, 301, 303 Content-addressable memory. See CAM Context 88–89, 93–94, 105, 123, 125 Contrast 59–60, 139, 145–46, 154, 203–4, 217–18, 293, 295, 297 Contrast variations 199–200 Control flags 225–27, 235–36 Controller 127, 134, 141–44, 157, 159, 320– 21, 325 Coordinate system – local 221–22, 224, 231 – strand-relative local 81–82 Coordinates, normalized 226 CORR 294, 296, 298, 301, 303 Correlation 139–40, 151–52, 155, 180–81, 247–48, 250–51, 253, 265, 308–9 Correspondence controller 122–23, 133, 141–43 Correspondence metrics 113, 233 Correspondence scores 116, 257, 261, 283, 285, 295, 297, 299, 302 – final 247, 305 Count 81, 138–40, 179, 215, 225, 229, 232– 33, 236, 321–22 Coverage 254, 267, 287, 294, 296, 299, 301, 303 Criteria 147–48, 225–27, 233, 235–36, 264–67, 273–74, 321 CSTG (C, shape S, texture T, and glyphs G) 18, 85 CSV agent calls 283–84 CSV agent parameterizations 119–20 CSV agents 124, 144–45, 147–48, 166, 276–77, 283–86, 289, 292, 305–6 – predefined 144, 166 CSV Metric Functions 234–35 CSV Texture Functions 247, 267–68 CSVs (correspondence signature vectors) 97, 116, 119, 123–25, 127, 144–48, 277, 284, 324 Custom agents 122, 127, 133–34, 166, 279, 286, 320 – registered 123 CVPR (Computer Vision and Pattern Recognition) 13, 328–29, 332, 334–35
Index D Dahlem Workshop on Biomedical Pattern Recognition and Image Processing 331 Dark 121, 311–12, 314–15, 317–19 Data structures 81, 119, 127, 145–46, 331 Database 9, 86, 92, 131, 135–36, 158 Default agents 124–25 Default CSV agents 97, 121, 123–24, 133, 144–45, 148, 166, 279, 285 Deg 172, 294, 296, 299, 301, 303, 308 Degrees 40, 151, 153, 179, 182, 191–93, 227–28, 240, 288 Delta 110, 139–40, 151–56, 216–18, 226, 229, 234, 264–67, 273–74 – volume centroids 228 Delta5, 156, 234–35 Delta8, 111, 151–52, 156, 234–35 Density 230, 254, 267, 287, 294, 296–97, 299, 301, 303–4 Dependent metrics 99–100, 123, 144, 230, 305–6, 324 Depth processing, stereo 28, 46 Descriptors 80–83, 148, 195, 197, 225, 231– 32, 235, 270 Digital cameras 39–40, 43–44, 46 DIR 134–35, 158, 280–81, 308 Directory 130–31, 134–35, 158, 160–67 Disp 120, 293, 295, 298, 300, 302 Displacement 138–39, 156, 181, 230, 234, 309 Distance 28, 41–42, 82, 91–92, 128–29, 149, 264, 330 Distance functions 29, 127–29, 175–76, 179–80, 211, 244–45, 247, 305–6, 308 Distance metrics 91, 137, 180, 182, 209, 247 Distillation 16, 327 Dithering, saccadic 28, 44, 52, 56, 65, 278– 79 DLLs 122, 125, 127, 132–34, 142–44, 320 DNA 20, 25, 75, 78, 85, 91, 100, 107 – human 2, 19–20, 37, 78–79, 85 DNN model scores 14, 148 DNN model weights 12–13, 69 DNN models 11, 13–16, 27, 30, 32–33, 69, 94, 106, 272 – trained 12, 15–16, 272 DNN spoofing 11, 13, 15, 94 DNN training 54, 95, 105, 277
DNN/CNN One-Shot Learning 34–37 DNNs 2, 11–16, 30–35, 69, 90–95, 107–8, 148, 272–74, 276–77 DNNs learning 94 Double popularity 139, 211 Double weight 147–48, 151, 153, 216–18, 234–35, 264–67, 273–74, 321 E ECCV 327, 329, 332–33 Emulate 52, 54, 56, 65, 68, 94–95, 121, 125, 282 Enhance Local Contrast 60 Ensemble 7, 12, 14–15, 94, 96, 120, 124, 283, 286 Error learning 107–8, 123 European conference on Computer Vision 327, 332 Experience 25–26, 55, 57, 68–69, 91, 199 Eye 28, 39–41, 43–44, 46, 48–53, 65, 70, 199–200, 218 Eye model 28, 39, 88, 199 Eye Model Color Ranging 199, 201, 203, 205, 207, 209, 211, 215 Eye/LGN model 39–40, 42, 44, 46, 48, 58, 95, 108–9, 129–30 Eye/LGN Visual Genome Sequencing Phases 52–53, 55, 57, 59, 61, 63, 159 F F-18 ceiling region 191–94 False coloring, blue 280–81 Feature comparison metrics 7, 98 Feature comparison scores 273–74 Feature descriptor value 231, 233 Feature descriptors 5, 33, 42, 70, 79–80, 92, 231, 233, 269 Feature learning 25, 69, 90, 92–93, 99 Feature memory 72–73, 76 – unlimited 5, 24 Feature metrics 4, 6, 8, 63–64, 73, 77–78, 91, 97–99, 127–29 – best 119, 123 – combination of 76–77 – independent 8, 85 – selected 119, 124–25 Feature metrics functions 86, 129 Feature metrics generation 63, 137
Index
Feature weights 12, 34–36, 69, 95, 99 Field 20, 42, 46, 48, 51, 55, 127, 195, 239 Filedir 61, 160 Filename 137–39, 161–67 Files 58–59, 61–64, 83–84, 86, 130–32, 134–38, 142, 159–64, 166 \ Flag 64, 143, 159, 203, 225, 235 Floats 30, 178, 203, 211 Focal planes 39–40, 46 Focus 23, 28, 39–40, 42, 44–45, 50–51, 53–54, 283, 292 Foliage 310–12, 315, 318–19 Foliage.strand 311–12, 315, 318–19 Forest.strand 312, 317 Function signature 152–53, 158–60, 232 Functions 110–11, 133–34, 141–42, 148, 150, 152, 198, 225–26, 272–74 G GANs (Generative adversarial networks) 13, 31, 33, 94, 328 Genome centroids 226 Genome comparison structure 160 Genome comparisons 110, 143, 276–77, 285, 292–93, 299, 305, 308–9 – reference/target 7, 286 Genome count 280–81 Genome Gn 222, 224 Genome IDs 10–11, 63, 118, 131–32, 232, 236 Genome image splitter 157, 159 Genome images 230–31, 240, 245 Genome matches 144, 309 Genome metrics 108, 159, 176 Genome orientations 182 Genome pairs 166, 275, 308, 310, 314, 317 Genome pixel region 84, 86 Genome regions 52, 63–64, 70–71, 119–20, 123–24, 128–29, 158–60, 163–64, 208–9 Genome segmentations 277–79, 281, 283, 285, 287, 289, 291, 293, 295 Genome selection 275–76 Genome sequencing 85, 134, 162 – initial 20, 96 – visual 52, 84–85 Genome space 10, 176 GENOMECOMPARE 138, 142, 160–62
Genomes 80–86, 134–43, 147–53, 155, 180–81, 247–48, 264–67, 272–77, 303–9 – brush 292, 297 – char 138–39 – common 11 – first 80, 82 – intermediate 221 – last 80, 82 – new 6–7, 85, 106, 118, 131 – primary 82, 224 – recurring 118–19 – selected 163–64, 166, 282 – single 143, 231 – stucco 292, 294–95 – terminal 221–22 – terminating 82–83 – unique 85, 118, 131 Genomes matching 11, 104 Genomes spanning 219–20 Global histeq 58, 130, 159 Glyph bases 156, 231, 272–73 Glyph features 149, 219, 231, 269–70 Glyphs 13–14, 31, 33, 74–76, 148, 231–32, 236, 269–70, 272–74 GMCs (Group Metric Classifiers) 100, 144, 146–47, 152, 167, 219 Gradients 12, 14, 33–34, 269 Graphs, texture metric 257, 261 Green 151, 153, 155–56, 216–18, 234–35, 264–66, 273 Group metric classifiers 146–47, 219 Groups 3, 5, 91–92, 146–48, 247, 250, 267, 269, 307 – agent security 106 – test 307 GSURF 156, 231–32, 237, 273–74 GSURF COLOR 232, 236 H Hamming 139, 154, 211, 218, 309 Haralick 138–39, 148, 155, 248–50, 257, 264–67, 297, 299, 331–32 Haralick deltas 264–66 Haralick Features 148, 158, 248–51, 265 Haralick metrics 59, 155, 239, 248–50, 254, 264, 266–67, 305 Haralick texture 294, 296, 298, 301, 303 Haralick texture metrics 239, 248, 267
Index HDR (high dynamic range) 40, 44, 46 Head 287–88, 292–94, 296, 298, 301, 303, 306, 311, 313 Head genomes, right 292, 301–2 Hellinger 139, 153–55, 202–3, 216–17, 293, 295, 297, 300, 302 Hellinger metrics 203, 208 Heuristics 6, 8, 37, 92, 98, 105, 110, 264, 267 Hidden Markov Models 92, 329 Hierarchy 4–5, 24, 33, 77, 90, 96, 239 High dynamic range. See HDR Highlight 107, 163–65, 195 High-texture genome region 187–88 Histeq 108–11, 137, 157–60, 216–18, 230– 31, 234–36, 266, 273, 305 Histogram metrics 216–17 Histogram8bit 138–39, 153–54, 216 Histograms 60, 138, 197, 199–203, 216, 248, 269, 291 – cumulative 201 – normal 201 – standard color 215–16, 289, 291 HLS 198–99, 282, 285, 287, 305 HMAX 24, 26, 69–70 HPS (hypothesis parameter set) 55 HS 199 HSL 151, 153, 198–99, 204, 216–18, 234– 35, 264–66, 273, 297 HSL saturation leveling 206–7 HSV 197–99 ] Hue 146, 153, 156, 196–97, 199, 231–32, 236–37, 270, 273–74 Hull 8, 97, 109–10, 112–14, 117, 160, 324 – sharp 113, 117 Hull learning 97, 108, 114, 116, 284 Human Connectome Project 9, 21, 334 Human genome 19–20, 84, 107 Human Genome Project 1, 6, 9, 11, 19, 37, 52, 84 Human visual system 1–2, 28, 41–42, 46, 48, 55–56, 129, 197, 282 Hypothesis 23, 27, 43–44, 53, 55, 73, 77, 97, 122 Hypothesis parameter set (HPS) 55
I ID 81, 131, 133–37, 141–44, 180–81, 232, 247–48, 273–74, 320–21 IDM (Inverse Difference Moment) 56–57, 59, 158, 252, 294, 296, 298, 301, 303 Illumination 34, 36, 197, 204–5 Image courtesy 20–21 Image files 134, 225, 235 Image matting 54–55 Image metrics 110–11, 282 Image pre-processing 28, 42, 60, 65, 88, 122, 277 Image pre-processing parameters 65, 158, 220 Image processing 44, 195, 207, 218, 239, 328, 331–32 Image pyramid 172, 177 Image regions 130, 169 Image registry 127, 131, 134 Image resolution 56–57, 159, 277, 324 Image segmentation 55, 281 Image size 34–35, 42, 54 Image spaces 106, 114, 121, 160, 230, 257, 324 – five input 185 IMAGE SUMMARY 264–65 Image texture 56, 158 Imagefilesdir 163–67 IMAGE.genome 64, 137 ImageJ 328 ImageJ Fiji 55, 59 ImageJ script code 59–60 ImageJ/Fiji 207 Imagepath 158 IMAGE.png TEST 308 Images – all-in-focus 44, 46 – best 151, 153, 216–17 – blur 130, 159 – blurred 48, 58 – current 73, 163–65 – natural 12, 15, 186 – normalized 196 – original 34–35, 201, 205–6 – retinex 152, 287 – segmented 54, 62 – sequenced 163–65 – single 32, 232, 277, 282 – small 35
Index
– source 134, 136, 254 – target 111, 123, 142–43, 224 – top 79, 190, 194 INDEXES 138–40, 178, 198 Inference 12, 14, 34, 36, 88, 93, 96, 101, 105–6 Input images 89, 93, 131, 135–36, 158, 181, 185, 209–11, 215 Input spaces 170, 172, 174, 194 Inputs 23–24, 91, 94, 169–70, 172, 174, 178, 235, 240 Intelligence, artificial 1, 8–9, 19, 88–90, 334 Invariance 12, 22, 81, 114, 121, 127, 129–30, 245, 271–72 Invariance attributes 12, 36, 108, 129–30 Invariant 22, 24, 26, 129, 145–46, 196–97, 231, 321–22 – image colors 196, 204 IsValidNumber 112–13 Iteration 35, 95–96, 203 J Jensen Shannon 139, 152–54, 203, 217, 293, 304 Jslic 59–61, 64, 136–37, 158–59, 280, 282 JSLIC superpixels 61 K Kernels 2, 92, 172, 242, 328, 330 K-MEANS 208–9 L LAB 151, 154, 165, 198–99, 204, 215–17, 234, 264–66 Labels 6, 94–95, 98, 101, 103, 105–6, 196 Latent Dirichlet allocation (LDA) 92 Lateral geniculate nucleus. See LGN LBP 119–20, 145–46, 173, 182, 293, 295, 298, 300, 302 LBP volumes 240, 242–43 LDA (Latent Dirichlet allocation) 92 Leaf.strand 312, 317–18 Learning 6–9, 25–26, 88–89, 91, 97–105, 107–8, 137, 160–62, 327–30 – associative 31–32, 101–2 – classifier 7, 99, 108, 116, 275 – continuous 6, 31, 37, 96–97, 101
– deep 1–2, 4, 8, 22, 32–33, 327, 329, 334– 35 – human 23, 87–88, 91, 101, 122 – refinement 90–91, 307 – unsupervised 13, 90, 93 Learning agents 1, 9, 33, 37, 84, 97, 104–5, 283 – intelligent 95–96 – master 124–25 – visual 3, 8 Learning and reasoning agents 87–88, 90, 92, 94, 96, 98, 100, 102, 104 Learning color names 197, 328 Learning models 32, 35, 88–91, 107 – machine 91–93, 332 Leaves.strand 311–12, 314, 318–19 Left head 287–88, 292, 294, 296, 298, 301–3 Length 81, 225–26, 232, 235, 294, 296, 299, 301, 303 – unit 82, 221–22 Leveled8bit 110, 138–39 Leveling 163, 165, 195, 204–5 Levels 51, 110, 138–39, 151–52, 175–76, 198, 216–17, 250, 264–66 – higher 24, 70, 88, 90 LGN (lateral geniculate nucleus) 24–25, 27– 28, 39, 43–49, 52–54, 69–70, 157–58, 195, 218 LGN Image Enhancements 24, 46, 52–53 LGN images 170 LGN Model 4, 43, 45, 47, 49, 51, 53, 272, 282 LGN Model Dominant Colors 197, 207 Lighting 139, 145–46, 152–54, 200, 217, 293, 295, 297, 300 Lightness 39–40, 110, 199, 203, 205–6 Linearity 254, 257, 267, 287, 294, 296, 299, 301, 303–4 Lines, command 158–60, 162 List 80–83, 135–36, 141–42, 147, 163–64, 166, 207–10, 232, 273 Local contrast 48, 60, 185 Local memory 3, 67, 73–74 Locus 254, 267, 287, 292, 294, 296–97, 299, 301, 303–4 Lo-res 287–88, 292–94, 296, 298, 301, 306
Index Luma 145–46, 151, 155–56, 199, 216–18, 230–32, 234–37, 264–66, 273 Luminance 47–48, 122, 135, 179 M Machine learning 88–93, 95, 327, 329–30 Magnitude 110, 174, 176, 224, 231–32, 271 Magno 43–45, 47–48, 56, 76, 135, 158–59, 173, 179, 281 Magno cells 42, 44–47, 52–53 – larger 44, 47–48 Magno features 5, 47–48, 53, 65, 119, 278– 79 Magno image preparation 53, 59 Magno segmentations 42–43, 51, 57, 278 Magno strand 279, 281 Maps 21, 72, 106, 172, 215, 226 Mask 48, 54, 60–61, 136–39, 160, 163, 167, 178–79, 284 Mask files 58, 63, 134, 137, 160 Master, metricsfile 161 Master learning controller 123–24, 144, 167, 279, 284, 306 Master volume 186 Mastergenome 64, 137, 161 MasterLearning 144 Match 112–13, 145–49, 225–27, 235–36, 273–76, 283–84, 293–303, 309–19, 321–22 – best 143, 176, 202, 208, 224, 233 – close 275, 307 – double 112, 321 – relaxed 147–48 – stricter 147–48 MATCH agent 293, 295, 297, 302 Match criteria 120, 122, 124, 145, 289 Match criteria parameters 147, 283 Match ground truth 284, 306 MATCH scores 233, 310, 313 MATCH test genome pairs 309–10 MATCH.txt 308 Matrix 10, 31, 132, 170, 173, 249, 321–22 MAX 110, 113, 124, 138–39, 151, 216–18, 234–35, 264–66, 273–74 Max8, 138–39, 153–54, 216 MC (motor cortex) 27, 31, 88, 97 MCC classifiers 105, 117–18, 151, 278, 283 MCC Function Names 151
MCC functions 108, 110–12, 117, 143, 149– 50, 152–53, 224–25, 306–7, 324 MCC functions for glyph bases 272–73 MCC texture functions 264 MCCs (Metric combination classifiers) 111, 117–18, 127, 137, 139, 143–44, 150–53, 155, 167 Mean 254, 267, 287, 292, 294, 296–97, 299, 301, 303–4 MEDIAN 208–11, 292, 295, 297, 299, 302, 306, 313, 316 Memory 6–7, 23–25, 30–32, 68–70, 74, 76, 96, 100–102, 104–6 – global 72–73, 78 – short-term 23 Memory impressions 23, 25, 29, 70, 101 Memory model 3, 30, 67, 101–2, 118, 141 Memory model and visual cortex 67–68, 70, 72, 74, 76, 78, 80, 82, 84 Methods – morphological 55, 61–62 – superpixel 55, 61–62 Metric Combination Classifiers 127, 139, 143, 150–51, 153, 155, 167 Metric comparisons 97, 108, 123, 245, 261, 275, 278, 284, 289 – first order 277, 304 Metric function 29, 128, 180, 198 Metric learning 8, 114, 332 – qualifier 99, 284 Metric scores 203, 286 – first order 304–5 Metric selection 275, 284, 307, 309 Metric spaces 3, 5, 9, 86, 108–9, 114, 117, 267, 269 – supported 97, 108 Metric1, 112–13 Metric2, 112–13 &metrics 143 Metrics – best 123, 144, 146, 279, 284, 286, 305 – best scoring 119, 284 – glyph 272, 274, 286 – largest 230 – qualifier 99, 114, 144, 306, 322, 324 – sliding 202–3, 208 – statistical 227, 250 – test 285, 292, 294, 297, 299, 301 – total 180, 305
Index
– trusted 99, 144, 306, 324 – trusted metrics and dependent 99 – uniform set of 286, 292, 304–5, 307 – volumetric projection 26, 169, 176 Metrics files 58, 63–64, 134–35, 137, 158, 160, 162 Metrics functions 68, 78, 96, 219 Metrics tuning 276–77, 307, 309, 313, 316 – qualifier 99, 264, 275, 305 MetricsRef 134 MetricsTarget 134 Mimics 55–56, 94, 107 MLC (Master Learning Controller) 99–100, 123–24, 144, 167, 279, 284, 306 Model 1–3, 9–10, 12–13, 31–34, 42–44, 53– 54, 67–70, 89–95, 105–7 – cat 93 – view-based 5, 22 Morpho 59–61, 136, 158–59, 281–82 Morpho segmentation 136, 278–79 Mulch.strand 317, 319 Multivariate 2, 33, 75, 77, 90, 125, 197 N Nature Neuroscience 329, 333 Neural clusters 3–4, 71, 174–76, 186, 194, 227, 229–30, 237, 239 Neural information processing systems 327–29, 335 Neural networks 89, 327, 330, 335 Neurogenesis 25–26 NeuroImage 328–29 Neurons 24–26, 28, 33, 169, 175, 229, 239– 40, 245, 333–34 Neuroscience 21, 25–26, 68, 72–73, 79, 89, 93, 328–29, 333–34 \ NOMATCH 292, 295, 297, 307, 314–15, 319 NOMATCH scores 314, 316 NOMATCH test genome pairs 313–14 Nonmatches 261, 276, 306–7, 313 NORMAL 119–20, 145–46, 233, 236, 264– 67, 273–74, 293–98, 300–304, 308–9 Normalize 59, 82, 196, 221, 245 NULL 134, 143, 320–21 Number 6–7, 75, 118–19, 134, 158, 175, 178, 209, 229
O Object recognition 89, 327–28, 332–33, 335 Objects, higher-level 1–2, 17, 84, 92–93, 236 Occlusion invariance 81–82, 84, 130, 150 Optimizations, color palette 197 ORB 90, 129, 146, 149, 156, 172, 231–32, 236–37, 269–74 Orientations 138–39, 170, 172–73, 178–79, 191–93, 231–32, 239, 241–42, 249–50 – genome ABCD 240 Outliers 174, 233, 245, 253 OVERRIDES 124, 145–46, 321–22 OVERRIDES agent 147–48, 267 P Parameter options 151–52, 162, 247, 324 Parameter settings 55, 123–24, 166 Parameterizations 124, 323–24 Parameters 117, 146–49, 151–53, 158, 164, 166–67, 216–18, 234–35, 265–67 Parvo 44–45, 47, 51–52, 61, 76–77, 135, 137, 158–59, 280 Parvo cells 28, 44–45, 47–48, 52–53 Parvo features 4, 47–48, 53, 65, 170, 278– 79 Parvo images 42, 57, 135 Parvo resolution 51–52, 158 Parvo scale images 42–43 Parvo segmentations 43, 51, 56, 278 Parvo strand 279–81 Parvo strands in Figure 280–81 Patch.strand 311, 313 Pathname 133, 135–36, 159 Pattern recognition 327–29, 331–34 Permutations 20, 111, 121, 257, 324 Persephone 124, 133, 145 PFC 23, 27, 31, 87–88, 96–98, 101–3, 125 Phases 11, 52, 140 Photographic memory 70, 72 Pieces.strand 312–13 PIT 27, 68, 70, 73 Pixel component values 243–44 Pixel kernel krgb 243–44 Pixel region 42, 78, 84–85, 171, 248 Pixel values, adjacent 186, 253 Pixels 32, 41–44, 56–58, 70, 85–86, 136, 138–39, 171–72, 269
Index Plausible range 108–10 Png file 59, 134 Pointer 85, 163–64, 167, 334 Popularity 138–39, 151–52, 154, 208–11, 217–18, 289, 292–93, 302, 309 Popularity algorithms 30, 197, 207–10 Popularity color map 211, 216 Popularity color spaces 198, 208 Popularity colors 166, 208–9, 215, 289, 292 Popularity maps 166, 209, 211, 216 Popularity method 208–9 PREJUDICE 100, 119–20, 124, 145–46, 264– 67, 273–74, 293–98, 300, 302–3 Proceedings 327–35 Processing centers 30, 56, 67–68, 72–73, 100 Proxy agents 6, 23, 51, 93, 98, 101–2, 105, 143 – conscious 97–98 Pyramid, bit scale 64 Q Quantization 173, 176, 179, 181–82, 230, 334 Quantization level 6, 186, 241–42, 244 Quantization space pyramids 137, 176 Quantization spaces 24–25, 170, 172, 175, 178–79, 181–82, 227, 229, 242–43 Quantized volume projection metric renderings 186–87 R Radius 59–61 Range 61–63, 89, 96, 119–20, 122, 191–93, 195–200, 248–50, 269–70 Ratio 112–13, 117, 230, 250 Ratio metrics 227, 230 Raw 108–12, 145–46, 157–61, 215–18, 230– 32, 234–36, 264–66, 287–89, 304–6 RAW AGENT 304 Raw image 55, 58–59, 110, 121, 160 Raw image baseline 109–10 Reasoning 4, 6, 9, 22, 88–89, 96–97, 101– 3 Reasoning Agents 87–88, 90, 92, 94, 96– 98, 100, 102, 104, 106 Reasoning model 3, 89, 91, 97, 125 Reasoning Model Overview 96–97, 99, 101, 103, 105, 107, 109, 111, 113
Red 151, 153, 155–56, 198, 216–18, 234–35, 264–66, 273 Reddish 312, 314–15, 319 Reference 109–12, 143, 203–4, 224–29, 235–36, 245, 273–74, 284, 320–21 Reference genome 116–17, 123, 142–43, 211, 228, 275, 282, 284–86, 305 Reference genome metrics 142, 284 Reference histogram 200, 202 Reference hull 117 Reference image 111, 130, 135, 245 Reference strand 149, 224–25, 233, 235– 36, 280 Reference strand SR 224 Region glyph metrics 269–70, 272, 274 Region metrics 195–96, 198, 200, 202, 204, 206, 208, 210, 214 Regions, ceiling 187, 194 Regions.strand 318 Registry 127, 130, 133, 135–36, 140–44, 320–21, 325 Reinforcement learning 90, 93–94, 116, 122–24, 275–76, 304–5, 313, 316, 322 Reinforcement learning process 119, 123, 284–85 RELAXED 113, 117, 119–20, 145–46, 264–67, 273–74, 293–98, 300, 302–3 Renderings 13, 82, 166, 175–76, 187, 190, 194, 206, 228 Research 4, 7–8, 12, 20–25, 54–55, 70, 91, 195, 197 – collaborative 1–3, 19 Resolution 48, 65, 70, 72, 138–40, 159, 176–78, 248, 250 – 8-bit 118–19, 177, 198, 228–29, 270 – lower 47–48, 53, 119 Resolution images 52, 276 RESULT match 147–48, 156, 225–27, 233, 235–36, 267, 274 Retina 23, 39–43, 45, 47–48, 51, 75, 103 Retinex 108–13, 117, 151–53, 157–61, 215– 18, 230–32, 234–36, 264–66, 305 Retraining 16, 34, 37, 91 RETRY 145–46 Return 74, 112–13, 148, 151–53, 233, 264, 266 Return STATUS 320–22 Return value 147–48, 225–27, 235–36, 267, 274
Index
RGB (red, green, blue) 119–20, 135, 145–46, 196–99, 216–18, 230–31, 234–35, 264– 66, 288 RGB color 33, 48, 172, 177, 185, 198, 209, 279 RGB color component 172, 209, 242 RGB genome image 146 RGB Textures 240, 242–43 RGBI 48, 77, 108, 119, 121, 156, 199, 204, 273–74 RGBI color spaces 153, 156, 243 RGBI colors 39, 110 RGBVOL 304 RGBVOLRAW 294, 296, 298, 301, 303 Rods 28, 40–42, 44–45, 47–48, 53, 69, 75 Rods and cones 39, 41–43, 172, 207 Root path 163–67 Rootpath 163–67 Rotation 26, 28, 129, 145–46, 149, 231, 271, 275, 277 S SAD 139, 153–55, 202–3, 216–18, 293, 295, 297, 300, 302 SAD and Hellinger metrics 203, 208 Saturation 156, 198–99, 206–7, 209–11, 215–17, 231–32, 264–66, 270, 273–74 Scale 24, 26, 35–36, 53, 77, 111–12, 129– 30, 231, 277 Scale invariance 83, 129, 172 Scaled.strand 310–12 Scanning 51–54, 70 SCORE 111–13, 116–17, 147–53, 156, 225– 27, 232–36, 264–67, 273–74, 306–19 – double 147, 152 – final 124, 143, 148, 150–51, 153, 225, 267, 305, 308 – lowest 151–53, 216–17, 234 Scoring 112–13, 151, 153, 216–18, 234–35, 264–66, 273–76, 304–6, 309 Scoring criteria 148, 306, 322 Scoring parameters 113, 117 Scoring results 275, 282, 292, 305, 308–9 SDMs (spatial dependency matrices) 59, 239, 248–49, 251, 253, 266, 268, 286 SDMs, degree 253–54 SDMX 138–39, 148, 155, 158, 239, 264–67, 297, 299, 304 SDMX deltas 265–66
SDMX Metric Comparison Graphs 257, 259, 261, 263, 265, 267 SDMX metrics 155, 239, 253–54, 264, 266– 68, 286, 305 SDMX Texture 285, 294, 296, 299, 301, 303 Search 1, 84–85, 142–43, 163–64, 321, 325 Sections.strand 312 Segmentation files 134, 136, 159, 167 Segmentation masks 58, 167 Segmentation parameters 55–56, 59, 65, 220 Segmentations 42–44, 51–52, 54–61, 63, 85, 136–37, 158–59, 218–20, 278–79 – global 157–59 – multiple 54, 56, 61, 136, 220, 282 – multiple-image 219, 282 – single-image 219, 282 – tried 55 Segmented regions 42–43, 54–55, 58, 62– 63, 74, 79, 85, 181, 184 Segmenter 56, 61–62, 158 Segmenting 42, 52, 54–55 Segments 43, 52, 55–57, 61–62, 85, 158– 59, 163, 165, 204 Selected metrics 86, 123, 276, 284, 292, 294, 297, 299, 301 Selected uniform baseline test metrics 275, 286 Sequence 4, 6, 9–11, 83, 85, 117–18, 123, 134, 320 – human genome 20 Sequencer 133–34, 137, 141–42, 321 Sequencer controller 122–23, 133–34, 141– 42, 167 Sequoia 311–12, 317–19 Sequoias genome 188 Shaded.strand 310, 318 Shades 204, 311, 315, 317–18 Shade.strand 310 Shape metrics 219–20, 222, 224, 226, 228, 230, 232, 234, 236 – genome structure 219, 231 SHARP 110–11, 151, 153, 161, 215–18, 234– 36, 264–66, 273, 305 Sharpen 56, 135, 158, 324 SHELL DISPLAY 163, 165–66 Shrub.strand 317 Side.strand 311, 314, 317
Index SIFT (scale-invariant feature transform) 2, 90, 93, 146, 149, 156, 172, 231, 269–74 Signatures 127, 146–49, 309, 321–22 Sky.strand 308, 310, 312, 314–15, 317–19 Sl 289, 293, 295, 297, 300, 302, 304 SLOTS 110, 138–40, 151, 203 Space – alignment 121, 325 – color-leveled 197, 289 – local feature tensor 231–32 – strand-relative coordinate 82, 130 Spatial relationships 12, 14, 17, 33–34, 36, 46, 48, 77–80 Split blur RGB channels 59 Spoofing 12–13, 15–16, 94, 307 Squirrel 275–76, 278, 280–81, 288, 294– 95, 311–13, 315, 317–19, 321–22 – front 287, 292, 294–96, 298, 301, 303, 306 Squirrel.png-0, 135, 137 Squirrel.strand 311, 315, 319 Standardcolors 309 STATUS 133, 135–36, 141–44, 156, 158, 203, 273, 321 STATUS agent 134, 320–21 Stdcolors 163, 289, 293, 295, 297, 300, 302 Strand AGENT 311 Strand correspondence 149, 225 Strand Editing 149 Strand file name 164–66 Strand files 135, 166, 280–81, 308 Strand genome regions 165 Strand genomes 149, 225 Strand metrics 82, 149, 225, 282 Strand model 80–81, 225 Strand objects 219, 277 Strand registry 127, 135 Strand shape metrics 226 Strand structures 81, 225 Strand Topological Shape Metrics 219, 221, 225, 227, 229, 231, 233, 235, 237 Strandfilename 163–65 Strands 80–84, 134–36, 163–66, 219–22, 224–27, 231–33, 235–37, 279–84, 317– 22 – candidate 225 – char 81, 136, 232 – genome structure 231, 233 – multi-image 220, 225–26, 232, 282
– multiple-image 219–20 – new 101–2 – reference strand 225–27, 235–36 – single-image 219–20, 225–26, 282 – target 222, 225, 233, 236, 284 Strands and bundles 17–18, 69, 78–80, 97, 102, 122–23 Strands and genome segmentations 277– 79, 281, 283, 285, 287, 289, 291, 293, 295 Strength 147–49, 225–27, 235–36, 267, 274, 299, 303–4, 309, 321–22 STRICT 119–20, 145–46, 233, 236, 264–67, 273–74, 293–98, 300, 302–3 Struct 110, 137–39, 143, 264–66 Struct color 138–39 Struct metrics 142 Struct strand 225–26 – typedef 81, 136 Structure 86, 90, 134, 137–38, 141–42, 156–57, 230, 232–33, 236 Structure strand 232, 236 Structured classifiers 7, 36–37, 99, 105, 113, 117–18, 275, 283, 322 Stucco 287–88, 304, 306, 311, 315, 317–18 Stucco.strand 311–14, 318–19 Support vector machine. See SVM SURF 81, 146, 149, 269, 271–72, 333 SVM (support vector machine) 7, 88, 90–92, 96, 99, 175, 330 Synthetic learning 87, 89, 91, 125 Synthetic learning and reasoning model overview 96–97, 99, 101, 103, 105, 107, 109, 111, 113 Synthetic model 1–3, 46–47, 54, 68, 70–71, 73–74, 88–89, 96–98, 101–4 Synthetic vision 1–2, 4, 6, 8–10, 12, 14, 16, 18, 20 Synthetic vision model 1–3, 11, 27, 32, 97, 167, 323 Synthetic vision pathway architecture 27, 29, 31, 33, 35 T Table 32–37, 74–75, 119–21, 153–56, 160– 67, 180–82, 286–89, 292–306, 309–19 Target DNN model 94 Target genome centroids 228
Index
Target genomes 111, 134, 143, 203, 208, 245, 257, 284, 320 Target histogram 200, 202 Target strand of candidate genomes 225– 27, 235–36 Target strand ST 224 Target volumes 245 Terminating genome centroids 81–82 TEST 13, 102–3, 163, 165–66, 275, 280–81, 283, 286, 305–8 Test application outline 276–77 Test genome pairs 285–86, 292 Test genomes 276–77, 285–91, 304–5, 322 – selected 123, 276, 292 Test genomes and correspondence results 275, 277, 285 Test images 12, 15, 36, 69, 88, 94, 275–77, 279, 308 Test sets 88, 197, 306–7 Test strands 166, 307 Textdata 61, 137 Texture 74–78, 138–39, 144–48, 239, 242– 44, 267, 307–9, 321–22, 331 – component 240 – primal 75 Texture analysis 59, 331–32 Texture base 26, 74, 155, 265–66 Texture dissimilarity 257, 261 Texture features 48, 332–33 Texture file 64 Texture metrics 129–30, 239–40, 242, 244, 248, 250, 256–58, 264, 266–68 Texture spaces 97, 108–9, 114, 286 Texture threshold bands 57, 59 Threshold 12, 113, 138, 159, 225, 233, 236, 299 Tolerance 59–61, 145–46, 225, 235 – influence match 225–27, 235–36 Top row surface renderings 189, 191–93 Training data 12, 31–33, 36, 92–93, 99 Training image pre-processing 35, 95 Training images 14–15, 33–35, 58, 88, 93, 103, 269, 276–77 Training iterations 95, 277 Training parameters 35, 95 Training protocols 7–8, 88–90, 95–96, 284 Training set 2, 8, 12–13, 25, 30, 32, 69, 94– 95, 306
Transfer learning 16, 25, 32, 37, 69–70, 85, 90–91 Trial and error learning 107–8, 123 Trunk.strand 312, 317–18 Tune 94, 99–100, 119, 147, 275, 284, 306 Tuning 95, 116, 125, 286, 292, 294, 297, 299, 304–8 U Uniform metrics comparison scores 297, 300 Unique genome IDs 10, 118 Unit tests 275–76, 307–8, 322 V Value 109–10, 112–13, 119, 122, 172, 203–4, 208, 240–41, 250 VDNA (Visual DNA) 1–4, 6, 8–14, 16–18, 28–34, 36–37, 73–79, 84–85, 121–22 VDNA and visual genomes 95, 101 VDNA application stories 11, 13, 15, 17, 34 VDNA bases 17–18, 78, 85 VDNA catalog 9–11, 14, 18 VDNA feature metrics 6, 33, 75–76, 78 VDNA metrics 8, 11, 33, 58, 84, 86, 119, 122 VDNA model 11, 13–14, 31, 68, 78–79 VDNA sequencing 84–86, 118, 127 VDNA strands 13, 17, 69, 73, 79–80 Vector length 83, 221, 226 Vector unit angles 83 Vectors 8, 36, 81–82, 85, 146, 149, 221, 225, 235 VGM (visual genome model) 3–8, 23–26, 37, 67–70, 93–99, 101–3, 127–28, 269–70, 275–77 VGM API 11, 44, 78, 122–23, 140, 324 VGM classifier learning 29, 99, 284, 306 VGM cloud server 127, 325 VGM database 130–32, 167 VGM model 12–13, 23, 25, 89, 91, 93, 95– 96, 102–3, 105 VGM Platform Controllers 149, 157, 159, 161, 163, 165 VGM platform overview 127–28, 130, 132, 134, 138, 140, 142, 144, 146 VGM texture metrics 239, 267 VGM volume learning method 309, 313 Vgv 157, 162–64, 166, 204, 279–81, 307–8 Vgv HIGHLIGHT 163–64, 280–81
Index Vision 22, 24, 26, 75, 79, 87–88, 332–34 Visual cortex 23–24, 43–44, 46, 67–70, 72–76, 78–80, 82, 84, 101 Visual cortex memory 51–52, 70 Visual cortex models 24, 30, 68, 72–73 Visual cortex processing centers 68–69, 72–74, 129 Visual genome format 9, 85–86 Visual genome project 1, 3, 6, 8–11, 18–19, 22, 37, 84, 323–25 Visual genomes 4, 9–10, 13–14, 33–37, 85, 95–96, 98, 105, 122 Visual genomes database 130–31, 133, 135, 137, 139 Visual genomes metrics files 137, 160 Visual impressions 5–6, 26, 28, 36, 70, 72, 101, 104, 239–40 Visual information 10, 19, 22, 30, 70–71, 170 Visual learning 2, 37, 89, 101 Visual memory 30, 69–70, 73, 75, 84, 86, 96, 98, 121–22 Visual pathway 2, 4, 22–24, 26–27, 44–48, 52, 56, 68, 97–98 Visual pathway models, synthetic 3, 5, 7–8, 11, 22 Vn processing centers 23, 73, 101 Vn regions 68, 73–74, 169 Volume 119–20, 145–47, 174–76, 186–87, 190–94, 227–30, 244–45, 264–67, 288 – 5-bit 228 – histogram 174 – large 6–7, 33, 98 Volume cells 229–30, 244–45 Volume centroids 198, 227, 240 Volume density 188–89, 191–94
Volume feature space 1, 77 Volume learning 1–2, 6–7, 32–34, 37, 77, 93–94, 98, 103, 125 Volume earning and visual DNA 2, 4, 6, 8, 10, 12, 14, 16, 18 Volume Learning and Visual Genomes 34– 37 Volume learning model 8, 34 Volume metrics 176, 180 Volume projection metrics 71, 169–70, 172, 174–76, 178–82, 184, 194, 267, 308–9 Volume projection shape metrics 219, 227 Volume projection spaces 71, 121, 183 Volume projections 175, 183, 186, 194, 227, 229–30, 236, 239–44, 287 Volume region 180–81 Volume renderings 170, 176, 186–87, 189, 194 Volume texture 155, 264, 294, 296, 298, 300, 303 Volume texture metrics 240, 242, 248, 305 Volumetric 33, 77, 145, 264 Volumetric projections 121, 124, 155–56, 176, 181, 234, 265 Volumetric Spaces 156, 234 W Weights 8, 16, 32–33, 35–36, 99–100, 112, 234, 283–84, 321–22 – amount of 151, 153 Window 166, 203, 228, 311–12, 314–15, 317 – shell 160, 163–66 Work-arounds 275–77, 279, 281–82 X, Y, Z Xcentroid 138–39, 226, 254, 267