356 39 13MB
English Pages 348 Year 2020
MACHINE VISION AND IMAGE RECOGNITION
MACHINE VISION AND IMAGE RECOGNITION
Edited by:
Jovan Pehcevski
ARCLER
P
r
e
s
s
www.arclerpress.com
Machine Vision and Image Recognition Jovan Pehcevski
Arcler Press 2010 Winston Park Drive, 2nd Floor Oakville, ON L6H 5R7 Canada www.arclerpress.com Tel: 001-289-291-7705 001-905-616-2116 Fax: 001-289-291-7601 Email: [email protected] e-book Edition 2020 ISBN: 978-1-77407-421-3 (e-book)
This book contains information obtained from highly regarded resources. Reprinted material sources are indicated. Copyright for individual articles remains with the authors as indicated and published under Creative Commons License. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data and views articulated in the chapters are those of the individual contributors, and not necessarily those of the editors or publishers. Editors or publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify. Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement. © 2020 Arcler Press ISBN: 978-1-77407-355-1 (Hardcover)
Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com
DECLARATION Some content or chapters in this book are open access copyright free published research work, which is published under Creative Commons License and are indicated with the citation. We are thankful to the publishers and authors of the content and chapters as without them this book wouldn’t have been possible.
ABOUT THE EDITOR
Jovan obtained his PhD in Computer Science from RMIT University in Melbourne, Australia in 2007. His research interests include big data, business intelligence and predictive analytics, data and information science, information retrieval, XML, web services and service-oriented architectures, and relational and NoSQL database systems. He has published over 30 journal and conference papers and he also serves as a journal and conference reviewer. He is currently working as a Dean and Associate Professor at European University in Skopje, Macedonia.
TABLE OF CONTENTS
List of Contributors .......................................................................................xv List of Abbreviations .................................................................................... xxi Preface.................................................................................................. ....xxiii SECTION I: METHODS AND APPROACHES IN MACHINE VISION Chapter 1
Behavior Fusion for Visually-Guided Service Robots ................................. 3 Introduction ............................................................................................... 3 Measurement Model .................................................................................. 5 Design of Controller .................................................................................. 7 Experiments ............................................................................................. 12 Results ..................................................................................................... 13 Conclusion .............................................................................................. 17 References ............................................................................................... 18
Chapter 2
Dynamic Omnidirectional Vision Localization Using a Beacon Tracker Based on Particle Filter ................. 21 Introduction ............................................................................................. 21 Calibration For Fisheye Lens Camera ....................................................... 24 Rectification For Fisheye Lens Distortion .................................................. 26 Omni-Vision Tracking And Localization Based on Particle Filter ............... 29 Navigation System ................................................................................... 33 Experimental Result ................................................................................. 35 Conclusion .............................................................................................. 38 Acknowledgments ................................................................................... 39 References ............................................................................................... 40
Chapter 3
QoE Assessment of Will Transmission Using Vision and Haptics in Networked Virtual Environment ......................................................... 43 Abstract ................................................................................................... 43 Introduction ............................................................................................. 44 Will Transmission..................................................................................... 45 Assessment System .................................................................................. 48 Force Calculation..................................................................................... 50 Assessment Methods ................................................................................ 52 Assessment Results .................................................................................. 53 Conclusions ............................................................................................. 61 Acknowledgements ................................................................................. 62 Notes ...................................................................................................... 62 References ............................................................................................... 64
Chapter 4
Concept Learning in Neuromorphic Vision Systems: What Can We Learn from Insects? ................................. 67 Abstract ................................................................................................... 68 Introduction ............................................................................................. 69 Higher-Order Learning In Insects ............................................................. 70 Neuromorphic Sensory-Motor Systems .................................................... 72 Representation of Concepts and Conceptual Relationships ...................... 74 Decentralized Computer Vision Systems .................................................. 76 Concluding Remarks................................................................................ 77 Acknowledgements ................................................................................. 78 References ............................................................................................... 79 SECTION II: MACHINE VISION TECHNIQUES IN PRODUCTION / MANUFACTURING PROCESSES
Chapter 5
An Automatic Assembling System for Sealing Rings Based on Machine Vision ....................................................................................... 89 Abstract ................................................................................................... 89 Introduction ............................................................................................. 90 System Composition and Design.............................................................. 92 Robot Dynamic Target Tracking ................................................................ 95 Experimental Results .............................................................................. 105 Summary ............................................................................................... 108
x
Acknowledgments ................................................................................. 109 References ............................................................................................. 110 Chapter 6
Precise and Robust Large-Shape Formation using Uncalibrated Vision for a Virtual Mold ....................... 113 Introduction ........................................................................................... 113 Laser-Assisted Robot Operation Using Camera-Space Manipulation ...... 115 Experimental Verification ....................................................................... 121 Experiment Result And Discussion ......................................................... 124 Summary And Conclusion ..................................................................... 128 References ............................................................................................. 130
Chapter 7
Design of Omni-Directional Tilt Sensor Based on Machine Vision ........ 131 Abstract ................................................................................................. 131 Introduction ........................................................................................... 132 Design of the ODTs ............................................................................... 133 Realization of The ODTs ........................................................................ 142 Experimental Results and Discussion ..................................................... 143 Conclusions and Future Work ................................................................ 146 References ............................................................................................. 148
Chapter 8
Application of Computer Vision Technology on Raising Sow and Procreating of Processing ...................................................................... 149 Abstract ................................................................................................. 149 Introduction ........................................................................................... 150 The Principle of Computer Vision........................................................... 150 The Infrared Remote Sensing Detection System...................................... 152 Image Processing and Image Analysis .................................................... 153 Conclusion ............................................................................................ 157 References ............................................................................................. 158
SECTION III: MEDICAL AND COGNITIVE APPLICATIONS OF MACHINE VISION Chapter 9
Sliding Window Based Machine Learning System for the Left Ventricle Localization in MR Cardiac Images........................................ 163 Abstract ................................................................................................. 163 Introduction ........................................................................................... 164 Sliding Window Machine Learning Approach ........................................ 165 xi
The Proposed Automatic Left Ventricle Detection System ....................... 166 Performance Evaluation ......................................................................... 173 Results Discussion ................................................................................. 176 Conclusion ............................................................................................ 177 Acknowledgments ................................................................................. 177 References ............................................................................................. 178 Chapter 10 Multilevel Cognitive Machine-Learning-Based Concept for Artificial Awareness: Application to Humanoid Robot Awareness Using Visual Saliency .......................................................... 181 Abstract ................................................................................................. 181 Introduction and Problem Stating........................................................... 182 Brief Overview of Multilevel Cognitive Concept .................................... 184 From Cognitive Function to Motion-Perception Architecture .................. 185 From Salient Objects Detectionto Visual Awareness ............................... 189 Conclusion ............................................................................................ 200 References ............................................................................................. 201 SECTION IV: ROBOTICS AND MOVEMENTS RECOGNITION Chapter 11 Catadioptric Omni-directional Stereo Vision and Its Applications in Moving Objects Detection............................................ 207 Introduction ........................................................................................... 207 Omni-Directional Stereo Vision Imaging System .................................... 209 Unwarping of Omni-Directional Stereo Vision Images ........................... 212 Omni-Directional Stereo Image Rectification ......................................... 218 Stereo Matching and Depth Estimation .................................................. 224 Applications on Moving Object Detection and Tracking ........................ 236 Acknowledgements ............................................................................... 237 References ............................................................................................. 238 Chapter 12 Person Following Robot with Vision-based and Sensor Fusion Tracking Algorithm ............................................................................... 241 Introduction ........................................................................................... 241 Person Following Robot ~ Robotics Application of Vision Systems ~ ...... 244 Vision-Based Tracking Algorithm ............................................................ 248 Problems of Tracking.............................................................................. 251 Vision – LRF Sensor Fusion Tracking ...................................................... 255 xii
Conclusion ............................................................................................ 265 References ............................................................................................. 267 Chapter 13 3D Autonomous Navigation Line Extraction for Field Roads Based on Binocular Vision ........................... 269 Abstract ................................................................................................. 269 Introduction ........................................................................................... 270 Image Processing ................................................................................... 273 Navigation Line Extracting ..................................................................... 278 Experimental Results and Discussion ..................................................... 290 Conclusions ........................................................................................... 295 Acknowledgments ................................................................................. 296 References ............................................................................................. 297 Chapter 14 Visual Feedback Balance Control of a Robot Manipulator and Ball-Beam System ........................................................................... 301 Abstract ................................................................................................. 301 Introduction ........................................................................................... 302 Robotic Ball-Beam Control System ........................................................ 304 Ball-Beam System Image Processing ...................................................... 305 Experimental Results .............................................................................. 308 Conclusion ............................................................................................ 311 Acknowledgements ............................................................................... 314 References ............................................................................................. 315 Index ..................................................................................................... 317
xiii
xiv
LIST OF CONTRIBUTORS Mohamed Abdellatif Ain Shams University, Faculty of Engineering Egypt Zuoliang Cao Tianjin University of Technology P.R. China Xianqiu Meng Tianjin University of Technology P.R. China Shiyu Liu Tianjin University of Technology P.R. China Pingguo Huang Department of Management Science, Tokyo University of Science, Tokyo, Japan Yutaka Ishibashi Department of Scientific and Engineering Simulation, Nagoya Institute of Technology, Nagoya, Japan Fredrik Sandin EISLAB, Luleå University of Technology, Luleå, Sweden Asad I. Khan Clayton School of Information Technology, Monash University, Clayton, Australia Adrian G. Dyer Department of Physiology, Monash University, Clayton, Australia School of Media and Communication, Royal Melbourne Institute of Technology, Melbourne, Australia
xv
Anang Hudaya M. Amin Faculty of Information Science & Technology (FIST), Multimedia University, Melaka, Malaysia Giacomo Indiveri Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland Elisabetta Chicca Cognitive Interaction Technology, Center of Excellence, Bielefeld University, Bielefeld, Germany Evgeny Osipov Division of Computer Science, Luleå University of Technology, Luleå, Sweden Mingyu Gao Department of Electronics and Information, Hangzhou Dianzi University, Hangzhou, China Xiao Li Department of Electronics and Information, Hangzhou Dianzi University, Hangzhou, China Zhiwei He Department of Electronics and Information, Hangzhou Dianzi University, Hangzhou, China Yuxiang Yang Department of Electronics and Information, Hangzhou Dianzi University, Hangzhou, China Biao Zhang Aerospace and Mechanical Engineering Department. University of Notre Dame. Notre Dame, IN 46556, México Emilio J. Gonzalez-Galvan Centro de Investigación y Estudios de Posgrado. Facultad de Ingeniería. Universidad Autónoma de San Luis Potosí. San Luis Potosí, S.L.P. 78290, México
xvi
Jesse Batsche Aerospace and Mechanical Engineering Department. University of Notre Dame. Notre Dame, IN 46556, México Steven B. Skaar Aerospace and Mechanical Engineering Department. University of Notre Dame. Notre Dame, IN 46556, México Luis A. Raygoza Centro de Investigación y Estudios de Posgrado. Facultad de Ingeniería. Universidad Autónoma de San Luis Potosí. San Luis Potosí, S.L.P. 78290, México Ambrocio Loredo Centro de Investigación y Estudios de Posgrado. Facultad de Ingeniería. Universidad Autónoma de San Luis Potosí. San Luis Potosí, S.L.P. 78290, México Yiping Tang College of Information Engineering, Zhejiang University of Technology, Hangzhou, China Caiguo Chen College of Computer Science, Zhejiang University of Technology, Hangzhou, China Yun Yang School of Election and Information Engineering, Ningbo University of Technology, Ningbo City, China Abdulkader Helwan Department of Biomedical Engineering, Near East University, Near East Boulevard, 99138 Nicosia, Northern Cyprus, Mersin 10, Turkey Dilber Uzun Ozsahin Department of Biomedical Engineering, Near East University, Near East Boulevard, 99138 Nicosia, Northern Cyprus, Mersin 10, Turkey Kurosh Madani Images, Signals and Intelligence Systems Laboratory (LISSI/EA 3956) and xvii
Senart-FB Institute of Technology, University Paris-EST Créteil (UPEC), Bât.A, avenue Pierre Point, 77127 Lieusaint, France Dominik M. Ramik Images, Signals and Intelligence Systems Laboratory (LISSI/EA 3956) and Senart-FB Institute of Technology, University Paris-EST Créteil (UPEC), Bât.A, avenue Pierre Point, 77127 Lieusaint, France Cristophe Sabourin Images, Signals and Intelligence Systems Laboratory (LISSI/EA 3956) and Senart-FB Institute of Technology, University Paris-EST Créteil (UPEC), Bât.A, avenue Pierre Point, 77127 Lieusaint, France Xiong Zhihui College of Information System and Management, National University of Defense Technology Changsha, P.R. China Chen Wang College of Information System and Management, National University of Defense Technology Changsha, P.R. China Zhang Maojun College of Information System and Management, National University of Defense Technology Changsha, P.R. China Takafumi Sonoura Corporate R&D Center, Toshiba Corporation Japan Takashi Yoshimi Corporate R&D Center, Toshiba Corporation Japan Manabu Nishiyama Corporate R&D Center, Toshiba Corporation Japan Hideichi Nakamoto Corporate R&D Center, Toshiba Corporation Japan Seiji Tokura Corporate R&D Center, Toshiba Corporation Japan
xviii
Nobuto Matsuhira Corporate R&D Center, Toshiba Corporation Japan Yunwu Li School of Technology and Engineering, Southwest University, Chongqing 400716, China Xiaojuan Wang School of Technology and Engineering, Southwest University, Chongqing 400716, China Dexiong Liu National and Local Joint Engineering Laboratory of Intelligent Transmission and Control Technology (Chongqing), Chongqing 400716, China Ching-Long Shih Department of Electrical Engineering, National Taiwan University of Science and Technology, Taiwan Jung-Hsien Hsu Department of Electrical Engineering, National Taiwan University of Science and Technology, Taiwan Chi-Jen Chang Department of Electrical Engineering, National Taiwan University of Science and Technology, Taiwan
xix
LIST OF ABBREVIATIONS AEC
Action elementary component
AER
Address-Event Representation
AI
Artificial intelligence
ANN
Artificial neural network
AGVs
Autonomous Guided Vehicles
BPNN
Backpropagation neural network
CHT
Circular Hough Transform
CAD
Computer-aided diagnosis
CCFs
Conscious cognitive functions
CCVFs
Conscious cognitive visual functions
CNN
Convolutional neural network
DEC
Decision elementary component
DTs
Decision trees
DNN
Deep neural network
DSP
Digital Signal Processor
ECs
Elementary components
EFs
Elementary functions
FBG
Fiber Bragg Grating
FPGA
Field-Programmable Gate Array
FOV
Field of View
GNSS
Global Navigation Satellite System
GC
Graph cuts
HGN
Hierarchical graph neuron
HSI
Hue, Saturation and Intensity
IDE
Integrated development environment
LRF
Laser Range Finder
LSM
Least-Squares Method
LIDAR
Light Detection and Ranging
ML
Machine learning
MRF
Markov Random Field
MOS
Mean opinion score
MONARCH
Morphable Networked Micro-Architecture
MP
Motion-perception
NSFC
Natural Science Foundation of China
OD
Object detector
ODSV
Omni-Directional Stereo Vision
ODTS
Omni-Directional Tilt Sensor
PF
Particle filter
PC
Personal computer
PmC
Polymorphic computing
PROIS
Paraboloid Reflective Omni-directional Imaging System
QoE
Quality of Experience
RANSAC
Random sample consensus
RTK-GPS
Real-time kinematic-global positioning system
ROI
Region of interest
SFC
Salient features construction
SOC
Salient objects classifier
SOE
Salient objects extraction
SSVM
Short-term salient objects visual memory
SURF
Speeded-Up Robust Features
SCD
Sunnybrook Cardiac Data
UCFs
Unconscious cognitive functions
UGVs
Unmanned Ground Vehicles
ULM
Unsupervised learning module
VSAs
Vector symbolic architectures
VAPE
Visual attention parameter estimation
WA
Weighted Averaging
WSNs
Wireless sensor networks
xxii
PREFACE
Computer vision is one of the areas studied by the field of artificial intelligence (AI). The computer vision combines computer intelligence with digitized visual information. Pattern recognition is considered as the heart of the computer vision. Pattern, in the context of computer vision, is defined in different ways: • • •
An arrangement that represents different types of objects and structures; Quantitative description of the object; or Vector representation of an object that contains forms, shapes and attributes. Identification (e.g. recognition) refers to a set of classification techniques, by which the computer seeks to gain the power to recognize the forms, i.e. images, shapes, or sound samples, by converting them into a binary presentation (a set of binary numbers). Computer vision includes several phases: perception, recognition, interpretation and actuation. Modern computer vision can distinguish different objects in the image and classify them according to some input parameters. Nowadays it can even find people in the picture and analyze their emotions. Based on these results, it can find out whether a person is happy, sad, or angry. Although neural networks bring incredible results, the combination with other artificial intelligence techniques (SVM, decision trees) can do much more. Regardless of the name (computer vision), we still have little in common between human vision and computer vision. AI developers use parameters based on geometric principles that apply this technology to differ one image from another. However, we need some perceptual means that will transfer the external properties of an object to the “mind” of the neural network. Such a unit is called perceptron, and it is corresponding to the human brain cell - neuron. Similar to the neuron, the perceptron has input data and output data elements. This allows the construction of a network and transmission of information from one neuron to another neuron, therefore analyzing relevant data. The same pattern exists in our brains, continually generating new thoughts, emotions, and impressions.
This book edition covers different aspects of the computer vision and image recognition. Section 1 focuses on methods and approaches in machine vision, describing behavior fusion for visually-guided service robots, the approaches and limitations of machine vision, dynamic omnidirectional vision localization using a beacon tracker, QoE assessment of will transmission using vision. Section 2 focuses on machine vision applications in production/manufacturing processes, describing automatic assembling system for sealing rings based on machine vision, precise and robust large-shape formation using uncalibrated vision for a virtual mold, workpiece sorting system based on machine vision mechanism, omni-directional tilt sensor based on machine vision, a machine vision based modeling and positioning system, and omputer vision technology on raising sow and procreating of processing. Section 3 focuses on medical applications of machine vision, describing sliding window based machine learning system for the left ventricle localization in MR cardiac images, machine-vision-based analysis of wireless capsule endoscopy video, cognitive machine-learning-based concept for artificial awareness. Section 4 focuses on robotics and movements applications, describing catadioptric omni-directional stereo vision and its applications in moving objects detection, person following robot with vision-based and sensor fusion, 3D autonomous navigation line extraction for field roads, visual feedback balance control of a robot manipulator, and enhancement of process capability for the vision-guided robot.
xxiv
SECTION I: METHODS AND APPROACHES IN MACHINE VISION
Chapter 1
Behavior Fusion for Visually-Guided Service Robots
Mohamed Abdellatif Ain Shams University, Faculty of Engineering Egypt
INTRODUCTION Mobile service robots are the class of robots that have tools to understand the environments at home and office. The development of mobile robots is increasing world-wide due to the availability of moderate price sensing and computing devices. Moreover, there is a strong belief that the market for service robots is just about to undergo a radical increase in the next few years.
Citation: Mohamed Abdellatif (November 1st 2008). Behavior Fusion for Visually-Guided Service Robots, Computer Vision, Xiong Zhihui, IntechOpen, DOI: 10.5772/6165. Copyright: © 2008 by authors and Intech. This paper is an open access article distributed under a Creative Commons Attribution 3.0 License
4
Machine Vision and Image Recognition
Despite the huge literature of the mobile robot navigation, the development of intelligent robots able to navigate in unknown and dynamic environment is still a challenging task (Walther et al., 2003). Therefore, developing techniques for robust navigation of mobile robots is both important and needed. The classical approach for mobile robot control used the “Model, Sense, Plan, Act”, MSPA serial strategy, which proved to be inherently slow and totally fails if one module is out of order. We may call this approach as a planner-based control approach. The appearance of behavior-based navigation approach (Arkin, 1998; Brooks, 1986) was a remarkable evolution, in which the reactive behaviors were designed to run simultaneously in parallel giving tight interaction between sensors and actuators. The reactive behaviors allow for incremental improvements and addition of more applicationspecific behaviors. Building several behaviors, each concerned with a sole objective, will produce different decisions for the robot control parameters, and they have to be combined in some way to reach the final motion decision. The fusion of independent behaviors is not an easy task and several approaches were proposed in the literature to solve this problem (Arkin, 1998; Borenstein & Koren, 1991; Carreras et al., 2001; Saffiotti, 1997). Coordination of behaviors can be classified into two further approaches, a competitive, as was originally proposed by (Brooks, 1986), and cooperative strategies (Carreras et al., 2001). Depending on the environment, the competitive approach may fail and become unstable in critical situations demanding higher switching frequencies between behaviors. In the subsumption architecture (Brooks, 1986) behaviors are activated once at a time but this may be inadequate for a variety of situations requiring several behaviors to be active at the same time. In the cooperative approach, all behaviors contribute to the output, rather than a single behavior dominates after passing an objective criterion. An example of the cooperative approach is proposed by (Khatib, 1985) using artificial potential fields to fuse control decisions from several behaviors. The potential field method suffers from being amenable to local minima which causes the control system to get stuck and become indecisive. Hybrid techniques from competitive and cooperative approaches were proposed in (Carreras et al., 2001). However, they used learning to build up the rule set which consumes a lot of time and effort. The use of fuzzy logic for behavior fusion had been reported in (Saffiotti, 1997) where a hierarchy of behaviors was used for mobile robot guidance. Fuzzy logic approach, since
Behavior Fusion for Visually-Guided Service Robots
5
its inception (Zadeh, 1965), have long been applied to robotics with many successful applications,(Luo et al., 2001; Saffiotti, 1997 ; Zimmermann, 1996) and regarded as an intelligent computational technique that enables the proper handling of sensor uncertainties. Fuzzy rules can be used to design the individual behaviors as well as the way they are integrated to reach a final decision (Arkin, 1998; Luo et al., 2001). In this work, we propose a new method to integrate the behavior decisions by using potential field theory (Khatib, 1985) with fuzzy logic variables. The potential field theory proved to be very efficient especially for fast robots (Borenstein & Koren, 1991). The theory relies on the physical concept of force vector summation. Forces are virtual and describe the attractions and disattraction in the robot field. The potential field theory had been criticized for being susceptible to local minima and consequently unstable motion. We show that when the vector field is applied to the output from a single behavior, which is smooth due to the use of fuzzy logic, it can significantly enhance the performance of the robot navigation system. The control system is implemented and used to navigate a small indoor service robot so that it can track and follow an object target in an indoor flat terrain. The chapter is arranged as follows, the next section presents the model of imaging and measurements of target location from the color image. In Section 3, we describe the design of fuzzy logic controller responsible for target tracking behavior, obstacle avoidance behavior and combining both behaviors. The results of robot control experiments for the behaviors are presented in Section 4. Conclusions are finally given in Section 5.
MEASUREMENT MODEL The RGB color space is the most popular color system since it is directly related to the acquisition hardware, but the RGB space is not perceptually uniform. The Hue, Saturation and Intensity, HSI color space is preferred when humans are involved, since it is perceptually uniform. The cylindrical representation of the HSI system is shown in Fig.1. Perceptually Uniform, PU, color spaces are more suitable for color recognition than the RGB space, since the quality of recognition will always be judged by a human observer (Cheng & Sun, 2000 ; Kim & Park, 1996 ; Littmann & Ritter, 1997; Tseng & Chang, 1992).
6
Machine Vision and Image Recognition
Figure 1: The Hue-Saturation-Intensity ( HSI ) color space by cylindrical representation.
The PU color space of HSI has the advantage that the object color is encoded mainly in the angle of the hue. This angle representation of color is easier in target color definition and less sensitive to changes of illumination intensity, but certainly changes when the illumination color is changed. Therefore, we can compute the Hue, H and Saturation, S using the following formulae (Kim & Park, 1996): (1) (2) (3) The target object color is defined in terms of limiting hue angles and limiting saturation values describing the boundaries of a color zone in the H-S diagram that can be described by the following constraints: (4) Where subscript min refers to the minimum limit, and max refers to the maximum limit. The target is detected in the image by this selection criterion based on whether the pixel color lies within the boundaries of the H-S zone, known apriori for the target. The segmented image is written into a monochromatic image, in which the target area color is written as white pixels and the background is written as dark pixels. The resulting binary image is then used to compute the area in pixels of the target area by counting the white pixels. This inherently uses the assumption that pixels are clustered in one group and that scattered pixels are a small portion in the image. The average horizontal coordinate of
Behavior Fusion for Visually-Guided Service Robots
7
the target region is also computed and forwarded as input to the controller, as shown schematically in Fig. 2.
Figure 2: Schematic representation of target measurement in the gray image showing extracted target region.
DESIGN OF CONTROLLER The goal of the controller is to enable the mobile robot to satisfy two objectives namely: target following and obstacle avoidance simultaneously. The objectives are implemented in separate behaviors which run independently in parallel and their output should be combined into a single command as shown in Fig.3. In this section, we describe the design of each behavior and then show how to combine their decisions.
Figure 3: Schematic of Robot Behaviors.
Design of Target Following Behavior The output of this behavior will decide the steering angle of the robot needed to make the target image appears continually in the middle of the image. The sensory information available for the steering command is the average horizontal target position in the image, shown in Fig.2. The horizontal
8
Machine Vision and Image Recognition
component of motion is only selected since the robot and target are both assumed to move on an indoor flat terrain and the camera orientation relative to the floor is fixed. The steering changes the target image position and hence, the motion attributes chosen as the input fuzzy linguistic inference layers for the FLC are selected to be: • Target image horizontal displacement • Target image horizontal velocity. The membership functions for these two layers are shown in Fig.4. The fuzzy logic controller used to control the mobile robot employs triangular membership functions to fuzzify the data measured by the vision system. The input fuzzy variables are divided into three overlapping fuzzy set functions. In our implementation, the linguistic descriptors for the image horizontal displacement are defined as : 1) Left (L) , 2) Middle (M), and 3) Right (R), as shown in Fig.4. a. The target image horizontal velocity is described by three fuzzy variables defined as : 1) Getting Left (GL), 2) Getting Middle (GM), and 3) Getting Right (GR), as shown in Fig.4.b. The shape and relative overlap of the fuzzy variables (that is tuning), are determined based on the experience gained from experiments with the robot. The shape of the membership function had been decided after studying the sensitivity of the mean of each membership function on the robot performance. The mean was changed across 10 % of its shown value and had been found to be stable over this range. The two fuzzy variables are then used to derive the output steering state. Three output states are used for steering namely, 1) Steer Right, SR, 2) go STRaight, STR and 3) Steer Left, SL, as shown in Fig.4.c. For each fuzzy linguistic interference process we define 3*3 fuzzy rule matrix as shown in Table 1.
Behavior Fusion for Visually-Guided Service Robots
9
Figure 4: Membership functions for the input variables of the steering FLC. Table 1: The Fuzzy Rule Matrix for the Target following FLC. ( The columns show states for the target horizontal velocity, while the rows show states of target horizontal displacement)
The motion decision for the tracking behavior is calculated through the fusion of the image displacement and image velocity in the fuzzy logic inference matrix. The values of matrix entry is calculated by finding the minimum of the two input variables. The three output variables are then computed using the root of sum squared of contributing variables. Finally, the normalized control command are defuzzified according to the center of gravity method.
10
Machine Vision and Image Recognition
Design of Obstacle Avoidance Behavior In the obstacle avoidance behavior, the reading of two ultrasonic range sensors are used as input variables for the FLC, while the output variable is the steering angle. The flow chart of obstacle avoidance algorithm is shown in Fig.5, where the sensor reading are first read. Notations S1 and S2 denote signal of obstacle distance measured by left sensor and right sensor respectively. The sensor readings are then fuzzified (transformed into fuzzy linguistic variables) into three variables, namely, Near, N , Medium, M and Far, F. The steering angle has three membership functions, Steer Left, SL, Steer Right, SR and STRaight, STR. Table 2, shows a list of the possible sensor states and the corresponding motion decision for avoiding the obstacle. Table 2: The Fuzzy Rule Matrix for the Obstacle Avoidance FLC. ( The columns show states for the right sensor, while the rows show states for the left sensor)
It should be noted that based on the limit of the obstacle distance corresponding to the “N” linguistic variable, the robot may escape the obstacle by steering or may have to stop, in case the distance is very small to enable maneuvering without crash. The minimum distance in our case is 40 cm, and the robot speed is moderate, therefore we rarely get in the situation that the robot should stop unless it is besieged completely. Then using the center of gravity methods, the steering angle is computed based on the decisions computed from Table 2 and made as a stand alone output signal that will be handled by the fusion algorithm.
Behavior Fusion for Visually-Guided Service Robots
11
Figure 5: Flow Chart of the Obstacle Avoidance Behavior.
Fusion of Behaviors We have two decisions for the steering angle computed from the two behavior implementations as shown before in Fig. 3. The decision is fused through using the potential field method by vector summation of the two vectors resulting from each behavior. The velocity vector from goal seeking behavior has a velocity amplitude maximum when steering is straight and decreases according to a linear model when the steering is off-axis. Then, using vector mechanics, the combined Euclidean magnitude and direction are computed and used to steer the vehicle. This method differs from the main potential field theory in the way the input vectors are generated, in our case it is generated through the fuzzy logic in each separate behavior in contrary to the direct construction of such vector in the main theory by linear scaling functions (Khatib, 1985; Saffiotti, 1997).
12
Machine Vision and Image Recognition
EXPERIMENTS System Configuration The robot had been constructed to have four wheels to move easily on flat terrain as shown in Fig.6. The two side wheels are driven by two independent servo motors, while the front and rear wheels are castor wheels and provided only to improve the mechanical stability of the robot. The robot consists of three layers of strong acrylic sheets supported by four long pillars. The lower level contains microcontroller circuit for controlling low level motor motion and reading of ultrasonic sensors and encoder readings. The second level carries the foursight vision processor and screen for displaying camera image, while the third carries the two cameras and the main microprocessor. The robot is equipped with 16-ultrasonic sensors to enable perception of its environment. The resolution of the sensor measurement is around 1 cm. The robot has two color video cameras installed onboard. The cameras provide the image that is used by the target following behavior. The main microprocessor receives data from the motion control system and the vision module. Inputs from both the cameras are fed into the Matrox Foursight module to process the image as a dedicated vision processor. The images received from the cameras are digitized via a Meteor II frame grabber and stored in the memory of the Foursight computer for online processing by specially designed software. We implemented algorithms that grab, calibrate the color image to eliminate the camera offset. The target color is identified to the system through measurement of it Hue-Saturation zone. The color attributes of the target are stored in the program for later comparison. Then the motion attributes of the target extracted area are computed and passed to main microprocessor where the data is needed for the FLC module. The movement of the vehicle is determined by the main microprocessor with inputs from different components. All programs are implemented in C++ code and several video and data processing libraries are used, including Matrox Imaging Library, MIL and OpenCV.
Behavior Fusion for Visually-Guided Service Robots
13
Figure 6: Photograph of the mobile service robot.
The robot sensors and actuators communicate with the host computer via wired connections. The DC motor is controlled through a motor interface card utilizing the popular H-Bridge circuit with a high torque DC motor, of 8 kg.cm nominal torque at rated voltage of 12 Volts. Test programs were devised to ensure the right operation of the measurement and control system and to identify the resolution of measurements and control signal. The robot main specifications are summarized in Table 3. Table 3: Specification of the mobile robot
RESULTS An experimental program was conducted to explore the effectiveness of the control system in guiding the robot through the indoor environment according to desired behaviors.
14
Machine Vision and Image Recognition
Experiments were done for separate behaviors and then for the combined behavior. A sample result showing the extraction of target area is shown in Fig. 7. The left image shows original captured image and the extracted target area is shown in the right image.
Figure 7: The segmented image showing the detected target area.
A computer program was devised to construct the Hue-Saturation, H-S histogram shown in Fig 8. The advantage of this representation is that it enables better extraction of the target when it had been well identified apriori. We show the histogram for a sample target, which is a red object in this particular case. The hue range is from 0 to 360 degrees and the saturation ranges from 0 to 255.
Figure 8: The Hue-Saturation diagram showing regions of darker intensity as those corresponding to higher voting of the target object pixels.
Behavior Fusion for Visually-Guided Service Robots
15
It is worth noting that the hue angle is repeated and hence 0 degree vertical line coincides with the 360 degree vertical line, therefore the region shown can be described in limited bounds. The dark regions in the graph corresponds to a high number of pixels in the target area having the same H-S point. This defines the color zone mentioned earlier in this paper and the bounds are extracted from this figure. It is worth noting that the input image contains the target in several views and distances so that it almost encapsulates all the possible color reflections of the object in all views. For the target following experiments, the robot is adjusted at first to view the target inside the color image. The robot starts to move as shown in the robot track, Fig. 9 and keeps moving forward.
Figure 9: The real track of the robot while following the colored target.
16
Machine Vision and Image Recognition
Figure 10: The real track of the robot while following the colored target.
During the robot motion, the target continuously approach the image center and consequently the target area increases in the extracted target image. The robot followed the target even when it is moving in curved routes, as long as the target is visible in the robot camera and the target speed is comparable to robot speed. An experiment for the obstacle avoidance behavior is shown in Fig 10. The dotted line shows the robot path when working with obstacle avoidance only. The robot evades the obstacles and move towards free areas based on the sequence of obstacles faced. The robot stops when the target area in the image exceeds a certain empirical threshold so that the robot stops at about 25 cm in front of the target, or the sensors detect an obstacle less than 30 cm close to it. The target following behavior is then integrated with the output from obstacle avoidance behavior using vector summation principle. The heading angle is then executed by the differential wheels. An experiment showing the combined effect of both behaviors is shown also in Fig 10. The solid line shows the robot track when both behaviors are combined, the robot evades the right target but soon recovers and steer right toward the target.
Behavior Fusion for Visually-Guided Service Robots
17
CONCLUSION We have implemented a control system that enables a mobile service robot to track and follow a moving target while avoiding obstacles. The system was experimentally validated using a real robot equipped with CCD cameras and ultrasonic range sensors. The algorithms for color image processing and extraction of the target image and measurement of target features had been developed. Fuzzy logic controllers had been designed to produce two concurrent behaviors of target following and obstacle avoidance and for combining the results of two behaviors into one set of commands for robot control. The control system succeeded in guiding the robot reliably in both tracking of the target and following it while keeping a reasonable distance between them that ensures the visibility of the target in the camera view. Fuzzy control provided smooth and reliable navigation that circumvents the inherent uncertainities and noise in the sensing process, as well as the smooth blending of behaviors. Future directions of research include the use of more input information such as that from a human interface or an external planner. The goal is to create an autonomous service robot that will be able to navigate based on information from combined information from visual inputs, sonars and outdoor GPS data that will guide the vehicle in remote target points and have a user-friendly interface.
18
Machine Vision and Image Recognition
REFERENCES 1.
Arkin, R.C. (1998). Behavior Based Robotics, MIT Press, Cambridge Massachusetts. 2. Borenstein, J. & Koren, Y. (1991). The Vector Field Histogram A Fast Obstacle-Avoidance for Mobile Robots. IEEE Journal of Robotics and Automation, Vol. 7, No. 3., (1991), pp. 278-288. 3. Brooks, R.A. (1986). A Robust Layered Control System for a Mobile Robot. IEEE Journal of Robotics and Automation, Vol.2, No., (1986), pp 14-23. 4. Carreras, M.; Batlle, J. & Ridao, P. (2001). Hybrid Coordination of Reinforcement Learningbased Behaviors for AUV control, Proceeding of IEEE/RSJIROS, Vol.3, pp:1410-1415, Maui, HI, USA. 5. Cheng, H.D. & Sun, Y. (2000). A Hierarchical Approach to Color Image Segmentation Using Homogeneity. In IEEE Transactions on Image Processing, Vol. 9, No. 12, (2000), pp:2071-2082. 6. Khatib, O. (1985). Real-Time Obstacle Avoidance for Manipulators and Mobile Robots, Proceeding of IEEE International Conference on Robotics and Automation, pp. 500-505. 7. Kim,W. & Park, R. (1996). Color Image Palette Construction Based on the HSI Color System for minimizing the reconstruction error, In Proceeding of IEEE International Conference on Image Processing, pp: 1041-1044. 8. Littmann, E. & Ritter, H. (1997). Adaptive color segmentation—a comparison of neural and statistical methods. IEEE Transactions on Neural Networks, Vol. 8, No. 1, pp:175-185. 9. Luo, R.C. ; Chen, T.M. & Su, K.L. (2001). Target tracking using hierarchical grey fuzzy decision making method, IEEE Transactions on Systems, Man and Cybernetics, Part A Vol. 31 No.3, pp:179-186. 10. Saffiotti, A. (1997). The uses of fuzzy logic in autonomous robot navigation: a catalogue raisonne, Technical Report TR/IRIDIA/97-6, available from http://iridia.ulb.ac.be, Accessed: 2006-10-10. 11. Sei, I.; Tomokazu, S.; Koichiro Y. & Naokazu, Y. (2007). Construction of Feature Landmark Database Using Omnidirectional Videos and GPS Positions, Proceeding of the Sixth International Conference on 3-D Digital Imaging and Modeling, pp: 249–256.
Behavior Fusion for Visually-Guided Service Robots
19
12. Veera Ragavan, S. & Ganapathy V. (2007). A Unified Framework for a Robust Conflict-Free Robot Navigation, International Journal of Intelligent Technology, Vol.2, No.1, pp:88- 94. 13. Walther, M.; Steinhaus, P. & Dillmann, R. (2003). A robot navigation approach based on 3D data fusion and real time path planning, Proceeding IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp:45–50, Germany, July2003, Karlsruhe. 14. Zadeh, L.A. (1965). Fuzzy sets Information and Control, 8, pp.338353, 1965. 15. Zimmermann, H.J. (1996). Fuzzy set theory—and its applications (Third edition), Kluwer Academic Publishers, Norwell, MA, 1996.
Chapter 2
Dynamic Omnidirectional Vision Localization Using a Beacon Tracker Based on Particle Filter
Zuoliang Cao, Xianqiu Meng and Shiyu Liu Tianjin University of Technology P.R. China
INTRODUCTION Autonomous navigation is of primary importance in applications involving the usage of Autonomous Guided Vehicles (AGVs). Visionbased navigation systems provide an interesting option for both indoor and outdoor navigation as they can also be used in environments without an external supporting infrastructure for the navigation, which is unlike GPS,
Citation: Zuoliang Cao, Xianqiu Meng and Shiyu Liu (November 1st 2008). Dynamic Omnidirectional Vision Localization Using a Beacon Tracker Based on Particle Filter, Computer Vision, Xiong Zhihui, IntechOpen, DOI: 10.5772/6166. Copyright: © 2008 by authors and Intech. This paper is an open access article distributed under a Creative Commons Attribution 3.0 License
22
Machine Vision and Image Recognition
for example. However, the environment has to contain some natural or artificial features that can be observed with the vision system, and these features have to have some relationship to spatial locations in the navigation environment (Cao, 2001). The omnidirectional camera system produces a spherical field of view of an environment. This is particularly useful in vision-based navigation systems as all the images, provided by the camera system, contain the same information, independent of the rotation of the robot in the direction of the optical axis of the camera. This makes the computed image features more suitable for localization and navigation purposes (Hrabar & Sukhatme, 2003; Hampton et al., 2004). The methods proposed have been developed for vision-based navigation of Autonomous Ground Vehicles which utilize an omni-directional camera system as the vision sensor. The complete vision-based navigation system has also been implemented, including the omni-directional color camera system, image processing algorithms, and the navigation algorithms. The actual navigation system, including the camera system and the algorithms, has been developed. The aim is to provide a robust platform that can be utilized both in indoor and outdoor AGV applications (Cauchois et al., 2005; Sun et al., 2004). The fisheye lens is one of the most efficient ways to establish an omnidirectional vision system. The structure of the fisheye lens is relatively dense and well-knit unlike the structure of reflector lenses which consist of two parts and are fragile. (Li et al., 2006; Ying et al., 2006). Omnidirectional vision (omni-vision) holds promise of various applications. We use a fisheye lens upwards with the view angle of 185° to build the omni-directional vision system. Although fisheye lens takes the advantage of an extremely wide angle of view, there is an inherent distortion in the fisheye image which must be rectified to recover the original image. An approach for geometric restoration of omni-vision images has to be considered since an inherent distortion exists. The mapping between image coordinates and physical space parameters of the targets can be obtained by means of the imaging principle on the fisheye lens. Firstly a method for calibrating the omni-vision system is proposed. The method relies on the utilities of a cylinder on which inner wall including several straight lines to calibrate the center, radius and gradient of a fisheye lens. Then we can make use of these calibration parameters for the correction of distortions. Several imaging rules are conceived for fisheye lenses. The regulations are discussed respectively and the distortion correction models are generated. An integral distortion correction approach based on these models is developed. A support vector machine (SVM) is introduced to regress the intersection points in order to
Dynamic Omnidirectional Vision Localization Using a Beacon ....
23
get the mapping between the fisheye image coordinate and the real world coordinate. The advantage of using the SVM is that the projection model of fisheye lens which needs to be acquired from the manufacturer can be ignored. Omni-directional vision navigation for autonomous guided vehicles (AGVs) appears definite significant since its advantage of panoramic sight with a single compact visual scene. This unique guidance technique involves target recognition, vision tracking, object positioning, path programming. An algorithm for omni-vision based global localization which utilizes two overhead features as beacon pattern is proposed. The localization of the robot can be achieved by geometric computation on real-time processing. Dynamic localization employs a beacon tracker to follow the landmarks in real time during the arbitrary movement of the vehicle. The coordinate transformation is devised for path programming based on time sequence images analysis. The beacon recognition and tracking are a key procedure for an omni-vision guided mobile unit. The conventional image processing such as shape decomposition, description, matching and other usually employed technique are not directly applicable in omni-vision. Vision tracking based on various advanced algorithms has been developed. Particle filter-based methods provide a promising approach to vision-based navigation as it is computationally efficient, and it can be used to combine information from various sensors and sensor features. A beacon tracking-based method for robot localization has already been investigated at the Tianjin University of Technology, China. The method utilizes the color histogram, provided by a standard color camera system, in finding the spatial location of a robot with highest probability (Musso & Oudjane, 2000; Menegatti et al., 2006). Particle filter (PF) has been shown to be successful for several nonlinear estimation problems. A beacon tracker based on Particle Filter which offers a probabilistic framework for dynamic state estimation in visual tracking has been developed. We independently use two Particle Filters to track double landmarks but a composite algorithm on multiple objects tracking conducts for vehicle localization. To deal with the mass operations of vision tracking, a processor with the ability of effective computation and low energy cost is required. The Digital Signal Processor fits our demands, which is well known for powerful operation capability and parallel operation of instruction (Qi et al., 2005). It has been widely used in complicated algorithm calculation such as video/imaging processing, audio signal analysis and intelligent control. However, there are few cases that DSP is applied in image tracking as the central process unit. In our AGV platform, DSP has been implemented as a
24
Machine Vision and Image Recognition
compatible on-board imaging tracker to execute the Particle Filter algorithm. An integrated autonomous vehicle navigator based on the configuration with Digital Signal Processor (DSP) and Filed-programmable Gate Array (FPGA) has been implemented. The tracking and localization functions have been demonstrated on experimental platform.
CALIBRATION FOR FISHEYE LENS CAMERA According to the fisheye imaging characteristics (Wang, 2006), the rectification of the fisheye image consists of two main phases. First the center of fisheye lens need to be calibrated. Second, establish the mapping between the physical space coordinate and fisheye images coordinate. The approach for geometric restoration of omni-vision images has been considered in some papers since the fisheye lens was used (Cao et al., 2007). Some parameters are primarily important in the geometric restoration, such as the center and focal length of the fisheye lens. The calibration by using distortion models has been discussed in recent papers (Wang et al., 2006; Li et al., 2006; Brauer-Burchardt. & Voss., 2001). The calibration parameters can be retrieved by the method of least square and mathematic models. The previous approach utilizes grids, which are drawn on a plan surface. The grids will distort after grabbed by the fisheye lens camera (Hartley & Kang, 2007). Here, another method for calibrating the center of omni-vision images is proposed. If a straight line in physical space is parallel to the optical axis direction of the fisheye lens, the line will not distort in the fisheye image. Therefore, a cylinder model is proposed in this article. To construct the cylinder model, straight lines are drawn on the inner side of the cylinder, whose axis is parallel to the optical axis of the fisheye camera. Then enclose the camera lens with this cylinder. The image captured with fisheye camera using cylinder model is shown in Fig. 1. The intersection of all the lines is the fisheye lens center. To get the conversion relationship between the physical space coordinate and fisheye images coordinate, the following method is utilized. The lower vertex of the vertical strip which lies in the middle of the image is on the center of the fisheye optical projection that is the origin of the fisheye coordinate system as shown in Fig. 2.
Dynamic Omnidirectional Vision Localization Using a Beacon ....
25
Figure 1: Radial straight lines in fisheye lens image under cylinder model.
The horizontal strips have the same intervals and the intersection points of the vertical and horizontal strips have the equal radial distance between them in physical space. As a result of fisheye distortion, the distance between two consecutive intersection points are not equal in the image. But the corresponding coordinates of intersection points in the fisheye image is achieved.
Figure 2: Calibration for omnidirectional vision system.
26
Machine Vision and Image Recognition
Then we use a support vector machine (SVM) to regress the intersection points in order to get the mapping between the fisheye image coordinate and the undistorted image coordinate. The advantage of using the SVM is that the projection model of fisheye lens which needs to be acquired from the manufacturer can be ignored.
RECTIFICATION FOR FISHEYE LENS DISTORTION Fisheye Lens Rectification Principle The imaging principle of fisheye lens is different from that of a conventional camera. The inherent distortion of the fisheye lens is induced when a 2π steradian hemisphere is projected onto a plane circle. Lens distortion can be expressed as (Wang et al., 2006):
(1) Where u and v refer to the unobservable distortion-free image coordinates; ud and vd are the corresponding image with distortion; are distortion in u and v direction. Fisheye lens distortion can be classified into three types: radial distortion, decentering distortion and thin prism distortion. The first just arise the radial deviation. The other two produce not only the radial deviation but decentering deviation. Generally, radial distortion is considered to be predominant, which is mainly caused by the nonlinear change in radial curvature. As a pixel of the image move along projection, the further it gets from the center of the lens, the larger the deformation is. Owing to the different structure of lens, there are two types of deformation; one is that the proportion becomes greater while the range between the points and the center of radial distortion becomes bigger. The other is contrary. The mathematical model is as follow (Wang et al., 2006):
(2)
Dynamic Omnidirectional Vision Localization Using a Beacon ....
27
Where k1 , k2 , k3 are radial distortion coefficients; r is the distance from point (u, v) to the center of radial distortion. The first term is predominant, and the second and third terms are usually negligible, so the radial distortion formula can usually be reduced as (Wang et al., 2006): (3) Here, we just consider radial distortion, others are neglected. Let (,) u v be the measurable coordinates of the distorted image points, (x,y) be the coordinates of the undistorted image points, and the function f be the conversion relationship, which can be expressed as:
(4) Thus, the relationship between the fisheye image coordinate and physical world image coordinate is obtained.
Fisheye Lens Image Rectification Algorithm In the conventional method, the approach to get parameters of distortion is complicated and the calculation is too intensive. Support Vector Machines (SVM) is statistical machine learning methods which perform well at density estimation, regression and classification (Zhang et al., 2005). It suits for small size example set. It finds a global minimum of the actual risk upper bound using structural risk minimization and avoids complex calculation in high dimension space by kernel function. SVM map the input data into a highdimensional feature space and finds an optimal separating hyper plane to maximize the margin between two classes in this high-dimensional space. Maximizing the margin is a quadratic programming problem and can be solved by using some optimization algorithms (Wang et al., 2005). The goal of SVM is to produce model predicts the relationship between data in the testing set. To reduce the computation complexity, we employ SVM to train a mapping from fisheye image coordinate to the undistorted image coordinate. SVM trains an optimal mapping between input date and output data, based on which the fisheye lens image can be accurately corrected. In order to rectify fisheye image we have to get radial distortion on all of distorted image
28
Machine Vision and Image Recognition
points. Based on the conversion model and the great ability of regression of SVM, we select a larger number of distorted image points (u, v) and input them to SVM. SVM can calculate the radial distortion distance and regress (u, v) to ( x, y) (the undistorted image point); so that the mapping between the distortional images point and the undistorted image point can be obtained. The whole process of fisheye image restoration is shown in Fig. 3.
Figure 3: Flow chart of fisheye image restoration algorithm.
A number of experiments for fisheye lens image rectification have been implemented. By comparison, the results verify the feasibility and validity of the algorithm. The results are shown in Fig. 4.
Dynamic Omnidirectional Vision Localization Using a Beacon ....
29
Figure 4: A fisheye image (above) and the corrected result of a fisheye image (below).
OMNI-VISION TRACKING AND LOCALIZATION BASED ON PARTICLE FILTER Beacon Recognition Selecting landmark is vital to the mobile robot localization and the navigation. However, the natural sign is usually not stable and subject to many external influences, we intend to use indoor sign as the landmark. According to the localization algorithm, at least two color landmarks are requested which are projected on the edge of the AGV moving area. We can easily change the size, color and the position of the landmarks. The height of two landmarks and the distance between them are measured as the known parameters. At the beginning of tracking, the tracker has to determine the landmark at first. In our experiment, we use Hough algorithm to recognize the landmark at the first frame as the prior probability value. The Hough transform has been widely used to detect patterns, especially those well parameterized patterns such as lines, circles, and ellipses (Guo et al., 2006). Here we utilize DSP processor which has high speed than PC to perform the Circular Hough Transform. The pattern recognition by using CHT (Circular Hough Transform) is shown in Fig. 5.
30
Machine Vision and Image Recognition
Figure 5: A circle object (above) and the result of Circular Hough Transform (below).
Tracking based on Particle Filter After obtain the initialization value, the two particle filters will track the landmark continuously. Particle filtering is a Monte Carlo sampling approach to Bayesian filtering. The main idea of the particle filter is that the posterior density is approximated by a set of discrete samples with associated weights. These discrete samples are called particles which describe possible instantiations of the state of the system. As a consequence, the distribution over the location of the tracking object is represented by the multiple discrete particles (Cho et al., 2006).
Dynamic Omnidirectional Vision Localization Using a Beacon ....
31
In the Bayes filtering, the posterior distribution is iteratively updated over the current state Xt , given all observations up to time t, as follows: (5) Where
expresses the observation model which specifies the
is the likelihood of an object being in a specific state and transition model which specifies how objects move between frames. In a is approximated recursively as a particle filter, prior distribution set of N-weighted samples, which is the weight of a particle. Based on the Monte Carlo approximation of the integral, we can get: (6) The principal steps in the particle filter algorithm include: STEP 1 Initialization Generate particle set from the initial distribution , and set k = 1 .
to obtain
STEP 2 Propagation For i N =1,..., , Sample
according to the transition mode
STEP 3 Weighting Evaluate the importance likelihood
(7) STEP 4 Normalize the weights (8) Output a set of particles posterior distribution as
that can be used to approximate the
(9) Where δ (g) is the Dirac delta function.
32
Machine Vision and Image Recognition
STEP 5 Resample particles
with probability
pendent and identically distributed random particles
to obtain N indeapproximately
distributed according to . STEP 6 Set k = k +1, and return to STEP 2.
Omni-vision based AGV Localization In this section, we will discuss how to localize the AGV utilizing the space and image information of landmarks. As it is shown in Fig. 6, two color beacons which are fixed on the edge of the AGV moving area as landmarks facilitate navigation. The AGV can localize itself employing the fisheye lens camera on top of it. The height of two landmarks and the distance between them are measured as the known parameters. When the AGV is being navigated two landmarks are tracked by two particle filters to get the landmarks positions in the image.
Figure 6: Physical space coordinates system for landmarks localization
Figure 7: Left figure shows that the relationship between incident angles and
Dynamic Omnidirectional Vision Localization Using a Beacon ....
33
radial distances of fisheye lens. Right figure illustrates the values of corresponding incident angles with different grey levels in the whole area of fisheye sphere image.
According to the Equal Distance Projection Regulation, the angle of view ω corresponds with the radial distance r between projection point and projection center. As shown in Fig. 7, the mapping between ω and r can be established. Based on this mapping, the image coordinate and space angle of the landmark are connected. Utilizing the depressions obtained from images and demarcated parameters of landmarks, the physical space position of the AGV is confirmed. We tag the landmarks as A and B. In order to set up the physical coordinate system, A is chosen as the origin. AB is set as axis X and the direction from A to B is the positive orientation of axis X. Axis Y is vertical to Axis X. According to the space geometry relations, we can get:
(10) where (x, y) is the physical space coordinate of lens, “ h1 ” and “ h2 ” are the height of two landmarks, “ d ” is the horizontal distance between two landmarks, “ v ” is the height from ground to lens, “θ1 ” and “θ2 ” are the depression angles from landmark A and B to lens. Here, y is nonnegative. Thus the moving path of AGV should keep on one side of the landmarks, which is half of the space.
NAVIGATION SYSTEM Algorithm Architecture of Navigator A dynamic omni-vision navigation technique for mobile robots is being developed. Navigation functions involve positional estimation and surrounding perception. Landmark guidance is a general method for vision navigation in structural environments. An improved beacon tracking and positioning approach based on a Particle Filter algorithm has been utilized. Some typical navigation algorithms have been already implemented such as the classic PID compensator, neural-fuzzy algorithm and so on. The multisensory information fusion technique has been integrated into the program. The hybrid software and hardware platform has been developed.
34
Machine Vision and Image Recognition
The algorithm architecture of the on-board navigator, as shown in Fig. 8, consists of the following phases: image collection, image pre-processing, landmark recognition, beacon tracking, vehicle localization and path guidance. The image distortion correction and recovery for omni-vision is a critical module in the procedures, which provides coordinate mapping for position and orientation.
Hardware Configuration of Navigator The design of the navigator for mobile robots depends on considering the integration of the algorithm and hardware. Real-time performance is directly influenced by the results of localization and navigation. Most image processing platforms use a PC and x86 CPUs. This presents some limitations for an on-board navigator for vehicles because of redundancy resources, energy consuming and room utility. This article presents a compatible embedded real-time image processor for AGVs by utilizing a Digital Signal Processor (DSP) and FieldProgrammable Gate Array (FPGA) for the image processing component. The hardware configuration of the navigator is shown in Fig. 9.
Figure 8: The algorithm architecture of the navigator.
Dynamic Omnidirectional Vision Localization Using a Beacon ....
35
Figure 9: Hardware configuration of the unique navigator.
The DSP facilitates Enhanced DMA (EDMA) to transfer data between the DSP and external Navigation Module efficiently. Pipeline and code optimization are also required to move to sharply increase the speed. An efficient FPGA preprocessing uses binarized images with a given threshold before starting processing and also provides some necessary trigger signal functions. With this coprocessor, it is possible to accelerate all navigator processes. The DSP and FPGA can cooperate with each other to solve the real-time performance problems; the flexible frame is reasonable and practical. The navigation module consists of an embedded platform, multi-sensors and an internet port. The embedded system is employed for a navigation platform, which consists of the following functions: vehicle localization, line following path error correction, obstacle avoidance through multi-sensory capability. There are three operation modes: remote control, Teach/Playback and autonomous. The internet port provides the wireless communication and human-computer interaction. The motor servo system is utilized for motion control. With the prototype we have obtained some satisfying experimental results.
EXPERIMENTAL RESULT The final system has been implemented by utilizing a real omni-directional vision AGV in an indoor environment which has been verified in terms of both the practicability and the feasibility of the design. The prototype experimental platform is shown in Fig. 10.
36
Machine Vision and Image Recognition
Figure 10: Experimental autonomous guided vehicle platform.
We perform the experiments twice to show the result. Two beacons with different colors are placed on the roof as landmarks. A color histogram was uses as the feature vector in particle filters. The experimental area we choose is about 30 square meters. The height of Landmarks A and B are 2.43m and 2.46m, respectively. The distance between them is 1.67m. The height of lens is 0.88m. At the initialization, the original positions of landmarks in the image are set for the tracker. The AGV guided by double color landmarks shown in Fig. 11. Driving path and orientation shown in Fig. 12. We can see the localization results are dispersed on the both sides of the moving path. The Fig. 12 demonstrates the results of AGV orientation corresponding to the positions in left figures from each localization cycle. The totally 16 fisheye images that were picked up are shown in Fig. 13 and Fig. 14. The numerical localization results are listed in the Table 1 and Table 2.
Figure 11: Localization of the experimental AGV platform.
Dynamic Omnidirectional Vision Localization Using a Beacon ....
37
Figure 12: Localization and orientation of autonomous vehicle in experiment 1 (above) and 2 (below) (the units are meter and degree (angle)).
Figure 13: Results of dynamic beacon tracking based on particle filters in experiment 1.
38
Machine Vision and Image Recognition
Table 1: Localization results of experiment 1(units are meter and degree (angle))
Figure 14: Results of dynamic beacon tracking based on particle filters in experiment 2. Table 2: Localization results of experiment 2(units are meter and degree (angle))
CONCLUSION We establish omni-directional vision system with fisheye lens and solve the problem of fisheye image distortion. A method for calibrating the omnivision system is proposed to generate the center of a fisheye lens image.
Dynamic Omnidirectional Vision Localization Using a Beacon ....
39
A novel fisheye image rectification algorithm based on SVM, which is different from the conventional method is introduced. Beacon recognition and tracking are key procedures for an omni-vision guided mobile unit. A Particle Filter (PF) has been shown to be successful for several nonlinear estimation problems. A beacon tracker based on a Particle Filter which offers a probabilistic framework for dynamic state estimation in visual tracking has been developed. Dynamic localization employs a beacon tracker to follow landmarks in real time during the arbitrary movement of the vehicle. The coordinate transformation is devised for path programming based on time sequence images analysis. Conventional image processing such as shape decomposition, description, matching, and other usually employed techniques are not directly applicable in omni-vision. We have implemented the tracking and localization system and demonstrated the relevance of the algorithm. The significance of the proposed research is the evaluation of a new calibration method, global navigation device and a dynamic omnidirectional vision navigation control module using a beacon tracker which is based on a particle filter through a probabilistic algorithm on statistical robotics. An on-board omni-vision navigator based on a compatible DSP configuration is powerful for autonomous vehicle guidance applications.
ACKNOWLEDGMENTS This article contains the results of research of the international science and technology collaboration project (2006DFA12410) (2007AA04Z229) supported by the Ministry of Science and Technology of the People’s Republic of China.
40
Machine Vision and Image Recognition
REFERENCES 1.
2.
3.
4.
5.
6.
7.
8.
9.
Brauer-Burchardt, C. & Voss, K. (2001). A new Algorithm to correct fisheye and strong wide angle lens distortion from single images, Proceedings of 2001 International Conference on Image processing, pp. 225-228, ISBN: 0-7803-6725-1, Thessaloniki Greece, October 2001 Cao, Z. L. (2001). Omni-vision based Autonomous Mobile Robotic Platform, Proceedings of SPIE Intelligent Robots and Computer Vision XX: Algorithms, Techniques, and Active Vision, Vol.4572, (2001), pp. 51-60, Newton USA Cao, Z. L.; Liu, S. Y. & Röning, J. (2007). Omni-directional Vision Localization Based on Particle Filter, Image and Graphics 2007, Fourth International Conference , pp. 478-483, Chengdu China, August 2007 Cauchois, C.; Chaumont, F.; Marhic, B.; Delahoche, L. & Delafosse, M. (2005). Robotic Assistance: an Automatic Wheelchair Tracking and Following Functionality by Omnidirectional Vision, Proceedings of the 2005 IEEE International Conference on Intelligent Robots and Systems, pp. 2560-2565, ISBN: 0-7803-8912-3, Las Vegas USA Cho, J. U.; Jin, S. H.; Pham, X. D.; Jeon, J. W.; Byun, J. E. & Kang, H. (2006). A Real-Time Object Tracking System Using a Particle Filter, Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2822-2827, ISBN: 1- 4244-0259X , Beijing China, October 2006 Guo, S. Y.; Zhang, X. F. & Zhang, F. (2006). Adaptive Randomized Hough Transform for Circle Detection using Moving Window, Proceedings of 2006 International Conference on Machine Learning and Cybernetics, pp. 3880-3885, ISBN: 1-4244-0061-9, Dalian Hampton, R. D.; Nash, D.; Barlow, D.; Powell, R.; Albert, B. & Young, S. (2004). An Autonomous Tracked Vehicle with Omnidirectional Sensing. Journal of Robotic Systems, Vol. 21, No. 8, (August 2004) (429-437), ISSN: 0741-2223 Hartley, R. & Kang, S. B. (2007). Parameter-Free Radial Distortion Correction with Center of Distortion Estimation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 8, (August 2007) (1309-1321), ISSN: 0162-8828 Hrabar, S. & Sukhatme, G. S. (2003). Omnidirectional Vision for an Autonomous Helicopter, Proceedings of the 2003 IEEE International
Dynamic Omnidirectional Vision Localization Using a Beacon ....
10.
11.
12.
13.
14.
15.
16.
17.
18. 19.
41
Conference: Robotics and Automation, Vol.1, pp. 558-563, Los Angeles USA, September 2003 Ishii, C.; Sudo, Y. & Hashimoto, H. (2003). An image conversion algorithm from fish eye image to perspective image for human eyes, Proceedings of the 2003 IEEE/ASME International Conference on Advanced Intelligent Mechatronics, pp. 1009-1014, ISBN: 0-78037759-1, Tokyo Japan Li, S. G. (2006). Full-View Spherical Image Camera, Pattern Recognition, 2006, ICPR 2006, 18th International Conference on Pattern Recognition, pp. 386-390 Menegatti, E.; Pretto, A. & PageIIo, E. (2006). Omnidirectional Vision Scan Matching for Robot Localization in Dynamic Environments, Robotics and Automation IEEE Transactions, Vol.22, No. 3, (June 2006) (523 - 535) Musso, C. & Oudjane, N. (2000). Recent Particle Filter Applied to Terrain Navigation, Proceedings of the Third International Conference on Information Fusion, Vol. 2, pp. 26- 33, ISBN: 2-7257-0000-0, Paris France Qi, C; Sun, F. X. & Huang, T. S. (2005). The real-time image processing based on DSP, Cellular Neural Networks and Their Applications, 2005 9th International Workshop, pp. 40–43, ISBN: 0-7803-9185-3 Sun, Y. J.; Cao, Q. X. & Chen, W. D. (2004). An Object Tracking and Global Localization Method Using Omnidirectional Vision System, Intelligent Control and Automation on 2004 Fifth Word Congress, Vol. 6, pp. 4730-4735, ISBN: 0-7803-8273-0, Harbin China Wang, L. P.; Liu, B. & Wan, C. R. (2005). Classification Using Support Vector Machines with Graded Resolution, Proceedings of 2005 IEEE International Conference on Granular Computing, Vol. 2, pp. 666 – 670, ISBN: 0-7803-9017-2, July 2005 Wang, J. H.; Shi, H. F.; Zhang, J. & Liu, Y. C. (2006). A New Calibration Model and Method of Camera Lens Distortion, Proceedings of the 2006IEEE/RSJ national Conference on Intelligent Robots and Systems, pp. 5317-5718, ISBN: 1-4244-0259-X, Beijing China, October , 2006 Wang, Y. Z. (2006). In: Fisheye Lens Optics, China Science Press, 2661, ISBN: 7-03-017143-8, Beijing China Ying, X. H. & Zha, H. B. (2006). Using Sphere Images for Calibrating Fisheye Cameras under the Unified Imaging Model of the Central
42
Machine Vision and Image Recognition
Catadioptric and Fisheye Cameras, ICPR 2006. 18th International Conference on Pattern Recognition, Vol.1, pp. 539–542, Hong Kong 20. Zhang, J. P.; Li, Z. W. & Yang, J. (2005). A Parallel SVM Training Algorithm on Large-scale Classification Problems, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Vol.3, pp. 1637–1641, Guangzhou China, August 2005
Chapter 3
QoE Assessment of Will Transmission Using Vision and Haptics in Networked Virtual Environment
Pingguo Huang1, Yutaka Ishibashi2 Department of Management Science, Tokyo University of Science, Tokyo, Japan
1
Department of Scientific and Engineering Simulation, Nagoya Institute of Technology, Nagoya, Japan
2
ABSTRACT In this paper, we handle collaborative work in which two users move an object together to eliminate a target in a 3-D virtual space. In the work, the users transmit their wills about movement direction of the object to each
Citation: Huang,P. and Ishibashi,Y.(2014) QoE Assessment of Will Transmission Using Vision and Haptics in Networked Virtual Environment. Int. J. Communications, Network and System Sciences, 7, 265-278.http://dx.doi.org/10.4236/ijcns.2014.78029. Copyright:© 2018 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0
44
Machine Vision and Image Recognition
other by only haptics and by haptics and vision (including with/ without drawing an arrow to indicate the direction of force applied to the object by the other user). We carry out QoE (Quality of Experience) assessment subjectively and objectively to investigate the influence of network delay on will transmission. As a result, we clarify the effects of vision on the transmissibility of haptic will transmission. Keywords: Networked Virtual Environment, Vision, Haptics, Will Transmission, Network Delay, QoE
INTRODUCTION In recent years, a number of researchers have been directing their attention to networked haptic environments. Since we can largely improve the efficiency of collaborative work and get high realistic sensations by using haptic sensation together with other sensations such as visual and auditory sensations [1] -[3] , haptic sensation is utilized in various fields such as medical, artistic, and educations fields [4] . In haptic collaborative work, users can feel the sense of force from each other interactively while doing the work. In the previous studies, effects of visual sensation on sensitivity of elasticity [5] and virtual manipulation assistance by haptics [6] have been investigated. Also, the influences of network delay on the operability of haptic interface devices have been investigated [7] -[12] . Especially in [5] , experimental results show that the accuracy of the sensitivity’s threshold of elasticity is improved by visual sensation. However, there are very few researches which focus on will (for example, object’s movement direction) transmission among users. In [7] through [12] , since the role of each user, master-slave relationship, and/or direction in which each user wants to move an object (i.e., user’s will) in collaborative work are determined in advance, it is not so important to transmit their wills to each other. In [13] , two users lift and move an object cooperatively while one of the two users is instructing the direction of the object’s movement by voice. Also, the efficiency between the work with voice is compared to that without voice, in which one of the two users is asked to follow the other’s movement. However, the role of each user, master-slave relationship, and/or movement direction of object are not always determined beforehand, and there exists many cases in which users stand on an equal footing. In these cases, it is necessary to transmit users’ wills to each other. Traditionally, wills can be transmitted by audio and video. However, it needs much more time
QoE Assessment of Will Transmission Using Vision and Haptics....
45
to transmit the wills, and it may be difficult to transmit the wills in delicate manipulation such as remote surgery training. In contrast, will transmission using haptic may reduce the transmission time, and it is possible to transmit wills in delicate manipulation work in which it is difficult to transmit wills only by audio and video. Therefore, it is very important to establish an efficient method to transmit wills by haptics for collaborative work, especially for delicate manipulation work. This paper focuses on will transmission using haptics. We investigate how to transmit wills by haptics, what type of forces are more suitable for will transmission, and the influences of vision on will transmission using haptics. Specifically, we deal with collaborative work in which users stand on an equal footing and object’s movement directions are not determined in advance. In the work, two users collaboratively lift and move an object, and each user tries to transmit his/her will about the object’s movement direction (i.e., user’s will) by haptics (for example, when the user wants to move the object to the right, he/she impresses a rightward force upon the object) to the other user. We deal with two types of will transmission methods in which the users transmit their wills to each other by only haptics and by haptics and vision (including with/without drawing an arrow to indicate the direction of force applied to the object by the other user). We carry out QoE (Quality of Experience) [14] assessment subjectively and objectively [15] to investigate the influences of network delay on will transmission. The remainder of this paper is organized as follows. Section 2 describes the collaborative work and the method of will transmission. Section 3 introduces our assessment system. Section 4 explains force calculation of the assessment system and assessment methods are explained in Section 5. Assessment results are presented in Section 6, and Section 7 concludes the paper.
WILL TRANSMISSION Collaborative Work In the collaborative work, we employ a client-server model [13] which consists of a server and two clients (clients 1 and 2). In a 3-D virtual space (height: 120 mm, width: 160 mm, depth: 70 mm) shown in Figure 1, two users manipulate haptic interface devices PHANToM Omnis [16] (called PHANToMs in this paper) which are connected to the two clients, and cooperatively lift and move an object (a rigid cube with a side of 20 mm
46
Machine Vision and Image Recognition
and a mass of 500 g. The cube does not tilt) by putting the object between the two cursors (denote positions of PHANToM styli’s tips in the virtual space) [13] . The static friction coefficient and dynamic friction coefficient between the cursors and object are set to 0.6 and 0.4, respectively. Since the gravitational acceleration in the virtual space is set to 2.0 m/s2†1, the object drops on the floor if it is not pushed from both sides strongly to some extent. Furthermore, at the location where is 35 mm from the back to the front of the virtual space, there exists a lattice which consists of 20 squares (25 mm on a side). The two users cooperatively lift and move the object along the lines of lattice to contain a target (a sphere with a radius of 20 mm) [13] . An intersection of the lattice in the i-th row (0 ≤ i ≤ 4) and the j-th column (0 ≤ j ≤ 5) is referred to as intersection (i, j) here (in Figure 1, the object is at intersection (3, 1)). When the target is contained by the object, it disappears and then appears at a randomly-selected intersection. When the users move the object to contain the target, they are asked to move the object via one of the shortest routes. For example, when the positions of the target and object are shown in Figure 1, there exist multiple routes (called routes a through d here) from the object to the target (see Figure 2). Therefore, at the intersections where the users need to determine the object’s movement direction (for example, at intersections (3, 1) and (2, 1) in route b, and at intersections (3, 1), (2, 1), and (1, 1) in routes c and d), they try to transmit their wills to each other by haptics.
Figure 1: Displayed image of virtual space.
QoE Assessment of Will Transmission Using Vision and Haptics....
47
Figure 2: Example of shortest routes.
If one user can know the other’s will, he/she tries to follow the will. Otherwise, he/she moves the object based on his/her own will.
Will Transmission Methods As described in Subsection 2.1, at the intersections where the users need to determine the object’s movement direction, they try to transmit their wills to each other. We deal with three cases (cases 1, 2, and 3) of will transmission. In case 1, the users transmit their wills to each other by only haptics. In case 2, they transmit their wills to each other by haptics and vision without drawing an arrow to indicate the direction of force applied to the object by the other user [17] , and in case 3, they do by haptics and vision with drawing the arrow. In case 1, for example, when the target and object are located at the positions shown in Figure 2, if the user at client 2 wants to move the object along route a, he/she needs to push the object to the left strongly. At this time, if the user at client 1 feels a strong leftward-force through the object, he/she makes a judgement that the other user wants to move the object to the left. If the user at client 2 wants to move the object along the other routes (i.e., routes b, c and d), that is, the user wants to move the object upward, he/she tries to push the object to the upper left. At this time, since the user at client 1 feels a little weaker leftward-force than that of the previous situation through the object, he/she judges that the other wants to move the object upward. If
48
Machine Vision and Image Recognition
the user at client 1 wants to move the object to the left, he/she applies a weak rightward-force to the object so that the object does not fall down. At this time, the user at client 2 feels a weak rightward force, and judges that the other wants to move the object to the left. If the user at client 1 wants to move the object upward, he/she applies a stronger rightward force than that when he/she wants to move the object to the upper left. At this time, the user at client 2 feels a strong rightward force, and knows that the other wants to move the object upward. In this case, their wills are transmitted by only haptics, and the other’s movement (i.e., the movement of the other’s cursor) cannot be seen at the local terminal (see Figure 3). In case 2, the user at each client can see the other’s movement (see Figure 1), and it may be possible for the user to know the other’s will by the movement of the other’s cursor. In case 3, at each client, we display the movement of the other’s cursor and draw an arrow to indicate the direction of force applied to the object by the other user (see Figure 4). The arrow is drawn with a starting point at the front side of the other’s cursor’s center point. The direction of the arrow is the same as the direction of the force applied by the other user, and its length is proportional to the magnitude of the force.
ASSESSMENT SYSTEM As shown in Figure 5, the assessment system consists of a server and two clients which are connected via a switching hub and a network emulator (NIST Net [18] ) by using Ethernet cables (100 Mbps). The packet size of each position information transmitted from the server to each client is 72 bytes, and that transmitted from each client to the server is 32 bytes. By using the network emulator, we generate a constant delay for each packet transmitted from each client to the server. We employ UDP as the transport protocol to transmit the packets. Functions of the server and clients are shown in Figure 6. We will explain the functions in what follows.
QoE Assessment of Will Transmission Using Vision and Haptics....
49
Figure 3: Displayed image of virtual space (will transmission by only haptics).
Figure 4: Displayed image of virtual space (will transmission by haptics and vision (with drawing cursor and arrow)).
Figure 5: Configuration of assessment system.
50
Machine Vision and Image Recognition
Figure 6: Functions at server and clients.
Each client performs haptic simulation by repeating the servo loop [16] at a rate of 1 kHz, and the client inputs/outputs position information at the rate. The client transmits the position information by adding the timestamp and sequence number to the server. When the server receives the position information from the two clients, it calculates the position of the object every millisecond by using the spring-damper model [16] and transmits the position information to the two clients every millisecond. When each client receives the position information, the client calculates the reaction force applied to the user after updating the position of the object. The image is updated at a rate of about 60 Hz at each client.
FORCE CALCULATION When the two users (the user at client i (i = 1, 2) is called user i) collaboratively lift and move the object, there are forces exerted by the two users, gravitation, and frictional force applied on the object. The resultant force Fobj applied to the object is calculated as follows: (1) where Fi is the force that user i exerts to the object, fi is the frictional force between the object and the cursor of user i, m is the object’s mass, and g is the gravitational acceleration in the virtual space. When the two users lift the object by putting it between the two cursors, we set the gravity of the object and the vertical component of force exerted by user i (Fiv) to 0 so that the users can lift and move the object easily
QoE Assessment of Will Transmission Using Vision and Haptics....
51
(see Figure 7). It means that there exist only the horizontal components of forces (F1h, F2h) exerted by the two users and frictional force applied on the object at this time. The object is moved in the same direction as the resultant force. There are the kinetic friction force and static frictional force applied on the object. The static frictional force is calculated by using the mass of the cursor (45 g) and the acceleration of the cursor. When the static frictional force is larger than the maximum static frictional force which is calculated by multiplying the horizontal component of forces exerted by user i by the coefficient of static friction (0.6), the kinetic friction force works. The kinetic friction force is calculated by multiplying the horizontal component of forces exerted by user i by the coefficient of kinetic friction (0.4). The reaction force applied to user i through PHANToM is calculated by using spring-damper model as follows: (2) where KS (0.8 N/mm) and KD (0.2 N∙ms/mm) are the spring coefficient and damper coefficient, respectively, and xi is the vector of the depth that the cursor of user i penetrates into the object, and vi is the relative velocity of cursor to the object (see Figure 8).
Figure 7: Force applied to object when users cooperatively lift and move object.
Figure 8: Force applied to user.
52
Machine Vision and Image Recognition
ASSESSMENT METHODS In the assessments, each pair of subjects is in two separate rooms. The pair of subjects is asked to do the collaborative work described in Subsection 2.1 while transmitting their wills to each other. If one of the pair can know the other’s will, he/she tries to follow the other’s will. Otherwise, he/she moves the object based on his/her own will. In order to clarify whether the wills are transmitted accurately or not, after passing each intersection at which the subjects need to transmit their wills, each subject is asked to stop moving†2 and tell whether he/she can know the other’s will, the will if he/she can know it, and his/her own will. Each subject is asked to determine the movement direction as randomly as possible at each intersection at which the subject needs to transmit his/her will in order to avoid moving the object in the same type of routes (for example, the routes which have the least intersections at which subjects need to transmit their wills ) every time. We carry out QoE assessments subjectively and objectively to assess the influences of will transmission by changing the constant delay. For subjective assessment, we enhance the single-stimulus method of ITU-R BT. 500-12 [19] . Before the assessment, each pair of subjects is asked to practice on the condition that there is no constant delay. In the practice, in order to avoid establishing the way of will transmission before the assessment, after the target is eliminated, it appears at a randomlyselected adjacent intersection. In this case, the object’s movement direction is already determined, and the subjects do not need to transmit their wills. After the practice, they are asked to do the collaborative work in case 3 once on the condition that there is no additional delay. The quality at this time is the standard in the assessment. Then, the subjects are asked to give a score from 1 through 5 (see Table 1) about the operability of PHANToM (whether it is easy to operate the PHANToM stylus), transmissibility of wills (whether their own wills can be transmitted to each other), and comprehensive quality (a synthesis of the operability and transmissibility) according to the degree of deterioration on the condition that there exist constant delays†3 to obtain the mean opinion score (MOS) [15] in cases 1 through 3. Objective assessment is carried out at the same time as the subjective assessment. As performance measures, we employ the average number of eliminated targets [13] , the average number of dropping times, the average number of passed intersections at which the subjects needed to transmit their wills, the average number of intersections at which one thought that he/ she knew the other’s wills, the average number of intersections at which
QoE Assessment of Will Transmission Using Vision and Haptics....
53
one could know the other’s will accurately, and the percentage of questions answered correctly (the percentage of the average number of intersections at which one could know the other’s will accurately to the average number of passed intersections at which the subjects needed to transmit their wills). Table 1: Five-grade impairment scale Score 5 4 3 2 1
Description Imperceptible Perceptible, but not annoying Slightly annoying Annoying Very annoying
In order to investigate the influence of constant delay, the delay is changed from 0 ms to 25 ms at intervals of 5 ms. We select the constant delay and the cases in random order for each pair of subjects. In the assessments, each test is done for 80 seconds, and it takes about 50 minutes for each pair of subjects to finish all the tests. The number of subjects whose ages are between 21 and 30 is twenty.
ASSESSMENT RESULTS We show MOS values of the operability of PHANToM, transmissibility of wills, and comprehensive quality versus the constant delay in Figures 9-11, respectively. The average number of eliminated targets is shown in Figure 12, and Figure 13 shows the average number of dropping times. The average number of passed intersections at which the subjects needed to transmit their wills, the average number of intersections at which one thought that he/she knew the other’s will, and the average number of intersections at which one could know the other’s will accurately are shown in Figures 14-16, respectively. The percentage of questions answered correctly is shown in Figure 17. In the figures, we also plot the 95% confidence intervals. From Figures 9-11, we see that the MOS values of the operability, transmissibility of wills, and comprehensive quality deteriorate as the constant delay increases. The tendency is almost the same as those of the results in [8] , [9] , and [13] , where the operability of haptic interface devices becomes worse as the network delay increases. Furthermore, we find in the figures that when the constant delay is smaller than about 10 ms, the MOS values of the operability of PHANToM are larger than 3.5 in the three
54
Machine Vision and Image Recognition
cases; this means that although subjects can perceive the deterioration, they do not think that it is annoying, and the deterioration in QoE is allowable [22] . We also assume that it is allowable when the MOS value is larger than 3.5 in this paper. From the figures, we notice that the MOS values of the transmissibility of wills are larger than 3.5 when the constant delay is smaller than about 5 ms, 10 ms, and 15 ms in cases 1, 2, and 3, respectively. For the comprehensive quality, the MOS values are larger than 3.5 when the constant delay is smaller than about 5 ms in case 1 and smaller than about 10 ms in cases 2 and 3. From Figure 9, we also find that the MOS values of the operability of PHANToM in all the cases are almost the same, and they do not depend on the cases. From Figure 10, we see that the MOS values in case 2 are larger than those in case 1, and the MOS values in case 3 are the largest. In order to confirm whether there are statistically significant differences among the results of the three cases, we carried out the two way ANOVA for the results shown in Figure 10 (factor A: network delay, factor B: cases). As a result, we found that the P-value of factor A is 1.50 × 10−108, and that of factor B is 8.35 × 10−44. That is, there are statistically significant differences among the network delay and among the three cases. It means that vision (the movement of the other’s cursor) affects haptic will transmission, and the transmissibility of wills is improved by denoting the direction and magnitude of the other’s force. Therefore, we can say that vision affects haptic will transmission and improves the transmissibility of wills. From Figure 12, we notice that as the constant delay increases, the average numbers of eliminated targets decrease. We also see that the average number in case 2 is larger than that in case 1, and the average number in case 3 is the largest. The tendency of the average number of eliminated targets is almost the same as that of the transmissibility of wills. From these results, we can also say that vision affects haptic will transmission, and the efficiency of the collaborative work is improved. In Figure 13, we observe that the average number of dropping times in case 2 is less than that in case 1, and that in case 3 is the least.
QoE Assessment of Will Transmission Using Vision and Haptics....
Figure 9: MOS of operability of PHANToM.
Figure 10: MOS of transmissibility of wills.
Figure 11: MOS of comprehensive quality.
55
56
Machine Vision and Image Recognition
Figure 12: Average number of eliminated targets.
Figure 13: Average number of dropping times.
Figure 14: Average numbers of intersections at which subjects needed to transmit their wills.
QoE Assessment of Will Transmission Using Vision and Haptics....
57
Figure 15: Average numbers of intersections at which one thought that he/she knew other’s will.
From this, we can also say that vision affects haptic will transmission and efficiency of the collaborative work is improved, as in Figure 12. From Figures 14-16, we find that the average number of passed intersections at which the subjects needed to transmit their wills, the average number of intersections at which one thought that he/she knew the other’s wills, and the average number of intersections at which one could know the other’s will accurately of case 2 are larger than those in case 1, and those in case 3 are the largest.
Figure 16: Average numbers of intersections at which one could know other’s will accurately.
Furthermore, from Figure 17, we see that the percentage of questions answered correctly in case 2 is larger than that in case 1, and that in case 3 is
58
Machine Vision and Image Recognition
the largest. From these results, we can also say that vision affects haptic will transmission and accuracy of the collaborative work is improved.
Figure 17: Percentage of questions answered correctly.
We also carried out two way ANOVA for the results shown in Figures 10-17 to confirm whether there are statistically significant differences among the results in the three cases. As a result, we found that the P-values are smaller than 0.05. It means that there are statistically significant differences among the three cases. From the above results, we can say that vision affects haptic will transmission and improves the accuracy of the transmissibility of wills. However, there are four directions (right, left, up or down) and subjects need to decide a direction from “up or right,” “up or left,” “down or right,” or “down or left” at each intersection where they need to transmit their wills in the assessment. Since the transmissibility of wills may be different in directions (right, left, up or down), we divide the average number of passed intersections at which the subjects needed to transmit their wills, the average number of intersections at which one thought that he/she knew the other’s wills, and the average number of intersections at which one could know the other’s will accurately on the basis of the four directions and show the results in Figures 18-21. We also show the average number of intersections at which the two subjects’ wills were different from each other and that at which the subjects moved the object to the right or left directions in Figure 22. We further show the percentage of intersections at which the subjects moved the object to the right or left direction when the two subjects’ wills were different from each other in Figure 23. From Figure 18, we see that the average number of inter-
QoE Assessment of Will Transmission Using Vision and Haptics....
59
sections at which the subjects move the object upward or downward is larger than that of intersections at which the subjects move the object to the right or left. This is because it is more difficult to perceive the perpendicular force than the horizontal force in the three cases. In Figure 21, we find that when the movement direction is right or left, the percentage of questions answered correctly is larger than the percentage of questions answered correctly when the movement direction is upward or downward.
Figure 18: Average numbers of intersections at which subjects needed to transmit their wills (right or left and up or down).
Figure 19: Average numbers of intersections at which one thought that he/she knew other’s will (right or left and up or down).
60
Machine Vision and Image Recognition
From Figures 19-21, we notice that the average number of intersections at which one thought that he/she knew the other’s wills, the average number of intersections at which one could know the other’s will accurately, and the percentage of questions answered correctly in case 3 are the largest, and those in case 1 are the smallest.
Figure 20: Average numbers of intersections at which one could know other’s will accurately (right or left and up or down).
Figure 21: Percentage of questions answered correctly (right or left and up or down).
QoE Assessment of Will Transmission Using Vision and Haptics....
61
Figure 22: Average number of intersections at which two users’ wills were different from each other and that at which users moved object to right or left directions.
Figure 23: Percentage of intersections at which users moved object to right or left directions when two users’ wills were different from each other.
Furthermore, we find in Figure 22 that when the two subjects’ wills were different from each other, the average numbers of intersections at which the movement direction is right or left is larger than that of intersections at which the movement direction is upward or downward. From Figure 23, we notice that the percentage of intersections at which the subjects moved object to the right or left direction when the two subjects’ wills were different from each other in case 3 is about 50% to 60%, which is the smallest in the three cases.
CONCLUSIONS In this paper, we handled collaborative work in which two users stand on
62
Machine Vision and Image Recognition
an equal footing and the role of each user, master-slave relationship, and/ or movement direction of object (i.e., user’s will) are not determined in advance, and we investigated the effects of wills transmission by using vision and haptics. In the experiment, we dealt with three cases of wills transmission. In case 1, wills were transmitted by only haptics. In case 2, wills were transmitted by haptics and vision (without drawing an arrow to indicate the direction of force applied to the object by the other user), and in case 3, wills were transmitted by haptics and vision (with drawing an arrow to indicate the direction of force applied to the object by the other user). As a result, we find that as the network delay increases, it becomes more difficult to transmit the wills accurately, and when the network delay is smaller than about 5 ms in case 1, about 10 ms in case 2, and about 15 ms in case 3, the MOS values are larger than 3.5 and the deterioration in QoE is allowable. We also notice that vision affects haptic will transmission and accuracy of the collaborative work is improved. Furthermore, we see that when the will (the direction in which each user wants to move the object) is right or left, the efficiency of will transmission is higher than that when the will is up or down. As the next step of our research, we plan to clarify the process of consensus formation using haptics and study how to improve the transmissibility of wills. Also, we need to improve the transmissibility of the perpendicular force and investigate the influences of network delay jitter and packet loss on will transmission using haptics. We will further investigate the effects of will transmission using haptics in other directions.
ACKNOWLEDGEMENTS Pingguo Huang, Yutaka Ishibashi The authors thank Prof. Hitoshi Watanabe and Prof. Norishige Fukushima for their valuable comments. They also thank Ms. Qi Zeng who was a student of Nagoya Institute of Technology for her help in QoE assessment.
NOTES We select these parameter values by carrying out a preliminary experiment in which the network delay is negligibly small and it is easy to do the collaborative work by using the parameter values. The other parameter values in this section are selected in the same way. †1
The authors also dealt with the case in which each pair of subjects does not
†2
QoE Assessment of Will Transmission Using Vision and Haptics....
63
stop after passing each intersection at which the subjects need to transmit their wills . As a result, we found that the results are almost the same as those in this study. The authors also investigated the influence of packet loss on the will transmission . As a result, we saw that it becomes more difficult to transmit the wills accurately as the packet loss rate increases. †3
64
Machine Vision and Image Recognition
REFERENCES 1.
Srinivasan, M.A. and Basdogan, C. (1997) Haptics in Virtual Environments: Taxonomy, Research Status, and Challenges. Computers and Graphics, 21, 393-404. http://dx.doi.org/10.1016/ S0097-8493(97)00030-7 2. Kerwin, T., Shen, H. and Stredney, D. (2009) Enhancing Realism of Wet Surfaces in Temporal Bone Surgical Simulation. IEEE Transactions on Visualization and Computer Graphics, 15, 747-758. http://dx.doi.org/10.1109/TVCG.2009.31 3. Steinbach, E., Hirche, S., Ernst, M., Brandi, F., Chaudhari, R., Kammerl, J. and Vittorias, I. (2012) Haptic Communications. IEEE Journals & Magazines, 100, 937-956. 4. Huang, P. and Ishibashi, Y. (2013) QoS Control and QoE Assessment in Multi-Sensory Communications with Haptics. The IEICE Transactions on Communications, E96-B, 392-403. 5. Hayashi, D., Ohnishi, H. and Nakamura, N. (2006) Understand the Effect of Visual Information and Delay on a Haptic Display. IEICE Technical Report, 7-10. 6. Kamata, K., Inaba, G. and Fujita, K. (2010) Virtual Manipulation Assistance by Collision Enhancement Using Multi-Finger ForceFeedback Device. Transactions on VRSJ, 15, 653-661. 7. Yap, K. and Marshal, A. (2010) Investigating Quality of Service Issues for Distributed Haptic Virtual Environments in IP Networks. Proceedings of the IEEE ICIE, 237-242. 8. Watanabe, T., Ishibashi, Y. and Sugawara, S. (2010) A Comparison of Haptic Transmission Methods and Influences of Network Latency in a Remote Haptic Control System. Transactions on VRSJ, 15, 221-229. 9. Huang, P., Ishibashi, Y., Fukushima, N. and Sugawara, S. (2012) QoE Assessment of Group Synchronization Control with Prediction in Work Using Haptic Media. IJCNS, 5, 321-331. http://dx.doi.org/10.4236/ ijcns.2012.56042 10. Hashimoto, T. and Ishibashi, Y. (2008) Effects of Inter-Destination Synchronization Control for Haptic Media in a Networked Real-Time Game with Collaborative Work. Transactions on VRSJ, 13, 3-13. 11. Hikichi, K., Yasuda, Y., Fukuda, A. and Sezaki, K. (2006) The Effect of Network Delay on Remote Calligraphic Teaching with Haptic
QoE Assessment of Will Transmission Using Vision and Haptics....
12.
13.
14. 15.
16. 17.
18.
19. 20.
21.
22.
65
Interfaces. Proceedings of 5th ACM SIGCOMM Workshop on Network and System Support for Games, Singapore City, 30-31 October 2006. Nishino, H., Yamabiraki, S., Kwon, Y., Okada, Y. and Utsumiya, K. (2007) A Remote Instruction System Empowered by Tightly Shared Haptic Sensation. Proceedings of SPIE Optics East, Multimedia Systems and Applications X, 6777. Kameyama, S. and Ishibashi, Y. (2006) Influence of Network Latency of Voice and Haptic Media on Efficiency of Collaborative Work. Proceedings of International Workshop on Future Mobile and Ubiquitous Information Technologies, Nara, 9-12 May 2006, 173-176. [Citation Time(s):6] ITU-T Rec. P. 10/G. 100 Amendment 1 (2007) New Appendix I— Definition of Quality of Experience (QoE). Brooks, P. and Hestnes, B. (2010) User Measures of Quality of Experience: Why Being Objective and Quantitative Is Important. IEEE Network, 24, 8-13. http://dx.doi.org/10.1109/MNET.2010.5430138 Sens Able Technologies, Inc. (2004) 3D Touch SDK Open Haptics Toolkit Programmers Guide. Version 1.0. Huang, P., Zeng, Q. and Ishibashi, Y. (2013) QoE Assessment of Will Transmission Using Haptics: Influence of Network Delay. Proceedings of IEEE 2nd Global Conference on Consumer Electronics, Tokyo, 1-4 October 2013, 456-460. Carson, M. and Santay, D. (2003) NIST Net—A Linux-Based Network Emulation Tool. ACM SIGCOMM Computer Communication Review, 33, 111-126. http://dx.doi.org/10.1145/956993.957007 ITU-R Rec. BT. 500-12 (2009) Methodology for the Subjective Assessment of the Quality of Television Pictures. Huang, P., Zeng, Q., Ishibashi, Y. and Fukushima, N. (2013) Efficiency Improvement of will Transmission Using Haptics in Networked Virtual Environment. 2013 Tokai-Section Joint Conference on Electrical and Related Engineering, 4-7. Yu, H., Huang, P., Ishibashi, Y. and Fukushima, N. (2013) Influence of Packet Loss on Will Transmission Using Haptics. 2013 Tokai-Section Joint Conference on Electrical and Related Engineering, 4-8. ITU-R BT. 1359-1 (1998) Relative Timing of Sound and Vision for Broadcasting.
Chapter 4
Concept Learning in Neuromorphic Vision Systems: What Can We Learn from Insects?
Fredrik Sandin1, Asad I. Khan2, Adrian G. Dyer3,4, Anang Hudaya M. Amin5, Giacomo Indiveri6, Elisabetta Chicca7, Evgeny Osipov8 EISLAB, Luleå University of Technology, Luleå, Sweden
1
Clayton School of Information Technology, Monash University, Clayton, Australia
2
Department of Physiology, Monash University, Clayton, Australia
3
School of Media and Communication, Royal Melbourne Institute of Technology, Melbourne, Australia
4
Faculty of Information Science & Technology (FIST), Multimedia University, Melaka, Malaysia
5
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland
6
Citation: Sandin, F., et al. (2014) Concept Learning in Neuromorphic Vision Systems: What Can We Learn from Insects? Journal of Software Engineering and Applications, 7,387-395.http://dx.doi.org/10.4236/jsea.2014.75035. Copyright:© 2014 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/
68
Machine Vision and Image Recognition
Cognitive Interaction Technology, Center of Excellence, Bielefeld University, Bielefeld, Germany
7
Division of Computer Science, Luleå University of Technology, Luleå, Sweden
8
ABSTRACT Vision systems that enable collision avoidance, localization and navigation in complex and uncertain environments are common in biology, but are extremely challenging to mimic in artificial electronic systems, in particular when size and power limitations apply. The development of neuromorphic electronic systems implementing models of biological sensory-motor systems in silicon is one promising approach to addressing these challenges. Concept learning is a central part of animal cognition that enables appropriate motor response in novel situations by generalization of former experience, possibly from a few examples. These aspects make concept learning a challenging and important problem. Learning methods in computer vision are typically inspired by mammals, but recent studies of insects motivate an interesting complementary research direction. There are several remarkable results showing that honeybees can learn to master abstract concepts, providing a road map for future work to allow direct comparisons between bio-inspired computing architectures and information processing in miniaturized “real” brains. Considering that the brain of a bee has less than 0.01% as many neurons as a human brain, the task to infer a minimal architecture and mechanism of concept learning from studies of bees appears well motivated. The relatively low complexity of insect sensory-motor systems makes them an interesting model for the further development of bio-inspired computing architectures, in particular for resource-constrained applications such as miniature robots, wireless sensors and handheld or wearable devices. Work in that direction is a natural step towards understanding and making use of prototype circuits for concept learning, which eventually may also help us to understand the more complex learning circuits of the human brain. By adapting concept learning mechanisms to a polymorphic computing framework we could possibly create large-scale decentralized computer vision systems, for example in the form of wireless sensor networks. Keywords: Concept Learning, Computer Vision, Computer Architecture, Neuromorphic Engineering, Insect
Concept Learning in Neuromorphic Vision Systems: What Can We ....
69
INTRODUCTION Efficient and lightweight computer vision devices that operate on a low energy budget can open up novel application domains. For instance, reliable collision avoidance, localization, and navigation have a range of practical applications. Making systems that can perform such tasks in environments with greater complexity and uncertainty than current systems can tolerate is an important challenge [1] . Neuromorphic [2] vision sensors are hybrid analog/digital VLSI devices that implement models of biological visual systems in hardware [3] -[7] . Similarly, neuromorphic neural processing systems implement real-time biologically plausible models of neural circuits and architectures that can be configured to carry out complex signal processing and computational tasks. Systems composed of neuromorphic vision sensors and neuromorphic multi-neuron chips are becoming elaborate and powerful enough for use in real-world applications [8] [9] . For example, a system of this type has been recently synthesized to perform context-dependent classification of motion patterns observed by a silicon retina [10] . Neuromorphic systems are designed to deal with uncertainties and show potential for brain-like computability and remarkable capabilities in vision, for example in terms of high temporal resolution and dynamic range at low data rate and power use. The speed at which such a system can operate in resource-constrained environments surpasses that of conventional computer vision systems. A major challenge, however, is to design a flexible sensory-motor architecture that can be adapted to real-world applications in a cost-efficient manner. Seeking inspiration in animal cognition, it is evident that concept learning [11] [12] is a key function that enables motor response in complex environments and novel situations by generalization of former experience, thereby making it unnecessary to learn each particular situation that is encountered. There is evidence for three types of concept learning in animals [12] : similarity-based in which items are categorized based on similarity; relational in which one item is categorized relative to another; associative in which arbitrary stimuli become interchangeable due to association with another stimulus or response. Concept learning is also a central part of analogy making [11] [13] , which is used to infer novel information about objects or situations through mapping of memories with similar compositional structure, see [14] [15] for examples. Learning approaches to computer vision are typically inspired by empirical studies of vision in mammals. A complementary approach is to seek inspiration and guidance in
70
Machine Vision and Image Recognition
studies of insects that demonstrate conceptual learning and problem solving abilities [16] -[18] . Work in that direction is motivated due to the relatively low complexity of insect brains, e.g., 105 neurons in the fruit fly brain [19] or 106 in the bee brain [20] versus 1011 in the human brain [21] . Despite the low number of neurons, the mushroom bodies in the brain of honeybees process visual, olfactory, gustatory, and mechanosensory information, and show a remarkable ability to learn how to solve problems [22] . Insect brains provide an excellent template for an empirical study, and for electronic implementation using neuromorphic circuits. For example, some aspects of the silicon retina vision sensors correlate to insect vision—prioritizing speed of processing over finer details such as colour vision. Being inspired by neural systems, the neuromorphic engineering approach is naturally motivated for the development of compact lowpower sensory-motor and neural processing systems. In distributed and heterogeneous large-scale systems the polymorphic computing (PmC) approach provides a scalable and reliable alternative to conventional computing architectures for pattern recognition, and vector symbolic architectures (VSAs) provide a mathematical framework for encoding concepts and related functions with distributed representations, which in principle can be implemented in neuromorphic devices and systems. In the next section we introduce the remarkable concept learning results obtained in studies of bees that are mentioned above. Thereafter we briefly introduce the neuromorphic approach to building artificial sensory-motor systems, followed by an introduction to vector symbolic representations of concepts and polymorphic computing in decentralized systems.
HIGHER-ORDER LEARNING IN INSECTS Insects operate in complex environments and potentially provide a valuable model for understanding how information is efficiently managed in a miniaturized brain [23] [24] . In particular the honeybee has become a supermodel [18] [22] [25] -[27] for understanding learning, due to their altruistic lifestyle that provides convenient access to investigating how individuals learn through experience to solve complex problems. Such learning was first demonstrated by the Nobel laureate Karl von Frisch exactly 100 years ago [28] who showed that bees could be trained to solve a visual task through associative conditioning with a rewarding sweet solution, and tested in extinction trials when rewards were removed. Since bees collect nutrition to contribute to the entire colony, it is possible to train and test one
Concept Learning in Neuromorphic Vision Systems: What Can We ....
71
individual bee for 8 - 10 hours a day to evaluate learning in psychophysics type experiments, where behavioral outcomes to known stimuli allow for black box evaluations of information processing by the visual system [29] . The brain of the bee has been intensively studied and is known to have very distinct hierarchical structures [30] [31] . Considering visual stimuli, the honeybee as a model of bee vision has three classes of receptors (UV-, Blueand Green-sensitive). The Green receptor signals are initially processed in the lamina region of the brain, which has neurons with fast response times that potentially drive rapid responses to achromatic signals like salient edges [32] . Beyond the level of the lamina, visual processing incorporates all three classes of receptor inputs that occur in specialized regions of the honeybee brain which have been imaged at high resolution [30] . After the lamina, visual signals are next processed in the medulla where information starts to become segregated by specialized neurons including broad-band neurons that respond equally to multiple wavelengths of light; narrow-band neurons that respond to input from a single photoreceptor type; or colour-opponentneurons that antagonistically process multiple spectrally different signals [31] . Interestingly, it appears that from this medulla level of processing, signals may follow different pathways such that neurons from the outer layers of the medulla project to the posterior protocerebrum region of the brain, whilst signals processed in the inner layers of the medulla project to the lobula, the lateral protocerebrum and mushroom bodies. These different pathways may enable either fast hard-wired responses, or a capacity to learn through experience [31] . Thus whilst the bee brain contains less than one million neurons, the hierarchical structure and alternative neural pathways appear to facilitate either a hard wired or plastic learning capacity for different scenarios that might occur for a flying insect. Indeed individual honeybees show a remarkable capacity to learn how to solve problems. One of the clearest demonstrations of this is the capacity of honeybees to solve delayed matching-to-sample type tasks where a viewed initial stimulus must be loaded into working memory, and then subsequently compared to alternatives in order to make a correct decision and collect a sucrose solution reward [33] . Whilst this type of delayed matching to sample task actually takes an individual bee a very long time to learn with variable stimuli during different trials; once the “matching” rule is learnt (involving long term memory), a bee can quickly apply the acquired rule to a novel task like matching scents within a trail [33] . Indeed, rule learning appears to be a major way that insects learn to solve problems, but only if provided with the correct conditions to reinforce flexible learning. For example, if free flying
72
Machine Vision and Image Recognition
honeybees have to “only” learn a fixed target at a constant visual angle, then there is a poor ability to make correct decision if conditions change, but bees trained to a set of variable stimuli learn how to use this acquired information to solve novel problems [34] . Bees can learn to solve complex visual tasks like face recognition using configural processing mechanisms [35] , and even with the right conditioning experience deal with complex transformation imposed by viewpoint variation [36] . Some of these tasks like solving above/below relational problems [16] , or simultaneously applying multiple rules like spatial relationships and differences [17] , are at a level of complexity that challenges our current understanding of what mammalian brains can achieve [18] [37] . This shows that miniaturized information-processing systems have a capacity to efficiently deal with very complex information provided that they are appropriately organized through adaptation. However, current challenges exist in completely bridging between the higher-order learning (i.e. non-elemental learning) behavior demonstrated in free-flying bees, and our complete understanding of the neural regions responsible because when a bee is harnessed, as would typically be required to enable brain recordings [38] , learning performance for visual stimuli in significantly impaired [39] . Solutions to this problem are starting to emerge with new work on displaying complex moving stimuli to tethered bees [40] , or using closed-loop paradigms allowing tethered but walking bees to actively control visual objects within a virtual reality arena [41] . Indeed such experiments reveal that attention like behavioral responses result in modulation of neural activity in the medulla region of the brain, again pointing towards the importance of this structure for filtering information for decision making in insects. These remarkable studies provide a road map for future work to more fully allow direct comparisons between bio-inspired computing architectures and information processing in miniaturized “real” brains [24] [31] . Currently there appears to be similarities in hierarchical organization and segregation of information parsed into fast hard-wired solutions, or more plastic learning modules depending upon task difficulty.
NEUROMORPHIC SENSORY-MOTOR SYSTEMS Neuromorphic vision sensors are hybrid analog/digital VLSI devices that implement hardware models of biological visual systems, which typically are used in machine vision [3] -[7] . It is only recently that these hardware models have become elaborate enough for use in a variety of engineering applications [8] [9] . These types of devices and systems offer a low cost
Concept Learning in Neuromorphic Vision Systems: What Can We ....
73
alternative to special purpose Digital Signal Processors (DSPs) for machine vision tasks. They can be used for reducing the computational load on the digital system in which they are embedded or, ideally, for carrying out all of the necessary computation without the need of any additional hardware. They process images directly at the focal plane level: each pixel contains local circuitry that performs different types of spatio-temporal computations on the continuous analog brightness signal in real-time. In contrast, CCD cameras or conventional CMOS imagers merely measure the brightness at the pixel level, eventually adjusting their gain to the average brightness level of the whole scene. In neuromorphic vision chips, photo-receptors, memory elements and computational nodes share the same physical space on the silicon surface. The specific computational function of a neuromorphic sensor is determined by the structure of its architecture and by the way its pixels are interconnected. Since each pixel processes information based on locally sensed signals and on data arriving from its neighbors, the type of computation being performed is fully parallel and distributed. Another important feature is the asynchronous operation of neuromorphic sensors, which is preferable to clocked operation for sensory processing, given the continuous nature of sensory signals. Clocked systems introduce temporal aliasing artifacts that can significantly compromise the time-dependent computations performed in real-time sensory processing systems. Recent neuromorphic vision sensors are clock-less and use a frame-less communication protocol [42] -[44] . In these sensors each pixel is assigned an address, and when a pixel generates an event (e.g., when it measures a contrast difference greater than a set threshold) its address is instantaneously put on a digital bus, using asynchronous logic. In this asynchronous “AddressEvent Representation” (AER) time represents itself, and analog signals are encoded by the inter-spike intervals between the addresses of their sending nodes. Address-events are the digital pulses written on the bus. In this way neuromorphic multi-chip systems can be assembled, consisting for example of neuromorphic sensory devices such as silicon retinas interfaced to one or more chips containing networks of spiking neuron circuits. Spiking neural network chips can receive address-events produced by neuromorphic sensors and process them (e.g., to implement concept learning), and eventually transmit the processed signals to actuators, thus implementing complete neuromorphic sensory-motor systems.
74
Machine Vision and Image Recognition
REPRESENTATION OF CONCEPTS AND CONCEPTUAL RELATIONSHIPS In general, the question how to realize higher-order learning and cognition in neuromorphic sensory-motor systems is open because we do not fully understand the principles and architectures of neural circuits in brains that make this possible. In particular, extracting and making use of conceptual relationships such as same/different or above/below is a challenge. Computational approaches to analyze spatial structure of images typically result in NP-hard graph matching problems [45] that are difficult to approximate. Research over the last few decades have advanced our understanding of concept learning [11] -[13] [46] , but we are still lacking a plausible description of the mechanisms and neural architecture involved. Open questions include the nature of neural object representations and how representations change across different processing stages, for example how object features are computationally integrated into coherent object representations, and how these are read-out by higherorder circuits. This problem includes learning of elementary sparse representations of objects and events, and representations of invariant features of object and event categories, which are grounded in sensory projections [47] [48] . Generative representations and motor programs play an important role in concept learning, see for example [49] and references therein. This appears natural in the case of animals since concept learning is an integrated part of the sensory-motor system, but covert actions may be more important than overt actions and the underlying computational mechanisms are unknown. In principle, the possibilities to generalize beyond familiar examples suggest that concepts are representationally rich. Therefore, it is remarkable that relatively few examples can be sufficient to learn a new concept. Efficient sensory and motor-program coding strategies are likely involved, for example sparse coding [50] and complexity minimization [51] . Vector symbolic architectures (VSAs) [52] -[54] offer an interesting mathematical framework for modeling of concepts, relationships between concepts and analogies [14] [15] . VSAs are based on high-dimensional vector representations of objects (e.g., image parts), relations (e.g., composition of objects), and sequences (e.g., sentence structures), and operators for dynamic binding and formation of chunks of multiple concepts in working memory. Such representations can in principle be integrated in spiking neural networks and neuromorphic hardware, see [55] for an example where a largescale brain model is designed and simulated. VSAs can be integrated with
Concept Learning in Neuromorphic Vision Systems: What Can We ....
75
a model of associative memory [14] known as sparse distributed memory [56] (SDM), forming a Monte Carlo importance sampler that approximates Bayesian inference [57] . The integration of an associative memory enables learning of multiple concepts and relationships. VSAs offer possibilities to prove learnability and enable rapid learning and generalization with high systematicity, meaning that generalization to compositional structures more complex than those in the training set is possible [58] . However, it is not known to what extent a VSA can approximate the complex dynamics of neural representations read out by higher-order circuits of a brain. One fundamental aspect of VSAs and the SDM is that similar or related concepts are represented by similar vectors in a vector space, which appears to be coherent with electrophysiology of hippocampal place cells in rodents showing that the topology of the stimulus space can be inferred from cofiring neurons [59] . Research integrating computational and empirical approaches to study the neural mechanisms underlying object perception and concept learning as it is observed in the behaviour of bees can help us to address the challenging questions outlined above, and to develop a more realistic architecture and computational model of concept learning. Insights gained from such studies can also stimulate the development of artificial concept learning mechanisms for decentralized applications such as wireless sensor networks (WSNs) and the Internet, for example in the form of polymorphic computing which is briefly introduced in the next section. Extending VSA-based concept learning to the case of distributed networks opens a possibility to develop drastically new communication modes. For example, the superposition of broadcasted messages in a wireless transmission medium could be exploited as a physical implementation of the VSA chunking mechanism rather than as a collision, thereby dynamically creating new concepts in the form of parallel transmissions. Recently a VSA was adopted for implementing novel communication protocols and architectures for collective communications in machine-to-machine communication scenarios including wireless sensor networks [60] [61] . The first work demonstrates unique reliability and timing properties that are essential in the context of industrial machine-to-machine communications. The latter work presents an example of collective communications using current radio technology.
76
Machine Vision and Image Recognition
DECENTRALIZED COMPUTER VISION SYSTEMS In a decentralized computer vision system the transportation of data to a centralized location for information processing is inefficient in terms of energy use and communication, and it entails unnecessary time penalties. Ideally pattern recognition should commence as soon as sensory data enters the processing network. The ability to perform such computations, however, depends on the availability of a network-centric computational model that is fully distributable and thus able to dynamically reconfigure its internal resources. Changing the hardware architecture to suit computational goals is common in e.g. field programmable gate arrays (FPGAs), but is less explored in the context of networks and pattern learning in complex and changing environments. Polymorphic computing (PmC) is a scalable and reliable alternative to conventional computing architectures for pattern recognition in such scenarios. Polymorphic computing can in this context be defined as a computing architecture that can vigorously adapt towards changes in the computational model requirements of specific applications. A key characteristic of a polymorphic computer is the ability to dynamically rearrange the hardware configuration during runtime [62] . Ideally it should also be possible to dynamically change the software during runtime. A benefit of PmC is the ability to dynamically divide and distribute tasks according to time-varying computational capacities and requirements. The concept of making such a machine can be traced back to late 1950s [63] , but only a few practical design concepts and methods have emerged so far. One instance of a reconfigurable computer design is the MONARCH (Morphable Networked Micro-Architecture) system developed at the University of Southern California and Raytheon Inc. [64] . WSNs and the emerging Internet of Things offer new interesting opportunities to develop and study such systems, possibly taking a bioinspired approach. For example, the dynamic reconfigurablity of PmC through optimal dataflow manipulation in a network mimics learninginduced plasticity in brains. A distributed pattern recognition approach developed for WSNs is the hierarchical graph neuron (HGN) [65] , which implements one-shot learning of patterns and provisions the structure necessary for deep learning (a form of learning that models data at multiple levels of abstraction or composition). Deep learning, which apparently is nature’s way of coping with complexity in visual processing, is based on hierarchically connected layers, with local feature learning at the lowest
Concept Learning in Neuromorphic Vision Systems: What Can We ....
77
layer and upper layers combining features into higher-order representations. A similar hierarchical organization of information takes place in the visual system of honeybees. In principle, a suitably designed polymorphic computer with stimuli-driven data flows could facilitate deep learning, and by using VSA principles for the design of the higher-order architecture, such a system could be capable of processing concepts. The HGN has been further developed at the conceptual level to a reconfigurable PmC design [66] . In our approach pattern recognition is undertaken progressively in multiple layers as sensory data flows from the input layer(s) upwards in the hierarchy. This design envisages an efficient in-network pattern recognition approach. One way to readily demonstrate the concept will be accomplished by implementing a connection-based computing mechanism within a WSN. Doing so will facilitate computation through data flows between simple processing elements.
CONCLUDING REMARKS Systems composed of neuromorphic vision sensors and neuromorphic multi-neuron chips are becoming elaborate enough for use in real-world applications [8] -[10] . If higher-order learning and cognitive mechanisms can be implemented in such systems, it will enable efficient and lightweight computer vision systems that can perform tasks in environments with greater complexity and uncertainty than current systems can tolerate. Concept learning [11] [12] is a central part of animal cognition that enables appropriate motor response in novel situations by generalization of former experience, possibly from a few training examples. These aspects make concept learning a challenging and important problem, which can lead to discoveries that radically change how we design and use computer vision systems and computing architectures. Learning methods in computer vision are typically inspired by mammals, but recent studies of insects motivate an interesting complementary research direction that is outlined in this paper. In particular, individual honeybees show a remarkable capacity to learn how to solve problems involving abstract concepts. Some of these tasks like solving above/below relational problems [16] , or simultaneously applying multiple rules like spatial relationships and differences [17] , are at a level of complexity that challenges our current understanding of what mammalian brains can achieve [18] [37] . These results provide a road map for future work to allow direct comparisons between bio-inspired computing architectures and information processing in miniaturized “real” brains. Considering that
78
Machine Vision and Image Recognition
the brain of a bee has less than 0.01% as many neurons as a human brain, the task to infer a minimal neural architecture and mechanism mediating concept learning and other forms of higher-order learning from studies of bees appears well motivated. Vector symbolic architectures [52] -[54] offers a mathematical framework for modeling of concepts and higher-order learning with distributed representations, which can also be implemented in digital processing systems and neuromorphic systems. The integration of concept learning mechanisms in a polymorphic computing architecture could enable the development of decentralized computer vision systems, for example in the form of wireless sensor networks, which are scalable and can perform advanced pattern recognition in complex and changing environments.
ACKNOWLEDGEMENTS Fredrik Sandin, Asad I. Khan, Adrian G. Dyer, Anang Hudaya M. Amin, Giacomo Indiveri, Elisabetta Chicca, Evgeny Osipov (STINT), grant number IG2011-2025. AK thank Ahmet Sekercioglu and Alexander Senior for their assistance in the preparation of this paper. AGD thanks ARC DP0878968/ DP0987989 for funding support.
Concept Learning in Neuromorphic Vision Systems: What Can We ....
79
REFERENCES 1.
(2013) A Roadmap for U.S. Robotics: From Internet to Robotics. tech. rep. www.robotics-vo.us 2. Indiveri, G. and Horiuchi, T.K. (2011) Frontiers in Neuromorphic Engineering. Frontiers in Neuroscience, 5, 118. 3. Mahowald, M. and Mead, C. (1989) Analog VLSI and Neural Systems. Silicon Retina, Addison-Wesley, Reading, 257-278. 4. Boahen, K. and Andreou, A. (1992) A Contrast Sensitive Silicon Retina with Reciprocal Synapses. In: Moody, J., Hanson, S. and Lippman, R., Eds., Advances in Neural Information Processing Systems (Vol. 4), Morgan Kaufmann, San Mateo, 764-772. 5. Kramer, J. (2002) An Integrated Optical Transient Sensor. IEEE Transactions on Circuits and Systems II, 49, 612-628. http://dx.doi. org/10.1109/TCSII.2002.807270 6. Lichtsteiner, P., Posch, C. and Delbruck, T. (2008) An 128 × 128 120 dB 15 μs Latency Temporal Contrast Vision Sensor. IEEE Journal of Solid State Circuits, 43, 566-576. http://dx.doi.org/10.1109/ JSSC.2007.914337 7. Posch, C., Matolin, D. and Wohlgenannt, R. (2011) A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor with Lossless Pixel-Level Video Compression and Time-Domain CDS. IEEE Journal of Solid-State Circuits, 46, 259-275. http://dx.doi.org/10.1109/ JSSC.2010.2085952 8. Koch, C. and Mathur, B. (1996) Neuromorphic Vision Chips. IEEE Spectrum, 33, 38-46.http://dx.doi.org/10.1109/6.490055 9. Liu, S.-C. and Delbruck, T. (2010) Neuromorphic Sensory Systems. Current Opinion in Neurobiology, 20, 288-295. http://dx.doi. org/10.1016/j.conb.2010.03.007 10. Neftci, E., Binas, J., Rutishauser, U., Chicca, E., Indiveri, G. and Douglas, R.J. (2013) Synthesizing Cognition in Neuromorphic Electronic Systems. Proceedings of the National Academy of Sciences, 110, E3468-E3476. http://dx.doi.org/10.1073/pnas.1212083110 11. Zentall, T.R., Wasserman, E.A., Lazareva, O.F., Thompson, R.R.K. and Ratterman, M.J. (2008) Concept Learning in Animals. Comparative Cognition & Behavior Reviews, 3, 13-45. 12. Zentall, T.R., Wasserman, E.A. and Urcuioli, P.J. (2014) Associative Concept Learning in Animals. Journal of the Experimental Analysis of
80
13.
14.
15.
16.
17.
18.
19.
20. 21. 22.
23.
Machine Vision and Image Recognition
Behavior, 101, 130-151. http://dx.doi.org/10.1002/jeab.55 Gentner, D. and Smith, L.A. (2013) Analogical Learning and Reasoning. In: The Oxford Handbook of Cognitive Psychology, Oxford University Press, Oxford, 668-681. Emruli, B. and Sandin, F. (2014) Analogical Mapping with Sparse Distributed Memory: A Simple Model That Learns to Generalize from Examples. Cognitive Computation, 6, 74-88. Emruli, B., Gayler, R. and Sandin, F. (2013) Analogical Mapping and Inference with Binary Spatter Codes and Sparse Distributed Memory. The 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, 4-9 August 2013, 1-8 Avarguès-Weber,A., Dyer,A.G. and Giurfa, M. (2011) Conceptualization of Above and Below Relationships by an Insect. Proceedings of the Royal Society B: Biological Sciences, 278, 898-905. Avarguès-Weber, A., Dyer, A.G., Combe, M. and Giurfa, M. (2012) Simultaneous Mastering of Two Abstract Concepts by the Miniature Brain of Bees. Proceedings of the National Academy of Sciences, 109, 7481-7486. http://dx.doi.org/10.1073/pnas.1202576109 Avarguès-Weber, A. and Giurfa, M. (2013) Conceptual Learning by Miniature Brains. Proceedings of the Royal Society B: Biological Sciences, 280. http://dx.doi.org/10.1098/rspb.2013.1907 Chiang, A.S., Lin, C.Y., Chuang, C.C., Chang, H.M., Hsieh, C.H., Yeh, C.W., Shih, C.T., Wu, J.J., Wang, G.T., Chen, Y.C., Wu, C.C., Chen, G.Y., Ching, Y.T., Lee, P.C., Lin, C.Y., Lin, H.H., Wu, C.C., Hsu, H.W., Huang, Y.A., Chen, J.Y., Chiang, H.J., Lu, C.F., Ni, R.F., Yeh, C.Y. and Hwang, J.K. (2011) Three-Dimensional Reconstruction of BrainWide Wiring Networks in Drosophila at Single-Cell Resolution. Current Biology, 21, 1-11. http://dx.doi.org/10.1016/j.cub.2010.11.056 Witthöft, W. (1967) Absolute anzahl und verteilung der zellenim him der honigbiene. Zeitschrift für Morphologie der Tiere, 61, 160-184. Herculano-Houzel, S. (2009) The Human Brain in Numbers: A Linearly Scaled-Up Primate Brain. Frontiers in Human Neuroscience, 3, 31. Giurfa, M. (2013) Cognition with Few Neurons: Higher-Order Learning in Insects. Trends in Neurosciences, 36, 285- 294. http:// dx.doi.org/10.1016/j.tins.2012.12.011 Ibbotson, M. (2001) Evidence for Velocity-Tuned Motion-Sensitive Descending Neurons in the Honeybee. Proceedings of the Royal Society
Concept Learning in Neuromorphic Vision Systems: What Can We ....
24.
25.
26.
27.
28. 29.
30.
31.
32.
33.
34.
81
B: Biological Sciences, 268, 2195-2201. http://dx.doi.org/10.1098/ rspb.2001.1770 Srinivasan, M.V. (2010) Honey Bees as a Model for Vision, Perception, and Cognition. Annual Review of Entomology, 55, 267-284. http:// dx.doi.org/10.1146/annurev.ento.010908.164537 Reinhard, J., Srinivasan, M.V. and Zhang, S.W. (2004) Olfaction: Scent-Triggered Navigation in Honeybees. Nature, 427, 411. http://dx. doi.org/10.1038/427411a Srinivasan, M.V., Zhang, S., Altwein, M. and Tautz, J. (2000) Honeybee Navigation: Nature and Calibration of the “Odometer”. Science, 287, 851-853. http://dx.doi.org/10.1126/science.287.5454.851 Esch, H.E., Zhang, S.W., Srinivasan, M.V. and Tautz, J. (2001) Honeybee Dances Communicate Distances Measured by Optic Flow. Nature, 411, 581-583. http://dx.doi.org/10.1038/35079072 von Frisch, K. (1914) Der Farbensinn und Formensinn der Biene. Fischer, Jena. Dyer, A.G., Spaethe, J. and Prack, S. (2008) Comparative Psychophysics of Bumblebee and Honeybee Colour Discrimination and Object Detection. Journal of Comparative Physiology A, 194, 617-627. http:// dx.doi.org/10.1007/s00359-008-0335-1 Brandt, R., Rohlfing, T., Rybak, J., Krofczik, S., Maye, A., Westerhoff, M., Hege, H.C. and Menzel, R. (2005) ThreeDimensional AverageShape Atlas of the Honeybee Brain and Its Applications. The Journal of Comparative Neurology, 492, 1-19. http://dx.doi.org/10.1002/ cne.20644 Dyer, A.G., Paulk, A.C. and Reser, D.H. (2011) Colour Processing in Complex Environments: Insights from the Visual System of Bees. Proceedings of the Royal Society B: Biological Sciences, 278, 952959. Skorupski, P. and Chittka, L. (2010) Differences in Photoreceptor Processing Speed for Chromatic and Achromatic Vision in the Bumblebee, Bombust errestris. The Journal of Neuroscience, 30, 38963903. http://dx.doi.org/10.1523/JNEUROSCI.5700-09.2010 Giurfa, M., Zhang, S., Jenett, A., Menzel, R. and Srinivasan, M.V. (2001) The Concepts of “Sameness” and “Difference” in an Insect. Nature, 410, 930-933. http://dx.doi.org/10.1038/35073582 Dyer, A.G. and Griffiths, D.W. (2012) Seeing Near and Seeing Far;
82
35.
36.
37. 38.
39.
40.
41.
42.
43.
Machine Vision and Image Recognition
Behavioural Evidence for Dual Mechanisms of Pattern Vision in the Honeybee (Apis mellifera). The Journal of Experimental Biology, 215, 397-404. http://dx.doi.org/10.1242/jeb.060954 Avarguès-Weber, A., Portelli, G., Benard, J., Dyer, A. and Giurfa, M. (2010) Configural Processing Enables Discrimination and Categorization of Face-Like Stimuli in Honeybees. The Journal of Experimental Biology, 213, 593-601. http://dx.doi.org/10.1242/ jeb.039263 Dyer, A.G. and Vuong, Q.C. (2008) Insect Brains Use Image Interpolation Mechanisms to Recognize Rotated Objects. PLoS ONE, 3, e4086. http://dx.doi.org/10.1371/journal.pone.0004086 Chittka, L. and Niven, J. (2009) Are Bigger Brains Better? Current Biology, 19, R995-R1008. http://dx.doi.org/10.1016/j.cub.2009.08.023 Paulk, A.C., Dacks, A.M., Phillips-Portillo, J., Fellous, J.M. and Gronenberg, W. (2009) Visual Processing in the Central Bee Brain. The Journal of Neuroscience, 29, 9987-9999.http://dx.doi.org/10.1523/ JNEUROSCI.1325-09.2009 Niggebrügge, C., Leboulle, G., Menzel, R., Komischke, B. and de Ibarra, N.H. (2009) Fast Learning but Coarse Discrimination of Colors in Restrained Honeybees. Journal of Experimental Biology, 212, 13441350. http://dx.doi.org/10.1242/jeb.021881 Luu, T., Cheung, A., Ball, D. and Srinivasan, M.V. (2011) Honeybee Flight: A Novel “Streamlining” Response. The Journal of Experimental Biology, 214, 2215-2225. http://dx.doi.org/10.1242/jeb.050310 Paulk, A.C., Stacey, J.A., Pearson, T.W.J., Taylor, G.J., Moore, R.J.D., Srinivasan, M.V. and van Swinderen, B. (2014) Selective Attention in the Honeybee Optic Lobes Precedes Behavioral Choices. Proceedings of the National Academy of Sciences of the United States of America, 111, 5006-5011. http://dx.doi.org/10.1073/pnas.1323297111 Delbruck, T. (2008) Frame-Free Dynamic Digital Vision. Proceedings of International Symposium on Secure-Life Electronics Advanced Electronics for Quality Life and Society, Tokyo, 6-7 March 2008, 2126. Posch, C., Matolin, D., Wohlgenannt, R., Hofstätter, M., Schön, P., Litzenberger, M., Bauer, D. and Garn, H. (2010) Biomimetic FrameFree HDR Camera with Event-Driven PWM Image/Video Sensor and Full-Custom Address-Event Processor. 2010 IEEE Biomedical
Concept Learning in Neuromorphic Vision Systems: What Can We ....
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
83
Circuits and Systems Conference (BioCAS), Paphos, 3-5 November 2010, 254-257. Benosman, R., Ieng, S.H., Clercq, C., Bartolozzi, C. and Srinivasan, M. (2012) Asynchronous Frameless Event-Based Optical Flow. Neural Networks, 27, 32-37. http://dx.doi.org/10.1016/j.neunet.2011.11.001 Conte, D., Foggia, P., Sansone, C. and Vento, M. (2004) Thirty Years of Graph Matching in Pattern Recognition. International Journal of Pattern Recognition and Artificial Intelligence, 18, 265-298. http:// dx.doi.org/10.1142/S0218001404003228 Jia, Y., Abbott, J.T., Austerweil, J., Griffiths, T. and Darrell, T. (2013) Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z. and Weinberger, K., Eds., Advances in Neural Information Processing Systems, 26, 1842-1850. Harnad, S. (1990) The Symbol Grounding Problem. Physica D: Nonlinear Phenomena, 42, 335-346. http://dx.doi.org/10.1016/01672789(90)90087-6 Barsalou, L.W. (2008) Grounded Cognition. Annual Review of Psychology, 59, 617-645. http://dx.doi.org/10.1146/annurev. psych.59.103006.093639 Lake, B.M., Salakhutdinov, R. and Tenenbaum, J.B. (2012) Concept Learning as Motor Program Induction: A LargeScale Empirical Study. Proceedings of the 34th Annual Conference of the Cognitive Science Society, Sapporo, 1-4 August 2012, 659-664. Olshausen, B. and Field, D. (1996) Emergence of Simple-Cell Receptive Field Properties by Learning a Sparse Code for Natural Images. Nature, 381, 607-609. http://dx.doi.org/10.1038/381607a0 Feldman, J. (2003) The Simplicity Principle in Human Concept Learning. Current Directions in Psychological Science, 12, 227232. http://dx.doi.org/10.1046/j.0963-7214.2003.01267.x Gallant, S.I. and Okaywe, T.W. (2013) Representing Objects, Relations, and Sequences. Neural Computation, 25, 2038-2078. http:// dx.doi.org/10.1162/NECO_a_00467 Kanerva, P. (2009) Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors. Cognitive Computation, 1, 139-159. Gayler, R.W. (2003) Vector Symbolic Architectures Answer
84
55.
56. 57.
58.
59. 60.
61.
62.
63. 64.
65.
Machine Vision and Image Recognition
Jackendoff’s Challenges for Cognitive Neuroscience. Proceedings of the ICCS/ASCS International Conference on Cognitive Science, Sydney, 13-17 July 2003, 133-138. Eliasmith, C., Stewart, T.C., Choo, X., Bekolay, T., DeWolf, T., Tang, Y. and Rasmussen, D. (2012) A Large-Scale Model of the Functioning Brain. Science, 338, 1202-1205. http://dx.doi.org/10.1126/ science.1225266 Kanerva, P. (1988) Sparse Distributed Memory. The MIT Press, Cambridge. Abbott, J.T., Hamrick, J.B. and Grifths, T.L. (2013) Approximating Bayesian Inference with a Sparse Distributed Memory System. Proceedings of the 35th Annual Conference of the Cognitive Science Society, Berlin, 31 July -3 August 2013, 6. Neumann, J. (2002) Learning the Systematic Transformation of Holographic Reduced Representations. Cognitive Systems Research, 3, 227-235. http://dx.doi.org/10.1016/S1389-0417(01)00059-6 Curto, C. and Itskov, V. (2008) Cell Groups Reveal Structure of Stimulus Space. PLoS Computational Biology, 4, Article ID: e1000205. Kleyko, D., Lyamin, N., Osipov, E. and Riliskis, L. (2012) Dependable MAC Layer Architecture Based on Holographic Data Representation Using Hyper-Dimensional Binary Spatter Codes. Multiple Access Communications: 5th International Workshop, MACOM 2012, Springer, Berlin, 134-145. Jakimovski, P., Schmidtke, H.R., Sigg, S., Chaves, L.W.F. and Beigl, M. (2012) Collective Communication for Dense Sensing Environments. Journal of Ambient Intelligence and Smart Environments (JAISE), 4, 123-134. Hentrich, D., Oruklu, E. and Saniie, J. (2011) Polymorphic Computing: Definition, Trends, and a New Agent-Based Architecture. Circuits and Systems, 2, 358-364. Ramo, S. (1959) All about Polymorphics. https://archive.org/details/ AllAboutPolymorphics1959 Granacki, J.J. and Vahey, M.D. (2002) Monarch: A High Performance Embedded Processor Architecture with Two Native Computing Modes. Proceedings of High Performance Embedded Computing Workshop 2002, Lexington, 24-26 September 2002. Nasution, B.B. and Khan, A.I. (2008) A Hierarchical Graph Neuron
Concept Learning in Neuromorphic Vision Systems: What Can We ....
85
Scheme for Real-Time Pattern Recognition. IEEE Transactions on Neural Networks, 19, 212-229. http://dx.doi.org/10.1109/ TNN.2007.905857 66. Osipov, E., Khan, A. and Amin, A.M. (2014) Holographic Graph Neuron. Proceedings of the 2nd International Conference on Computer and Information Sciences (ICCOINS 2014), Kuala Lumpur, 3-5 June 2014.
SECTION II: MACHINE VISION TECHNIQUES IN PRODUCTION / MANUFACTURING PROCESSES
Chapter 5
An Automatic Assembling System for Sealing Rings Based on Machine Vision
Mingyu Gao, Xiao Li, Zhiwei He, and Yuxiang Yang
Department of Electronics and Information, Hangzhou Dianzi University, Hangzhou, China
ABSTRACT In order to grab and place the sealing rings of battery lid quickly and accurately, an automatic assembling system for sealing rings based on machine vision is developed in this paper. The whole system is composed of the light sources, cameras, industrial control units, and a 4-degree-offreedom industrial robot. Specifically, the sealing rings are recognized and located automatically with the machine vision module. Then industrial
Citation Mingyu Gao, Xiao Li, Zhiwei He, and Yuxiang Yang, “An Automatic Assembling System for Sealing Rings Based on Machine Vision,” Journal of Sensors, vol. 2017, Article ID 4207432, 12 pages, 2017. https://doi.org/10.1155/2017/4207432. Copyright © 2017 Mingyu Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
90
Machine Vision and Image Recognition
robot is controlled for grabbing the sealing rings dynamically under the joint work of multiple control units and visual feedback. Furthermore, the coordinates of the fast-moving battery lid are tracked by the machine vision module. Finally the sealing rings are placed on the sealing ports of battery lid accurately and automatically. Experimental results demonstrate that the proposed system can grab the sealing rings and place them on the sealing port of the fast-moving battery lid successfully. More importantly, the proposed system can improve the efficiency of the battery production line obviously.
INTRODUCTION Faced with the social reality of the aging population, high labor costs and low demographic dividend receive more and more attentions. The industrial reform of China has been urgent. In order to promote industrial upgrading of China, industrial robots have been used as a key development area. Hence, more and more companies are beginning to try to change their traditional manual assembly model by using industrial robots. Robot-based production is an essential part of the industrial manufacturing automation reform [1]. Traditional applications of industrial robots in industrial production are generally open-loop mechanisms, which are fixed trajectories and motions based on teaching and off-line programming [2, 3]. But, this kind of robot applications only can repeat the programmed motions. The trajectory and motion of the robots cannot achieve adaptive adjustments according to the processing objects and the working environments. Some specific industrial processing procedures cannot be replaced by these fixed track robots. Machine vision is a novel technology for recognizing objects, extracting and analyzing object information from digital images [4–6]. Thus, machine vision technology [7] has been widely applied in the field of industrial robots to improve the flexibility and adaptability of industrial robots [8, 9]. With the visual feedback, adaptive adjustments can be achieved according to the processing objects and the working environments [10]. In recent years, the robot systems combined with machine vision module are more and more used in various fields [11, 12]. By using an optic camera to measure the bead geometry (width and height), a real-time computer vision algorithm [13] is proposed to extract training patterns and enable an industrial robot to acquire and learn autonomously the welding skill. With the development of the robot technology, the domestic and foreign scholars have done a lot of research on the robots based on machine vision. A pneumatic manipulator based on vision positioning is developed in [14],
An Automatic Assembling System for Sealing Rings Based on ....
91
whose core is the visual calibration of static objects. A sorting technology of industrial robot based on machine vision is proposed by [15], and the recognition of the target contour is the key process of this sorting technology. An eye-in-hand robot system [16] is built through the camera at the end effector of robot. In battery production industry of China, labor intensity is very large and production processes are very specific. As shown in Figure 1(a), traditional battery manufacturers of China rely entirely on human to grab rubber rings and place the rubber rings on the sealing port of the battery lid. Traditional manual operations require lots of labor and time costs and the production efficiency is very low. Hence, the industrial robots based on machine vision are urgently needed to improve production efficiency. In [17] a noncalibration scanning method is proposed to locate and grab the sealing rings using a camera fixed at the end effector of robot. However, using such a noncalibration scanning method, the scanning range is small and the scanning speed is slow. With such a slow scanning method it is unable to place the sealing rings on the fast-moving battery lid. In this paper, a novel industrial robot assembling system for sealing rings based on machine vision is developed for the lead battery production line. Specifically, the sealing rings are recognized and located automatically with the machine vision module. Then industrial robot is controlled for grabbing the sealing rings dynamically under the joint work of multiple control units and visual feedback. Furthermore, the coordinates of the fast-moving battery lid are tracked by the machine vision module. Then the sealing ports on the battery lid are identified and located under the joint work of the contour recognition and fitting algorithms. Finally the sealing rings are placed on the sealing ports of battery lid automatically using the industrial robot. With the proposed system the sealing rings can be grabbed and placed on the sealing ports accurately and automatically. More importantly, the proposed system can improve the efficiency of the battery production line obviously.
92
Machine Vision and Image Recognition
Figure 1: (a) The traditional assembling process of battery lids. (b) The sealing rings and the battery lid.
The rest of the paper is organized as follows: The overview of the system is given in Section 2. The proposed target recognition and tracking algorithms are described in Section 3. In Section 4, experimental results are given to demonstrate the superiority of our system. Finally, conclusions are made in Section 5.
SYSTEM COMPOSITION AND DESIGN As shown in Figure 2, the system mainly consists of a 4-degree-of-freedom industrial robot, a grab-side visual processing module, a place-side visual processing module, and an air pump. The grab-side visual processing module includes a light source, a grab-side camera, and a grab-side processing unit, and the place-side visual processing module also includes a light source, a camera, and a processing unit. Specifically, the grab-side visual processing module recognizes and locates the coordinates of the sealing rings. The place-side visual processing module calibrates the coordinates of the fastmoving battery lid on the conveyor belt. And the coordinates of the sealing ring and battery lid are transferred to the 4-DOF robot controller through the serial ports. Then, the 4-DOF robot controller implements the motion and trajectory control to achieve grabbing and placing processes with an air pump. The concrete structure of the proposed system is described in Figure 3. The image of the real system is shown in the Figure 11.
An Automatic Assembling System for Sealing Rings Based on ....
93
Figure 2: System block diagram.
Figure 3: The compositions of the system. ①, ①: crosslink Ethernet cable; ①: grab-side control unit; ①: serial line; ①: robot controller; ①: place-side control unit; ①: robot control line; ①, ①: light source of camera; ①: placeside camera; ①: grab-side camera; ①: battery lid; ①: conveyor belt; ①: 4-degree robot; ①: air pump; ①: circuit board of air pump driver; ①: tray of sealing ring; ①: back view of sealing ring; ①: front view of sealing ring.
As shown in Figure 3, the transmission device is constructed to simulate the mode of the industrial production line and realize the high speed movement of the battery lid. Then place-side and grab-side cameras are fixed above the robot working area to collect the images of the different target regions (sealing ring region and battery lid region), respectively.
94
Machine Vision and Image Recognition
Furthermore, a novel parallel light source [18] is provided for each target region. The specific designs of the light source are shown in Figure 4(a); the designed light sources are hanged on both sides of the camera. The size of each light board is 250 × 250 mm, and the 200 LED lamps are evenly distributed on each light board. The place-side and grab-side vision ranges are shown in Figure 4(b). The regional ranges are 650 × 400 mm and 100 × 100 mm.
Figure 4: Mechanical design of the light sources: (a) camera side view; (b) camera vision range.
Task scheduling [19] of the proposed system is shown in Figure 5. The robot controller will ask the grab-side control unit for the grabbing coordinates. After receiving the inquiry, the grab-side control unit will process the real-time pictures transmitted by the corresponding camera immediately. Then the coordinates will be fed back to the robot controller. When receiving feedback coordinates, the controller will control the robot to move to the corresponding coordinates and open the pump drive via an internal GPIO. Then the sealing ring under the end of the robot will be drawn to complete the action of grabbing. After the action of grabbing, the robot controller will ask the place-side control unit for the placing coordinates. Then a novel dynamic target tracking algorithm is applied in the place-side control unit to obtain the coordinates of the fast-moving battery lid. Then the tracking coordinates will be feedback to the robot controller. When
An Automatic Assembling System for Sealing Rings Based on ....
95
receiving the coordinates, the controller will control the robot to move to the corresponding coordinates and turn off the pump drive. Then the sealing ring will be put down to complete the action of placing.
Figure 5: The diagram of the task scheduling.
ROBOT DYNAMIC TARGET TRACKING It can be seen that how to place the sealing rings on the sealing port of the fast-moving battery lid quickly and effectively is the key of the developed system. Firstly, a visual calibration technique should be applied to achieve precise positioning for the sealing ring and the sealing port on the battery lid. Then through accurate position feedback, we can adjust the robot trajectory and realize the coordinate tracking of the moving objects on the conveyor belt. Zhang calibration method [20] has the advantages of strong robustness and high accuracy and has been widely used in the various applications. Reference [21] proved that Zhang calibration method gives the most accurate model in the experiment. Thus, we adopt Zhang calibration method to complete the camera calibration in this paper.
Calibration Zhang’s calibration method is a flexible calibration method, based on the plane of the calibration method template. Images of the calibration plate are captured from different directions. And then, through the corresponding relation between each point feature of the calibrating board, the camera calibration can be completed. After the operation above, the matrix 𝑀 related to the camera’s internal structure can be calculated. The 𝑀 is called camera intrinsic parameters matrix defined by 𝑓𝑥, 𝑓𝑦, 𝑢0, V0. 𝑓𝑥, 𝑓𝑦 are expressed as the effective focal length of the camera in the 𝑥- and 𝑦-axes. The origin of the image pixel
96
Machine Vision and Image Recognition
coordinate is (𝑢0, V0). The relationship between the image pixel coordinate and the camera coordinate can be represented by a rotation matrix 𝑀 in the following formula: (1)
(2) As shown in formula (2), the relationship between the camera coordinate and the world coordinate can be represented by a rotation matrix 𝑅 and a translation vector 𝑇. 𝑅 is 3 × 3 unit orthogonal matrix and 𝑇 is translation vector. Since we only consider two-dimensional calibration, 𝑍𝑤 = 0. Furthermore, we set 𝑎, 𝑏 on behalf of(𝑢−𝑢0)/𝑓𝑥, (V− V0)/𝑓𝑦. Hence, combined with formula (1) and (2), 𝑌𝑤 and 𝑋𝑤 can be expressed by 𝑎, 𝑏, 𝑅, and 𝑇.
(3) Then, we also completed the transformation between the fixed coordinate system of camera and the fixed coordinate system of robot. In order to facilitate the calculation, the reference coordinate system is coincided with the world coordinate system when the external parameters are obtained. And the 𝑍𝑤-axis of the reference coordinate system is parallel to the 𝑍𝑏-axis of the robot coordinate system. Since the fixed 𝑍 value of the target in the robot coordinate system can be determined directly by teaching method, it is necessary to consider only the rotation and translation of the 𝑋 and 𝑌 directions in the coordinate system. A point 𝐴 is taken on the console casually, whose coordinates (𝐴𝑥𝑤, 𝐴𝑦𝑤) are measured directly in the world coordinate system (the reference coordinate system). At the same time, the coordinates of the origin 𝑂𝑤 in the reference coordinate system and the
An Automatic Assembling System for Sealing Rings Based on ....
97
point 𝐴 are obtained by the teaching method in the robot coordinate system and are (𝑂𝑥𝑏, 𝑂𝑦𝑏) and (𝐴𝑥𝑏, 𝐴𝑦𝑏), respectively. Figure 6(a) is the calculation matrix of the transformation between coordinate matrices.
Figure 6: The calculation matrix of the transformation between coordinate matrices.
As shown in Figure 6(b), the 𝑋𝑤-axis and the 𝑌𝑤-axis of the reference coordinate system are translated to coincide with the origin 𝑂𝑤, 𝑂𝑏 of the
two coordinates. At this time, the coordinates of point 𝐴 in the reference coordinate system remain unchanged, but the coordinates relative
to the robot coordinate system become . There is the relationship in formula (4). The angle 𝜃 between the 𝑋𝑤- axis of the reference coordinate system and the 𝑋𝑏-axis of the robot coordinate system is the rotation angle between the two coordinate systems. So, according to the triangle formula, 𝜃 can be obtained. And there is the derivation process:
(4)
98
Machine Vision and Image Recognition
(5) Therefore, we can obtain the corresponding coordinates (𝑥𝑏, 𝑦𝑏) of the point (𝑥𝑤, 𝑦𝑤) in the reference coordinate system with respect to the robot coordinate system. (6) After the calculation above, for the place-side camera, we can calculate the internal reference matrix 𝑀𝑝, the rotation matrix 𝑅𝑝, the translation matrix 𝑇𝑝, and the rotation angle 𝜃𝑝. Similarly, for the grab-side camera, the reference matrix 𝑀𝑔, rotation matrix 𝑅𝑔, translation matrix 𝑇𝑔, and the rotation angle 𝜃𝑔 are as follows:
An Automatic Assembling System for Sealing Rings Based on ....
99
(7)
Target Recognition Algorithms The recognition of the sealing ring and sealing port is another key problem. The front and back recognition of the sealing ring is achieved using the algorithm proposed in the literature [17]. Firstly, some parameters for the Hough transform should be set by a series of experiments, such as the dp, the min_dist, the param1, the param2, and the min_radius and max_radius. And the specific meaning and value of these parameters are shown in Table 1. Then the gray image is processed through the Hough transform with these parameters. And the circular appearance of the sealing ring will be found through the algorithm of voting in grab-side vision ranges. Table 1: The parameters of related algorithm
Secondly, because only the front-view sealing ring can be covered on the sealing port as shown in Figure 1(b), the front and back recognition of the sealing ring should be distinguished. As shown in Figure 7(a), there are some differences between the front and back of the sealing ring. The frontview sealing rings are brighter and relatively smooth while the back-view sealing rings are darker with numbers and letters on them. Image binarization has a large effect on the rest of the document image analysis processes in character recognition [22]. So the obvious difference between the front and back of the sealing ring can be found after the binarization with the threshold 𝜂1 set by observing the phenomenon of the experiment. As shown in Figure
100
Machine Vision and Image Recognition
7(b), there are almost no black pixels in the front binarization image, while there are a lot of black pixels in the back binarization image. Hence, the front and back recognition can be achieved by the number of black pixels in the image: (8) Here, 𝑅𝑥𝑦 is the identified circular area. And the coordinates of the center of the sealing ring are 𝑥 and 𝑦. (𝑥, 𝑦) is the number of black pixels within 𝑅𝑥𝑦. The results of recognition are shown in Figure 7(c). Then the circular feature of the sealing ring is extracted to obtain the center coordinates. After the front and back recognition of the sealing ring, the first identified front-view sealing ring will be garbed by the robot.
Figure 7: The front and back recognition: (a) the grab-side image, (b) the binary image, and (c) the results of recognition.
For the overlap and overturn of the sealing rings, the tray of the sealing ring will be improved in the future. The tray can be shaken to flip the sealing ring. Furthermore, there is a rod on the tray which can be rotated to limit the height of the sealing rings to avoid overlap. Different from the sealing rings, the battery lid is a dynamic target. Hence the place-side camera will take images of the battery lid at different positions on the conveyor belt, which results in different illumination conditions. However, the Hough transform algorithm [23, 24] is sensitive to the parameters and the illumination conditions. In this paper, a novel circle fitting algorithm is proposed to recognize the sealing port on the battery lid in this paper. The proposed circle fitting algorithm is as follows. In the beginning, the image of the battery lid of the conveyor belt is collected in real time under certain intensity of vertical illumination. Then
An Automatic Assembling System for Sealing Rings Based on ....
101
the binarization will be carried out with the threshold 𝜂2 set by observing the phenomenon of the experiment. (9) The parameter 𝑖 represents the 𝑖th row of the image plane and the parameter 𝑗 represents the 𝑗th column of the image plane. (𝑖, 𝑗) denotes gray value of the pixel at the 𝑖th row and 𝑗th column of the image plane. 0 is black gray value, and 255 is white gray value. Figure 8(a) shows the binarized images of the battery lid obliquely and laterally.
Figure 8: The contour extraction of the sealing port: (a) binary image; (b) contour searching; (c) contour selection.
Then, the edges of the binarized images are detected using the Canny operator [25], and the pixels of the boundary of the contour can be detected according to the differences among the pixels. And then all the inner contours can be extracted on the binarized images, which are the white regions surrounded by black pixels as shown in Figure 8(a). Of course, the results of edge detection and contour searching include not only the approximate contour of the sealing ports, but also other interference profiles of the object or background. Figure 8(b) shows the pictures with the interference profiles. And the areas enclosed by the red line are the extracted contours in the binarized image. For clearing the interference profiles of the appeal, a set of constraints will be given by setting the area range of the contour 𝑆min