Biometric ID Management and Multimodal Communication: Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid, Spain, September 16-18, 2009. Proceedings [1 ed.] 3642043909, 9783642043901

This book constitutes the research papers presented at the Joint 2101 & 2102 International Conference on Biometric I

295 77 7MB

English Pages 358 [370] Year 2009

Table of contents :
Front Matter....Pages -
Illumination Invariant Face Recognition by Non-Local Smoothing....Pages 1-8
Manifold Learning for Video-to-Video Face Recognition....Pages 9-16
MORPH: Development and Optimization of a Longitudinal Age Progression Database....Pages 17-24
Verification of Aging Faces Using Local Ternary Patterns and Q-Stack Classifier....Pages 25-32
Recognition of Emotional State in Polish Speech - Comparison between Human and Automatic Efficiency....Pages 33-40
Harmonic Model for Female Voice Emotional Synthesis....Pages 41-48
Anchor Model Fusion for Emotion Recognition in Speech....Pages 49-56
Audiovisual Alignment in a Face-to-Face Conversation Translation Framework....Pages 57-64
Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation....Pages 65-72
Visual Context Effects on the Perception of Musical Emotional Expressions....Pages 73-80
Eigenfeatures and Supervectors in Feature and Score Fusion for SVM Face and Speaker Verification....Pages 81-88
Facial Expression Recognition Using Two-Class Discriminant Features....Pages 89-96
A Study for the Self Similarity Smile Detection....Pages 97-104
Analysis of Head and Facial Gestures Using Facial Landmark Trajectories....Pages 105-113
Combining Audio and Video for Detection of Spontaneous Emotions....Pages 114-121
Face Recognition Using Wireframe Model Across Facial Expressions....Pages 122-129
Modeling Gait Using CPG (Central Pattern Generator) and Neural Network....Pages 130-137
Fusion of Movement Specific Human Identification Experts....Pages 138-145
CBIR over Multiple Projections of 3D Objects....Pages 146-153
Biometrics beyond the Visible Spectrum: Imaging Technologies and Applications....Pages 154-161
Confidence Partition and Hybrid Fusion in Multimodal Biometric Verification System....Pages 212-219
Multi-biometric Fusion for Driver Authentication on the Example of Speech and Face....Pages 220-227
Multi-modal Authentication Using Continuous Dynamic Programming....Pages 228-235
Biometric System Verification Close to “Real World” Conditions....Pages 236-243
Developing HEO Human Emotions Ontology....Pages 244-251
Common Sense Computing: From the Society of Mind to Digital Intuition and beyond....Pages 252-259
On Development of Inspection System for Biometric Passports Using Java....Pages 260-267
Handwritten Signature On-Card Matching Performance Testing....Pages 268-275
Classification Based Revocable Biometric Identity Code Generation....Pages 276-284
Vulnerability Assessment of Fingerprint Matching Based on Time Analysis....Pages 285-292
A Matching Algorithm Secure against the Wolf Attack in Biometric Authentication Systems....Pages 293-300
Formant Based Analysis of Spoken Arabic Vowels....Pages 162-169
Key Generation in a Voice Based Template Free Biometric Security System....Pages 170-177
Extending Match-On-Card to Local Biometric Identification....Pages 178-186
A New Fingerprint Matching Algorithm Based on Minimum Cost Function....Pages 187-191
A Region-Based Iris Feature Extraction Method Based on 2D-Wavelet Transform....Pages 301-307
A Novel Contourlet Based Online Fingerprint Identification....Pages 308-317
Invariant Fourier Descriptors Representation of Medieval Byzantine Neume Notation....Pages 192-199
Bio-Inspired Reference Level Assigned DTW for Person Identification Using Handwritten Signatures....Pages 200-206
Pressure Evaluation in On-Line and Off-Line Signatures....Pages 207-211
Fake Finger Detection Using the Fractional Fourier Transform....Pages 318-324
Comparison of Distance-Based Features for Hand Geometry Authentication....Pages 325-332
A Comparison of Three Kinds of DP Matching Schemes in Verifying Segmental Signatures....Pages 333-339
Ergodic HMM-UBM System for On-Line Signature Verification....Pages 340-347
Improving Identity Prediction in Signature-based Unimodal Systems Using Soft Biometrics....Pages 348-356
Back Matter....Pages -

Recommend Papers

Analytical and stochastic modeling techniques and applications 16th international conference, ASMTA 2009, Madrid, Spain, June 9-12, 2009: proceedings [1 ed.] 9783642022043, 3642022049

This book constitutes the refereed proceedings of the 16th International Conference on Analytical and Stochastic Modelin

324 89 6MB Read more

Algebra and Coalgebra in Computer Science: Third International Conference, CALCO 2009, Udine, Italy, September 7-10, 2009, Proceedings 3642037402, 9783642037405

This book constitutes the proceedings of the Third International Conference on Algebra and Coalgebra in Computer Science

443 9 6MB Read more

Scalable Uncertainty Management: Third International Conference, SUM 2009, Washington, DC, USA, September 28-30, 2009. Proceedings [1 ed.] 3642043879, 9783642043871

This volume contains the papers presented at the Third International Conference on Scalable Uncertainty Management, SUM

387 18 44KB Read more

Language and automata theory and applications third international conference, LATA 2009, Tarragona, Spain, April 2-8, 2009, proceedings [1 ed.] 9783642009815, 3642009816

This book constitutes the refereed proceedings of the Third International Conference on Language and Automata Theory and

438 14 10MB Read more

Software Reuse: Methods, Techniques, and Tools: 8th International Conference, ICSR 2004, Madrid, Spain, July 5-9, 2009. Proceedings [1 ed.] 3540223355, 9783540223351, 9783540277996

This book constitutes the refereed proceedings of the 8th International Conference on Software Reuse, ICSR-8, held in Ma

291 79 4MB Read more

Mobile Networks and Management: First International Conference, MONAMI 2009, Athens, Greece, October 13-14, 2009. Revised Selected Papers 9783642118166

Kurzbeschreibung This volume constitutes the refereed proceedings of the First International ICST Conference on Mobile N

418 16 3MB Read more

Haptic and Audio Interaction Design: 4th International Conference, HAID 2009 Dresden, Germany, September 10-11, 2009 Proceedings [1 ed.] 3642040756, 9783642040757

This book constitutes the refereed proceedings of the 4th International Workshop on Haptic and Audio Interaction Design,

358 2 4MB Read more

Information Systems, Technology and Management: Third International Conference, ICISTM 2009, Ghaziabad, India, March 12-13, 2009, Proceedings [1 ed.] 9783642004049, 3642004040

This book constitutes the refereed proceedings of the Third International Conference on Information Systems, Technology

393 46 6MB Read more

Comparative Genomics: International Workshop, RECOMB-CG 2009, Budapest, Hungary, September 27-29, 2009. Proceedings [1 ed.] 3642047432, 9783642047435

This book constitutes the proceedings of the 7th RECOMB International Satellite Workshop on Comparative Genomics, RECOMB

376 24 6MB Read more

Hybrid Artificial Intelligence Systems: 4th International Conference, HAIS 2009, Salamanca, Spain, June 10-12, 2009. Proceedings [1 ed.] 3642023185, 9783642023187

This volume constitutes the refereed proceedings of the 4th International Workshop on Hybrid Artificial Intelligence Sys

455 40 10MB Read more

Biometric ID Management and Multimodal Communication: Joint COST 2101 and 2102 International Conference, BioID_MultiComm 2009, Madrid, Spain, September 16-18, 2009. Proceedings [1 ed.]
3642043909, 9783642043901

Author / Uploaded
Vitomir à truc
Nikola Pavešić (auth.)
Julian Fierrez
Javier Ortega-Garcia
Anna Esposito
Andrzej Drygajlo
Marcos Faundez-Zanuy (eds.)

Similar Topics
Biology
Biophysics

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

5707

Julian Fierrez Javier Ortega-Garcia Anna Esposito Andrzej Drygajlo Marcos Faundez-Zanuy (Eds.)

Biometric ID Management and Multimodal Communication Joint COST 2101 and 2102 International Conference BioID_MultiComm 2009 Madrid, Spain, September 16-18, 2009 Proceedings

13

Volume Editors Julian Fierrez Javier Ortega-Garcia Universidad Autonoma de Madrid Escuela Politecnica Superior C/Francisco Tomas y Valiente 11, 28049 Madrid, Spain E-mail: {julian.fierrez;javier.ortega}@uam.es Anna Esposito Second University of Naples, and IIASS Caserta, Italy E-mail: [email protected] Andrzej Drygajlo EPFL, Speech Processing and Biometrics Group 1015 Lausanne, Switzerland E-mail: [email protected] Marcos Faundez-Zanuy Escola Universitària Politècnica de Mataró 08303 Mataro (Barcelona), Spain E-mail: [email protected]

Library of Congress Control Number: 2009934011 CR Subject Classification (1998): I.5, J.3, K.6.5, D.4.6, I.4.8, I.7.5, I.2.7 LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics ISSN ISBN-10 ISBN-13

0302-9743 3-642-04390-9 Springer Berlin Heidelberg New York 978-3-642-04390-1 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12752645 06/3180 543210

Preface

This volume contains the research papers presented at the Joint COST 2101 & 2102 International Conference on Biometric ID Management and Multimodal Communication, BioID_MultiComm 2009, hosted by the Biometric Recognition Group, ATVS, at the Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain, during September 16–18, 2009. BioID_MultiComm 2009 was a joint international conference organized cooperatively by COST Actions 2101 & 2102. COST 2101 Action focuses on “Biometrics for Identity Documents and Smart Cards (BIDS),” while COST 2102 Action is entitled “Cross-Modal Analysis of Verbal and Non-verbal Communication.” The aim of COST 2101 is to investigate novel technologies for unsupervised multimodal biometric authentication systems using a new generation of biometrics-enabled identity documents and smart cards. COST 2102 is devoted to developing an advanced acoustical, perceptual and psychological analysis of verbal and non-verbal communication signals originating in spontaneous face-to-face interaction, in order to identify algorithms and automatic procedures capable of recognizing human emotional states. While each Action supports its own individual topics, there are also strong links and shared interests between them. BioID_MultiComm 2009 therefore focused on both Action-specific and joint topics. These included, but we are not restricted to: physiological biometric traits (face, iris, fingerprint, hand); behavioral biometric modalities (speech, handwriting, gait) transparent biometrics and smart remote sensing; biometric vulnerabilities and liveness detection; data encryption for identity documents and smart cards; quality and reliability measures in biometrics; multibiometric templates for next generation ID documents; operational scenarios and large-scale biometric ID management; standards and privacy issues for biometrics; multibiometric databases; human factors and behavioral patterns; interactive and unsupervised multimodal systems; analysis of verbal and non-verbal communication signals; cross modal analysis of audio and video; spontaneous face-to-face interaction; advanced acoustical and perceptual signal processing; audiovisual data encoding; fusion of visual and audio signals for recognition and synthesis; identification of human emotional states; gesture, speech and facial expression analysis and recognition; implementation of intelligent avatars; annotation of extended MPEG7 standard; human behavior and unsupervised interactive interfaces; and cultural and socio-cultural variability. We sincerely thank all the authors who submitted their work for consideration. We also thank the Scientific Committee members for their great effort and high-quality work in the review process. In addition to the papers included in the present volume, the conference program also included three keynote speeches from outstanding researchers: Prof. Anil K. Jain (Michigan State University, USA), Prof. Simon Haykin (McMaster University, Canada) and Dr. Janet Slifka (Harvard – MIT, USA). We sincerely thank them for accepting the invitation to give their talks.

VI

Preface

The conference organization was the result of a team effort. We are grateful to the Advisory Board for their support at every stage of the conference organization. We also thank all the members of the Local Organizing Committee, in particular Pedro Tome-Gonzalez for the website management, Miriam Moreno-Moreno for supervising the registration process, and Almudena Gilperez and Maria Puertas-Calvo for taking care of the social program. Finally, we gratefully acknowledge the material and financial support provided by the Escuela Politécnica Superior and the Universidad Autónoma de Madrid.

August 2009

Javier Ortega-Garcia Julian Fierrez

Organization General Chair Javier Ortega-Garcia

Universidad Autonoma de Madrid, Spain

Conference Co-chair Joaquin Gonzalez-Rodriguez

Universidad Autonoma de Madrid, Spain

Advisory Board Anna Esposito Andrzej Drygajlo Marcos Faundez Mike Fairhurst Amir Hussain Niels-Christian Juul

Second University of Napoles, Italy EPFL, Switzerland Escuela Universitaria Politécnica de Mataró, Spain University of Kent, UK University of Stirling, UK University of Roskilde, Denmark

Program Chair Julian Fierrez

Universidad Autonoma de Madrid, Spain

Scientific Committee Akarun, L., Turkey Alba-Castro, J.-L., Spain Almeida Pavia, A., Portugal Alonso-Fernandez, F., Spain Ariyaeeinia, A., UK Bailly, G., France Bernsen, N.-O., Denmark Bourbakis, N., USA Bowyer, K. W., USA Campbell, N., Japan Campisi, P., Italy Cerekovic, A., Croatia Chetouani, M., France Chollet, G., France Cizmar, A., Slovak Rep. Delic, V., Serbia Delvaux, N., France Dittman, J., Germany Dorizzi, B., France Dutoit, T., Belgium Dybkjar, L., Denmark

El-Bahrawy, A., Egypt Erzin, E., Turkey Fagel, S., Germany Furui, S., Japan Garcia-Mateo, C., Spain Gluhchev, G., Bulgaria Govindaraju, V., USA Granstrom, B., Sweden Grinberg, M., Bulgaria Harte, N., Ireland Kendon, A., USA Hernaez, I., Spain Hernando, J., Spain Hess, W., Germany Hoffmann, R., Germany Keus, K., Germany Kim, H., Korea Kittler, J., UK Koreman, J., Norway Kotropoulos, C., Greece Kounoudes, A., Cyprus

VIII

Organization

Krauss, R., USA Kryszczuk, K., Switzerland Laminen, H., Finland Laouris, Y., Cyprus Lindberg, B., Denmark Lopez-Cozar, R., Spain Majewski, W., Poland Makris, P., Cyprus Matsumoto, D., USA Mihaylova, K., Bulgaria Moeslund, T.-B., Denmark Murphy, P., Ireland Neubarth, F., Austria Nijholt, A., The Netherlands Pandzic, I., Croatia Papageorgiou, H., Greece Pavesic, N., Slovenia Pelachaud, C., France Pfitzinger, H., Germany Piazza, F., Italy Pitas, I., Greece Pribilova, A., Slovak Rep. Pucher, M., Austria Puniene, J., Lithuania Raiha, K.-J., Finland Ramos, D., Spain Ramseyer, F., Switzerland Ratha, N., USA Ribaric, S., Croatia Richiardi, J., Switzerland Rojc, M., Slovenia

Rudzionis, A., Lithuania Rusko, M., Slovak Rep. Ruttkay, Z., Hungary Sankur, B., Turkey Schoentgen, J., Belgium Schouten, B., Netherlands Sigüenza, J.-A., Spain Smekal, Z., Czech Rep. Staroniewicz, P., Poland Tao, J., China Tekalp, A.-M., Turkey Thorisson, K.-R., Iceland Tistarelli, M., Italy Toh, K.-A., Korea Toledano, D. T., Spain Tome-Gonzalez, P., Spain Trancoso, I., Portugal Tsapatsoulis, N., Cyprus Tschacher, W., Switzerland v. d. Heuvel, H., The Netherlands Veldhuis, The Netherlands Vich, R., Czech Republic Vicsi, K., Hungary Vielhauer, C., Germany Vilhjalmsson, H., Iceland Vogel, C., Ireland Wilks, Y., UK Yegnanarayana, B., India Zganec Gros, J., Slovenia Zhang, D., Hong Kong Zoric, G., Croatia

Local Organizing Committee (from the Universidad Autonoma de Madrid, Spain) Javier Galbally Pedro Tome-Gonzalez Manuel R. Freire Marcos Martinez-Diaz Miriam Moreno-Moreno Javier Gonzalez-Dominguez Ignacio Lopez-Moreno Javier Franco Alicia Beisner Javier Burgues Ruben F. Sevilla-Garcia Almudena Gilperez Maria Puertas-Calvo

Table of Contents

Face Processing and Recognition Illumination Invariant Face Recognition by Non-local Smoothing . . . . . . . ˇ Vitomir Struc and Nikola Paveˇsi´c

1

Manifold Learning for Video-to-Video Face Recognition . . . . . . . . . . . . . . . Abdenour Hadid and Matti Pietik¨ ainen

9

MORPH: Development and Optimization of a Longitudinal Age Progression Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Allen W. Rawls and Karl Ricanek Jr.

17

Veriﬁcation of Aging Faces Using Local Ternary Patterns and Q-Stack Classiﬁer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrzej Drygajlo, Weifeng Li, and Kewei Zhu

25

Voice Analysis and Modeling Recognition of Emotional State in Polish Speech - Comparison between Human and Automatic Eﬃciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Piotr Staroniewicz

33

Harmonic Model for Female Voice Emotional Synthesis . . . . . . . . . . . . . . . Anna Pˇribilov´ a and Jiˇr´ı Pˇribil

41

Anchor Model Fusion for Emotion Recognition in Speech . . . . . . . . . . . . . Carlos Ortego-Resa, Ignacio Lopez-Moreno, Daniel Ramos, and Joaquin Gonzalez-Rodriguez

49

Multimodal Interaction Audiovisual Alignment in a Face-to-Face Conversation Translation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˇ Jerneja Zganec Gros and Aleˇs Miheliˇc Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrew Abel, Amir Hussain, Quoc-Dinh Nguyen, Fabien Ringeval, Mohamed Chetouani, and Maurice Milgram Visual Context Eﬀects on the Perception of Musical Emotional Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anna Esposito, Domenico Carbone, and Maria Teresa Riviello

57

65

73

X

Table of Contents

Eigenfeatures and Supervectors in Feature and Score Fusion for SVM Face and Speaker Veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pascual Ejarque, Javier Hernado, David Hernando, and David G´ omez

81

Face and Expression Recognition Facial Expression Recognition Using Two-Class Discriminant Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marios Kyperountas and Ioannis Pitas A Study for the Self Similarity Smile Detection . . . . . . . . . . . . . . . . . . . . . . David Freire, Luis Ant´ on, and Modesto Castrill´ on Analysis of Head and Facial Gestures Using Facial Landmark Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hatice Cinar Akakin and Bulent Sankur

89 97

105

Combining Audio and Video for Detection of Spontaneous Emotions . . . ˇ ˇ Rok Gajˇsek, Vitomir Struc, Simon Dobriˇsek, Janez Zibert, France Miheliˇc, and Nikola Paveˇsi´c

114

Face Recognition Using Wireframe Model Across Facial Expressions . . . . Zahid Riaz, Christoph Mayer, Michael Beetz, and Bernd Radig

122

Body and Gait Recognition Modeling Gait Using CPG (Central Pattern Generator) and Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arabneydi Jalal, Moshiri Behzad, and Bahrami Fariba

130

Fusion of Movement Speciﬁc Human Identiﬁcation Experts . . . . . . . . . . . . Nikolaos Gkalelis, Anastasios Tefas, and Ioannis Pitas

138

CBIR over Multiple Projections of 3D Objects . . . . . . . . . . . . . . . . . . . . . . . Dimo Dimov, Nadezhda Zlateva, and Alexander Marinov

146

Biometrics beyond the Visible Spectrum: Imaging Technologies and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miriam Moreno-Moreno, Julian Fierrez, and Javier Ortega-Garcia

154

Poster Session Voice Analysis and Speaker Veriﬁcation Formant Based Analysis of Spoken Arabic Vowels . . . . . . . . . . . . . . . . . . . . Yousef Ajami Alotaibi and Amir Husain

162

Table of Contents

Key Generation in a Voice Based Template Free Biometric Security System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joshua A. Atah and Gareth Howells

XI

170

Fingerprint Biometrics Extending Match-On-Card to Local Biometric Identiﬁcation . . . . . . . . . . . Julien Bringer, Herv´e Chabanne, Tom A.M. Kevenaar, and Bruno Kindarji A New Fingerprint Matching Algorithm Based on Minimum Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Andr´es I. Avila and Adrialy Muci

178

187

Handwriting Analysis and Signature Veriﬁcation Invariant Fourier Descriptors Representation of Medieval Byzantine Neume Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimo Dimov and Lasko Laskov

192

Bio-Inspired Reference Level Assigned DTW for Person Identiﬁcation Using Handwritten Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muzaffar Bashir and J¨ urgen Kempf

200

Pressure Evaluation in On-Line and Oﬀ-Line Signatures . . . . . . . . . . . . . . Desislava Dimitrova and Georgi Gluhchev

207

Multimodal Biometrics Conﬁdence Partition and Hybrid Fusion in Multimodal Biometric Veriﬁcation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chaw Chia, Nasser Sherkat, and Lars Nolle Multi-biometric Fusion for Driver Authentication on the Example of Speech and Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tobias Scheidat, Michael Biermann, Jana Dittmann, Claus Vielhauer, and Karl K¨ ummel Multi-modal Authentication Using Continuous Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K.R. Radhika, S.V. Sheela, M.K. Venkatesha, and G.N. Sekhar

212

220

228

Biometric Systems and Knowledge Discovery Biometric System Veriﬁcation Close to “Real World” Conditions . . . . . . . ´ Aythami Morales, Miguel Angel Ferrer, Marcos Faundez, Joan F` abregas, Guillermo Gonzalez, Javier Garrido, Ricardo Ribalda, Javier Ortega, and Manuel Freire

236

XII

Table of Contents

Developing HEO Human Emotions Ontology . . . . . . . . . . . . . . . . . . . . . . . . Marco Grassi Common Sense Computing: From the Society of Mind to Digital Intuition and beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erik Cambria, Amir Hussain, Catherine Havasi, and Chris Eckl

244

252

Biometric Systems and Security On Development of Inspection System for Biometric Passports Using Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luis Ter´ an and Andrzej Drygajlo

260

Handwritten Signature On-Card Matching Performance Testing . . . . . . . . Olaf Henniger and Sascha M¨ uller

268

Classiﬁcation Based Revocable Biometric Identity Code Generation . . . . Alper Kanak and Ibrahim So˜gukpinar

276

Vulnerability Assessment of Fingerprint Matching Based on Time Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier Galbally, Sara Carballo, Julian Fierrez, and Javier Ortega-Garcia A Matching Algorithm Secure against the Wolf Attack in Biometric Authentication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshihiro Kojima, Rie Shigetomi, Manabu Inuma, Akira Otsuka, and Hideki Imai

285

293

Iris, Fingerprint and Hand Recognition A Region-Based Iris Feature Extraction Method Based on 2D-Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nima Tajbakhsh, Khashayar Misaghian, and Naghmeh Mohammadi Bandari

301

A Novel Contourlet Based Online Fingerprint Identiﬁcation . . . . . . . . . . . Omer Saeed, Atif Bin Mansoor, and M Asif Afzal Butt

308

Fake Finger Detection Using the Fractional Fourier Transform . . . . . . . . . Hyun-suk Lee, Hyun-ju Maeng, and You-suk Bae

318

Comparison of Distance-Based Features for Hand Geometry Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier Burgues, Julian Fierrez, Daniel Ramos, and Javier Ortega-Garcia

325

Table of Contents

XIII

Signature Veriﬁcation A Comparison of Three Kinds of DP Matching Schemes in Verifying Segmental Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seiichiro Hangai, Tomoaki Sano, and Takahiro Yoshida Ergodic HMM-UBM System for On-Line Signature Veriﬁcation . . . . . . . . Enrique Argones R´ ua, David P´erez-Pi˜ nar L´ opez, and Jos´e Luis Alba Castro

333 340

Improving Identity Prediction in Signature-based Unimodal Systems Using Soft Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M´ arjory Abreu and Michael Fairhurst

348

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

357

Illumination Invariant Face Recognition by Non-Local Smoothing ˇ Vitomir Struc and Nikola Paveˇsi´c Faculty of Electrical Engineering, University of Ljubljana, Trˇzaˇska 25, SI-1000 Ljubljana, Slovenia {vitomir.struc,nikola.pavesic}@fe.uni-lj.si.com http://luks.fe.uni-lj.si/

Abstract. Existing face recognition techniques struggle with their performance when identities have to be determined (recognized) based on image data captured under challenging illumination conditions. To overcome the susceptibility of the existing techniques to illumination variations numerous normalization techniques have been proposed in the literature. These normalization techniques, however, still exhibit some shortcomings and, thus, oﬀer room for improvement. In this paper we identify the most important weaknesses of the commonly adopted illumination normalization techniques and presents two novel approaches which make use of the recently proposed non-local means algorithm. We assess the performance of the proposed techniques on the YaleB face database and report preliminary results. Keywords: Face recognition, retinex theory, non-local means, illumination invariance.

1

Introduction

The performance of current face recognition technology with image data captured in controlled conditions has reached a level which allows for its deployment in a wide variety of applications. These applications typically ensure controlled conditions for the image acquisition procedure and, hence, minimize the variability in the appearance of diﬀerent (facial) images of a given individual. However, when employed on facial images captured in uncontrolled and unconstrained environments the majority of existing face recognition techniques still exhibits a significant drop in their recognition performance. The reason for the deterioration in the recognition (or verification) rates can be found in the appearance variations induced by various environmental factors, among which illumination is undoubtedly one of the most important. The importance of illumination was highlighted in several empirical studies where it was shown that the illumination induced variability in facial images is often larger than the variability induced to the facial images by the individual’s identity [1]. Due to this susceptibility, numerous techniques have been proposed in J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 1–8, 2009. c Springer-Verlag Berlin Heidelberg 2009

2

ˇ V. Struc and N. Paveˇsi´c

the literature to cope with the problem of illumination. These techniques try to tackle the illumination induced appearance variations at one of the following three levels: (i) at the pre-processing level, (ii) at the feature extraction level, and (iii) at the modeling or/and classification level. While techniques from the latter two levels represent valid eﬀorts in solving the problem of illumination invariant face recognition, techniques operating at the pre-processing level exhibit some important advantages which make them a preferred choice when devising robust face recognition systems. One of their most essential advantages lies in the fact that they make no assumptions regarding the size and characteristics of the training set while oﬀering a computationally simple and simultaneously eﬀective way of achieving illumination invariant face recognition. Examples of normalization techniques operating at the pre-processing level1 include the single and multi scale retinex algorithms [2],[3], the self quotient image [4], anisotropic smoothing [5], etc. All of these techniques share a common theoretical foundation and exhibit some strengths as well as some weaknesses. In this paper we identify (in our opinion) the most important weaknesses of the existing normalization techniques and propose two novel techniques which try to overcome them. We assess the proposed techniques on the YaleB database and present encouraging preliminary results. The rest of the paper is organized as follows. In Section 2 the theory underlying the majority of photometric normalization techniques is briefly reviewed and some weakness of existing techniques are pointed out. The novel normalization techniques are presented in Section 3 and experimentally evaluated in Section 4. The paper concludes with some final comments in Section 5.

2

Background and Related Work

The theoretical foundation of the majority of existing photometric normalization techniques can be linked to the Retinex theory developed and presented by Land and McCann in [6]. The theory tries to explain the basic principles governing the process of image formation and/or scene perception and states that an image I(x, y) can be modeled as the product of the reflectance R(x, y) and luminance L(x, y) functions: I(x, y) = R(x, y)L(x, y). (1) Here, the reflectance R(x, y) relates to the characteristics of the objects comprising the scene of an image and is dependant on the reflectivity (or albedo) of the scenes surfaces [7], while the luminance L(x, y) is determined by the illumination source and relates to the amount of illumination falling on the observed scene. Since the reflectance R(x, y) relates solely to the objects in an image, it is obvious that (when successfully estimated) it acts as an illumination invariant representation of the input image. Unfortunately, estimating the reflectance from 1

We will refer to these techniques as photometric normalization techniques in the remainder of this paper.

Illumination Invariant Face Recognition by Non-Local Smoothing

3

the expression defined by (1) represents an ill-posed problem, i.e., it is impossible to compute the reflectance unless some assumptions regarding the nature of the illumination induced appearance variations are made. To this end, researchers introduced various assumptions regarding the luminance and reflectance functions, the most common, however, are that the luminance part of the model in (1) varies slowly with the spatial position and, hence, represents a low-frequency phenomenon, while the reflectance part represents a high-frequency phenomenon. To determine the reflectance of an image, and thus, to obtain an illumination invariant image representation, the luminance L(x, y) of an image is commonly estimated first. This estimate L(x, y) is then exploited to compute the reflectance via the manipulation of the image model given by the expression (1), i.e.: ln R(x, y) = ln I(x, y) − ln L(x, y) or R(x, y) = I(x, y)/L(x, y),

(2)

where the right hand side equation of (2) denotes an element-wise division of the input image I(x, y) with the estimated luminance L(x, y). We will refer to the reflectance computed with the left hand side equation of (2) as the logarithmic reflectance and to the reflectance computed with the right hand side equation of (2) as the quotient reflectance in the rest of this paper. As already emphasized, the luminance is considered to vary slowly with the spatial position [8] and can, therefore, be estimated as a smoothed version of the original image I(x, y). Various smoothing filters and smoothing techniques have been proposed in the literature resulting in diﬀerent photometric normalization procedures that were successfully applied to the problem of face recognition under severe illumination changes. The single scale retinex algorithm [2], for example, computes the estimate of the luminance function L(x, y) by simply smoothing the input image I(x, y) with a Gaussian smoothing filter. The illumination invariant image representation is then computed using the expression for the logarithmic reflectance. While such an approach generally produces good results with a properly selected Gaussian, its broader use in robust face recognition systems is still limited by an important weakness: at large illumination discontinuities caused by strong shadows that are casted over the face halo eﬀects are often visible in the computed reflectance [8]. To avoid this problem the authors of the algorithm extended their normalization technique to a multi scale form [3], where Gaussians with diﬀerent widths are used and basically outputs of diﬀerent implementations of the single scale retinex algorithm are combined to compute the final illumination invariant face representation. Another solution to the problem of halo eﬀects was presented by Wang et al. [4] in form of the self quotient image technique. Here, the authors approach the problem of luminance estimation by introducing an anisotropic smoothing filter. Once the anisotropic smoothing operation produces an estimate of the luminance L(x, y), the quotient reflectance R(x, y) is computed in accordance with the right hand side equation of (2). However, due to the anisotropic nature of the employed smoothing filter flat zones in the images are not smoothed properly.

ˇ V. Struc and N. Paveˇsi´c

4

Gross and Brajovic [5] presented a solution to the problem of reliable luminance estimation by adopting an anisotropic diﬀusion based smoothing technique. In their method the amount of smoothing at each pixel location is controlled by the images local contrast. Adopting the local contrast as means to control the smoothing process results in flat image regions being smoothed properly while still preserving image edges and, thus, avoiding halo eﬀects. Despite the success of the normalization technique in eﬀectively determining the quotient reflectance, one could still voice some misgivings. An known issue with anisotropic diﬀusion based smoothing is that it smoothes the image only in the direction orthogonal to the images gradient [9]. Thus, it eﬀectively preserves only straight edges, but struggles at edge points with high curvature (e.g., at corners). In these situations an approach that better preserves edges would be preferable. To this end, we present in the next section two novel algorithms which make use of the recently proposed non-local means algorithm.

3

Non-Local Means for Luminance Estimation

3.1 The Non-Local Means Algorithm The non-local means (NL means) algorithm [9] is a recently proposed image denoising technique, which, unlike existing denoising methods, considers pixel values from the entire image for the task of noise reduction. The algorithm is based on the fact that for every small window of the image several similar windows can be found in the image as well, and, moreover, that all of these windows can be exploited to denoise the image. Let us denote an image contaminated with noise as In (x) ∈ Ra×b , where a and b are image dimensions in pixels, and let x stand for an arbitrary pixel location x = (x, y) within the noisy image. The NL means algorithm constructs the denoised image Id (x) by computing each pixel value of Id (x) as a weighted average of pixels comprising In (x), i.e. [9]: Id (x) =

w(z, x)In (x),

(3)

x∈In (x)

where w(z, x) represents the weighting function that measures the similarity between the local neighborhoods of the pixel at the spatial locations z and x. Here, the weighting function is defined as follows: w(z, x) =

2 ˙ Gσ I n (Ωx )−In (Ωz )2 1 h2 e− and Z(z) = Z(z)

e−

2 ˙ Gσ I n (Ωx )−In (Ωz )2 h2

.

x∈In (x)

(4) In the above expressions Gσ denotes a Gaussian kernel with the standard deviation σ, Ωx and Ωz denote the local neighborhoods of the pixels at the locations x and z, respectively, h stands for the parameter that controls the decay of the exponential function, and Z(z) represents a normalizing factor.

Illumination Invariant Face Recognition by Non-Local Smoothing

5

Fig. 1. The principle of the NL means algorithm: an input image (left), similar and dissimilar image windows (right)

From the presented equations it is clear that if the local neighborhoods of a given pair of pixel locations z and x display a high degree of similarity, the pixels at zand x will be assigned relatively large weights when computing their denoised estimates. Some examples of image windows used by the algorithm are presented in Fig. 1. Here, similar image windows are marked white, while dissimilar image windows are marked black. When computing the denoised value of the center pixel of each of the white windowed image regions, center pixels of the similar windows will be assigned relatively large weights, the center pixels of the dissimilar windows, on the other hand, will be assigned relatively low weights. Whit a proper selection of the decay parameter h, the presented algorithm results in a smoothed image whit preserved edges. Hence, it can be used to estimate the luminance of an input image and, consequently, to compute the (logarithmic) reflectance. An example of the deployment of the NL means algorithm (for a 5×5 local neighborhood and h = 10) for estimation of the logarithmic reflectance is shown in Fig. 2 (left triplet).

Fig. 2. Two sample images processed with the NL means (left triplet) and adaptive NL means (right triplet) algorithms. Order of images in each triplet (from left to right): the input image, the estimated luminance, the logarithmic reflectance.

3.2

The Adaptive Non-Local Means Algorithm

The NL means algorithm assigns diﬀerent weights to each of the pixel values in the noisy image In (x) when estimating the denoised image Id (x). As we have shown in the previous section, this weight assignment is based on the similarity of the local neighborhoods of arbitrary pixel pairs and is controlled by the decay parameter h. Large values of h result in a slow decay of the Gaussian weighted Euclidian distance2 and, hence, more neighborhoods are considered similar and are assigned 2

Recall that the Euclidian distance serves as the similarity measure between two local neighborhoods.

6

ˇ V. Struc and N. Paveˇsi´c

relatively large weights. Small values of h, on the other hand, result in a fast decay of the Euclidian similarity measure and consequently only a small number of pixels is assigned a large weight for the estimation of the denoised pixel values. Rather than using the original NL means algorithm for estimation of the luminance of an image, we propose in this paper to exploit an adaptive version of the algorithm, where the decay parameter h is a function of local contrast and not a fixed and preselected value. At regions of low contrast, which represent homogeneous areas, the image should be smoothed more (i.e., more pixels should be considered for the estimation of the denoised pixel value), while in regions of high contrast the image should be smoothed less, (i.e., less pixels should be considered for the estimation of the denoised pixel value). Following the work of Gross and Brajovic [5], we define the local contrast between neighboring pixel locations a and b as: ρa,b =| In (a) − In (b) | / | In (a) + In (b) |. Assuming that a is an arbitrary pixel location within In (x) and b stands for a neighboring pixel location above, below, left or right from a, we can construct four contrast images encoding the local contrast in one of the possible four directions. The final contrast image Ic (x) is ultimately computed as the average of the four (directional) contrast images. To link the decay parameter h to the contrast image we first compute the logarithm of the inverse of the (8bit grey-scale) contrast image Iic (x) = log[1/Ic (x)], where 1 denotes a matrix of all ones and the operator ”/” stands for the element-wise division. Next, we linearly map the values of our inverted contrast image Iic (x) to values of the decay parameter h, which now becomes a function of the spatial location: h(x) = [(Iic (x) − Iicmin )/(Iicmax − Iicmin )] ∗ hmax + hmin , where Iicmax and Iicmin denote the maximum and minimum value of the inverted contrast image Iic (x), respectively, and hmax and hmin stand for the target maximum and minimum values of the decay parameter h. An example of the deployment of the presented algorithm is shown in Fig. 2 (right triplet).

4

Experiments

To assess the presented two photometric normalization techniques we made use of the YaleB face database [10]. The database contains images of ten distinct subjects each photographed under 576 diﬀerent viewing conditions (9 poses 64 illumination conditions). Thus, a total of 5760 images is featured in the database. However, as we are interested only in testing our photometric normalization techniques, we make use of a subset of 640 images with frontal pose in our experiments. We partition the 640 images into five image set according to the extremity in illumination under which they were taken and employ the first image set for training and the remaining ones for testing. In the experiments we use principal component analysis as the feature extraction technique and the nearest neighbor (to the mean) classifier in conjunction with the cosine similarity measure as the classifier. The number of features is set to its maximal value in all experiments. In our first series of recognition experiments we assess the performance of the NL means (NLM) and adaptive NL means (ANL) algorithms for varying values of their

Illumination Invariant Face Recognition by Non-Local Smoothing

7

Table 1. The rank one recognition rates (in %) for the NLM and ANL algorithms Algorithm Parameter value Image set no. 2 Image set no. 3 Image set no. 4 Image set no. 5 Average

ANL - parameter hmax 40 80 120 160 200 100.0 100.0 100.0 100.0 100.0 98.3 100.0 100.0 100.0 100.0 90.7 94.3 94.3 92.1 92.9 87.9 97.4 92.6 84.7 82.1 94.2 97.9 96.7 94.2 93.8

NLM - parameter h 10 30 60 120 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 91.4 95.0 97.1 95.7 96.3 99.5 92.6 85.3 96.9 98.6 97.4 95.3

parameters, i.e., the decay parameter h for the NLM algorithm and hmax for the ANL algorithm. It has to be noted that the parameter hmin of the ANL algorithm was fixed at the value of hmin = 0.01 and the local neighborhood of 5 × 5 pixels was chosen for the NLM and ANL algorithms in all experiments. The results of the experiments in terms of the rank one recognition rates for the individual image sets as well as its average value over the entire database are presented in Table 1. We can see that the best performing implementations of the NLM and ANL algorithm feature parameter values of h = 30 and hmax = 80, respectively. In our second series of recognition experiments we compare the performance of the two proposed algorithms (for h = 30 and hmax = 80) and several popular photometric normalization techniques. Specifically, the following techniques were implemented for comparison: the logarithm transform (LN), histogram equalization (HQ), the single scale retinex (SR) technique and the adaptive retinex normalization approach (AR) presented in [8]. For baseline comparisons, experiments on unprocessed grey scale images (GR) are conducted as well. It should be noted that the presented recognition rates are only indicative of the general performance of the tested techniques, as the YaleB database represent a rather small database, where it is possible to easily devise a normalization technique that eﬀectively discriminates among diﬀerent images of the small number of subjects. Several techniques were presented in the literature that normalize the facial images by extremely compressing the dynamic range of the images, resulting in the suppression of most of the images variability, albeit induced by illumination or the subjects identity. The question of how to scale up these techniques for use with larger numbers of subjects, however, still remains unanswered. To get an impression of the scalability of the tested techniques we present also recognition rates obtained with the estimated logarithmic luminance functions (where applicable). These results provide an estimate of how much of the useful information was removed from the facial image during the normalization. For the experiments with the logarithmic luminance functions logarithm transformed images from the first image set were employed for training. The presented results show the competitiveness of the proposed techniques. Similar to the best performing AR technique, they achieve an average recognition rate of approximately 98%, but remove less of the useful information as shown by the results obtained on the luminance estimates. The results suggest that the proposed normalization techniques will perform well on larger databases as well.

8

ˇ V. Struc and N. Paveˇsi´c

Table 2. Comparison of the rank one recognition rates (in %) for various algorithms Representation Image sets GR HQ LN SR AR NLM ANL

5

No. 2 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Normalized image No. 3 No. 4 No. 5 100.0 57.9 16.3 100.0 58.6 60.0 98.3 58.6 52.6 100.0 92.1 84.2 100.0 97.1 98.4 100.0 95.0 99.5 100.0 94.3 97.4

Avg. 68.6 79.7 77.4 94.1 98.9 98.6 97.9

Log. luminance No. 2 No. 3 No. 4 n/a n/a n/a n/a n/a n/a n/a n/a n/a 100.0 90.8 46.4 100.0 95.0 49.3 100.0 86.7 39.3 100.0 65.8 36.4

ln L(x, y) No. 5 Avg. n/a n/a n/a n/a n/a n/a 41.1 69.6 44.3 72.1 26.3 63.1 26.8 57.3

Conclusion and Future Work

In this paper we have presented two novel image normalization techniques, which try to compensate for the illumination induced appearance variations of facial images at the preprocessing level. The feasibility of the presented techniques was successfully demonstrated on the YaleB database were encouraging results were achieved. Our future work with respect to the normalization techniques will be focused on their evaluation on larger and more challenging databases.

References 1. Heusch, G., Cardinaux, F., Marcel, S.: Lighting Normalization Algorithms for Face Verification. IDIAP-com 05-03 (March 2005) 2. Jobson, D.J., Rahman, Z., Woodell, G.A.: Properties and Performance of a Center/Surround Retinex. IEEE Transactions on Image Processing 6(3), 451–462 (1997) 3. Jobson, D.J., Rahman, Z., Woodell, G.A.: A Multiscale Retinex for Bridging the Gap Between Color Images and the Human Observations of Scenes. IEEE Transactions on Image Processing 6(7), 897–1056 (1997) 4. Wang, H., Li, S.Z., Wang, Y., Zhang, J.: Self Quotient Image for Face Recognition. In: Proc. of the Int. Conference on Pattern Recognition, pp. 1397–1400 (2004) 5. Gross, R., Brajovic, V.: An Image Preprocessing Algorithm for Illumination Invariant Face Recognition. In: Proc. of AVPBA 2003, pp. 10–18 (2003) 6. Land, E.H., McCann, J.J.: Lightness and Retinex Theory. Journal of the Optical Society of America 61(1), 1–11 (1971) 7. Short, J., Kittler, J., Messer, K.: Photometric Normalisation for Face Verification. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 617–626. Springer, Heidelberg (2005) 8. Park, Y.K., Park, S.L., Kim, J.K.: Retinex Method Based on Adaptive smoothing for Illumination Invariant Face Recognition. Signal Processing 88(8), 1929–1945 (2008) 9. Buades, A., Coll, B., Morel, J.M.: On Image Denoising Methods. Prepublication, http://www.cmla.ens-cachan.fr 10. Georghiades, A.G., Belhumeur, P.N., Kriegman, D.J.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE TPAMI 23(6), 643–660 (2001)

Manifold Learning for Video-to-Video Face Recognition Abdenour Hadid and Matti Pietik¨ ainen Machine Vision Group, P.O. Box 4500, FI-90014, University of Oulu, Finland

Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose a novel approach based on manifold learning. The idea consists of first learning the intrinsic personal characteristics of each subject from the training video sequences by discovering the hidden low-dimensional nonlinear manifold of each individual. Then, a target face video sequence is projected and compared to the manifold of each subject. The closest manifold, in terms of a recently introduced manifold distance measure, determines the identity of the person in the sequence. Experiments on a large set of talking faces under diﬀerent image resolutions show very promising results (recognition rate of 99.8%), outperforming many traditional approaches.

1

Introduction

Recently, there has been an increasing interest on video-based face recognition (e.g. [1,2,3]). This is partially due to the limitations of still image-based methods in handling illumination changes, pose variations and other factors. The most studied scenario in video-based face recognition is having a set of still images as the gallery (enrollment) and video sequences as the probe (test set). However, in some real-world applications such as in human-computer interaction and content based video retrieval, both training and test sets can be video sequences. In such settings, performing video-to-video matching may be crucial for robust face recognition but this task is far from being trivial. There are several ways of approaching the problem of face recognition in which both training and test sets are video sequences. Basically, one could build an appearance-based system by selecting few exemplars from the training sequences as gallery models and then performing still image-based recognition and fusing the results over the target video sequence [4]. Obviously, such an approach is not optimal as some important information in the video sequences may be left out. Another direction consists of using spatiotemporal representations for encoding the information both in the training and test video sequences [1,2,3]. Perhaps, the most popular approach in this category is based on the hidden Markov models (HMMs) which have been successfully applied to face recognition from videos [2]. The idea is quite simple: in the training phase, an HMM is created to learn both the statistics and temporal dynamics of each individual. During the recognition process, the temporal characteristics of the face sequence are J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 9–16, 2009. c Springer-Verlag Berlin Heidelberg 2009

10

A. Hadid and M. Pietik¨ ainen

analyzed over time by the HMM corresponding to each subject. The likelihood scores provided by the HMMs are compared. The highest score provides the identity of a face in the video sequence. Unfortunately, most methods which use spatiotemporal representations for face recognition have not yet shown their full potential as they suﬀer from diﬀerent drawbacks such as the use of only global features while local information is shown to also be important to facial image analysis [5] and the lack of discriminating between the facial dynamics which are useful for recognition from those which can hinder the recognition process [6]. Very recently, inspired by studies in neuroscience emphasizing manifold ways of visual perception, we introduced in [7] a novel method for gender classification from videos using manifold learning. The idea consists of clustering the face sequences in the low-dimensional space based on the intrinsic characteristic of men and women. Then, a target face sequence is projected into both men and women manifolds for classification. The proposed approach reached excellent results not only in gender recognition problem but also in age and ethnic classification from face video sequences. In this work, we extend the approach proposed in [7] to the problem of videoto-video face recognition. Thus, we propose to first learn and discover the hidden low-dimensional nonlinear manifold of each individual. Then, a target face sequence can be projected into each manifold for classification. The “closest” manifold will then determine the identity of the person in the target face video sequence. The experiments which are presented in Section 4 show that such manifold-based approach yields in excellent results outperforming many traditional methods for video-based face recognition. The rest of this paper is organized as follows. Section 2 explains the notion of face manifold and discusses some learning methods. Then, we describe our proposed approach to the problem of video-to-video face recognition and the experimental analysis in sections 3 and 4, respectively. Finally, we draw a conclusion in Section 5.

2

Face Manifold

Let I(P , s) denote a face image of a person P at configuration s. The variable s describes a combination of factors such as facial expression, pose, illuminations etc. Let ξp , ξp = { I(P, s) | s ∈ S} (1) be the collection of face images of the person P under all possible configurations S. The ξp thus defined is called the face manifold of person P . Additionally, if we consider all the face images of diﬀerent individuals, then we obtain the face manifold ξ: ξ = ∪ ξp (2) p

Such a manifold ξ resides only in a small subspace of the high-dimensional image space. Consider the example of Fig. 1 showing face images of a person when moving his face from left to right. The only obvious degree of freedom in this case is the rotation angle of the face. Therefore, the intrinsic dimensionality of

Manifold Learning for Video-to-Video Face Recognition

11

Fig. 1. An example showing a face manifold of a given subject embedded in the high dimensional image space

the faces is very small (close to 1). However, these faces are embedded in a 1600dimensional image space (since the face images have 40×40 = 1600 pixels) which is highly redundant. If one could discover the hidden low-dimensional structure of these faces (the rotation angle of the face) from the input observations, this would greatly facilitate the further analysis of the face images such as visualization, classification, retrieval etc. Our proposed approach to the problem of video-tovideo face recognition, which is described in Section 3, exploits the properties of face manifolds. Neuroscience studies also pointed out the manifold ways of visual perception [8]. Indeed, facial images are not “isolated” patterns in the image space but lie on a nonlinear low-dimensional manifold. The key issue in manifold learning is to discover the low-dimensional manifold embedded in the high dimensional space. This can be done by projecting the face images into low-dimensional coordinates. For that purpose, there exist several methods. The traditional ones are Principal Component Analysis (PCA) and Multidimensional Scaling (MDS). These methods are simple to implement and eﬃcient in discovering the structure of data lying on or near linear subspaces of the high-dimensional input space. However, face images do not satisfy this constraint as they lie on a complex nonlinear and nonconvex manifold in the high-dimensional space. Therefore, such linear methods generally fail to discover the real structure of the face images in the low-dimensional space. As an alternative to PCA and MDS, one can consider some nonlinear dimensionality reduction methods such as Self-Organizing Maps (SOM) [9], Generative Topographic Mapping (GTM) [10], Sammon’s Mappings (SM) [11] etc. Though these methods can also handle nonlinear manifolds, most of them tend to involve several free parameters such as learning rates and convergence criteria. In addition, most of these methods do not have an obvious guarantee of convergence to the global optimum. Fortunately, in the recent years, a set of new manifold learning algorithms have emerged. These methods are based on an Eigen decomposition and combine the major algorithmic features of PCA and MDS (computational eﬃciency, global optimality, and flexible

12

A. Hadid and M. Pietik¨ ainen

asymptotic convergence guarantees) with flexibility to learn a broad class of nonlinear manifolds. Among these algorithms are Locally Linear Embedding (LLE) [12], ISOmetric feature MAPping (ISOMAP) [13] and Laplacian Eigenmaps [14].

3

Proposed Approach to Video-Video Face Recognition

We approach the problem of video-to-video face recognition from manifold learning perspective. We adopt the LLE algorithm for manifold learning due to its demonstrated simplicity and eﬃciency to recover meaningful low-dimensional structures hidden in complex and high-dimensional data such as face images. LLE is an unsupervised learning algorithm which maps high-dimensional data onto a low-dimensional, neighbor-preserving embedding space. In brief, considering a set of N face images and organizing them into a matrix X (where each column vector represents a face), the LLE algorithm involves then the following three steps: 1. Find the k nearest neighbors of each point X i . 2. Compute the weights W ij that best reconstruct each data point from its neighbors, minimizing the cost in Equation (3): 2 N Wij Xj (3) ǫ(W ) = Xi − i=1 j ∈neighbors(i) while enforcing the constraints Wij = 0 if Xj is not a neighbor of Xi , and N j=1 Wij = 1 for every i (to ensure that W is translation-invariant). 3. Compute the embedding Y (of lower dimensionality d 1 THz

160

M. Moreno-Moreno, J. Fierrez, and J. Ortega-Garcia

(a)

(d)

(b)

(e)

(h)

(g)

Outdoors

(c)

Outdoors

(f)

(i)

2 mm

Indoors PMMW

AMMW

PSMW

ASMW

Fig. 6. Images acquired with MMW and SMW imaging systems. (a) Outdoors PMMW image (94 GHz) of a man carrying a gun in a bag. (b-c) Indoors and outdoors PMMW image of a face. (d) AMMW image of a man carrying two handguns acquired at 27-33 GHz. (e) PSMW image (0.1-1 THz) of a man with concealed objects beneath his jacket. (f) PSMW image (1.5 THz) of a man with a spanner under his T-shirt. (h) ASMW image (0.6 THz) of a man hiding a gun beneath his shirt. (g) Full 3-D reconstruction of the previous image after smoothing of the back surface and background removal. (i) Terahertz reflection mode image of a thumb. These figure insets are extracted from: www.vision4thefuture.org (a-c), [16] (d), [17] (e), [18] (f), [19] (g-h) and [20] (i).

MMW (PMMW), Active MMW (AMMW), Passive SMW (PSMW) and Active SMW (ASMW) images are shown in Fig. 6. As MMW and SMW images measure the different radiometric temperatures in the scene, see Eq. (1), images acquired indoors and outdoors have very different contrast when working in passive mode, specially with MMW (see Fig. 6b and c). In spite of the significant advantages of MMW and SMW radiation for biometric purposes (cloth penetration, low intrusiveness, health safety), no biometric applications have been developed yet.

6 Conclusions We have provided a taxonomy of the existing imaging technologies operating at frequencies beyond the visible spectrum that can be used for biometrics purposes. The advantages and challenges of each imaging technology, as well as their image properties have been presented. Although only X-ray and Infrared spectral bands have been used for biometric applications, there is another kind of radiation with promising applications in the biometric field: millimeter and submillimeter waves. However MMW and SMW technology is not completely mature yet.

Biometrics Beyond the Visible Spectrum: Imaging Technologies and Applications

161

Acknowledgments. This work has been supported by Terasense (CSD2008-00068) Consolider project of the Spanish Ministry of Science and Technology. M. M.-M. is supported by a CPI Fellowship from CAM, and J. F. is supported by a Marie Curie Fellowship from the European Commission.

References 1. Jain, A.K., et al.: An Introduction to Biometric Recognition. IEEE Trans. on CSVT 14(1), 4–20 (2004) 2. National Research Council, Airline Passenger Security Screening: New Technologies and Implementation Issues. National Academy Press, Washington, D.C (1996) 3. Galbally, J., et al.: Fake Fingertip Generation from a Minutiae Template. In: Proc. Intl. Conf. on Pattern Recognition, ICPR. IEEE Press, Los Alamitos (2008) 4. Appleby, R., et al.: Millimeter-Wave and Submillimeter-Wave Imaging for Security and Surveillance. Proc. of the IEEE 95(8), 1683–1690 (2007) 5. Shamir, L., et al.: Biometric identification using knee X-rays. Int. J. Biometrics 1(3), 365– 370 (2009) 6. Chen, H., Jain, A.K.: Dental Biometrics: Alignment and Matching of Dental Radiographs. IEEE Transactions on PAMI 27(8), 1319–1326 (2005) 7. Bossi, R.H., et al.: Backscatter X-ray imaging. Materials Evaluation 46(11), 1462–1467 (1988) 8. Morris, E.J.L., et al.: A backscattered x-ray imager for medical applications. In: Proc. of the SPIE, vol. 5745, pp. 113–120 (2005) 9. Chalmers, A.: Three applications of backscatter X-ray imaging technology to homeland defense. In: Proc. of the SPIE, vol. 5778(1), pp. 989–993 (2005) 10. Li, S.Z., et al.: Illumination Invariant Face Recognition Using Near-Infrared Images. IEEE Trans. on PAMI 29(4), 627–639 (2007) 11. Lingyu, W., et al.: Near- and Far- Infrared Imaging for Vein Pattern Biometrics. In: Proc. of the AVSS, pp. 52–57 (2006) 12. Buddharaju, P., et al.: Physiology-Based Face Recognition in the Thermal Infrared Spectrum. IEEE Trans. on PAMI 29(4), 613–626 (2007) 13. Fann, C.K., et al.: Biometric Verification Using Thermal Images of Palm-dorsa Veinpatterns. IEEE Trans. on CSVT 14(2), 199–213 (2004) 14. Chen, X., et al.: IR and visible light face recognition. Computer Vision and Image Understanding 99(3), 332–358 (2005) 15. Kapilevich, B., et al.: Passive mm-wave Sensor for In-Door and Out-Door Homeland Security Applications. In: SensorComm 2007, pp. 20–23 (2007) 16. Sheen, D.M., et al.: Three-dimensional millimeter-wave imaging for concealed weapon detection. IEEE Trans. on Microwave Theory and Techniques 49(9), 1581–1592 (2001) 17. Shen, X., et al.: Detection and Segmentation of Concealed Objects in Terahertz Images. IEEE trans. on IP 17(12) (2008) 18. Luukanen, A., et al.: Stand-off Contraband Identification using Passive THz Imaging. In: EDA IEEMT Workshop (2008) 19. Cooper, K.B., et al.: Penetrating 3-D Imaging at 4- and 25-m Range Using a Submillimeter-Wave Radar. IEEE Trans. on Microwave Theory and Techniques 56(12) (2008) 20. Lee, A.W., et al.: Real-time imaging using a 4.3-THz Quantum Cascade Laser and a 320 x 240 Microbolometer Focal-Plane Array. IEEE Photonics Technology Letters 18(13), 1415–1417 (2006)

Formant Based Analysis of Spoken Arabic Vowels Yousef Ajami Alotaibi1 and Amir Husain2 1

Computer Eng. Dept., College of Computer & Information Sciences, King Saud University 2 Department of Computing Science, Stirling University [email protected], [email protected]

Abstract. In general, speech sounds are classified into two categories: vowels that contain no major air restriction through the vocal tract, and consonants that involve a significant restriction and are therefore weaker in amplitude and often "noisier" than vowels. This study is specifically concerned with modern standard Arabic dialect. Whilst there has been disagreement between linguistics and researchers on the exact number of Arabic vowels that exist, here we consider the case of eight Arabic vowels that comprise the six basic ones in addition to two diphthongs. The first and second formant values in these vowels are investigated and the differences and similarities between the vowels are researched using consonant-vowels-consonant (CVC) utterances. The Arabic vowels are analyzed in both time and frequency domains, and the results of the analysis will facilitate future Arabic speech processing tasks such as vowel and speech recognition and classification. Keywords: Arabic, Vowels, Analysis, Speech, Recognition, Formants.

1 Introduction Arabic is a Semitic language, and is one of the oldest languages in the world. Currently it is the second most spoken language in terms of number of speakers. Modern Standard Arabic (MSA) has basically 36 phonemes, of which six are vowels, two diphthongs, and 28 are consonants. A phoneme is the smallest element of speech units that indicates a difference in meaning of a word or a sentence. In addition to the two diphthongs, the six vowels are /a, i, u, a: , i:, u:/ where the first three ones are short vowels and the last three are their corresponding longer versions (that is, the three short vowels are /a, i, u /, and their three long counterparts are /a:, i:, u:/) [1] [2][3]. As a result, vowel sound duration is phonemic in Arabic language. Some researchers consider Arabic vowels to number eight in total by counting the two diphthongs as vowels and, this is normally considered to be the case for MSA [4]. MSA has fewer vowels than English language. Arabic phonemes comprise two distinctive classes, termed pharyngeal and emphatic phonemes. These two classes can be found only in Semitic languages such as Hebrew and Persian [1], [5], [4]. Arabic dialects may have different vowels - for instance, Levantine dialect has at least two extra types of diphthongs /aj/ and /aw/. Similarly, Egyptian dialect has other extra vowels [3]. Arabic language is comparatively much less researched compared to other languages such as English and Japanese. Most of the reported studies have been conducted on J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 162–169, 2009. © Springer-Verlag Berlin Heidelberg 2009

Formant Based Analysis of Spoken Arabic Vowels

163

Arabic language and speech digital processing in general, with only a few on Arabic vowels in specific. Some of research works have been carried out on MSA, classical and Quraanic versions of Arabic. More recently, Iqbal et al. [6] reported a new preliminary study on vowels segmentation and identification using formant transitions occurring in continuous recitation of Quraanic Arabic. The paper provided an analysis of cues to identify Arabic vowels. Their algorithm extracted the formants of already segmented recitation audio files and recognized the vowels on the basis of these extracted formants. The investigation was applied in the context of recitation principles of the Holy Quraan. The vowel identification system developed showed up to 90% average accuracy on continuous speech files comprising around 1000 vowels. In other related works, Razak et. al. [7] have investigated Quraanic verse recitation feature extraction using the Mel-Frequency Cepstral Coefficient (MFCC) approach. Their paper explored the viability of the MFCC technique to extract features from Quranic verse recitation. Features extraction is crucial to prepare data for classification process. The authors were able to recognize and differentiate the Quranic Arabic utterance and pronunciation based on the extracted features vectors. Tolba et al. [8] have also reported a new method for Arabic consonant/vowel segmentation using the wavelet transform. In their paper, a new algorithm for Arabic speech consonant and vowel segmentation without linguistic information was presented. The method was based on the wavelet transform and spectral analysis and focused on searching the transient between the consonant and vowel parts in certain levels from the wavelet packet decomposition. The accuracy rate was about 88.3% for consonant/vowel segmentation and the rate remained fixed at both low and high signal to noise ratios (SNR). Previously, Newman et al. [9] worked on a frequency analysis of Arabic vowels in connected Speech. Their findings do not confirm the existence of a high classical style as an acoustically ‘purer’ variety of modern standard Arabic. Alghamdi [10] carried out an interesting spectrographic analysis of Arabic vowels based on a cross-dialect study. He investigated whether Arabic vowels are the same at the phonetic level when spoken by speakers of different Arabic dialects, including Saudi, Sudanese, and Egyptian dialects. The author found that the phonetic implementation of the standard Arabic vowel system differs according to dialects. Previously, Al-Otaibi [11] also developed an automatic Arabic vowel recognition system. Isolated Arabic vowels and isolated Arabic word recognition systems were implemented. The work investigated the syllabic nature of the Arabic language in terms of syllable types, syllable structures, and primary stress rules. In this study, we carry out a formant based analysis of the six Arabic vowels as used in MSA. By changing the vocal tract shape, different forms of a perfect tube are produced, which in turn, can be used to change the desired frequencies of vibration. Each of the preferred resonating frequencies of the vocal tract (corresponding to the relevant bump in the frequency response curve) is known as a formant. These are usually referred to as F1 indicating the first formant, F2 indicating the second formant, F3 indicating the third formant, etc. That is, by moving around the tongue body and the lips, the position of the formants can be changed [2]. In vowels, F1 can vary from 300 Hz to 1000 Hz. The lower it is, the closer the tongue is to the roof of the mouth. F2 can vary from 850 Hz to 2500 Hz. The value of F2 is proportional to the frontness or backness of the highest part of the tongue during the production of the vowel. In addition, lips' rounding causes a lower F2 than with

164

Y.A. Alotaibi and A. Husain

unrounded lips. F3 is also important in determining the phonemic quality of a given speech sound, and the higher formants such as F4 and F5 are thought to be significant in determining voice quality. The rest of this paper is organized as follows: section two introduces the experimental framework employed in this study. The results are described and discussed in Section 3 and some concluding remarks and future work suggestions are given in Section 4.

2 Experimental Framework The allowed syllables in Arabic language are: consonant-vowel (CV), consonantvowel-consonant (CVC), and consonant-vowel-consonant-consonant (CVCC), where V indicates a (long or short) vowel while C indicates a consonant. Arabic utterances can only start with a consonant [1]. Table 1 shows the eight Arabic vowels along with their names, examples, and IPA symbols. In this paper the formants of Arabic vowels are analyzed to determine their values. These are expected to prove helpful in subsequent speech processing tasks such as vowel and speech recognition and classification. In carrying out the analysis, Arabic vowels have been viewed as if they are patterns on papers. Specifically, the vowels were plotted on paper or computer screen in the form of their time waveform, spectrograms, formants, and LPC spectrums. Comparisons and investigations are used as the vehicle to accomplish the goals of this research. At a later stage, these Arabic phonemes will be employed as input to a speech recognition system for classification purposes. An in-house database was built to help in investigating Arabic vowels depending on good selected and fixed phoneme. The utterances of ten male Arabic speakers, all aged between 23 to 25 years with the exception of one child, were recorded. Nine of the speakers are from different regions in Saudi Arabia and the remaining one from Egypt. Each of the ten speakers participated in five different trials for every carrier word in the data set used along with all the eight intended Arabic phonemes. Some of the speakers recorded the words in one session and others in two or three sessions. The carrier words were chosen to represent different consonants before and after the intended vowel. These carrier words are displayed in Table 2 using the second vowel /a:/. The sampling rate used in recording these words is 16 kHz and 16-bit resolution mono. Total of the recorded audio tokens is 4000 (i.e., eight phonemes times ten speakers times carrier words times five trials for each speaker). These audio tokens are used for analyzing the intended phonemes in frequency, during the training phase of the recognition system, and in its testing phase. Table 1. Arabic vowels

Formant Based Analysis of Spoken Arabic Vowels

165

Table 2. Carrier words used in the study using second vowel

3 Results The first part of the experiments is to evaluate values of the first and second formants, namely F1 and F2, in all considered Arabic vowels. This study considered frames in the middle of each vowel to minimize the co-articulation effects. Distribution of all vowels for all speakers with respect to the values of F1 and F2 is shown in Figure 1, from which we can see that the location of the /ay/ vowel distribution is far from both vowel /a/ and vowel /i/. This implies that there is a big difference between this vowel, /ay/, and the two components that form it (/a/ and /i/). In Figure 1 we can see an overlap between the vowel /aw/ and its constituent vowels, /a/ and /u/. Based on the figure, we can estimate the Arabic vowel triangle’s location as (400,800), (700,1100), and (400,2100) where the first value is for F1 and the second value is for F2. Figure 2 shows a plot for all short vowels for one of the speakers for three different trials. It can be seen from Figure 2 that the F1 value is relatively high for /a/, medium for /u/, and minimum for /i/. But in the case of F2, it is medium for /a/, minimum for /u/ and high for /i/. For the long vowels it is clear that the same situation is true as observed for their short counterparts. The long vowels are peripheral while their short counterparts are close to center when the frequencies of the first two formants are plotted on the formant chart. The position of /aw/ can be seen to be between /a:/ and /u:/ and the position of /ay/ is between /a:/ and /i:/. F1 can be used to classify between /a:/ and /u:/ and between /a/ and /u/. F2 can be used to classify between /i:/ and /u:/ and between /i/ and /u/ as can be inferred from Figure 2.

166

Y.A. Alotaibi and A. Husain

Fig. 1. Vowels distribution depending on F1 and F2 for all speakers

From the obtained results, it can be seen that F1 of /a/ is smaller than F1 of /a:/, while F2 of /a/ is smaller than F2 of /a:/ and the values of F3 are close for both of the durational counterparts. Also F1 of /u/ is larger than F1 of /u:/ except for the child, whereas F2 of /u/ is smaller than F2 of /u:/ for all speakers and the values of F3 are close for both of them. Similarly, it has been found that F1 of /i/ is larger than F1 of /i:/ and F2 of /i/ is smaller than F2 of /i:/. To conclude these specific outcomes, it can be said that F1 in a given short vowel is larger than F1 in its long counterpart except for /a/ and /a:/; and F2 in a given short vowel is larger than F1 in its long counterpart except for /a/ and /a:/. These findings confirm those reported in previous studies [10]. Further, it can be seen from Figure 2, that on the basis of F3, Arabic vowels can be classified into three groups: group 1 contains vowels /u/, /u:/ and /aw/ where F3 is less than 2700Hz; group 2 contains vowels /a/, /a:/ and /i/ where F3 is more than 2700Hz and less than 2760Hz; and group 3 contains vowels /i:/ and /ay/ where F3 is more than 2760Hz. Also F1 can be used to classify between /a/ and /u/. F2 can be used to classify /i/ and /u/. The vowel /a/ has the largest value of F1 and /i/ has the largest value of F2. The vowel /i/ has the smallest value of F1 and /u/ has the smallest value of F2. F1 can be used to classify between /a:/ and /u:/. F2 can be used to classify /i:/ and /u:/. The vowel /a:/ has the largest value of F1 and /i:/ has the largest value of F2. The vowel /i:/ has the smallest value of F1 and /u:/ has the smallest value of F2.

Formant Based Analysis of Spoken Arabic Vowels

167

Fig. 2. Values of F1 and F2 for short vowels for Speaker 6 for three trials for short vowels

In Arabic vowels, as mentioned earlier, F1 can vary from 300 Hz to 1000 Hz, and F2 can vary from 850 Hz to 2500 Hz. F3 is also important in determining the phonemic quality of a given speech sound, and the higher formants such as F4 and F5 are thought to be significant in determining voice quality. In this case, as can be seen from Table 3 which shows the Arabic vowel formants averaged on all speakers, for short vowels, /a/ has the largest value of F1 and /i/ has the largest value of F2. In /a/ and /a:/ the whole tongue goes down so the vocal tract becomes wider than in producing other Arabic vowels. In /u/ and /u:/ the end of the tongue comes near to the palate while the other parts of the tongue are in the regular position. In /i/ and /i:/ the the front of tongue comes near to the palate whereas other parts remain in their regular position. Lips are more rounded for /u/ and /u:/ than for /i/ and /i:/. Figure 3 shows the signal in time domain and spectrograms of the specific carrier word used in all eight vowels. The time domain representations in Figure 3 confirm that all utterances are CVC as the signal energy in the middle is mostly at its maximum. Also, the formants’ usual patterns can be noticed in the spectrograms. In addition, the similarities of the first (and final) consonant in all words can be clearly noticed since the same first and final consonants are used in all the plots (with just the vowel being varied). Table 3. Arabic vowel formants averaged on all speakers Formant

a

a:

u

u:

I

I:

F1

590.8

684.4

488.8

428.7

479.3

412.2

F2

1101.9

1193.3

975.2

858.6

1545

2131.9

F3

2755

2750.5

2660.2

2594.5

2732.5

2788.3

F4

3581.5

3665.7

3534.7

3426.9

3573.2

3599.8

168

Y.A. Alotaibi and A. Husain

Fig. 3. Time waveforms and spectrogram for Speaker 9, Word 2, and Vowels 1,5,7 and 8

In summary, these formants can thus be seen to be very effective in classifying vowels correctly and can be used in a future speech recognition system. Formants of the vowels can be included explicitly in the feature extraction module of the recognizer. If such a system is able to recognize the different vowels then this will tremendously assist in the Arabic speech recognition process. The reason behind this is that every word and syllable in Arabic language must contain at least one vowel; hence vowel recognition will play a key role in identifying the spoken utterance.

4 Conclusions This paper has presented a new formant based analysis of Arabic vowels using a spectrogram technique. The Arabic vowels were studied as if they were patterns shown on screen or paper. Modern standard Arabic has six basic vowels and two diphthongs which are considered by some linguistics as vowels rather than diphthongs. Thus the number of vowels in this study was considered to be eight (including the two diphthongs). All these eight Arabic phonemes were included for constricting the created database deployed in the investigation which has shown that the formants are very effective in classifying vowels correctly. In the near future, a recognition system will be built for classifying these eight phonemes and an error performance analysis of the recognition system will be carried out to acquire further knowledge and infer related conclusions about Arabic vowels and diphthongs. Other planned future work will extend the present study to include vowels of classical Arabic (used in the Quraan).

Formant Based Analysis of Spoken Arabic Vowels

169

Acknowledgment The authors would like to acknowledge the British Council (in Riyadh, Saudi Arabia) for funding this collaborative research between King Saud University and the University of Stirling.

References 1. Alkhouli, M.: Alaswaat Alaghawaiyah. Daar Alfalah, Jordan (1990) (in Arabic) 2. Deller, J., Proakis, J., Hansen, J.H.: Discrete-Time Processing of Speech Signal. Macmillan, Basingstoke (1993) 3. Alghamdi, M.: Arabic Phonetics. Al-Toubah Bookshop, Riyadh (2001) (in Arabic) 4. Omar, A.: Study of Linguistic phonetics. Aalam Alkutob, Eygpt (1991) (in Arabic) 5. Elshafei, M.: Toward an Arabic Text-to -Speech System. The Arabian Journal for Scince and Engineering 16(4B), 565–583 (1991) 6. Iqbal, H.R., Awais, M.M., Masud, S., Shamail, S.: New Challenges in Applied Intelligence Technologies. In: On Vowels Segmentation and Identification Using Formant Transitions in Continuous Recitation of Quranic Arabic, pp. 155–162. Springer, Berlin (2008) 7. Razak, Z., Ibrahim, N.J., Tamil, E.M., Idris, M.Y.I., Yakub, M., Yusoff, Z.B.M.: Quranic Verse Recitation Feature Extraction Using Mel-Frequency Cepstral Coefficient (MFCC). In: Proceedings of the 4th IEEE International Colloquium on Signal Processing and its Application (CSPA), Kuala Lumpur, MALAYSIA, March 7-9 (2008) 8. Tolba, M.F., Nazmy, T., Abdelhamid, A.A., GadallahA, M.E.: A Novel Method for Arabic Consonant/Vowel Segmentation using Wavelet Transform. International Journal on Intelligent Cooperative Information Systems, IJICIS Vol 5(1), 353–364 (2005) 9. Newman, D.L., Verhoeven, J.: Frequency Analysis of Arabic Vowels in Connected Speech, pp. 77-87 10. Alghamdi, M.M.: A spectrographic analysis of Arabic vowels: A cross-dialect study. Journal of King Saud University 10(1), 3–24 (1998) 11. Al-Otaibi, A.: Speech Processing. The British Library in Association with UMI (1988)

Key Generation in a Voice Based Template Free Biometric Security System Joshua A. Atah and Gareth Howells Department of Electronics, University of Kent, Canterbury, Kent, CT2 7NT United Kingdom {JAA29,W.G.J.Howells}@kent.ac.uk

Abstract. Biometric systems possess major drawbacks in their inability to be revoked and re-issued as would be the case with passwords if lost or stolen. The implication is that once a biometric source has been compromised, the owner of the biometric as well as the data protected by the biometric is compromised for life. These concerns have necessitated research in template free biometrics, which exploits the possibility of directly encrypting the biometric data provided by the individual and therefore eliminates the need for storing templates used for data validation, and thus increasing the security of the system. Template free system function in stages- (a) Calibration during which feature distribution maps of typical users are generated from known biometric samples without storing any personal information, and (b) Operation which uses the feature distribution maps as a reference to generate encryption keys from samples of previously unseen users and also rebuilds the key when needed from new sets of previously unseen samples. In this report, we used a combination of stable features from the human voice to directly generate biometric keys using a novel method of feature concatenation to combine the binary information at the operation phase. The stability of the keys is superior to current key generation methods and the elimination of biometric templates improves the safety of personal information and in turn increases users’ confidence in the system. Keywords: Biometrics, Security, Template-free, Voice features.

1 Introduction In all current biometric systems, which are based on templates and operate by measuring an individual's physical features in an authentication inquiry and comparing this data with stored biometric reference data [6], there are concerns particularly in the safety of the personal biometric information that is stored on the template [1, 7]. One of these concerns is that once a biometric source has been compromised, it cannot be re-issued, unlike passwords that can be cancelled and reissued if lost or stolen. Therefore, the owner of the biometric as well as the data protected by the biometric is compromised for life because users cannot ever change their features. Although cancellable biometrics seeks to address re-issueability of compromised biometric keys, it is still based on the storage of users’ information on templates and therefore, it is challenged by the integrity of the owner of the biometric sensor who may pre-record J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 170–177, 2009. © Springer-Verlag Berlin Heidelberg 2009

Key Generation in a Voice Based Template Free Biometric Security System

171

biometric samples, as well as those with access to the algorithm who may have, and can use, privilege rights at the pre and post template stages of the system. Our current research in template-free biometrics principally addresses all concerns associated with template storage of individual’s physical features as reference data for authentication/verification [6, 10]. the idea is novel and exploits the possibility of directly encrypting the biometric data provided by the individual and therefore eliminates the need for storing templates used for data validation. Template free systems have two stages called ‘Calibration’ and ‘Operation’ Phases. Abstractly, in the Calibration phase, users provides samples of the given biometric to generate typical user normalisation maps and then allowing encryption keys to be generated directly from certain features in the samples. As a result, the proposed system requires no storage of the biometric templates and no storage of any private encryption keys. In the Operation phase, a new set of samples are provided by individual users from which new keys may be generated directly. These are previously unseen samples (that have not been stored anywhere). Message data is encrypted first with the receiver's asymmetric public key and a digest of the message is then encrypted with the sender's asymmetric private key regenerated via new biometric samples to form a digital signature. The encrypted message is then sent to the receiver. Figure 1 below shows a conceptual key generation system.

In the operation phase, the encrypted message is decrypted first with the sender's asymmetric public key to Fig. 1. A conceptual biometric key generation system verify the sender. Thereafter, the decrypted message is further decrypted with the receiver’s asymmetric private key again regenerated via further biometric samples.

The novelty of the current proposal lies in the development of techniques for the direct encryption of data extracted from biometric Fig. 2. samples which characterise the identity of the individual. Such a system offers the following significant advantages:• •

The removal of the need to store any form of template for validating the user, hence directly addressing the disadvantages of template compromise. The security of the system will be as strong as the biometric and encryption algorithm employed (there is no back door). The only mechanisms to gain subsequent access are to provide another sample of the biometric or to break the cipher employed by the encryption technology.

172

J.A. Atah and G. Howells

•

The (unlikely) compromise of a system does not release sensitive biometric template data which would allow unauthorised access to other systems protected by the same biometric or indeed any system protected by any other biometric templates present.

A further significant advantage relates to the asymmetric encryption system associated with the proposed technique. Traditional systems require that the private key for decrypting data be stored in some way (memorising a private key is not feasible). With the proposed system, the key will be uniquely associated with the given biometric sample and a further biometric sample will be required to generate the required private key. As there is no physical record of the key, it is not possible to compromise the security of sensitive data via unauthorised access to the key. Our previous publication [6] identified some of the voice features that are considered suitable for template free biometrics as Maximum Power Spectral Density (PSD), Average Power Spectral Density (PSD), Minimum Power Spectral Density (PSD), Minimum fft, Mean Amplitude, Minimum amplitude, minimum cepstrum, mean cepstrum, maximum ifft, minimum ifft, mean ifft, maximum hilbert function, minimum hilbert function, mean hilbert function.

2 Feature Distribution Maps The initial phase in template free biometrics is the Calibration phase, which yields feature distribution maps for typical users of the system. Calibration begins with the presentation of known biometric samples by all the users.

Calibration Phase

Signalinput

PreͲ Processing

FeatureExtraction (Featuresare representedas integers)and OriginalSamples aredeleted

FeatureScore normalisation and Quantisation

Generationof feature distribution/ Normalisation maps

Data/fileis encryptedfromthe normalised& combinedbiometric featuresfor operatingthe system

Fig. 3. Schematic representation of the Operation phase

All voice samples are pre-processed to determine sampling frequency, the frame and to ensure relative stability/standardisation because of the variances in the capture device. Useful features [6] are extracted in the form of measureable integers and then normalised for all users within a given feature space in order to reduce the effect of score variability. For a defined quantisation interval users’ probability density functions are calculated within the feature space. These are used to generate the normalisation maps which is a bell shape curve in useful features. The probability values

Key Generation in a Voice Based Template Free Biometric Security System

173

within a pre-defined percentage of the interval from the mean are considered for generating the biometric key. This is because of overlaps in users information within the curve. For the purpose of determining the keys, the probability values are further converted into binary form to provide more specific information about the stable bits which are then combined to build the encryption keys for each user. Normalisation and Quantisation The min-max normalisation technique is employed bearing in mind that voice modality to a reasonable extent does not generate unusual distributions (multiple modes). Suppose that xi = x1, x2…..xn is a score to be fused, then, the min-max method is given by

x

norm

=

xi − min( xi ) [5], [7] max(xi ) − min( xi )

The min and max sample score values are those of users within each feature space, where xi is the individual sample value; min( xi ) is the smallest value of x in all the users for that feature space, and

max( xi ) is the largest value of x in all the users for

that feature space Fixed quantisation interval between 0 and 1 is used per feature space. For each value in the quantization interval, the mean and standard deviation per user is used to calculate the normal probability distribution function, given by:

Plot the p(x) against the quantization space. It should however be noted that there are overlaps in users characteristics increasing the likelihood of one user’s key to be similar to that of the others. The p(x) values are further converted into Gray code to

Fig. 4. User distribution maps

174

J.A. Atah and G. Howells

provide more specific information about the samples. Values within the range of 10% from the mean on both sides of the curve are considered most useful and are the then combined to build the encryption keys for each user. The stored user distribution maps forms the basis for the operation phase of the system.

3 Key Generation Key generation for encryption and decryption takes place in the operation phase of template free biometric systems. In all cases, the signal pre-processing and feature extraction processes takes place as described earlier. The system then references the feature distribution maps to generate the bits used as biometric keys per feature space that represent individual users. The keys are then combined using a novel method of concatenation to produce a single key which is used to encrypt message/ data. For the purpose of decrypting the system, the same process is followed and a new key is generated from previously unseen samples provided by the user, but which have not been stored anywhere.

OperationPhase

Signal input

Pre Processing

FeatureExtraction (Featuresare representedas integers)and OriginalSamples aredeleted

FeatureScore normalization and Quantization

System references feature distribution mapsto generate binarised

Combination ofbitsfrom various features

Encryption and decryption keysare generated asrequired

Fig. 5. Schematic representation of the Operation phase

Binarisation The binarisation process is introduced to convert the probability distribution scores within the quantised intervals into Gray code. This will ensure precision and absolute score stability to the algorithm at every instance of encryption and decryption. The acceptable quantisation intervals used in this case corresponds to the probability distribution values within the range of 10% deviation from the mean. Gray code is used because of its single distance code property i.e. adjacent code words differ by 1 in one digit position only. Feature bits concatenation For each user, the keys generated are mostly stable within a given region of percentage deviation on either side of the mean since the keys begins to degenerate when bits beyond a certain range is considered. As a result, the key generated as biometric keys per feature space is very small and therefore, it is required that all the bits from all the features be used to produce a single long key. A novel key combination method using concatenation is used to produce a long biometric key.

Key Generation in a Voice Based Template Free Biometric Security System

175

The concept of the research is that rather than referencing a stored template of users’ information, user candidates must always present their samples every time a file needs to be encrypted or decrypted, but none of the candidate’s samples will be recorded. Thus the key that is used for the system are reproduced at every instance of operation, but neither this key nor the samples from which they are derived are stored on any form of template.

4 Experiments and Results Our datasets are: (i) the VALID database [3] and (ii) the Biosecure database [9]. The valid database contains 530 samples (consisting of five recordings for each of 106 individuals over a period of one month), each one uttering the sentence "Joe Took Father's Green Shoebench Out". The recordings were acquired across different environments/ background – noisy, real world, and others in office scenario with no control on acoustic noise. The biosecure database is a multimodal biometric dataset, a product of the biosecure foundation. Each user candidate presented two sessions which were collected on different occasions, and the sessions are used for calibration and operation respectively. The calibration and operation tests are carried using features that have been established suitable. For all probability values within the quantisation interval in the distribution, values beyond 10% on either side of the mean generates bits equal to or close to zero and are therefore not considered useful in generating the keys. A typical User distribution graph and table is shown below

Quantizati on Interval Normal PDF

0

0.1

0.2

0.3

4.324 59E08

7.32 018 E-08

2.63221 E-79

2.0107 E-222

0.4

0.5 0

0.6 0

0.7 0

0.8 0

0.9 0

1 0

0

For a quantisation interval on a scale of 10, the four highest probability values within the quantisation space is considered in the key generation bits. Although this is the first attempt at using voice signals to generate template free biometrics, the experiments on 106 users in the valid database generates unique

176

J.A. Atah and G. Howells

biometric keys representing the users. This is a better performance over previous use of the same database in a template based system.

5 Conclusion This research introduces a novel method of biometric key generation in a template free biometric system. The two stage process of calibration and operation enables new biometric keys to be built at every instance of operation, thereby transferring the safe custody of the personal information to the individual users. A template-free biometric encryption system addresses the concerns associated with compromise of template of stored data. It is a technique that directly encrypts the biometric data provided by the individual and therefore eliminates the need for storing templates used for data validation, thus increasing the security of, and in turn the confidence in, the system. Its application ranges from secure document exchange over electronic media to instant encryption of mobile telephone conversations based on the voice samples provided by the speaker.

References [1] Maltoni, Anil, Wayman, Dario (eds.): Biometric Systems: Technology, Design and Performance Evaluation. Springer, Heidelberg (2002) [2] Wayman, J.: Fundamentals of biometric authentication technologies. Int. J. Imaging and Graphics 1(1) (2001) [3] http://ee.ucd.ie/validdb/datasets.html [4] Poh, N., Bengio, S.: A study of the effect of score normalization prior to fusion in Biometric Authentication Tasks (December 2004) [5] Rumsey, D.: Statistics for Dummies. Wiley publishing Inc., Indiana (2003) [6] Atah, J.A., Howells, G.: Score Normalisation of Voice Features for Template Free Biometric Encryption. In: The 2008 multi-conference in computer science, information technology, computer engineering, control and automation technology, Orlando, FL, USA (July 2008) [7] Nandakumar, K.: Integration of multiple cues in biometric systems, PhD thesis, Michigan State University (2005) [8] Koutsoyiannis, A.: Theory of Econometrics, 2nd edn. Palgrave, New York [9] http://biosecure.it-sudparis.eu/AB/ [10] Sheng, W., Howells, G., Fairhurst, M.C., Deravi, F.: Template-free Biometric Key Generation by means of Fuzzy Genetic Clustering. Information Forensics and Security 3(2), 183–191 (2008) [11] Deravi, F., Lockie, M.: Biometric Industry Report - Market and Technology Forecasts to 2003, Elsevier Advanced Technology (December 2000) [12] Bolle, R.M., Connell, J.H., Ratha, N.K.: Biometric perils and patches. Pattern Recognition 35(12) (December 2002) [13] Howells, W.G.J., Selim, H., Hoque, S., Fairhurst, M.C., Deravi, F.: An Autonomous Document Object (ADO) Model. In: Proceedings of the 6th International Conference on Document Analysis and Recognition (ICDAR 2001), Seattle, Washington, USA, September 2001, pp. 977–981 (2001)

Key Generation in a Voice Based Template Free Biometric Security System

177

[14] Hoque, S., Selim, H., Howells, G., Fairhurst, M.C., Deravi, F.: SAGENT: A Novel Technique for Document Modeling for Secure Access and Distribution. In: Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), Edinburgh, Scotland (2003) [15] Howells, G., Selim, H., Fairhurst, M.C., Deravi, F., Hoque, S.: SAGENT: A Model for Security of Distributed Multimedia. Submitted to IEEE Transactions on System, Man and Cybernetics [16] Rahman, A.F.R., Fairhurst, M.C.: Enhancing multiple expert decision combination strategies through exploitation of a priori information sources. IEE Proc. on Vision, Image and Signal Processing 146, 1–10 (1999) [17] Sirlantzis, K., Hoque, S., Fairhurst, M.C.: Trainable multiple classifier schemes for handwritten character recognition. In: Proc. 3rd Int. Workshop on Multiple Classifier Systems, Cagliari, Italy, pp. 169–178 [18] Chibelushi, C.C., Mason, J.S.D., Deravi, F.: Audio-Visual Person Recognition: An Evaluation of Data Fusion Strategies. In: Proc. European Conference on Security, London, April 28-30, 1997, pp. 26–30. IEE (1997) [19] Jain, A.K., Prabakar, S., Ross, A.: Biometrics Based Web Access. Technical Report TR98-33, Michigan State University (1998)

Extending Match-On-Card to Local Biometric Identification Julien Bringer1, Herv´e Chabanne1,2 , Tom A.M. Kevenaar3, and Bruno Kindarji1,2 1

Sagem S´ecurit´e, Osny, France T´el´ecom ParisTech, Paris, France priv-ID, Eindhoven, The Netherlands

2 3

Abstract. We describe the architecture of a biometric terminal designed to respect privacy. We show how to rely on a secure module equipped with Match-On-Card technology to ensure the confidentiality of biometric data of registered users. Our proposal relies on a new way of using the quantization functionality of Secure Sketches that enables identification. Keywords: Biometric terminal, Privacy, Identification.

1

Introduction

This paper aims at giving a solution to the problem of securing a biometric access control terminal. For privacy reasons, we want to protect the confidentiality of the biometric data stored inside the terminal. We follow the example of payment terminals which rely on a Secure Access Module (SAM, think of a dedicated smartcard) to provide the secure storage functionality for secret elements. The same goes for GSM phones and their SIM card. Our work follows and is adapted to biometric specificities to handle local identification, i.e. access control through a biometric terminal. The sensitive operation of matching a fresh biometric data against the biometric references of the registered users is made inside a SAM equipped with Match-On-Card (MOC) technology. To optimize the performances of our process, we reduce the number of these MOC comparisons by using a faster pre-comparison of biometric data based on their quantization. This way of proceeding partially comes from [10] where the computations needed for identifying an iris among a large database are sped-up with a similar trick. To quantize biometric data, we mostly rely on the works already done in the context of Secure Sketches. [2, 6, 8, 9, 11, 13, 18, 19, 17]. We focus on fingerprintbased identification, though more biometrics are possible, e.g. iris or face recognition, etc. However, Secure Sketches are not fit for biometric identification; moreover, their security is defined only in an information-theoretic specification. A direct application of such a scheme is thus leaky [3,16], and our proposal takes into account these weaknesses. J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 178–186, 2009. c Springer-Verlag Berlin Heidelberg 2009

Extending Match-On-Card to Local Biometric Identification

179

The novelty introduced in this paper concerns both identification and privacy. We fully describe the architecture of a biometric access-control terminal that is based on Match-On-Card, and thus extend the security properties of MOC.

2

Preliminaries

We focus on biometric identification. In such a setting, one of the main concerns is the privacy of the database containing all the biometric records. Indeed, we do not want such a collection of personal characteristics to be used against their owners. This implies to deploy an adequate solution so that no one is able to obtain information about a biometric feature stored within the database. It leads to an architectural issue: how is it possible to have a system in which the biometric data remains secure from the capture, to the storage, until it is used by a trusted matching device? Another concern is the eﬃciency of such a system: we wish to identify a user in a number of comparison sublinear in the database size. 2.1

Biometric Templates

In the following, we use two diﬀerent kinds of biometric template. The first one is used by classical biometrics matching algorithms. For example, for fingerprints, one may think of the coordinates of the minutiae points. There is a large literature on this subject and we refer to [7] for more details. We also introduce what we call quantized templates. Let k be a natural integer, and B = {0, 1}k be the binary Hamming space, equipped with Hamming distance d. d returns the number of coordinates on which two vectors diﬀer. In the following, we consider that our biometric terminal captures a biometric feature from an individual u, and translates it into a binary vector – the quantized template v ∈ B. Quantized templates must verify the following properties. Two diﬀerent quantized templates v, v ′ of the same user are with high probability “close”, i.e. at a small Hamming distance d(v, v ′ ) ≤ m; quantized templates v1 , v2 of diﬀerent users u1 , u2 are with high probability “far away”, i.e. at a Hamming distance d(v1 , v2 ) > m′ , with m < m′ . A summary of the notations used throughout the paper is given in Table 1. Comparisons between quantized templates go fast as it simply consists in computing a Hamming distance. Storage of a quantized template is also less demanding as we envisage a few hundreds bits for representing it (see Sec. 4). Table 1. Summary of Variables and Parameters Parameters B = { 0, 1} k : the Hamming space d: the Hamming distance n: number of registered users c: maximal number of MOC comparisons

Variables ui : the user number i bi : enrolled template for ui vi : quantized version of bi b′j , vj′ : candidates templates

180

J. Bringer et al.

Remark 1. Quantized templates have been introduced in the context of Secure Sketches [11, 13, 18, 19]. Given a quantized template x, we can indeed compute a secure sketch as: c⊕ x where cis a random codeword from a given code. [17] explicitly reports the construction of quantized templates for fingerprints. The theory behind this topic of quantization of biometrics, and the problematics such as prealignment, are not within the scope of this paper; see Section 4 for our implementation or [5, 4, 12] for more recent background on this subject. 2.2

Match-On-Card Technology

A classical way to use biometrics with enhanced security is to store the biometric template on a smartcard. Match-On-Card (MOC) is usually used for biometric authentication. In such a setting, a person is authenticated by first inserting a smartcard into a terminal, and then by presenting his biometrics. The biometric terminal sends the resulting template to the smartcard, which computes a matching score between the fresh template, and a previously stored one, and decides if the two templates come from the same user. Typically, a MOC fingerprint template is stored on about 512 bytes. As the computing power is limited, the matching algorithms for Match-OnCard suﬀer from more restrictions than usual matching functions. However, the performances are still good enough for real-life applications. As an example, the NIST MINEX test [15] reports a False Reject Rate of 4.7 10−3 for a False Accept Rate of 10−2 , and a False Reject Rate of 8.6 10−3 for a False Accept Rate of 10−3 . More detailed results can be found on the project website. The next section describes how to use MOC for biometric identification.

3 3.1

A Step by Step Implementation Entities Involved

The system architecture depicted in this work tends to combine the eﬃciency of biometric recognition and the physical security of a hardware-protected component. In practice, we build a biometric terminal, (cf. Figure 1), that includes distinct entities: – a main processing unit, – a sensor, – some non-volatile memory. This memory contains what we call the encrypted database which contains the encryption of all the templates of registered users, – a SAM dedicated to the terminal. It can be physically attached to the terminal as a chip with connections directly weld to the printed circuit board of the terminal. Another possibility is to have a SIM card and a SIM card reader inside the terminal. Afterwards, when we mention the computations of the biometric terminal, we designate those made by its main processor.

Extending Match-On-Card to Local Biometric Identification

181

Fig. 1. Our Biometric Terminal

Remark 2. Coming back to the Introduction of this paper and the analogy with the payment terminals, we consider in the following that our terminal is tamperevident [20]. Therefore, attempts of physical intrusions will be detected after. 3.2

Setup

We choose a symmetric encryption scheme, such as, for instance, the AES. It requires a cryptographic key κ which is kept inside the SAM. The SAM thus performs the encryption and decryption. The encryption of x under the key κ is denoted by Enc(x) (we omit κ in order to lighten the notations). Note that no user owns the key κ: there is only one key, independent of the user. We ensure the confidentiality of the templates by encrypting the content of the database under the key k. For n registered users, the database of the terminal stores their n encrypted templates {Enc(b1 ), . . . , Enc(bn )}. Identification through our proposal is made in two steps. To identify the owner of a biometric template b′ , we first roughly select a list of the most likely templates (bi1 , . . . , bic ) from the database, c < n . This is done by comparing quantized templates, as the comparison of these binary vectors is much faster than a MOC comparison. In a second step, the identification is comforted by doing the c matching operations on the MOC. 3.3

Enrolment Procedure

The enrolment of a user ui associated to a (classical) template bi takes two steps: 1. Compute and store the encryption of the template Enc(bi ) into the database, 2. then, compute and store a quantized template vi into the SAM memory. Although not encrypted, the quantized templates are stored in the SAM memory, and are thus protected from eavesdroppers. 3.4

Access-Control Procedure

When a user uj presents his biometrics to the sensor, he is identified in this way: 1. The processor encodes the biometric feature into the associated template b′j . 2. The processor computes the quantized template vj′ and sends it to the SAM.

182

J. Bringer et al.

3. The SAM compares vj′ with the stored v1 , . . . , vi , . . . , vn . He gets a list of c candidates vi1 , . . . , vic for the identification of b′j . 4. The SAM sequentially requests each of the Enc(bi ) for i ∈ {i1 , . . . , ic }, and decrypts the result into bi . 5. The SAM completes its task by doing the c MOC comparisons, and finally validating the identity of the owner of b′j if one of the MOC comparisons leads to a match. Proposition 1. As the biometric information of the enrolled users remain either in the SAM, or encrypted outside the SAM and decrypted only in the SAM, our access-control biometric terminal architecture ensures the privacy of the registered users.

4

Performances of This Scheme

The main observation is that the MOC comparison is the most costly operation here as an identification executes n of them. Based on this fact, we reduce the number of comparisons to a lower one, and focus on selecting the best candidates. For an identification, we in fact switch the timing needed for n MOC comparisons within the SAM for the (Hamming) comparison with n quantized templates followed by a sorting for the selection of the c best candidates, and at most c MOC comparisons. Let µMOC (resp. µHD (k); Sort(k, n); µDec ) be the computation time for a MOC comparison (resp. Hamming distance computation of k-bits vectors; sorting n integers of size k; template decryption). Additionally, the feature extraction and quantization of the fresh biometric image is managed outside the SAM by the main processor of the terminal. Neglecting this latter part, the pre-screening of candidates through quantized biometrics will improve the identification time as soon as (n − c) × (µMOC + µDec ) > n×µHD (k)+Sort(k, n). Assuming that µHD (k) is 2ms for k ≤ 1000 and that the comparison of two integers of size k takes 2ms as well, then it yields (n − c) × (µMOC + µDec ) > 2(n + n × log2 (n))ms. µMOC is generally within 100ms-500ms; assume that µMOC + µDec takes 200ms here. Then for instance with n = 100 and c = 10, it leads to an improvement by a factor 5.6. 4.1

A Practical Example

To confirm our solution to enhance the security of an access control terminal, we run experiments through diﬀerent fingerprints dataset based on a modification of the quantization algorithm studied in [17]. Some of our results are highlighted here on the fingerprint FVC2000 second database [14]. Adaptation of [17] algorithm towards identiﬁcation. Tuyls et al. apply an algorithm based on reliable component quantization to extract stable binary vectors from fingerprints and to apply secure sketches on it. From a fingerprint image, a real vector is extracted via the computation of a squared directional field and Gabor responses of 4 complex Gabor filters. Before this encoding,

Extending Match-On-Card to Local Biometric Identification

183

Table 2. Notations for the algorithm n users {u1 , . . . un }, M captures per user L = 17952: the number of extracted values per capture Xi,j ∈ (R {⊥})L : capture n◦ j for user ui , ⊥ is a null component (Mi )t : for t ∈ {1 . . . L}, number of real values among {(Xi,1 )t , . . . (Xi,M )t } µi , µ: mean (vector) per user, and overall Σ w , Σ b : within-class covariance and between-class covariance of the Xi,j Qi ∈ {0, 1, ǫ}L : L-long quantized vector for ui ; ǫ denotes an erasure Vi , V Mi : k-long binary quantized vector for ui , and its mask.

fingerprints are considered as pre-aligned based mainly on core detection and registration [17]. We adapt their algorithm for identification purpose and to avoid any loss of information due to re-alignment. On the second FVC2000 database, we obtain real vectors with 1984 components of information embedded in a vector of length L= 17952. All the 15968 null components are marked as erasures (i.e. positions where no value is known) for the sequel. To increase the stability of the vectors, an enrolment database containing n users and M images j=1..M obtained as above, per user is considered. From the nM real vectors (Xi,j )i=1..n fixed-length binary strings are generated following some statistics and reliable bit selection. This step is similar to the one described in [17], but adapted to take into account null positions, and to select the same choice of coordinates for all the users, to unmake the user-specific aspect of the original approach. For a given user i, the number (Mi )t of non-erased components at an index t is not constant: for 1 ≤ i ≤ n, 1 ≤ t ≤ L, 0 ≤ (Mi )t ≤ M . Whence (Mi )t = 0 the position t is considered as an erasure for the user i. When (Xi,j )t or (Mi )t are erased, they are not counted in the mean computations. For each coordinates, the means µi by user and the mean µ of all the enrolment vectors are computed. The within-class covariance Σ w and the between-class covariance Σ b are also estimated. Then we construct a binary string Qi as follows: for 1 ≤ t ≤ L, if (Mi )t ≥ 1, (Qi )t = 0 if (µi )t ≤ (µ)t , 1 if (µi )t > (µ)t . If (Mi )t = 0 then (Qi )t is marked as an erasure. Let k < L be the number of reliable components to select. (Σb ) . The Signal-to-Noise Ratio (SNR) of the coordinate t is defined by (ξ)t = (Σw )t,t t,t Here, we pick the k coordinates with highest SNR. These indexes are saved in a vector P and a new vector Vi of length k is constructed with the corresponding reliable bit values for each user ui , with a mask V Mi – a second k-bit vector to distinguish the known coordinates and the positions where no value is known. With respect to Section 3, this procedure enables us to manage the enrolment of a set of users u1 , . . . , un and outputs for each user a quantized template vi = (Vi , V Mi ). As for the access-control procedure, when a new fingerprint image is captured for a user uj , a fingerprint template based on minutiae b′j is extracted together with another template Yj based on pattern as above, then the quantization handles the quantized vector Q(Yj ), which is computed according to the comparison of Yj with the enrolment mean µ, to construct the vector vj′ = (Vj′ , V Mj′ ) by keeping only the indexes contained in P . We stress again that all these computations are performed by the main processor unit of the terminal.

184

J. Bringer et al.

Performances. On the second FVC2000 database, for the 100 users, we choose randomly M = 6 images per user for enrolment and the 2 remaining for the identification tests. We construct binary vectors vi of length 128 (i = 1, . . . , 100) at enrolment and for each vj′ obtained at the access-control step, we observe the rank of the good candidates by sorting the vi with respect to an adapted Hamming distance between vj′ and vi . This distance is computed as the number of diﬀerences plus half the number of positions where no value is known. In that case, 90% of the good candidates are among the 8 closest results and almost all are reached before rank 20. To reduce further the number of MOC comparisons needed, we can increase the length of the quantized templates. The experiments validate this, for 256-bit long templates: 81% of good candidates are reached on rank 2 and 90% on rank 5. The list of candidates is then almost always consolidated by very few MOC comparisons. Figure 2 illustrates the results with a quantization on 256 bits and 128 bits.

Fig. 2. Accuracy with 128 bits and 256 bits

5

Conclusion

This paper describes how to locally improve the privacy inside a biometric terminal for the purpose of identification. We can go further and change the scale of the setting. Indeed, the same idea can be applied at a system level. We only need to replace our Match-On-Card SAM by a more powerful hardware component, such as, for instance, Hardware Secure Module (HSM) [1]. This leads to study the applicability of our quantized templates speed-up to many users: this change of scale in the size of the database needs further investigations. Acknowledgment. This work is supported by funding under the Seventh Research Framework Programme of the European Union, Project TURBINE (ICT2007-216339). This document has been created in the context of the TURBINE project. It describes one of the protocols which are envisaged to be developed in the TURBINE demonstrators. All information is provided as is and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability. The European Commission has no liability in respect of this document, which is merely representing the authors’ view.

Extending Match-On-Card to Local Biometric Identification

185

References 1. Anderson, R., Bond, M., Clulow, J., Skorobogatov, S.P.: Cryptographic processorsa survey. Proceedings of the IEEE 94(2), 357–369 (2006) 2. Boyen, X., Dodis, Y., Katz, J., Ostrovsky, R., Smith, A.: Secure remote authentication using biometric data. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 147–163. Springer, Heidelberg (2005) 3. Bringer, J., Chabanne, H., Cohen, G., Kindarji, B., Zemor, G.: Theoretical and practical boundaries of binary secure sketches. IEEE Transactions on Information Forensics and Security 3(4), 673–683 (2008) 4. Chen, C., Veldhuis, R.N.J., Kevenaar, T.A.M., Akkermans, A.H.M.: Biometric binary string generation with detection rate optimized bit allocation. In: IEEE CVPR 2008, Workshop on Biometrics, June 2008, pp. 1–7 (2008) 5. Chen, C., Veldhuis, R.N.J.: Performances of the likelihood-ratio classifier based on diﬀerent data modelings. In: ICARCV, pp. 1347–1351. IEEE, Los Alamitos (2008) 6. Crescenzo, G.D., Graveman, R., Ge, R., Arce, G.: Approximate message authentication and biometric entity authentication. In: Patrick, A.S., Yung, M. (eds.) FC 2005. LNCS, vol. 3570, pp. 240–254. Springer, Heidelberg (2005) 7. Maltoni, A.K.J.D., Maio, D., Prabhakar, S. (eds.): Handbook of Fingerprint Recognition. Springer, Heidelberg (2003) 8. Dodis, Y., Katz, J., Reyzin, L., Smith, A.: Robust fuzzy extractors and authenticated key agreement from close secrets. In: Dwork, C. (ed.) CRYPTO 2006. LNCS, vol. 4117, pp. 232–250. Springer, Heidelberg (2006) 9. Dodis, Y., Reyzin, L., Smith, A.: Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. In: Cachin, C., Camenisch, J. (eds.) EUROCRYPT 2004. LNCS, vol. 3027, pp. 523–540. Springer, Heidelberg (2004) 10. Hao, F., Daugman, J., Zielinski, P.: A fast search algorithm for a large fuzzy database. IEEE Transactions on Information Forensics and Security 3(2), 203–212 (2008) 11. Juels, A., Wattenberg, M.: A fuzzy commitment scheme. In: ACM Conference on Computer and Communications Security, pp. 28–36 (1999) 12. Kelkboom, E.J.C., Molina, G.G., Kevenaar, T.A.M., Veldhuis, R.N.J., Jonker, W.: Binary biometrics: An analytic framework to estimate the bit error probability under gaussian assumption. In: IEEE BTAS 2008, pp. 1–6 (2008) 13. Linnartz, J.-P.M.G., Tuyls, P.: New shielding functions to enhance privacy and prevent misuse of biometric templates. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 393–402. Springer, Heidelberg (2003) 14. Maio, D., Maltoni, D., Cappelli, R., Wayman, J.L., Jain, A.K.: FVC2000: fingerprint verification competition. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3), 402–412 (2002) 15. National Institute of Standards and Technology (NIST). Minex ii - an assessment of match-on-card technology, http://fingerprint.nist.gov/minex/ 16. Simoens, K., Tuyls, P., Preneel, B.: Privacy weaknesses in biometric sketches. In: IEEE Symposium on Security and Privacy (to appear, 2009) 17. Tuyls, P., Akkermans, A.H.M., Kevenaar, T.A.M., Schrijen, G.-J., Bazen, A.M., Veldhuis, R.N.J.: Practical biometric authentication with template protection. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 436– 446. Springer, Heidelberg (2005)

186

J. Bringer et al.

18. Tuyls, P., Goseling, J.: Capacity and examples of template-protecting biometric authentication systems. In: Maltoni, D., Jain, A.K. (eds.) BioAW 2004. LNCS, vol. 3087, pp. 158–170. Springer, Heidelberg (2004) 19. Tuyls, P., Verbitskiy, E., Goseling, J., Denteneer, D.: Privacy protecting biometric authentication systems: an overview. In: EUSIPCO 2004 (2004) 20. Weingart, S.H.: Physical security devices for computer subsystems: A survey of attacks and defenses. In: Paar, C., Ko¸c, C ¸ .K. (eds.) CHES 2000. LNCS, vol. 1965, pp. 302–317. Springer, Heidelberg (2000)

A New Fingerprint Matching Algorithm Based on Minimum Cost Function ´ Andr´es I. Avila and Adrialy Muci Departamento de Ingenier´ıa Matem´ atica Universidad de La Frontera, Chile

Abstract. We develop new minutia-based ﬁngerprint algorithms minimizing a cost function of distances between matching pairs. First, using the minutia type or minutia quality, we choose a reference set of points form each set. next, we create the set of combinations of pairs to perform the best alignment and ﬁnally the matching by distances is computed. We tested our algorithm using the DB2A FVC2004 database extracting the minutia information by the mindtct program given by NBIS and we compare with the bozorth3 algorithm performace.

1

Introduction

Among all the fingerprint matching methods, minutia-based ones have proved to be the most implemented in real life applications, due to their low computational requirements and cheap sensor technologies. One of the problems of this family is the dependence with the quality of the image captured by diﬀerent sensor technologies. In Figure 5 [13], it is shown the eﬀect of solid sate and digital sensor technologies respect to the number of extracted minutia, noting that the solid state sensor tends to capture less minutia. Also in [3], it is shown the eﬀect of capacitance, optical and thermal sensor technologies in quality image. Finally in [1] the authors studied the eﬀect of sensor technology in quality image, defining five quality indexes considering diﬀerent pressures and dryness. In addition to these two eﬀects, temperature was also considered in [9] for four types of sensors. All these works highlight the need of new algorithms, which uses robust information related to the quality of images. In this work, first we will use information about minutia type to select a set of minutiae for alignment, and then use a minimizing cost algorithm for matching. Next, we will consider minutia quality given by mindtct files .xyt and .min for selecting the best minutiae for alignment.

2 2.1

Fingerprint Matching Main Ideas

Minutia-based matching is performed comparing a number of characteristics from each minutia. Most algorithms consider coordinates (xi , yi ), rotation angle respect the horizontal θi , and type ti . In [6], they considered eight characteristics J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 187–191, 2009. c Springer-Verlag Berlin Heidelberg 2009

188

´ A.I. Avila and A. Muci

including type information of neighboring minutiae. In [5], they included more characteristics including distances, lines between minutiae, and relative orientation respect to the central minutia. In [8] and [12], first they generate a local characteristic vector to find the best pair by weighted distance of vectors. Next, the align the rest of the minutiae. In [14], they also generate a set of angles for the characteristic vector to compute a score matrix with the best minutiae. They perform a greedy algorithm to find the maximum pairs. In our approach, we will consider only three characteristics, the first two are (xi , yi ), and the third will be either the type ti or the quality qi given by mindtct algorithm. The main idea is to minimize the number of characteristics used in matching. In the first stage in the algorithm, we perform an alignment by each pairs of minutia from a reference set as it was shown in [7] and solve a minimum square problem to find the translation, orientation and scaling. The reference set is selected either by type or by quality. In the second case, a new parameter is needed, the quality threshold. In the second stage, for each alignment we compute the distance matrix between minutiae from template and input images, and solve a task assignment problem by the Munkres’ Assigment Algorithm [10], also called Hungarian Algorithm, which was extended to rectangular matrices in [2]. We will give more details in the next sections. 2.2

Reference Set and Parameters

In this stage, we want to solve the problem of alignment among minutiae. Because testing all rotations, translations and scalings are too much time consuming, most minutia-based methods have used two reference points, as core or deltas, from each image for performing one alignment. In our approach, we do not consider information about these special points, so we consider pair of special minutiae to perform the alignment, the so called principal minutiae in [4]. Because not all combinations are useful, we first select a reference set. Denote by template minutiae by T = {mti }i=1...n and the input minutiae by I = {mij }j=1...m , where each minutia is given by mk = (xk , yk , tk ) for the coordinates and type, and by mk = (xk , yk , qk ) when quality qk is considered. For the reference set, in [7] the authors mentioned that false minutiae are located closer than real minutiae. Then, we will extend our minutiae to a fourth characteristic wk defined as the minimum distance to all other minutiae from the same image. Then, in case of type, we sort from highest to lowest distance and select a fixed number L of minutiae from each image for building the reference set A = {(mti , mij ) : 1 ≤ i, j ≤ L and ti = tj }, where the minutiae have the same type. This step avoid useless pairing for alignment. In case of quality, we consider the quality threshold U to select the best quality minutia for performing the alignment qk > U . In both cases, we will choose L such that the set A is nonempty and not so large. Notice that the are L2 pairs. Now, we build a set of pairs of pairs for the transformation AT = {(mti1 , mij1 ), (mti2 , mij2 ) ∈ A and mi1 = mj1 , mi2 = mj2 }.

A New Fingerprint Matching Algorithm

189

Thus, we obtain O(L4 ) possible alignments. It is clear that a small L will give few possibilities for alignment and a large L will give too many options. Thus, L must be tune. 2.3

Alignment and Rotation

The main idea is to transform the template minutiae into the coordinate system of the input minutiae. To obtain this transformation, we search for the optimal set of coordinates for the origin, angle and scaling T = (tx, ty, θ, ρ). Consider the reference pair (pi1 , qj1 ), (pi2 , qj2 ) ∈ A, where pi1 = (xi1 , yi1 ), pi2 = (xi2 , yi2 ) ∈ P , and qj1 = (xj1 , yj1 ), qj2 = (xj2 , yj2 ) ∈ Q u cos θ sin θ x t =ρ + x . (1) v − sin θ cos θ y ty Using standard minimum squares, it is possible to obtain a set of formula for the parameters. Because these computations are fast, we can perform several transformations avoiding the use of cores or deltas. 2.4

Minimum Cost Function

After alignment is done, we have two sets of minutiae in the same coordinate system P = {p1 , p2 , ..., pn } pi = (xpi , yip , zip ) i = 1, n Q = {q1 , q2 , ..., qm }

q q qi = (xq j , yj , z j )

j = 1, m,

zip

where is either the type or the quality of the minutia. Next, we compute the p q cost matrix C = (cij ) by the distance cij = |xpi − xq j | + |yi − yj | for i = 1, n and j = 1, m. This distance is faster to compute than the standard Euclidean distance. Considering that both set of minutiae has diﬀerent size, we will assume that n < m. For each template minutia i we search for input minutia j such that the minimum distance c∗ij is attained. This is represented by the following minimum cost problem, where zij is an integer variable representing if the pair (i, j is chosen: ⎞ ⎛ n m c z min ij ij ⎟ ⎜ ⎟ ⎜ i=1 j=1 ⎟ ⎜ ⎟ ⎜ m n . ⎟ ⎜ ⎟ ⎜ zij ≤ 1 zij = 1, ⎟ ⎜ s.t. ⎠ ⎝ i=1

j=1

zij ∈ 0, 1 i = 1, n j = 1, m

The first restriction represents the fact that each template minutia has a matching, and the second restriction mentioned that each input minutia cannot be associated to more than one template minutiae. Because the size of the matrix is not large, we can use non heuristic methods to solve this problem. The Munkres’ algorithm is the most eﬃcient algorithm to solve this problem.

190

´ A.I. Avila and A. Muci

2.5

Munkres’ Algorithm

We assume the matrix C has dimensions n × m, where n ≤ m. We sketch the algorithm: 1. For each row, find the minimum element and substract to the whole row. 2. Find a zero element and mark with an ∗. 3. Cover columns with ∗, count columns. If there are n elements, the matching is found. Else, goto to next step 4. 4. Find a zero not covered and mark with ′. If there is no ∗ in the row, goto next step 5. If there an ∗, uncover and cover row. Store the lowest noncovered value and goto 6. 5. Build a sequence of ∗ and ′ to find a new assignment and goto 3. 6. Add the value to each covered row and subtract to the uncovered columns. Goto 4. The whole matching algorithm is described in Figure 1. Algorithm 1. Matching with alignment

1 2 3 4 5 6 7 8 9 10 11 12

Data: set of minutiae from template and input images Result: Matching score after alignment begin Build characteristic vectors P,Q from the input data; Compute the weight wk for each minutia P and Q; Sort minutiae P and Q from highest to lowest ; Find reference set A; foreach reference pair in Ado Compute transformation by minimum squares; Align two characteristic sets using the transofrmation; Perform matching by Munkres’ algorithm and compute matching score; end The matching score will be the lowest score obtained among transformations; end

The final score is computed as the percentage of template minutiae found by each run.

3

Experiments for Minutia Quality

We perform an experiment with DB2A FVC2004 fingerprints, eight images for one hundred fingers. We selected a quality parameter U = 50 and L = 10 references sets. Using these parameters, we perform on average 134.4 alignments (with std 201.7), which shows how sensitive is the algorithm to the selection of pairs. The selected parameter for minimum EER was 78. We obtain a FNMR was 89.3% and the FMR was 11.5%, which shows.

A New Fingerprint Matching Algorithm

4

191

Conclusions

It is important to study the sensitivity of the parameters respect to the matching. It is clear that the combination of parameters will more complicated to obtain but it will allow more possibilities for adjusting the algorithm. The parameter L should depend on the quality image. If the quality is high, we will need less pairs for performing matching. Because of the poor quality of some fingerprints, we notice that in some cases there were no alignment, so rejection is made. Also, the score depends on the number of minutia matched in the comparison.

References 1. Alonso-Fernandez, F., Roli, F., Marcialis, G.L., Fierrez, J., Ortega-Garcia, J.: Comparison of ﬁngerprint quality measures using an optical and a capacitive sensor. In: IEEE Conference on Biometrics: Theory, Applications and Systems, 6 p. (2007) 2. Burgeois, F., Lasalle, J.-C.: An extension of the Munkres algorithm for the assignment problem to rectangular matrices. Communications of the ACM 142, 302–806 (1971) 3. Blomeke, C., Modi, S., Elliott, S.: Investigating The Relationship Between Fingerprint Image Quality and Skin Characteristics. In: IEEE International Carnahan Conference on Security Technology ICCST 2008, 4 p. (2008) 4. Chang, S.H., Cheng, F.H., Hsu, W.H., Wu, G.Z.: Fast algorithm for point pattern matching: invariant to translations, rotations and scale changes. Pattern Recognition 30(2), 311–320 (1997) 5. Chen, Z., Kuo, C.H.: A Topology-Based Matching Algorithm for Fingerprint Authentication. In: Proc. Int. Carnahan Conf. on Security Technology (25th), pp. 84–87 (1991) 6. Hrechak, A., McHugh, J.: Automated Fingerprint Recognition Using Structural Matching. Pattern Recognition 23(8), 893–904 (1990) 7. Jia, J., Cai, L., Lu, P., Liu, X.: Fingerprint matching based on weighting method and the SVM. Neurocomputing 70, 849–858 (2007) 8. Jiang, X., Yau, W.Y.: Fingerprint Minutiae Matching Based on the Local and Global Structures. Proc. Int. Conf. on Pattern Recognition (15th) 2, 1042–1045 (2000) 9. Kang, H., Lee, B., Kim, H., Shin, D., Kim, J.: A Study on Performance Evaluation of Fingerprint Sensor. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688, pp. 574–583. Springer, Heidelberg (2003) 10. Munkres, J.: Algorithms for Assignment and Transportation Problems. Journal of the SIAM 5(1) (March 1957) 11. Watson, C., Garris, M., Tabassi, W., Wilson, C., McCabe, R., Janet, S., Ko, K.: User’s Guide to NIST Biometric Image Software (NBIS), NIST, 217 p. (2009) 12. Ratha, N.K., Pandit, V.P., Bolle, R.M., Vaish, V.: Robust Fingerprint Authentication Using Local Structural Similarity. In: Proc. Workshop on Applications of Computer Vision, pp. 29–34 (2000) 13. Ross, A., Jain, A.: Biometric Sensor Interoperability: A Case Study in Fingerprints. In: Maltoni, D., Jain, A.K. (eds.) BioAW 2004. LNCS, vol. 3087, pp. 134–145. Springer, Heidelberg (2004) 14. Tico, M., Kuosmanen, P.: Fingerprint Matching using an Orientation-based Minutia Descriptor. IEEE Trans. on Pattern Analysis and Machine Intelligence 25(8), 1009–1014 (2003)

Invariant Fourier Descriptors Representation of Medieval Byzantine Neume Notation Dimo Dimov1 and Lasko Laskov1,2 1

Institute of Information Technologies (IIT) at Bulgarian Academy of Science (BAS) 2 Institute of Mathematics and Informatics (IMI) at BAS Acad. G. Bonchev Str., Block 29-A, 1113 Sofia, Bulgaria {dtdim,llaskov}@iinf.bas.bg

Abstract. During the last decade a lot of effort has been put in studying of the Fourier descriptors (FD) and their application in 2D shape representation and matching. Often FD has been preferred to other approaches (moments, wavelet descriptors) because of their properties which allow their translational, scale, rotational and contour start-point change invariance. However, there is a lack in the literature of extensive theoretical proof of these properties, which can result in inaccuracy in the methods’ implementation. In this paper we propose a detailed theoretical exposition of the FDs’ invariance with special attention paid to the corresponding proofs. A software demonstration has been developed with an application to the medieval Byzantine neume notation as part of our OCR system. Keywords: Fourier descriptors, historical document image processing, OCR.

1 Introduction Byzantine neume notation is a form of musical notation, used by the Orthodox Christian Church to denote music and musical forms in the sacred documents from the ancient times until nowadays. The variety and the number of different historical documents, containing neume notation is vast and they are not only a precious historical record, but also an important source of information and object of intense scientific research [8]. Naturally, most of the research of the neume notation in the historical documents is connected with the content of the documents itself, including searching for fragments or patterns of neumes, comparison between them, searching for similarities, etc. These and other technical activities are good argument in favor of creation of a software tool to help the research of the medieval neume notation. Such software tool can be an OCR (Optical Character Recognition) based system. In the literature there are quite few attempts described for creation of a software system for processing and recognition of documents containing neume notation [4], [1]. The both works were designed to work with the contemporary neume notation in printed documents. Our goal is to develop methods and algorithms for processing and recognition of medieval manuscripts containing Byzantine neume notation with no J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 192–199, 2009. © Springer-Verlag Berlin Heidelberg 2009

Invariant Fourier Descriptors Representation of Medieval Byzantine Neume Notation

193

binding to a particular notation. The main stages of the processing include: (i) preliminary processing and segmentation; (ii) symbol agglomeration in classes based on unsupervised learning of the classifier; (iii) symbol recognition. For the goal of the unsupervised learning and recognition we need a suitable representation of the neumatic symbols which will be used for defining of a feature space which will help the comparison between the neume representatives. Since the neumes have relatively simple shapes and rarely contain cavities, in the proposed approach each neume is represented by its outer contour. For feature space definition the Fourier transform (FT) of the contour is used with number of high frequencies removed resulting in a reduced frequency contour representation. Such representation of 2D shapes is often called Fourier descriptors (FDs) [5], [6]. During the last decade FDs has been investigated in detail and applied with success in different problems, like OCR systems design [2], Content Based Image Retrieval (CBIR) [3], [5], etc. One of the main reasons FDs to be preferred to the other approaches for 2D shape representation, as moments and wavelet descriptors, are the comparatively simple methods for translation, scale, rotation and starting point normalization of FDs. This is the reason why in the literature a lot of effort is put in investigating these properties. Nevertheless, the corresponding analytical proofs are rarely given which can be the reason for inaccuracy and even errors in the implementation of the corresponding methods for linear frequency normalization of the contours. The goal of this paper is to investigate in detail the properties of FDs to achieve their translation, scale, rotation and start- point invariance. A special attention is paid to the analytical proofs of these properties and a method for construction of linearly normalized reduced FDs (LNRFDs) for 2D shapes representation, in particular for Byzantine neume notation representation. The LNRFD representation of the neume notation can be used effectively for the goals of the unsupervised learning.

2 FD Representation of Byzantine Neume Notation For each segmented neume symbol, the algorithm for contour finding of bi-level images [7] is applied. The resulting contour is a closed and non-self-crossing curve. For our purposes we will represent the contour z as a sequence of Cartesian coordinates, ordered in the counterclockwise direction:

z ≡ ( z (i ) | i = 0,1,...( N − 1) ) ≡ ((x(i ), y (i ) ) | i = 0,1,...( N − 1) ) .

(1)

Besides, the contour is a closed curve, i.e.: z (i ) = z (i + N ), i = 0,1,K , N − 1 .

(2)

We will also assume that z is approximated with line segments between its neighboring points z (i) = (x(i), y(i)) , which are equally spaced, i.e.:

z (i + 1) − z (i ) = z ( N − 1) − z (0) = Δ , i = 0,1,K , N − 1 , where Δ is a constant for which we can assume Δ = 1 .

(3)

194

D. Dimov and L. Laskov

(a)

(b)

Fig. 1. (a) Fragment of a neume contour, represented in the complex plane; (b) The contour represented as a sum of pairs of radius-vectors. The sum of the first pair gives the base ellipse of the neume symbol.

2.1 Fourier Transform of a Contour

For the sake of the FT and in correspondence with (1) we will consider the contour z as a complex function: z (i ) = x (i ) + jy (i ) = z (i ) e jϕ (i ) = z (i ) exp( jϕ (i ) ) , i = 0,1,K , N − 1,

(4)

where x and y are its real and imaginary components in Cartesian representation, z and ϕ = arg(z ) are the respective module and phase in polar representation, and

j = − 1 is the imaginary unit (see Fig.1,a). Thus, according to (2) and (3), the conditions for the DFT are fulfilled: zˆ (k ) = F ( z )(k ) =

1 N

∑z(i)exp(− jΩki), N −1

k = 0,1, K , N − 1 , Ω =

i =0

2π , N

(5)

where zˆ is the spectrum of z , zˆ (k ) , k = 0,1,..., ( N − 1) are the respective harmonics, also called FDs, and the values Ω | k | have the sense of angular velocity. The Inverse DFT relates the spectrum zˆ to the contour z : z (i ) = F -1 ( z )(i ) = ∑zˆ(k ) exp( jΩki) , i = 0,1,K , N − 1 , N −1

(6)

k =0

which after equivalent transformations can be written in the form:

∑

N 2 −1

z (i ) = rest +

zˆ(k ) exp( jΩki) , i = 0,1,K , ( N 2 − 1) , N 2 = ⎡N / 2⎤ ,

⎧ zˆ( N 2 ), if N is odd rest = ⎨ if N is even ⎩0 , k = − ( N 2 −1)

(7)

Considered in polar coordinates, (6) and (7) lead to a useful interpretation: Interpretation 1. The contour z represented as a sum of pairs of radius-vectors r r (rk , r− k ) , k = 1,2...( N 2 − 1) , N 2 = ⎡N / 2⎤ , rotating with the same angular velocity Ω k ,

Invariant Fourier Descriptors Representation of Medieval Byzantine Neume Notation

195

r r but in different directions: rk in positive and the symmetric r−k in negative direction, r r where rk ⇔ zˆ(k ) and r− k ⇔ zˆ(−k ) ⇔ zˆ( N − k ) . To this vector-sum we have also the r static CoG (Center of Gravity) vector, r0 ≡ zˆ (0) as well as the residual vector r rN ≡ zˆ( N 2 ) which is different from zero only if N is even (see Fig.1,b). 2

r

Apparently the terms harmonics zˆ ( k ) , descriptors zˆ ( k ) , and radius-vectors rk , k = 0,1,..., ( N − 1) are almost identical, but express different interpretations of the contour spectrum zˆ . Thus, according to the Interpretation 1 each separate pair outlines an ellipse with a variable speed which direction depends on which of the two radiusvectors dominate by module. The following practical rules can be derived: Rule 1. The base harmonics zˆ (1) and zˆ (−1) cannot be zero at the same time, i.e. r r r1 + r−1 ≠ 0 . The opposite means that the contour is traced more than once, which is

impossible with the used algorithm for contour trace. Rule 2. If the direction of the contour trace is positive (counterclockwise), then r r r r | r1 | ≥ | r−1 | , otherwise | r1 | ≤ | r−1 | (clockwise). For concreteness we assume that the direction of the contour trace is positive, i.e. | zˆ (1) | ≥ | zˆ(−1) | that respects our case. An important property of FDs is that the harmonics which correspond to the low frequencies contain the information about the more general features of the contour, while the high frequencies correspond to the details. In this sense we shall give: Definition 1. Reduced FD of length L we will call the following spectral representation of the contour z :

~ ⎧ zˆ(k ), 0 ≤ k ≤ L zˆ (k ) = ⎨ L < k < N 2 , N 2 = ⎡N / 2 ⎤ ⎩0 ,

(8)

for a boundary value L , 0 ≤ L ≤ ⎡N / 2⎤ . L and respectively the frequency ΩL can be evaluated using the least-square criterion:

ε2 =

1 N

∑ | z (i) − ~z (i) | N −1

2

< ε 02 ,

(9)

i =0

where ~ z is the approximation of the contour z which corresponds of the reduced ~ frequency representation zˆ and ε 02 is some permissible value of the criterion ε 2 . 2.2 Linear Normalization of Contour in the Frequency Domain

For the aims of creation of a self-learning classifier for the neume symbols a measure of similarity between the normalized individual representatives is needed. These normalizations can be relatively easily performed in the frequency domain, using the FDs.

196

D. Dimov and L. Laskov

Translational normalization. Given (6) for the translated by a vector zˆ (0) contour z we have that

z (i ) − zˆ (0) = ∑zˆ (k ) exp( jΩki) , i = 0,1,K, N − 1 , N −1

(10)

k =1

1 N −1 ∑ z (i) according to (5). Obviously the new contour ν ≡ z(i) − zˆ(0) N i =0 coincides with the original z , but the coordinate system is translated in its CoG, i.e. the static harmonic of ν is equal to zero: F (ν )(0) = 0 , while all others remain un-

where zˆ (0) =

changed: F (ν )(k ) = F ( z )(k ) , k = 1,2,..., ( N − 1) . Hence, the transitional normalization can be achieved by z (i ) := z (i ) − zˆ (0) , i = 0,1,..., ( N − 1) , where “:=” denotes the operation assignment. Scale normalization. Assume that we have the contour scaled by an unknown coefficient s , i.e.:

v which is a version of z ,

v(i) = sz (i), i = 0,1,K , N − 1 , s ≠ 0 .

(11)

Thus, the spectral representation of ν will be scaled by the same coefficient. Really, for the forward DFT of (10), it follows from (5): vˆ(k ) =

1 N

sz (i)exp(− jΩki) = ∑z (i)exp(− jΩki) = szˆ (k ), ∑ N i=0 i =0 N −1

s

N −1

k = 0,1, K , N − 1

(12)

Therefore, the scale invariance of the contour can be achieved dividing the modules of its harmonics with some non-zero linear combination of them. In the case of the algorithm of Pavlidis [7], which we use for neume contour trace, the first positive or negative harmonic is different from zero, depending on the contour trace direction. Thus, without loss of generality we may consider the module of the first harmonic is non-zero, i.e. the scale invariance can be achieved by a division of all the harmonics by it. Thus, for the spectrum νˆ s of the scale normalized contour ν s : vˆ s (k ) =

| vˆ(k ) | s | zˆ(k ) | | zˆ(k ) | = = , k = 0,1,K , N − 1 | vˆ(1) | s | zˆ(1) | | zˆ (1) |

Hence, scale normalization can be done by zˆ (k ) :=

(13)

| zˆ (k ) | , k = 0,1,K , N − 1 . | zˆ (1) |

The case of irregular scaling s x ≠ s y is not interesting for our application. For completeness, we will mention that in this case we can preliminary calculate the 2D ellipsoid of inertia for the given neume and to reshape it according to the main axis of the ellipsoid to its transformation to circle and then to continue with (13). Rotational normalization. Suppose we have the contour v which is a version of the contour z , rotated to an unknown angle α . If the contours are preliminary normalized with respect to translation, i.e. their common CoG coincides with the beginning

Invariant Fourier Descriptors Representation of Medieval Byzantine Neume Notation

197

of the coordinate system, the rotation to α corresponds to multiplication of the complex representation of z with e jα . v(i ) = e jα z (i ), i = 0,1,K , N − 1

(14)

The spectrum of the contour will be rotated by the same angle α . Indeed, because of the linearity of DFT, (5) and similarly to (12), for (14) we have: vˆ(k ) =

1 N

e jα z (i )exp(− jΩki) = e jα zˆ (k ), ∑ i =0 N −1

k = 0,1,K , N − 1

(15)

And so, the rotation by an angle α in the object domain corresponds to rotation by the same angle α of the phases of the contour spectrum. Therefore, there are two approaches to provide the rotational invariance of the final contour representation. The first is to ignore the phases of the spectrum which leads to the rotationally invariant representation, but also to a big lost of information. The second approach is to normalize the spectrum phases by the phase of some of the harmonics, for example the first one vˆ(1) , for which we consider again vˆ(1) ≠ 0 . Thus, for the spectrum vˆα of the rotationally normalized contour ν α , we have: vˆα (k ) =

vˆ(k ) e jα zˆ(k ) zˆ(k ) , k = 0,1,K, N − 1 = jα = exp( j arg(vˆ(1))) e exp( j arg(zˆ(1))) exp( j arg(zˆ(1)))

Hence, rotational normalization is: zˆ (k ) :=

(16)

zˆ (k ) , k = 0,1,K , N − 1 . exp( j arg(zˆ (1)))

Starting point normalization. The algorithm of Pavlidis does not guarantee that the contour trace of two identical symbols will start from one and the same start-point. The contour start-point change can be simply examined in the frequency domain. Suppose we have the contour v which is a version of the contour z with shifted start-point by Δ positions:

v(i) = z (i + Δ ), i = 0,1,K , N − 1

(17)

Statement 1. Let two contours z and v , given in the complex plane, corresponds each other as (16). Then their correspondence in the frequency domain is given by:

vˆ(k ) = e jΩkΔ zˆ (k ), k = 0,1,K, N − 1 . Proof: see the Appendix.

(18) ♦

According to this statement, the integer shift Δ of the start-point of the contour in the object domain corresponds to multiplication of the phases of its spectrum by the constant exp( jΩkΔ) , or equivalently to rotations of the phases as follows: the k -th phase is rotated to an angle δ (k ) , δ (k ) = ΩΔk , k = 0,1,..., ( N − 1) . This normalization can be treated analogously to the rotational one, again in two approaches. The invariance in the first approach is trivial. To achieve invariance in

198

D. Dimov and L. Laskov

the second approach, we propose the procedure: Normalize each harmonic of the spectrum vˆ with the phase of the first non-zero harmonic vˆ(m) ≠ 0 , where m > 1 : vˆ Δ ( k ) =

vˆ ( k ) e jΩkΔ zˆ ( k ) zˆ ( k ) = j ( Ωm Δ ) k / m = , exp ( j arg (vˆ( m ) )k / m ) e exp ( j arg ( zˆ ( m ) )) exp ( j arg (zˆ ( m ) ))

(19)

for k = 0,1, K , N − 1 . Then the modified spectrum vˆΔ corresponds uniquely to the all contours that are isomorphic to the original z but with an arbitrary selected startzˆ(k ) point: Hence, we normalize: zˆ(k ) := , k = 0,1,..., N − 1 . exp( j arg(zˆ(m))k / m) Definition 2. We will call linearly normalized reduced FD (LNRFD) of the original contour z the reduced FD of z after its processing by (10), (13), (16) and (19). Besides, (10) has to be the first one, and in case of s x ≠ s y , (13) has to be the second, else arbitrary, while (16) and (19) has to be applied one after another at least q times, q ≥ (ln arg (zˆ ( m ) ) − arg (zˆ (1) ) − ln(ε ) ) ln( m ) , to obtain finally: arg(zˆ(1) ) ≤ ε and arg(zˆ (m) ) ≤ ε , where ε is chosen arbitrary close to zero.

3 Conclusion In the paper we propose an approach for constructing of LNRFDs for medieval Byzantine neume notation, which are invariant with respect to the translation, scaling, rotation and change of the contour start-point. Theoretical grounds of considered normalizations are described in more detail. For the aims of experiment, original software has been developed to extract the LNRFDs of each neume segmented in a document. These LNRFDs play the role of index into a database of neume objects. The next stage of the proposed methodology for medieval neume notation processing and recognition will be the organization of an unsupervised learning on the basis of the above described LNRFD. After the database sorting through the LNRFD-index, the problem will be reduced to a 1D clustering problem. Acknowledgments. This work was partially supported by following grants of IITBAS: # DO-02-275/2008 of the National Science Fund at Bulgarian M. of Education & Science, and Grant # 010088/2007 of BAS.

References 1. Dalitz, C., Michalakis, G.K., Pranzas, C.: Optical recognition of psaltic byzantine chant notation. IJDAR 11(3), 143–158 (2008) 2. Dimauro, G.: Digital Transforms in Handwriting Recognition. In: Impedovo, S. (ed.) FHWR. NATO ASI Series “F”, vol. 124, pp. 113–146. Springer, Heidelberg (1994) 3. Dimov, D.: Fast Image Retrieval by the Tree of Contours’ Content, Cybernetics and Information Technologies, BAS, Sofia, 4(2), pp. 15–29 (2004) 4. Gezerlis, V., Theodoridis, S.: Optical character recognition of the orthodox hellenic byzantine music notation. Pattern Recognition 35(4), 895–914 (2002)

Invariant Fourier Descriptors Representation of Medieval Byzantine Neume Notation

199

5. Folkers, A., Samet, H.: Content-based image retrieval using Fourier descriptors on a logo database. In: 16th Int. Conf. on Pattern Recogn., vol. 3, pp. 521–524 (2002) 6. Zhang, D., Lu, G.: A comparative study on shape retrieval using Fourier descriptors with different shape signatures. In: Intelligent Multimedia, Computing and Communications Technologies and Applications of the Future, Fargo, ND, USA, June 2001, pp. 1–9 (2001) 7. Pavlidis, T.: Algorithms for Graphics and Image Processing. Springer, Heidelberg (1982) 8. DDL of Chant Manuscript Images, http://www.scribeserver.com/NEUMES

Appendix: Proof of Statement 1 If Δ = 0 , then the statement is obviously true. Let Δ ≠ 0 . Then, for each harmonic vˆ(k ) , k = 0,1,K , N − 1 from the spectrum of the contour v , according to (16): νˆ (k ) =

1 N

ν (i )exp( − jΩki) = ∑ i =0 N −1

exp( jΩkΔ) N −1 z (i + Δ )exp(− jΩk (i + Δ) ) ∑ N i =0

Using the substitution l = i + Δ , we get:

νˆ(k) =

N +Δ−1 ⎞ exp(jΩkΔ) N+Δ−1 exp(jΩkΔ) ⎛ N−1 ⎜⎜ ∑z(l)exp(− jΩkl) + ∑ z(l)exp(− jΩkl)⎟⎟ z(l)exp(− jΩkl) = ∑ N N l =Δ l =N ⎠ ⎝ l =Δ

Because of the periodicity (2) of the contours z (l ) = z (l ± N ) we have:

νˆ(k ) = =

Δ −1 ⎞ exp( jΩkΔ) ⎛ N −1 ⎜⎜ ∑z (l )exp(− jΩkl ) + ∑ z (l − N )exp(− jΩk (l − N + N ))⎟⎟ = N l − N =0 ⎝ l =Δ ⎠ Δ −1 ⎞ exp( jΩkΔ) ⎛ N −1 ⎜⎜ ∑z (l )exp(− jΩkl ) + ∑z (m)exp(− jΩkm)exp(− jΩkN ) ⎟⎟ N m=0 ⎝ l =Δ ⎠

But, according to (5), ΩN = 2π , and hence exp( − jΩkN ) = 1 . Thus, finally:

νˆ(k ) = =

Δ −1 ⎞ exp( jΩkΔ) ⎛ N −1 ⎜⎜ ∑z (l )exp(− jΩkl ) + ∑z (m)exp(− jΩkm) ⎟⎟ = N m=0 ⎝ l =Δ ⎠

exp( jΩkΔ) N −1 z (l )exp(− jΩkl ) = exp( jΩkΔ) zˆ(k ) , k = 0,1,K, N − 1 ; ∑ N l =0

which we had to prove.

Bio-Inspired Reference Level Assigned DTW for Person Identification Using Handwritten Signatures Muzaffar Bashir and Jürgen Kempf Faculty of Electronics and Information Technology University of Applied Sciences Regensburg, Germany [email protected], [email protected] www.bisp-regensburg.de

Abstract. Person identification or verification is becoming an important issue in our life with the growth of information technology. Handwriting features of individuals during signing are used as behavioral biometrics. This paper presents a new method for recognizing person using online signatures based on reference level assigned Dynamic Time Warping (DTW) algorithm. The acquisition of online data is carried out by a digital pen equipped with pressures and inclination sensors. The time series obtained from pen during handwriting provide valuable insight to the unique characteristics of the writers. The obtained standard deviation values of time series are found person specific and are used as so called reference levels. In the proposed method reference levels are added to time series of query and sample dataset before dynamic time warping distance calculations. Experimental results show that the performance of accuracy in person authentication is improved and computational time is reduced. Keywords: online signature authentication, dynamic time warping, biometric person authentication, signature normalization.

1 Introduction and Related Work Person identification or verification is an important issue in our life with the growth of digital age. Widely acceptance of person verification by handwritten signature and because of history of its use in transactions and documents authorization, online signature verification is a topic of present research. A handwritten signature generation is a process of sequence predetermined by the brain activity and reflects neuro-motors of the signer [1]. Natural variations of person’s handwritten genuine signatures can be minimized by consistency and practice. Therefore recognizable signature pattern can be made available for biometric identification. Unlike static signature verification methods in which the image of the signature is digitized and taken into account for comparison, the dynamic signature verification methods consider how the handwritten signature is generated. Therefore the signing process in terms of x-y coordinates or pressures, speeds, timing and inclinations etc are recoded and used for signature comparisons. So it is difficult for forger to recreate this signing information [4],[5],[6]. The person verification or identification by handwritten PIN can also be considered in this regard [6]. Signature identification or verification can be categorized into two main J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 200–206, 2009. © Springer-Verlag Berlin Heidelberg 2009

Bio-inspired Reference Level Assigned DTW

201

types: parametric and functional approaches. In parametric approach only the parameters or features set abstracted from the complete signals are used for signature matching. Because of higher level of data abstraction, these approaches are generally very fast but it is difficult to select the right parameters. On the other hand, functional approaches use the complete signals as features set in terms of time series which essentially contains more signing information hence provide more accurate results[5]. Dynamic Time Warping based classifier has been successfully applied in this regard since couple of decades. DTW is computationally expensive. There are some speed up techniques which reduces the number of data point comparisons by introduction of bands like Sakoe-Chiba band or Itakura parallelogram [7],[8] piecewise aggregate representation of time series for DTW (PDTW),[2] data down-sampling [6] segment to segment matching [4] and extreme points warping [5] etc. In this paper we present classic dynamic time warping based essentially functional approach which also includes one parametric feature. In our study work the acquisition of online signature data is carried out by a digital pen in terms of five time series. The time series obtained from handwriting provides valuable insight to the unique characteristics of the writers. In our study works it is noted that the reference level obtained from the standard deviation values of time series is person specific, we name it as bio-inspired reference level. In proposed method reference levels are added to time series of reference and sample dataset before dynamic time warping distance calculations therefore amplitude values of sequences are shifted to different base levels. Bashir and Kempf [3] introduced a simple method for multi-dimensional channel data conversion to one time series by direct sum of all channels with no lose of accuracy. We also take the advantage of dimension reduction by direct sum. This paper deals with person identification by applying Dynamic Time Warping and its variant Bio-inspired Reference Level Assigned Dynamic Time Warping a technique which provides fast and accurate classification results. In section 2 database and classifier used for our experiment is described as well as the concept of proposed method and speed up by dimension reduction of time series are discussed. Then in section 3 experimental results are presented. Section 4 finally summarizes the major findings and highlights the future prospects of the application.

2 Database and DTW Based Classifier Signature database consists of 420 signatures from 42 writers (10 signatures from each writer). Signatures are captured by a digital pen. The pen is equipped with a diversity of sensors measuring two refill pressures, one finger grip pressure holding the pen and acceleration and tilt angle of the pen during handwriting on commonly used paper pad. A captured signature can be represented by time series of five sensor channels as: x(t) horizontal pressure, y(t) finger grip pressure, z(t) vertical pressure, α(t) longitudinal and β(t) vertical angles. For evaluation task, the data base is divided into query (test) and reference (prototype) samples. Dynamic Time Warping (DTW) based classifier measure distance for matching two signature time series. The minimum distance determines the similarity. Natural variations of person’s handwritten genuine signatures in terms of non-linear distortions in time domain are minimized before DTW distance calculations. Generally Euclidean distance is determined for optimal aligned time series. The review of DTW is omitted, we refer to [2] and [7] for details.

202

M. Bashir and J. Kempf

2.1 Preprocessing of Time Series After acquisition, the data is pre-processed in order to eliminate the potential sensor noise. The essential pre-processing steps are segmentation of data, smoothing, normalizing and down-sampling of data without discarding valuable information. The signatures are captured separately therefore no separate segmentation of signals is required in our study work. Smoothing of data based on local regression is done to minimize sensor noise. In order to compensate partly large variations in time duration normalization of two signature data is done in such a way that time is normalized to short signature signal. In order to reduce complexity of DTW based classifier, five dimensional times of one signature are converted to one dimension by direct sum in such a way that the amplitude of five channel data is normalized to [-1 1] before conversion. Further data is down-sampled to a lower sampling rate. We use smooth(,) and decimate(,) functions for smoothing and data down-sampling of MATLAB. Data processing and DTW algorithms implementation in MATLAB were done by using Pentium 4 processor (2.4 GHz, 3 GB RAM). 2.2 Reference Level Assignment to Time Series In DTW based classifiers, generally the signatures time series are normalized in time and amplitude domains but we proposed a special treatment for amplitude shifting. In our study work it is found that the Reference Level (RL) is unique for a writer. We added person specific so called reference level to the corresponding time series, consequently amplitude values are shifted to new base levels. The distribution of

Fig. 1. Person specific reference level values are shown against number of writers. This shows the distribution of values for 42 writers.

Bio-inspired Reference Level Assigned DTW

203

reference level values is shown in Fig1. Standard deviation (STD) values are calculated for each channel signal and different combinations of these values are tested for best performance. Best RL value determined for accuracy in person identification is given by equation (1). RL= mean{ STD(x(t), STD(β(t))}

(1)

Where x(t) is refill pressure and β(t) is vertical angle of pen during handwriting. 2.3 Architecture of Proposed System Fig.2 shows the proposed system. It can be described as segmentation, noise removal, normalizing each individual channel amplitude to [-1 1], dimension reduction by sum and data sampling to lower sampling rate. Two schemes for classification of data are as followings: In first scheme two sequences are normalized in time domain in such a way that time is normalized to short signature signal while on the other hand in second scheme besides time normalization, amplitude values of time series are shifted to their bio-inspired reference levels.

Multichannel time series

Original Data time series

Normalization, Noise removal dimensions reduction

decision

Down sampling Of data

Reference Level Assignment

DTW classifier

Fig. 2. Proposed person identification system shows data acquisition, pre-processing: Normalization, noise removal and dimension conversion, data down-sampling and classification of signatures based on (1) standard DTW and (2) proposed reference level assigned DTW technique

3 Experiments and Results Signature database consists of 420 signatures samples from 42 writers (10 signatures from each writer). For evaluation task, the data base is divided into query (test) and reference (prototype) samples. We are interested in classification of sequences

204

M. Bashir and J. Kempf

obtained from handwritten signatures so, one out of 420 samples is repeated selected as query and matched with rest of all remaining samples. The minimum DTW distance determines the accuracy of match. In DTW based classifiers, the signature time series are generally normalized in time and amplitude domains. In order to evaluate the performance of proposed method we did two experiments. (a) DTW1: The DTW technique is applied to two signature sequences in such a way that time is normalized to shorter signature signal and amplitude base levels are shifting to person specific reference levels as described in section 2. (b) DTW2: The DTW technique is applied to two signature sequences in such a way that time is normalized to shorter signature signal and with no shifts in amplitude values. The time complexity of the both DTW techniques is O(mn) or O(m2) with m=n, where m is length of signature sequence [2]. The average data points for signatures in our study work were about 3000± 660. The speedup of computations is obtained by data down-sampling as shown in the table I. The speed up obtained over classic DTW in our experiments is O(m2)/D2, where D is a down-sampling factor. The classification accuracy of signature sequences in terms of Error Rate is shown in the table I. At D=6, the Error Rate of classification of proposed method DTW1 is about 3 times lower than that of DTW2 and a similar situation of lower error rates is shown for other values of D in table I . Another prospective of our experimental results is the computations reduction, as for DTW2 the Error Rate at D=30 is 0.2265 with computational complexity of O(m2)/900 while on the other hand proposed method is faster and has about same error rate at D=40 with computational complexity of O(m2)/1600. Similar effect is shown for other D values in the table 1. Table 1. The average performance for 42 writers for Error Rates: DTW1 proposed method for shifted amplitude values and DTW2 without amplitude shifting D

Error Rate DTW1

6 10 20 30 40 50 60 70 100

0.0058 0.0116 0.0290 0.1161 0.2371 0.4123 0.8188 1.1789 1.7422

DTW2

0.0174 0.0232 0.0581 0.2265 0.5923 1.2602 1.9744 2.8513 4.1115

The receiver operating characteristic ROC curves for DTW1 and DTW2 are shown in the Fig.3. The higher value of area under the curve AUC for DTW1 over AUC value of DTW2 in figure shows better classification of signature sequences with the help of proposed method DTW1.

Bio-inspired Reference Level Assigned DTW

205

Fig. 3. ROC curves for DTW1 (line with cross as marker) and DTW2 (line with dot as marker) are shown. The figure is zoomed to lower scale in order to increase readability.

4 Conclusion In this paper we introduced a new reference level assigned dynamic time warping technique for person identification based on handwritten signatures. The acquisition of online data is carried out with the help of digital pen during handwriting on commonly used paper pad. Generally in DTW based classifiers, the signatures sequences are normalized in time and amplitude domains. We introduced a special approach to amplitude normalization. A useful feature (reference level) of handwriting for individuals during signing is found unique for each writer. In the proposed method reference level are added to the amplitude of signature sequences consequently the base levels of amplitudes are shifted to different levels. We achieve the goal of getting lower errors in signature classifications with the help of proposed method. The experimental results show speedup of computations over classical DTW because proposed method allows high level of data abstraction in terms of data down-sampling without loss of accuracy. Further speed up of computations can be achieved by involving the state of the art fast DTW algorithms. The focus of the present study work is signature classification for person identification. The effectiveness of the proposed method need to be realized as future work, in case of person identification while data is sampled from the same writer in different sessions on different days and in case of verification tasks in the present of forgery tests.

Acknowledgment The support given by G. Scharfenberg, G. Schickhuber and BiSP team from the University of Applied Sciences Regensburg is highly acknowledged.

206

M. Bashir and J. Kempf

References 1. Impedovo, D., Modugno, R., Pirlo, G., Stasolla, E.: Handwritten Signature Verification by Multiple Reference Set. In: Int. Conf. on Frontiers in Handwriting Recognition ICFHR (2008) 2. Keogh, E.J., Pazzani, M.J.: Scaling up dynamic Time Warping for Data mining Applications. In: Proc. 6th Int. Conf. on Knowlegde Discovery and Data Mining. KDD (2000) 3. Bashir, M., Kempf, J.: Reduced Dynamic Time Warping for Handwriting Recognition Based on Multi-dimensional Time Series of a Novel Pen Device. In: IJIST, WASET, Paris, vol. 3.4 (2008) 4. Zhang, J., Kamata, S.: Online Signature Verification Using Segment-to-Segment Matching. In: Int. Conf. on Frontiers in Handwriting Recognition ICFHR (2008) 5. Hao, F., Wah, C.C.: Online signature verification using a new extreme points warping technique. In: Pattern recognition, vol. 24. Elsevier Science, NY (2003) 6. Bashir, M., Kempf, J.: Person Authentication with RDTW using Handwritten PIN and signature with a Novel Biometric Smart Pen Device. In: SSCI Computational Intelligence in Biometrics. IEEE, Nashville (2009) 7. Keogh, E., Ratanamahatana, C.A.: Exact indexing of dynamic time warping. In: Knowledge and Information Systems, pp. 358–386. Springer, London (2004) 8. Henniger, O., Muller, S.: Effects of Time Normalization on the Accuracy of Dynamic Time Warping. In: BTAS, pp. 27–29. IEEE, Los Alamitos (2007)

Pressure Evaluation in On-Line and Off-Line Signatures Desislava Dimitrova and Georgi Gluhchev Institute of Information Technologies - BAS, 2, Acad. G. Bonchev Str., 1113 Sofia, Bulgaria {ddimitrova,gluhchev}@iinf.bas.bg

Abstract. This paper presents a comparison of pressure of signatures acquired from a digitizing tablet using an inking pen. The written signature is digitized with a scanner and a pseudo pressure is evaluated. The experiments have shown that the obtained histograms are very similar and the two modalities could be used for pressure evaluation. Also, the data from the scanned image proved to be more stable which justifies its use for signature authentication. Keywords: graphical tablet, scanner, signature, pressure evaluation, pressure distribution, sensor interoperability.

1 Introduction Signatures are recognized and accepted modality for authentication. That is why developing of reliable methods for signature authentication is a subject of high interest in biometrics. There are two methods for signature acquisition: off-line and on-line [6]. The offline method uses an optical scanner or CCD camera to capture written signatures, and specially designed software to measure geometric parameters as shape, size, angles, distances, and like. However, important dynamic parameters and pressure can not be measured directly. Nevertheless, one can estimate dynamic information from a scanned image using pseudo dynamic features [2,3,5] considering pixel intensities in a grayscale image. It is intuitively acceptable to interpret dark zones in a grayscale image as zones of high pressure. The on-line method where signature is captured during signing seems to be more appropriate due to the straightforward measurement of pressure, speed and other writer specific parameters like pen tilt, azimuth, velocity, acceleration, etc. Devices used in on-line method are touch screens, Tablet PCs, PDAs, and graphical tablets. Often signature verification systems are designed to work with a particular input sensor and changing the sensor leads to decreasing of the system’s performance. Various input sensors exist so the problem of sensor interoperability arises. Sensor interoperability can be defined as the capability of a biometric system to adapt to the data obtained from different sensors. In [1] sensor interoperability is evaluated on a signature verification system by using two different Tablet PCs brands with similar hardware and having 256 levels of pressure. In this paper we investigate the sensor interoperability problem by means of a comparison of pressure data taken from a set of signatures. We use an on-line J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 207–211, 2009. © Springer-Verlag Berlin Heidelberg 2009

208

D. Dimitrova and G. Gluhchev

(graphical tablet) and an off-line (image scanner) input sensors with different pressure levels (1024 and 256 respectively). The paper is structured as follows. Section 2 discusses the proposed approach. Section 3 presents some experimental results. Finally, section 4 includes some concluding remarks.

2 The Approach In off-line signature investigations experts try to use different pressure as a reliable identification feature due to its immunity to forgery. As a rule they usually distinguish between three levels of pressure: high, medium and low. However, it is quite difficult to define quantitative thresholds for the automatic detection of the zones of specific pressure. One possibility is to try to automatically construct three clusters of pressure and classify the strokes according to them. Another way which seems more appropriate when a comparison between different devices is needed, and which was used in the paper, is to present pressure by a relatively small number of levels, say 16, that could be presented in pseudo colors if required. It is especially interesting to see what the situation is in case of different modalities, like on-line and off-line signatures. To investigate the problem, we used a graphical tablet with inking pen. Thus, same signature placed in a sheet of paper on the tablet could be captured directly by the tablet software and could be scanned after that. In both cases triplets (x, y, pressure) will be registered and saved into a data base. For the comparison between the two data sets histograms of 16 bins were used. 2.1 Data Acquisition The digitized signature consists of a set of sample points, captured at device dependent frequency. For the tablet the points number is directly proportional to the time of signing. In this investigation we used a digitizing tablet WACOM Intuos3 A5 PTZ-630 with a resolution of 5080 lines per inch, 1024 levels of pressure, an acquisition area of the pad of 152.4 x 210.6 mm, and sampling rate of 200 points per second coupled with an inking pen which allows writing on a white sheet of paper placed on the tablet pad. Signature data is obtained using the Tablet SDK 1.7 in a C# software program. After that the sheet is scanned at a resolution of 200 dpi and 256 grey levels. For this HP ScanJet 3400C scanner and a software program written in MATLAB were used. 2.2 Data Preprocessing Due to the high sampling rate of the digitizing tablet (200 pps), we have to perform re-sampling in order to get rid of redundant data. In this way we lose information concerning the writing speed, which is implicitly incorporated in the data, but this is not of vital importance because in this study only x and y coordinates and the pressure are used. All of the repeated points come with different level of pressure, so we set pressure to its average. The graphical tablet reports pressure values as integers in [0, 1023] interval. The higher the pressure value, the darker the corresponding pixel appears. Grayscale pixel

Pressure Evaluation in On-Line and Off-Line Signatures

209

intensities of the scanned signature image fall in the range [0,255]. Here the lowest gray level is associated with the pixel of highest pressure, which is opposite to the tablet. So, we have to adjust the two ranges and invert one of them. For this the tablet’s range was squeezed to 0-255, and the values were inverted. But, this is not sufficient. There is one more thing that has to be taken into account. It concerns the different width of lines in both signatures. While the tablet submits lines of one pixel of width, the lines from the scanned image may be of a few pixels of width each of them of different intensity. This is because the ink is spread around the central line and the border pixels are brighter than the central ones. To overcome this, only the gray levels alongside the skeleton have to be used in the evaluation (Fig. 1). The second pitfall comes from the repetition of same pixels captured by the tablet. To avoid this, only one of the repeated pixels was preserved and the average pressure value of all of his twins was used.

Fig. 1. From left to right: original signature and its skeleton

Fig. 2. From left to right: histogram of the scanned signature and histogram of the signature captured by the tablet

210

D. Dimitrova and G. Gluhchev

2.3 Pressure Presentation In many cases histograms are used for general presentation of data. They are appropriate when data has to be presented according to its magnitude, which is the case of pressure. Thus, two 16 bins histograms were built for the obtained signatures by the scanner and by the tablet dividing the dynamic range in 16 intervals of equal length (Fig. 2).

3 Experimental Results A group of three individuals were used to collect signatures. All the signatures of a particular signer have been captured at the same time. Thus, no changes are involved due to time delay. The histograms can be compared either globally by evaluating the distance between them (using Euclidean distance) or locally, bin by bin, looking for the largest difference. The values of both distances between the histograms of same signatures are shown in Table 1. Table 2 and Table 3 show the results from histogram comparison (global distance and max bin distance) within the set of signatures of a given signer for the scanned signature images and for the signatures captured by the tablet, respectively. All pairs of the signatures used in the experiments have similar histogram forms (Fig. 2) and some variation in bin values. An interesting and unexpected observation is that the distances between signatures of same writer captured by the tablet are much higher than the distances between corresponding signatures captured by the scanner. This points out that pressure evaluation in case of scanned image is more stable. Even the distances between the histograms from scanned images and tablet are smaller than the distances between tablet histograms. Table 1. Global distance and max bin distance between the corresponding histograms

Signature # 1 2 3 4 5 6 7 8

Global distance 16,36 8,90 8,24 28,39 23,55 29,50 18,49 21,57

Max bin distance 9,06 6,91 5,13 17,01 10,30 12,70 11,13 13,62

Table 2. Histogram comparisons carried out for each individual for scanned signature images. Each cell (i,j) contains global distance / max bin distance between i-th and j-th signatures of the signer.

1 2

2 4.47/ 2.46

3 9.72/ 4.83 12.56/ 7.29

4 6

5 8.06/ 4.78

6 7.27/ 4.93 3.33/ 1. 90

7 8

8 5.57/ 3.89

Pressure Evaluation in On-Line and Off-Line Signatures

211

Table 3. Histogram comparisons carried out for each individual for the signatures captured by the tablet. Each cell (i,j) contains global distance / max bin distance between i-th and j-th signatures of the signer.

1

2 20.32/ 13.35

2

3 12.82/ 8.93 15.82/ 10.13

4 6

5 21.8/ 14.40

6 17.8/ 13.01 20.8/ 12.29

7 8

8 29.85/ 15.31

4 Conclusions In this paper a comparison has been carried out about the similarity in pressure obtained from same signature, captured simultaneously by a graphical tablet and a scanner. The observed similarity justifies the use of pressure from scanned images, which is usual practice in forensic investigations and document authentication. But while being similar in shape, the obtained histograms are not interchangeable, i.e. it does not seem possible to do identification/verification based on pressure evaluation of scanned signatures, if the comparison will be carried out with pressure values, obtained by a tablet. Further investigation will be required for the estimation of pressure distribution of signatures of same individuals in both modalities. Acknowledgments. This investigation was supported by the Ministry of Education and

Sciences in Bulgaria, contract No BY-TH-202/2006.

References 1. Alonso-Fernandez, F., Fierrez-Aguilar, J., Ortega-Garcia, J.: Sensor interoperability and fusion in signature verification: A case study using Tablet PC. In: Li, S.Z., Sun, Z., Tan, T., Pankanti, S., Chollet, G., Zhang, D. (eds.) IWBRS 2005. LNCS, vol. 3781, pp. 180–187. Springer, Heidelberg (2005) 2. Ammar, M., Yoshida, Y., Fukumura, T.: A new Effective Approach for Automatic Off-line Verification of Signatures by using Pressure Features. In: 8th International Conference on Pattern Recognition, pp. 566–569. IEEE Press, Paris (1986) 3. Fierrez-Aguilar, J., Alonso-Hermira, N., Moreno-Marquez, G., Ortega-Garcia, J.: An offline signature verification system based on fusion of local and global information. In: Maltoni, D., Jain, A.K. (eds.) BioAW 2004. LNCS, vol. 3087, pp. 295–306. Springer, Heidelberg (2004) 4. Hennebert, J., Loe_el, R., Humm, A., Ingold, R.: A new forgery scenario based on regaining dynamics of signature, pp. 366–375. IEEE Press, Seoul Korea (2007) 5. Nestorov, D., Shapiro, V., Veleva, P., Gluhchev, G., Angelov, A., Stoyanov, I.: Towards objectivity of handwriting pressure analysis for static images. In: 6th Int. Conf. on Handwritting and Drawing ICOHD 1993, Paris, July 5-7, pp. 216–218 (1993) 6. Plamondon, R., Lorette, G.: Automatic signature verification and writer identification – the state of the art. Pattern Recognition 22(2), 107–131 (1989)

Confidence Partition and Hybrid Fusion in Multimodal Biometric Verification System Chaw Chia, Nasser Sherkat, and Lars Nolle School of Computing and Technology Nottingham Trent University, Nottingham, UK {chaw.chia,nasser.sherkat,lars.nolle}@ ntu.ac.uk

Abstract. Sum rule fusion is a very promising multimodal biometrics fusion approach. However, it is proposed not to widely applying it across the multimodal biometrics score space. By examining the score distributions of each biometric matcher, it can be seen that there exist confidence regions which enable the introduction of the Confidence Partition in multimodal biometric score space. It is proposed that the Sum rule can be replaced by the Min or the Max rule in the Confidence Partition to further increase the overall verification performance. The proposed idea which is to apply the fusion rules in a hybrid manner has been tested on two publicly available databases and the experimental results shows 0.3% ~ 2.3% genuine accept rate improvement at relatively low false accept rate.

1 Introduction Multimodal biometrics have attracted great interest in the biometric research field in recent years. Given its potential to out perform single biometrics verification, many researchers have put their efforts in exploration of different integration techniques. However, integration at the score level is the most preferred approach due to the effectiveness and ease in implementation [1]. The Sum rule, one of the well known score level fusion rule is a method that simply utilises the addition of each biometric scores as fusion result. Surprisingly, it appears to be outperforming many complicated fusion algorithms [2] and being widely employed in biometric research [3, 4, 5, 6, 7, 8]. Through sensitivity analysis, Kittler concluded that the superior performance of the Sum rule is due to it resilient ability to estimate error [9]. In this paper, the assignment of Confidence Partitions (CP) in multimodal biometrics score space has been introduced. Instead of applying the Sum rule over the complete region of multimodal biometrics score space, we suggest to replace the Sum rule in the different CPs with more appropriate rules (Min and Max rule in this paper). This scheme enables the fusion of multimodal biometrics in a hybrid manner including the Sum rule. Figure 1 illustrates a typical biometric matcher score distribution that includes a genuine user and an impostor score distributions. There is a significant overlap region of the curves that causes the main difficulty to classify the claimant into the genuine user or impostor groups. The shaded regions outside the overlap part are confidence J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 212–219, 2009. © Springer-Verlag Berlin Heidelberg 2009

CP and Hybrid Fusion in Multimodal Biometric Verification System

213

regions. They represent the regions where only a single class of users can be found. Although the Sum rule performs well to produce reliable fusion scores, when the biometric scores are located in a confidence region it is suggested to apply a more appropriate rule instead of the Sum rule for a more reliable fusion score, for example the Min, Max rule [9] or the decision fusion rule [10]. The rest of the paper is organised as follows: Section 2 provides details about the proposed integration method. Section 3 presents the databases used, experiments, results and their analysis. Finally section 4 concludes the paper.

Fig. 1. Biometric matcher score distribution

2 Confidence Partition and Hybrid Fusion Even though the proposed idea is feasible in higher dimensional score space, it has only being used for the bimodal biometrics fusion in this paper. First of all, the score distributions of bimodal matchers are constructed (the distributions will be modeled by density estimation algorithm in future research). The regions within the distribution where only one type of user (either genuine user or impostor) is present are marked. Within the genuine user score distribution, the marked region is termed as genuine user confidence region whereas the region within the impostor score distribution is termed as impostor confidence region. Consequently, a two dimensional score space is created. The Genuine User Confidence Partition (GCP) in the score space is assembled from both modalities’ genuine user confidence regions. Also the Impostor Confidence Partition (ICP) in the score space is formed by both modalities’ impostor confidence regions. Prior to applying the fusion rule, we need to normalise the scores from different biometric matchers into a common domain before they can be effectively combined [11]. The simplest normalisation technique is the Minmax normalisation [11] which is showed in (1). It is a rule that maps the biometric scores into the interval between 0 and 1. The minimum value (min) and the maximum value (max) of the score distribution can be estimated from a set of matching scores. The notations shown in the equation represent the follows: Si is the biometric score of user i, S’i represents the normalised score for user i, Sfi is the after fusion score for the particular user, K represents the total number of matchers.

214

C. Chia, N. Sherkat, and L. Nolle

S 'i =

Si − min max − min

(1)

By introducing the CP, multiple rules can be applied over the multimodal biometric system in a hybrid manner. In this work, the rules (2) ~ (4) have been applied. The hybrid fusion scheme is implemented according to scenario shown in (5). 1. Sum Rule:

S fi = ∑ S 'i ,k , ∀i K

(2)

k =1

2. Min Rule:

S fi = min(S 'i ,1 , S 'i , 2 ,..., S ' K ) , ∀i

(3)

S fi = max(S 'i ,1 , S 'i , 2 ,..., S 'i , K ) , ∀i

(4)

3. Max Rule:

4. Hybrid Rule: Apply Min Rule, when < S’i,1 , S’i,2 ,…,S’i,K > fall in ICP. Sfi

=

Apply Max Rule, when < S’i,1 , S’i,2 ,…,S’i,K > fall in GCP.

(5)

Apply Sum Rule, elsewhere. As shown in equation (5), for the partitions where we have high confidence from the biometric matchers we can apply the Min or Max rule which is considered as the more appropriate rule than the Sum rule. The non-confidence partition which is the complement region of the CP exhibits the part that can be easily misclassified. Due to the superior performance of Sum rule in dealing with the estimation error mentioned in section 1, we employ this rule to these non-confidence partitions.

3 Experimental Results The proposed method has been tested on two publicly available databases, which are the NIST-BSSR1 multimodal database [12] and the XM2VTS benchmark database [13]. In the NIST-BSSR1 multimodal database, there are 517 genuine user scores and 266,772 impostor scores, whereas the XM2VTS database (evaluation set) includes 400 genuine user scores and 111,800 impostor scores. Both the databases are truly multimodal (chimeric assumption is not in used [14]). The performance graphs of each matcher in the databases are depicted in figure 2.

CP and Hybrid Fusion in Multimodal Biometric Verification System

(a)

215

(b)

Fig. 2. Performance of baseline matchers (a) NIST-BSSR1 Multimodal Matchers and (b) XM2VTS Matchers Performance

Only the best and the worst biometric matchers from each modality are chosen for the experiments. In the NIST-BSSR1 multimodal database, the right index fingerprint has been paired with the facial matcher C and the left index fingerprint has been paired with the facial matcher G to develop the best and worst multimodal biometrics fusion respectively. For the XM2VTS database, the best facial matcher DCTb-GMM is paired with the best speech matcher LFCC-GMM whereas the worst DCTb-MLP facial matcher is paired with the worst speech matcher PAC-GMM in the experiments. Table 1. Assignment of Confidence Partitions in the experiments Impostor

Genuine User

Non-

Confidence Partition

Confidence Partition

Confidence Partition

NIST-BSSR1 Best Matchers

Sface < 0.55 Sfinger < 0.15

Sface > 0.34 Sfinger > 0.20

Other than Confidence Partitions

NIST-BSSR1 Worst Matchers

Sface < 0.35 Sfinger < 0.09

Sface > 0.20 Sfinger > 0.20

Other than Confidence Partitions

XM2VTS Best Matchers

Sspeech < 0.48 Sface < 0.44

Sspeech > 0.41 Sface > 0.60

Other than Confidence Partitions

XM2VTS Worst Matchers

Sspeech < 0.43 Sface < 1.00

Sspeech > 0.67 Sface > 0.79

Other than Confidence Partitions

216

C. Chia, N. Sherkat, and L. Nolle

The GCP and ICP are assigned manually according to the figures in table 1. All the fusion results based on the best and worst multimodal matcher’s combination are graphically shown in figure 3 and figure 4. Their numerical results are also presented in table 2 and table 3. This is worth mention that the genuine accept rate (GAR) listed in the tables is reported to be 0.001% of the false accept rate (FAR). From the graphical and numerical results shown in figures 3 and 4 and tables 2 and 3, we can conclude that the proposed method outperforms the Sum rule fusion especially at lower FAR even though there are no significant improvements of the equal error rate (EER) which is the rate where FAR is equal to the false reject rate (FRR). The best matchers hybrid fusion for the NIST-BSSR1 dataset achieved 93% GAR which is 0.7% more than the Sum rule whereas in the XM2VTS the best matchers hybrid fusion achieved 96.3% GAR which is 0.3% better than the Sum rule. The GAR

(a)

(b)

Fig. 3. Performance of the NIST-BSSR1 bimodal biometrics fusion on (a) the best multimodal matchers and (b) the worst multimodal matchers Table 2. Accept rates and error rates of NIST-BSSR1 Multimodal database single biometrics and the combined multimodal biometrics

Fingerprint EER

GAR

Face EER

GAR

Sum

Hybrid

EER

GAR

EER

GAR

Best Matchers

8.6% 70.0 %

5.8% 61.1%

1.6%

92.3%

1.3% 93.0%

Worst Matchers

4.5%

4.3% 56.9%

0.5%

91.9%

0.5% 94.0%

82.7%

CP and Hybrid Fusion in Multimodal Biometric Verification System

217

improvement becomes more obvious in the worst matchers hybrid fusion in both databases. The hybrid fusion gains additional 2.1% and 2.3% GAR improvement compared to the Sum rule in NIST-BSSR1 and XM2VTS databases respectively. The relative Sum rule performances are 91.9% and 62.0% in NIST-BSSR1 and XM2VTS. As it can be observed from the scatter plots, the best matchers achieved very good separation between the genuine user and impostor score distribution. Therefore the Sum rule is able to produces a very reliable fusion score. As a result no significant hybrid fusion improvement can be obtained when comparing it with the Sum rule. However, the Sum rule performs poorer to fuse multimodal biometrics with lower authentication rate. In this case, the use of a hybrid fusion rule leads to an improvement over the Sum rule fusion. Like the work shown in [4], our work justifies again that the higher accuracy biometric system leaves less room for improvement.

(a)

(b)

Fig. 4. Performance of the XM2VTS bimodal biometrics fusion on (a) the best multimodal matchers and (b) the worst multimodal matchers

In a bimodal biometric system, the Sum fusion score can be considered as the average value between the Min fusion score and the Max fusion score. Further, within the confidence partition the difference between minimum score and maximum score will not be significant. As a result, the improvements of the GAR achieved in the experiments are within the range between 0.3%~2.3%. It is assumed that the improvement can be further increased when the Min and Max rules being replaced by a higher degree confidence fusion rule, for example the decision fusion rule. In fact, the improvement also relies on a more accurate assignment of the CP and depends on the amount of claimants whose multimodal biometric scores are falling in the confidence partitions. The more scores falls in the CP, the more improvement of the hybrid fusion can be obtained.

218

C. Chia, N. Sherkat, and L. Nolle

Table 3. Accept rates and error rates of XM2VTS single biometrics and their combined multimodal biometrics

Face EER

GAR

Speech EER

GAR

Sum

Hybrid

EER

GAR

EER

GAR

Best Matchers

1.8% 81.3%

1.1% 58.3%

0.5%

96.0%

0.5% 96.3%

Worst Matchers

6.4%

6.4% 19.0%

2.5%

62.0%

2.5% 64.3%

0.0%

4 Conclusions After the introduction of the confidence partition, we have proposed to use more appropriate fusion rules (Min and Max rule in this paper) in the confidence partitions instead of Sum rule. This approach enables the rule based fusion to be applied in a hybrid manner that includes Sum, Min and Max rules. In the preliminary experiments, we showed that the manually operated hybrid rule performed better than the Sum rule. The future exploration will be focusing on automatic assignment of confidence partitions across the biometric score space. An investigation into integration of decision rule in the developed hybrid fusion framework will also be conducted.

References 1. Ross, A., Jain, A.K.: Multimodal Biometrics: An Overview. In: 12th European Signal Processing Conference (EUSIPCO), September 2004, pp. 1221–1224 (2004) 2. Ross, A., Jain, A.K.: Information Fusion in Biometrics. Pattern Recognition Letters 24, 2115–2125 (2003) 3. Jain, A.K., Ross, A.: Learning User Specific Parameters in A MultiBiometric System. In: IEEE ICIP, pp. 57–60 (2002) 4. Indovina, M., Uludag, U., Snelick, R., Mink, A., Jain, A.K.: Multimodal Biometric Authentication Mehods: A COTS Approach. In: Proc. MMVA, Workshop Multimodal User Authentication, December 2003, pp. 99–106 (2003) 5. Ailisto, H., Vildjiounaite, E., Lindholm, K., Makela, S., Peltola, J.: Soft Biometrics- Combining Body Weight and Fat Measurements with Fingerprint Biometrics. Pattern Recognition Letters 27, 325–334 (2006) 6. Lu, X., Wang, Y., Jain, A.K.: Combining Classifiers for Face Recognition. In: ICME 2003, vol. 3, pp. 13–16 (2003) 7. Bouchaffra, D., Amira, A.: Structural Hidden Markov models for biometrics: Fusion of face and fingerprint. Pattern Recognition 41(3), 852–867 (2008) 8. Nanni, L., Lumini, A.: A Hybrid Wavelet-based Fingerprint Matcher. Pattern Recognition 40(11), 3146–3151 (2007) 9. Kittler, J.: On Combining Classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)

CP and Hybrid Fusion in Multimodal Biometric Verification System

219

10. Lam, L., Suen, C.Y.: Application of Majority Voting to Pattern Recognition: An Analysis of Its Behaviour and Performance. IEEE Trans. Systems Man Cybernet. Part A: Systems Humans 27(5), 553–568 (1997) 11. Jain, A.K., Nandakumar, K., Ross, A.: Score Normalization in Multimodal Biometric Systems. Pattern Recognition 38(12), 2270–2285 (2005) 12. National Institute of Standards and Technology: NIST Biometric Scores Set, http://www.itl.nist.gov/iad/894.03/biomtricscores 13. Poh, N., Bengio, S.: Database, Protocol and Tools for Evaluating Score-Level Fusion Algorithms in Biometrics Authentication. Pattern Recognition 39(2), 223–233 (2006) 14. Poh, N., Bengio, S.: Can Chimeric Persons Be Used in Multimodal Biometric Authentication Experiments? In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 87– 100. Springer, Heidelberg (2006)

Multi-biometric Fusion for Driver Authentication on the Example of Speech and Face Tobias Scheidat1, Michael Biermann1, Jana Dittmann1, Claus Vielhauer1,2, and Karl Kümmel2 1

Otto-von-Guericke University of Magdeburg, Universitätsplatz 2, 39106 Magdeburg, Germany 2 Brandenburg University of Applied Sciences, PSF 2132, 14737 Brandenburg, Germany {tobias.scheidat,claus.vielhauer, jana.dittmann}@iti.cs.uni-magdeburg.de, {claus.vielhauer,kuemmel}@fh-brandenburg.de

Abstract. Nowadays biometrics becomes an important field in IT security, safety and comfort research for automotive. Aims are automatic driver authentication or recognition of spoken commands. In this paper an experimental evaluation of a system is presented which uses a fusion of three biometric modalities to verify the authorized drivers out of a limited group of potential persons such as a family or small company which is a common use case for automotive domain. The goal is to show the tendency of biometric verification performance in such a scenario. Therefore a multi-biometric fusion is carried out based on biometric modalities face and voice in combination with the body weight. The fusion of the three modalities results in a relative improvement of 140% compared to the best individual result with regard to the used measure, the equal error rate. Keywords: Automotive, multi-biometric fusion, face, voice, body weight, compensational biometrics

1 Introduction The automatic authentication of persons and information plays an important role in IT security research. There are three main concepts for user authentication: secret knowledge, personal possession and biometrics. Methods based on secret knowledge use information only known by the authorized person such as a password. A special physical token is used for authentication in personal possession scenarios which can be a smart card for example. A main problem of both strategies is the possibility that other person may get the authentication object (information, token) which can be stolen or handed over. In biometric applications the authentication object is based on a physical (e.g. fingerprint) or behavioral (e.g. voice) characteristic of the person, thus it can not be misused by another person in an easy way. J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 220–227, 2009. © Springer-Verlag Berlin Heidelberg 2009

Multi-biometric Fusion for Driver Authentication on the Example of Speech and Face

221

Biometric information is used in cars since a couple of years in a simple way to detect if persons sitting on driver’s and/or passenger’s seats in order to activate the corresponding airbag or detect if the seat belt is fastened. Here in the most cases the used biometrics is the body weight which is acquired by a binary sensor in the seat. Figure 1 shows a simplified car and different networks corresponding to different functions of a car. As shown in figure 1 additionally to the conventional networks for power trains, instrumentation, chassis and body electrics, two new components are necessary for biometric systems which provide higher levels of safety, security and comfort: the biometrics network and the biometric database. The biometrics network is used to acquire the data from drivers and/or passengers and process them. The next step depends on the application of the system. One aim may to decide whether a person is authorized to use the vehicle in a given way, while another goal can be to setup the infotainment and entertainment systems as well as the positions of seat, mirrors, heater and/or other comfort settings. The new biometric database component is used to store biometric reference data and fusion parameters such as general weights.

Fig. 1. Simplified car ([1], [2]) with the four conventional networks and two new biometric components biometrics network and biometrics database

In this paper we focus on the automatic driver authentication based on biometric modalities face, speech, and body weight combined with compensational biometrics and environmental sensors values. As compensational biometrics we use steering wheel pressure, pedal pressure and body volume. Contrarily to static biometric systems, an automotive system lacks on differing environmental influences such as noises or changing illuminations as well as differing availability of sensors (e.g. dirty, broken or vibrating sensors). As shown in [3] also existing sensor information could support the adaptive calculation of the fusion result. In our scenario, such sensor information could be light level, current speed or window position to determine a confidence value for the corresponding biometric data. This Paper is structured as follows: The next section describes fundamentals of biometric fusion while the second subsection gives a short overview on the biometric algorithms used for speech and face recognition and the third subsection explains the fusion strategy of our approach. The setup, methodology and results of the experimental evaluation are described in section three. The forth section concludes the paper and gives a short outlook to future work in this area of biometric research.

222

T. Scheidat et al.

2 Biometric Fusion Despite their advantages in comparison to authentication systems based on secret knowledge and personal possession, biometric systems lack of false recognition probabilities due to the variability of data from the same person (intra-class variability). Another problem exists in form of similarities between data of different persons (inter-class similarity). In both cases, a combination of at least two authentication factors can be used to improve the security and/or authentication performance. While also combinations of biometrics, knowledge and possession are possible, in the last years the fusion of multiple biometric components has become more important in biometric research. There are a number of possibilities to improve the authentication performance of the single components involved in the fusion process. For example, it is possible to combine biometric modalities with each other as well as algorithms of one biometric modality. This section provides a short overview on the fundamentals of multi-biometric fusion, introduces the algorithms for face and speech verification shortly and discusses the fusion strategy adapted for evaluation described in this paper to show the recognition tendency in a small use case scenario of a family or small company. 2.1 Fundamentals of Multi-biometric Fusion In order to decrease the influence of drawbacks of biometric systems (i.e. intra-class variability, inter-class similarity), some current approaches are using more than one biometric component such as sensor, modality, and algorithm of one modality or instance of one trait. In recent work on combination of biometric components, fusion is carried out mainly on one of the following levels within the authentication process: feature extraction, matching score computation and decision. At fusion on feature extraction level each subsystem extracts the features from raw data, and fusion is done by combining the feature vectors of all particular subsystems into one single combined feature vector. Each system involved determines the matching score for itself on fusion on matching score level. Then all single scores are fused in order to obtain a joint score as basis for the authentication decision. The fusion on decision level is carried out on a late time within authentication process because the single decisions are made by each system separately followed by the fusion of all individual results. In [4] Ross et al. suggest a classification based on the number of biometric traits, sensors, classifiers and units involved: The fusion in multi-sensor systems is based on different sensors, which acquire the data for one biometric modality. Multialgorithmic systems use multiple algorithms for the same biometric trait. In case of multi-instance systems multiple representations of the same modality are used for the fusion process. Besides multiple physical traits such as fingertips also behavioral modalities provides the usage of multiple units. Multi-sample systems use multiple samples of the same modality, e.g. different positions of the same fingerprint. Multimodal systems combine at least two different biometric modalities to improve the authentication performance of the single systems involved. By using such multi-biometric fusion approaches, higher levels of authentication performance, security and user acceptance can be reached. However, for some strategies, complexity of usability increases with each additional fusion component (e.g.

Multi-biometric Fusion for Driver Authentication on the Example of Speech and Face

223

sensor, modality or trait). Here a user has to present more than one biometric trait or instance of the same trait to the biometric system. But problems, caused by disabilities, illness and/or age can be compensated by some of these systems. For example, a trait that can not be recognized in a sufficient degree can be ignored. 2.2 Biometric Algorithms In the evaluation described here, two existing algorithms were used to calculate a matching score: Speech Alize for speech recognition and 2DFace for face recognition. Both methods are shortly introduced in the next subsections. To perform the verification using speech modality based on Speech Alize ([5], see figure 2, p2) 16 Mel Frequency Cepstral Coefficients (MFCC), their first order deltas and also the energy and delta-energy are calculated for the first audio channel. Therefore the audio material is divided into Hamming windows of 20 ms with a shift of 10 ms. Frequencies between 300 Hz and 8000 Hz are analyzed. Afterwards preprocessing is done like normalization of the energy coefficients, silence removal and feature normalization. For the enrollment one world model GMM (Gaussian Mixture Model) is generated, that is later adapted regarding the user that someone tries to verify to as its own GMM. Finally a score for each test feature vector is calculated. In general the 2DFace algorithm ([6]) uses the standard Eigenfaces approach for face recognition (see figure 2, p1). First the used images are normalized. Therefore a fixed mask is used to crop the image to be 51x55 pixels. That way the located eyes are at the same position for all reference and test data. Afterwards the face space is build, whereas the Principal Component Analysis is used to reduce the dimensionality of the space whilst 99% of the variance is selected. For both, target and test images, features are extracted which correspond to the projection of images to the face space. Finally the score is generated using the L1-norm as a distance measure. Both systems are based on world-models for user verification. To generate the Speech Alize world-model 24 randomly chosen audio samples from [7] where used. For 2DFace an existing world model from [8] was used to have more than the four registered users for the model. 2.3 Fusion Strategy The approach used for fusion in this paper is based on the Enhanced Fusion Strategy (EFS) introduced in [2] by Biermann et al. In total three biometric modalities are used to form a fused matching score in our automotive scenario. The matching scores of the body weight and the compensational biometrics were simulated because corresponding sensors where not installed in the generalized car and/or are not developed yet. The two remaining main modalities are face and voice. In general the final matching score is the weighted sum of the individual matching scores of the single biometric modalities involved. Figure 2 shows the scheme of possible biometric fusion of the biometrics face, speech and body weight which are based directly on the driver as suggested in [1]. Additionally, since the biometric authentication in automotive scenario depends on many environmental factors the figure shows a selection of such factors.

224

T. Scheidat et al.

Fig. 2. Biometric authentication of the driver [1]

As suggested in [2], our evaluation considers the biometrics face, speech and body weight, and the compensational biometrics steering wheel pressure, pedal pressure and body volume (not shown in figure 2). The steering wheel pressure is the pressure of the hands on the steering wheel and replaces speech modality if it is broken. Pedal pressure is the pressure based behavior of the pedal usage, which compensates the face modality. The biometrics body volume is measured by the length of the seat belt used and replaces the main biometrics body weight. In this initial experimental evaluation additional factors such as environmental conditions are not taken into account. Thus, the matching score calculation of the Enhanced Fusion Strategy (EFS) introduced in [2] is applied as shown in equation (1). MS fus,t = ∑ B j ,t * Aj ,t * {MBU j ,t *WNt *W j * MS j ,t }+ 3

[

j =1

{(1 − MBU ∑W

]

j ,t ) *V _ CB j ,t * F _ CB j ,t *WNt *W _ CB j * MS _ CB j ,t }

(1)

3

j

= 1; W j ∈[0, 1]; V _ CB j ,t ∈[0, 1]; F _ CB j ,t ∈[0, 1]

(2)

j =1

Here the variables are defined as follows: MSj,t is the matching score of modality j at time t, while MSfus,t is the fused matching score at time t. Bj,t are binary operands and the related part consisting of main modality j and compensational modality j becomes zero if the corresponding operand is set to 0. In order to have the possibility to manipulate the standard weighting, the operands Aj,t were introduced. Using this parameter, the individual weighting of a modality can be decreased or increased by the system. MBUj,t is a parameter which describes the influence of main biometrics (MBj, here face, speech and body weight) to MSfus,t. MBUj,t is a parameter which describes the functionality of the sensors of main biometrics (MBj, here face, speech and body weight). Wj are constant weights that are based on an estimation of the equal error rate (EER) of the related modality j. According to the fact, that the body weight fluctuates in a higher magnitude than the other two modalities it is weighted with 0.2. The main biometrics face and speech are weighted with 0.4. MS_CBj denotes the matching

Multi-biometric Fusion for Driver Authentication on the Example of Speech and Face

225

score of the compensational biometrics which compensates a breakdown of main biometrics. For the compensational biometrics three additional values are used: the functionality of the sensor F_CBi, the confidence-factor V_CBi and the weight W_CBi. Please note, as shown in equation (2) the sum of the main biometrics’ weights amounts 1.

3 Experimental Evaluation This section introduces database and methodology, which were used for the experimental evaluation of the single systems and their fusion. The evaluation results are presented and discussed in the third part of this section. 3.1 Database The number of test persons is limited since the scenario is based on a family of four. Thus, video and audio data were acquired from four test persons. This simulated small family consists of one woman and three men. To record speech and face we used a webcam (Logitech Quickcam Pro 5000) mounted at the position of the rear-view mirror in a generalized car in our laboratory. For each user and modality one reference sample and two verification samples were acquired. The faces were captured frontal. The initial idea was to collect speech data of different content (so-called semantics, see also [9]), but the first evaluations show, that the Speech Alize needs audio samples with duration higher than 10 seconds to determine sufficient results. Thus, we decide to use longer spoken sequences which consist of several commands to the power train and instrumentation networks. Hence, the spoken semantic was ‘ | Start Engine | Yes | No | Cancel’. Please note, due to the German data protection law, the user’s name is a pseudonym. 3.2 Methodology Biometric error rates are applied, to determine the authentication performance of the fusion scenario. They have to be determined empirically, since it is not possible to measure these error rates from the system directly. In order to do so, for each threshold, the numbers of acceptances or rejections for authorized and non authorized persons are determined experimentally. On one hand, the false rejection rate (FRR) calculates the ratio between the number of false rejections of authorized persons and the total number of tests. On the other hand, the ratio between the number of false acceptances of non-authentic persons and the entire number of authentication attempts is the false acceptance rate (FAR). For a comparative analysis of verification performance of the fusion components, as well as those of their fusion the equal error rate (EER) is used. The EER is a common measurement in biometrics and denotes the point in error characteristics, where FRR and FAR yield identical values. However, the EER is not to be interpreted as the optimal operating point of a biometric system, it is mainly used as a normalized reference point for comparisons between algorithms or systems.

226

T. Scheidat et al.

3.3 Results As shown in the third row of table 1 the fusion of the three biometrics face, speech and body weight leads to a better result compared to the results of the individual modalities. The best individual result was determined by modality face with an EER of 15.00%, while speech and body weight results in an EER of 29.17% and 49.70% respectively. The fusion results in an equal error rate of approx. 6.24%. This corresponds to a relative improvement of approx. 140% compared to the best result of the single modalities (face: EER=15.00%). Table 1. Evaluation results of the single modalities speech, face, weight and compensational biometrics body volume Speech EER weight 29.17% 0.4 29.17% 0.5 29.17% 0.4

Face EER weight 15.00% 0.4 15.00% 0.5 15.00% 0.4

Body weight EER weight 49.70% 0.2 -

Body volume EER weight 41.48% 0.2

Fusion EER 6.24% 12.49% 6.25%

The case that the body weight system is out of order is shown in rows four and five of table 1. The fourth row illustrates the behavior of the fusion if no compensational biometrics is use to adjust the broken component. The fusion result of speech and face is 12.49%, which is on the one side a relative improvement of approx. 20% compared with best single modality’s result. On the other side, the fusion performance declines by approx. 50% compared to the result of the fusion of all three main biometrics. In the last row of table 1 the fusion results are shown, which are determined by substitution of the broken body weight system by the compensational biometric body volume. While the individual result of body volume amounts 41.48% a corresponding fusion result of 6.25% can be reached. This observation show, if the broken body weight system is compensated by the body volume system, a similar authentication performance is reached.

4 Conclusions and Future Work In this paper we show an experimental evaluation of a biometric fusion based on the modalities speech, face and body weight, and the compensational biometrics steering wheel pressure, pedal pressure and body volume. Firstly, the results show, that the fusion of speech, face and body weight leads to a better verification result compared to the individual results of the single modalities. The relative improvement amounts approx. 140% with an EER of 6.24%. Secondly, in the simulated case of a broken body weight recognition system, we use the corresponding compensational biometrics body volume to provide a fall back possibility. Here a fusion result of approx. 6.25% was reached with the compensational biometrics and an EER of 12.49% without (bimodal fusion of speech and face). An evaluation of the suggested system based on a higher number of users is one of the main parts of our future work. This would correspond to other use cases in the automotive domain such as car pools of medium and/or big companies or car rental

Multi-biometric Fusion for Driver Authentication on the Example of Speech and Face

227

companies. Another of the next steps will be the integration of a speech recognition system in order to recognize the spoken content. This can be helpful to decide if a spoken text has to be used for driver authentication or as a command. The position of the speaker within the car may be evidence whether he or she is authorized to give commands. In the case a command is not spoken by the driver it should be ignored. Possibilities to detect the position of the speaker we see in the combination of speech and video by an analysis of the lip movement of the occupants or by using the independent component analysis.

Acknowledgements This work has been supported in part by the European Commission through the EFRE Programme "Competence in Mobility" (COMO) under Contract No. C(2007)5254.

References 1. Makrushin, A., Dittmann, J., Kiltz, S., Hoppe, T.: Exemplarische Mensch-MaschineInteraktionsszenarien und deren Komfort-, Safety- und Security-Implikationen am Beispiel von Gesicht und Sprache. In: Alkassar, S. (ed.) Sicherheit 2008; Sicherheit - Schutz und Zuverlässigkeit; Beiträge der 4. Jahrestagung des Fachbereichs Sicherheit der Gesellschaft für Informatik e.V (GI), April 2-4, 2008, pp. 315–327 (2008) 2. Biermann, M., Hoppe, T., Dittmann, J., Vielhauer, C.: Vehicle Systems: Comfort & Security Enhancement of Face/Speech Fusion with Compensational Biometrics. In: MM&Sec 2008 - Proceedings of the Multimedia and Security Workshop 2008, Oxford, UK, pp. 185– 194 (2008) 3. Nandakumar, K., Chen, Y., Jain, A.K., Dass, S.C.: Quality-based Score Level Fusion in Multibiometric Systems. In: Proceedings of the 18th international Conference on Pattern Recognition, ICPR, vol. 04. IEEE Computer Society, Washington (2006) 4. Ross, A., Nandakumar, K., Jain, A.K.: Handbook of Multibiometrics. Springer, New York (2006) 5. Reference System based on speech modality (2008), http://share.int-evry.fr/ svnview-eph/filedetails.php?repname=ref_syst&path=%2FSpeech_ Alize%2Fdoc%2FhowTo.pdf 6. A biometric reference system for 2D face (2008), http://share.int-evry.fr/ svnview-eph/filedetails.php?repname=ref_syst&path=%2F2Dface_ BU%2Fdoc%2FhowTo.pdf 7. bodalgo. Voice Over Market Place for Voice Overs, Voiceover Talents – Find Voice Overs The Easy Way! (2009), http://www.bodalgo.com (download: February 2009) 8. Subversion server of BioSecure reference systems, 2DFace (2009), http://share.int-evry.fr/svnvieweph/filedetails.php?repname=ref_syst&path=%2F2Dface_BU%2Fresu lts%2FmodelBANCA.dat (download: February 2009) 9. Vielhauer, C.: Biometric User Authentication for IT Security: From Fundamentals to Handwriting. Springer, New York (2006)

Multi-modal Authentication Using Continuous Dynamic Programming K.R. Radhika1 , S.V. Sheela1 , M.K. Venkatesha2 , and G.N. Sekhar1 1 2

B M S College of Engineering, Bangalore, India R N S Institute of Technology, Bangalore, India

Abstract. Storing and retrieving the behavioral and physiological templates of a person for authentication using a common algorithm is indispensable in on-line applications. This paper deals with authentication of on-line signature data and textual iris information using continuous dynamic programming [CDP]. Kinematic derived feature, acceleration is considered. The shape of acceleration plot is analysed. The experimental study depict that, as the number of training samples considered for CDP algorithm increase, the false rejection rate decrease.

1

Introduction

Research issues are based on iris localization, nonlinear normalization, occlusion, segmentation, liveness detection and large scale identification. Signature authentication is strongly aﬀected by user dependencies as it varies from one signing instance to another in a known way. Signature and iris, as biometric security technologies have great advantages of variability and scalability. The variability of signature can be described as constant variation which aids in rejection of a duplicate in a modern self certified security system. Even though captured iris image is a constant signal, it provides scalability for variety of applications. In this paper the sample refer to both on-line signature and an iris sample.

2

Background

CDP aid in recognition process with a concept of grouping items with similar characteristics together. A part of registered pattern as an input pattern can be verified using spotting method called CDP [3]. CDP, developed by R.Oka to classify real world data by spotting method, allows to ignore the portions of data which lie outside of the task domain [4]. CDP is a nice tool to tackle problems of time warping and spotting for classification [5]. Two-dimensional [2D] CDP allows characteristical transformation and is a quasi-optimal algorithm for row axis and column axis which is combination of spotting recognition with a reference image for tracking a target and making a segmented image as a reference image for the next frame [6,7]. Incremental reference-interval-free CDP is applied for speech recognition, which gives us the idea of detecting similar J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 228–235, 2009. c Springer-Verlag Berlin Heidelberg 2009

Multi-modal Authentication Using CDP

229

segments within one discourse sample which can be extended for detecting segments that are similar to two independent samples [8]. The work done by T. Nishimura, gives a clear on-line scenario, which detects a frame sequence in the map which matches an input frame sequence with real-time localization [9]. Shift CDP applies CDP to a constant length of unit reference patterns and provides a fast match between arbitrary sections in the reference pattern and the input pattern [10]. Spatio-temporal approach handles the direction change to obtain flexibility of recognition system [11]. Reference-interval-free CDP are the technique that has been proposed as ways to assign labels, analyze content, and search time-series data. These work by detecting similar passages in two diﬀerent sets of time-series data. H.Kameya et al, developed writer authentication system for text independent system [2].

3

Proposed System

In this experiment we have used a novel method for pupil extraction. The first peak of the histogram provides the threshold ‘t’ for lower intensity values of the eye image as shown in the Fig.1(e). We label all the connected components in sample eye image with less than ‘t’ intensity value. Selecting the maximum area component we arrive at pupil area of the eye as shown in the Fig.1(a)-(d). Normalised bounding rectangle is implemented using centre of pupil to crop iris biometry from eye image. In extracted 2D pupil area, the pixel intensity values are sorted in ascending order. The listing of unique top ‘g’ gray scale values are considered which signify lower intensity values. Ito p = {I1 , I2 , I3 , I4 ,...Ig }. In this experiment the value of ‘g’ is 50. This value specifies darkest part of the eye image. The set of co-ordinate values with Itop intensity values form the texture template. Larger the ‘g’ value, larger the size of the texture template, which extends from pupil to iris.

Fig. 1. (a)-(d) Iris texture template formation (e) Histogram of eye image

3.1

Segmentation

Using each of the co-ordinate values confined in texture template, the velocity and acceleration values are calculated using (1) and (2) in two-dimension. v = r′ rˆ + rω θˆ

(1)

230

K.R. Radhika et al.

a = rˆ(r′′ − rθ′2 ) + θˆ

1 d 2 ′ (r θ ) r dt

(2)

ˆcosθ + yˆsinθ, θˆ = −ˆ xsinθ + yˆcosθ. From signature where r = x2 + y 2 , rˆ = x samples, 3D on-line features are extracted. The x, y, pressure (z), pen azimuth and pen inclination coordinate sequence are the features considered [1] . Azimuth is the angle between the z axis and the radius vector connecting the origin and any point of interest. Inclination is the angle between the projection of the radius vector onto the x-y plane and the x-axis. z-axis is the pressure axis. The velocity and acceleration values of sample are calculated using (3) and (4) in three-dimension. ˆ ′ + ϕrϕ v = rˆ(r′ + θrθ ˆ ′ sinθ) (3) a=r ˆ(r′′ − rθ′2 − rϕ′2 sin2 θ) + θˆ

1 d 2 ′ 1 d 2 ′ (r θ ) − rϕ′2 sinθcosθ + ϕˆ (r ϕ sin2 θ) r dt rsinθ dt (4)

ˆsinθcosϕ + yˆsinθsinϕ + zˆcosθ, θˆ = xˆcosθcosϕ + where r = x2 + y 2 + z 2 , rˆ = x yˆcosθsinϕ − zˆsinθ, ϕˆ = −ˆ xsinϕ + yˆcosϕ. The r is radial distance of a point from origin. x ˆ, yˆ and zˆ are unit vectors. θ is azimuth angle and ϕ is angle of inclination. rˆ, θˆ and ϕˆ are unit vectors in spherical co-ordinates. r′ , θ′ and ϕ′ are derivatives of spherical co-ordinates. The top 50 velocity values were considered in descending order for a signature sample. These values were plotted according to positional values. From Fig.2(a)-(b), we can analyse that, the velocity scatter plot look similar for two genuine samples. By Fig.2(b)-(c), we can analyse that, the velocity scatter plot of forged samples does not match with velocity plots of two genuine samples. The numerical values of velocity of genuine samples did not match but the shape of scatter plot looked similar therefore the second order derivative acceleration was considered. Similar results were found with iris texture template velocity values. Let ‘n’ be is total number of pixels of a sample. a1 ,a2 ,a3 ,a4 ...an are acceleration values generated using (4). Let p1 ,p2 ,p3 ,p4 ...pn signify the positional values of ‘n’ pixels. The acceleration values are normalized with respect to positional values by using (5). It is observed that, irrespective of the length of signature and number of gray values, scatter of acceleration values form same shape pattern for genuine signatures and genuine iris samples. pinorm = (pi ∗ 100)/n, ∀ pinorm ≤ 100

(5)

By normalizing position we achieve acceleration values in class intervals like 10 - 19, 20 - 29, 30 - 39 and so on. The sample can be now be segmented to ‘m’ equal parts with respect to acceleration values. In this paper we have considered m = 10. In some cases the pressure of the writing device may change in starting and ending stage by same person due to emotional variation which in turn varies the acceleration scatter shape. Figure 3(a)-(b) show starting stage acceleration variation. The same is observed in some cases of iris texture template due to white illumination spots and dark shadow spots. In ‘m’ segments the first and last segments are not considered to improve the system performance. They

Multi-modal Authentication Using CDP

231

Fig. 2. Scatter of velocity values pertaining to diﬀerent percentage of signature length

Fig. 3. Initial part plot for two genuine samples of same person

contain the settling down acceleration components. The variations exist in the numerical values of the acceleration within genuine sample segments with a fewer variations in shape of acceleration plot. Figure 4(a) shows the shape similarity of acceleration plot between the two genuine samples of same person and Fig.4(b) shows enormous variation in shape of acceleration plot when compared between a genuine and forged sample of same person. The proposed sytem matches the shape of acceleration plot segmentwise. This partial reference matching to partial

232

K.R. Radhika et al.

Fig. 4. (a)Depicts similar local minima direction changes for two genuine samples of same person (b)Depicts dissimilar local minima direction changes with one genuine sample and one forged sample

input is achieved by CDP. In one percentile range for acceleration values the distance measure is generated which is the count of directional changes. ∀ ai < ai+ 1 , D

i

= −1; ∀ ai >

ai+ 1 , D

i

= +1; else Di = 0

(6)

The direction sequence set D is generated as D = {1,-1,0,1,1,1,-1,...} using equation(6). The count of change over from ‘-1’ to ‘+1’ are counted as c40−49 . These are the local minima acceleration values in the considered percentile. The set of 8 counts, [c10−19 ,c20−29 ,c30−39 ,c40−49 ,c50−59 ,c60−69 ,c70−79 ,c80−89 ] leaving the counts of first and last parts form the value array, va . va = {va (1), va (2), va (3), va (4), va (5), va (6), va (7), va (8)} where va (1) = c10−19 , va (2) = c20−29 , va (3) = c30−39 , va (4) = c40−49 , va (5) = c50−59 , va (6) = c60−69 , va (7) = c70−79 , va (8) = c80−89 . Each training sample will generate a value array. The reference genuine sample in the set of P training samples should be selected for further testing given genuine or forged sample. This is explained in Sect.3.3. 3.2

Algorithm

CDP accumulates minimum local distances [2]. For a sample r(t) (1≤ t ≤ S) and another sample i(τ)(1 ≤ τ ≤ T) va and vb value arrays are generated, which are bounded with τ. The value of S and T is 8, which depicts 8 percentile components. va = {va (1), va (2), va (3), va (4), va (5), va (6), va (7), va (8)} and vb = {vb (1), vb (2), vb (3), vb (4), vb (5), vb (6), vb (7), vb (8)}. Dynamic programming method follows scan-line algorithm of the (t,τ) plane from line with t = 1 to the line with t = S. R(t,τ) contain minimum of accumulated distance. The weight will normalise the value of R(t,τ) to locus of ‘3T’. This is using the theorem that between the 2 fixed points A and B, circle of Appolonius is the locus of the point P such that |P A| = 3 · |P B|, where |P A| means the distance from point P to point A. For the cases of τ = -1 and τ = 0, the accumulation is defined by R(t,-1) = R(t,0) = ∞. For t=1, R(1,τ)=3*d(1,τ), where ‘d’ is local distance measure between r(t) and i(τ). d = |va (1) − vb (τ)|. For t=2, R(2,τ) =

Multi-modal Authentication Using CDP

233

min{R(1,τ -2)+2·d(2,τ -1)+d(2,τ ),R(1,τ -1)+3·d(2,τ ),R(1,τ )+3·d(2,τ )}. For t=3 to S, R(t,τ ) = min{R(t-1,τ -2)+2·d(t,τ -1)+d(t,τ ), R(t-1,τ -1) +3·d(t,τ ), R(t-2,τ 1)+3·d(t-1,τ )+3·d(t,τ )}. Given the two value arrays va and vb , the cdp-value is found. cdp-value = min{R(1,8), R(2,8), R(3,8), R(4,8), R(5,8), R(6,8), R(7,8), R(8,8)}*(1/(3*T)). 3.3

Leave One Out Method

Training set [T-SET] consists of ‘P’ samples. va array is calculated for first sample. All the samples in T-SET lead to ‘P’ vb arrays. The ‘P’ cdp-values formed with first sample’s va array and ‘P’ vb arrays form the first row. Similarly P-1 rows of cdp-values are formed considering each of the P-1 samples independently to form va array and left out P-1 sample’s vb arrays. This forms P X P matrix of cdp-values. The row index of minimum average row forms reference/template value array va−person for one person. The threshold value is the average of P row averages multiplied by a constant Z. Z signifies constant behavioral variation in one’s signature and angle of capture of iris sample. CDP works on diﬀerences between va−person and vb value arrays in respective eight percentile parts of sample. The vb array is obtained from any input sample either testing samples or forged samples. If cdp-value obtained is less than threshold , the given input is considered as genuine sample else it is termed as forged sample.

4

Experimental Results

100 x 25 x 2 signature samples were used from MCYT-100 Signature Baseline Corpus, an online database. This database provides 25 genuine and 25 forged samples per person. For P = 10, 10 genuine samples form T-SET which is 40% of genuine sample set. The remaining 15 genuine samples form testing set, to find false rejection rate [FRR]. The 25 forged samples were used to find false acceptance rate [FAR]. The acceptance rate 97%and rejection rate 92% are obtained. UBIRIS.V1 database is composed of 1877 iris images collected from 241 people in two distinct sessions with 5 geniune samples per person. P = 3 genuine samples form T-SET which is 60% of genuine sample set. The acceptance rate 98% and rejection rate 97% are obtained. CASIA Iris Image Database version 2.0 includes 1200 iris images from 60 eyes. For each eye, 20 images are captured in one session. Two such sets are provided with images captured using diﬀerent devices. For P = 10, 10 genuine samples form T-SET which is 50% of genuine sample set. The acceptance rate 98% and rejection rate 97% are obtained. For large databases, the reference template storing and retrieving will be a major issue. Further the experiment was conducted for P = 7,8 and 9. The result show that FRR decrease and FAR increase with increase in ‘P’ values for Z = 1,2,3,4 as shown Table.1 and Fig.5. FAR and FRR are trade oﬀ against one another [12]. Keeping track of receiving operating curve characteristics, proposed system suggests P = 10 as reliable value for authentication applications using CDP.

234

K.R. Radhika et al. Table 1. FRR and FAR values for Signature and Iris Signature FRR Z=1 FRR Z=2 FRR Z=3 FRR Z=4 FAR Z=1 FAR Z=2 FAR Z=3 FAR Z=4

P=7 8.8% 6.7% 4.59% 3.25% 3.03% 4.51% 6.07% 7.55%

P=8 8.72% 6.01% 4.02% 2.55% 3.02% 4.51% 6.07% 7.52%

P=9 8.91% 6.33% 3.75% 2.63% 2.93% 4.58% 6.08% 7.73%

P=10 8.73% 5.71% 3.37% 2.37% 3.07% 5.06% 6.6% 8.02%

Iris FRR Z=1 FRR Z=2 FRR Z=3 FRR Z=4 FAR Z=1 FAR Z=2 FAR Z=3 FAR Z=4

P=7 7.15% 4.71% 2.56% 1.51% 1.81% 5.98% 7.15% 8.18%

P=8 6.06% 3.46% 1.78% 1.05% 3.2% 6.53% 8.68% 8.48%

P=9 5.2% 3.31% 1.56% 0.91% 2.95% 6.55% 9.46% 9.21%

P=10 5.16% 1.71% 0.58% 0.2% 3.71% 6.83% 9.71% 9.9%

Fig. 5. (a)Depicts FRR decreasing and FAR increasing for P=10 as compared to P=7 for MCYT database (b)Depicts FRR decreasing and FAR increasing for P=10 as compared to P=7 for CASIA database

5

Conclusion

The proposed system is a function based approach dealing with local shape analysis of a kinematic value using CDP for on-line hand written signature and for eye texture template. Instead of working on primary features or image, the system works on derived feature which leads to vigorous security system. The moments of each segment can be considered as a reference feature for CDP to achieve global optimization.

Acknowledgment The authors would like to thank J.Ortega-Garcia for the provision of MCYT Signature database from Biometric Recognition Group, B-203, Universidad Autonoma de madrid SPAIN [1] and Proenca H and Alexandre L.A, Portugal [13] for UBIRIS database. Portions of the research in this paper use the CASIA-V2 collected by the Chinese Academy of Sciences Institute of Automation (CASIA) [14].

Multi-modal Authentication Using CDP

235

References 1. Ortega-Garcia, J., Fierrez-Aguilar, et al.: MCYT baseline corpus: A bimodal biometric database. IEEE Proc. Vision, Image and Signal Processing 150(6), 395–401 (2003) 2. Kameya, H., Mori, S., Oka, R.: A segmentation-free biometric writer verification method based on continuous dynamic programming. Pattern Recognition Letters 27(6), 567–577 (2006) 3. Kameya, H., Mori, S., Oka, R.: A method of writer verification without keyword registration using feature sequences extracted from on-line handwritten sentences. In: Proc. MVA2002 IAPR Workshop., vol. 1, pp. 479–483 (2002) 4. Oka, R.: Spotting Method for Classification of Real World Data. Computer Journal 41(8), 559–565 (1998) 5. Zhang, H., Guo, Y.: Facial Expression Recognition using Continuous Dynamic Programming. In: Proc. IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems, pp. 163–167 (2001) 6. Iwasa, Y., Oka, R.: Spotting Recognition and Tracking of a Deformable Object in a Time-Varying Image Using Two-Dimensional Continuous Dynamic Programming. In: Proc. The Fourth International Conference on Computer and Information Technology, pp. 33–38 (2004) 7. Itoh, Y., Kiyama, J., Oka, R.: A Proposal for a New Algorithm of Reference Interval-free Continuous DP for Real-time Speech or Text Retrieval. In: Proc. International Conference on Spoken Language Processing., vol. 1, pp. 486–489 (1996) 8. Kiyama, J., Itoh, Y., Oka, R.: Automatic Detection of Topic Boundaries and Keywords in Arbitrary Speech Using Incremental Reference Interval-free Continuous DP. In: Proc. International Conference on Spoken Language Processing., vol. 3, pp. 1946–1949 (1996) 9. Nishimura, T., Kojima, H., Itoh, Y., Held, A., Nozaki, S., Nagaya, S., Oka, R.: Eﬀect of Time-spatial Size of Motion Image for Localization by using the Spotting Method. In: Proc. 13th International Conference on Pattern Recognition, pp. 191– 195 (1996) 10. Nishimura, T., Sogo, T., Ogi, S., Oka, R., Ishiguro, H.: Recognition of Human Motion Behaviors Using View-Based Aspect Model Based on Motion Change. IEICE Transactions on Information and systems, J84-D 2(10), 2212–2223 (2001) 11. Itoh, Y.: Shift continuous DP: A fast matching algorithm between arbitrary parts of two time-sequence data sets. Systems and Computers in Japan 36(10), 43–53 (2005) 12. Jain, A.K., Bolle, R., Pankanthi, S.: BIOMETRICS Personal Identification in Networked Society 13. Proenca, H., Alexandre, L.A.: UBIRIS: A Noisy Iris Image Database. In: Roli, F., Vitulano, S. (eds.) ICIAP 2005. LNCS, vol. 3617, pp. 970–977. Springer, Heidelberg (2005) 14. CASIA-IrisV3, http://www.cbsr.ia.ac.cn/IrisDatabase.htm

Biometric System Verification Close to “Real World” Conditions Aythami Morales1, Miguel Ángel Ferrer1, Marcos Faundez2, Joan Fàbregas2, Guillermo Gonzalez3, Javier Garrido3, Ricardo Ribalda3, Javier Ortega4, and Manuel Freire4 1

2

GPDS Universidad de Las Palmas de Gran Canaria Escola Universitària Politècnica de Mataró (Adscrita a la UPC) 3 HCTLab, Universidad Autónoma de Madrid, 4 ATVS, Universidad Autónoma de Madrid [email protected]

Abstract. In this paper we present an autonomous biometric device developed in the framework of a national project. This system is able to capture speech, hand-geometry, online signature and face, and can open a door when the user is positively verified. Nevertheless the main purpose is to acquire a database without supervision (normal databases are collected in the presence of a supervisor that tells you what to do in front of the device, which is an unrealistic situation). This system will permit us to explain the main differences between what we call "real conditions" as opposed to "laboratory conditions". Keywords: Biometric, hand-geometry verification, contact-less, online signature verification, face verification, speech verification.

1 Introduction Biometric system developments are usually achieved by means of experimentation with existing biometric databases, such as the ones described in [1]. System performance is usually measured using the identification rate (percentage of users whose identity is correctly assigned) and verification errors: False Acceptance Rate (FAR, percentage of impostors permitted to enter the system), False Rejection Rate (FRR, percentage of genuine users whose access is denied) and combinations of these two basic ratios, such as Equal Error Rate (EER, or adjusting point where FAR=FRR) and Detection Cost Function (DCF) [2]. A strong problem in system comparison is that most of the times the experimental conditions of different experiments performed by different teams are not straight forward comparable. In order to illustrate this problem, let us see a simple example in the motoring sector. Imagine two cars with the fuel consumption depicted in table 1. According to this table, looking at the distance (which is equal in both cases) and the speed (which is also equal) we could conclude that car number 1 is more efficient. Nevertheless, if we look at figure 1, we realize that the experimental conditions are very different and, in fact, nothing can be concluded. This is an unfair comparison. It is well known that car makers cannot do that. Slope, wind, etc., must be very controlled and it is not up to the car maker. Nevertheless the situation is not the same in J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 236–243, 2009. © Springer-Verlag Berlin Heidelberg 2009

Biometric System Verification Close to “Real World” Conditions

237

biometrics, because there is no “standard” database to measure performance. Each fabricant can use its own database. This can let to unfair comparisons, as we explain next. We will assume that training and testing of a given biometric system will be done using different training and testing samples, because this is the situation in real operating systems in normal life. Otherwise, this is known as “Testing on the training set”: the test scores are obtained using the training data, which is an optimal and unrealistic situation. This is a trivial problem where the system only needs to memorize the samples, and the generalization capability is not evaluated. The comparison of different biometric systems is quite straight forward: if a given system shows higher identification rate and lower verification error than its competitor, it will be considered better. Nevertheless, there is a set of facts that must be considered, because they can let to reach a wrong conclusion. Table 1. Toy example for car fuel consumption comparison

1 Distance Speed Fuel consumption

2

100 Km 100 Km/h 8 liters

100Km 100Km/h 12 liters

Slope

100Km Car1:8liters

Flat

2 100Km Car2:12liters

Fig. 1. Experimental conditions corresponding to table 1

Nevertheless, there is a set of facts that must be considered, because they can let to reach a wrong conclusion. We will describe these situations in the next sections. A.

Comparison of results obtained with different databases

When comparing two biometric systems performing over different databases, it must be taken into account that one database can be more trivial than the other one. For instance, it does not have the same difficulty to identify people inside the ORL database [3] (it contains 40 people) than in the FERET database [4] (around 1000 people). For a study of this situation, see [5]. Thus, as a conclusion, a given system A performing better on Database DB1 than another system B performing worse on database DB2, is not necessarily better, because the comparison can be unfair.

238

B.

A. Morales et al.

Comparison of results obtained with the same database

When comparing two biometric systems performing over the same database, and following the same protocol (same samples for training both competing systems and the remaining samples for testing), it seems that the comparison is fair. In fact it is, but there is a problem: how can you be sure that these results will hold on when using a different database? Certainly you cannot. For this reason, researchers usually test their systems with different databases acquired by different laboratories. In the automobile example, probably, you will get the fuel consumption in several situations (urban, highway, different speeds, etc.) because one car can be more efficient in a particular scenario but it can be worse in a different one. Of course the car must be the same in all the scenarios. It will be unfair to trim the car design before making the test (one design for urban path, one design for rural path, another one for highway, etc). In which comparison is interested the system seller? Probably in the most favorable one for his/her product. In which comparison are we (the buyers) interested? Obviously the best characterization of biometric systems is the one that we achieve with a fully operating system, where users interact with the biometric system in a “normal” and “real” way. For instance, in a door opening system, such as the system described in [6-7]. In this paper we want to emphasize the main differences between databases collected under “real conditions”, as opposed to “laboratory conditions”. This is a milestone to produce applications able to work in civilian applications. Next sections summarize the main differences between our proposed approach and classical approaches. 1.1 Classic Design (Step 1) Biometric system design implies the availability of some biometric data to train the classifiers and test the results. Figure 2 on the left summarizes the flow chart of the procedure, which consists on the following steps: 1. A database is acquired in laboratory conditions. There is a human supervisor that tells the user what to do. Alternatively, in some cases, programs exist for creating synthetic databases, such as SFINGE [8] for fingerprints. Another example would be the software Faces 4.0 [10] for synthetic face generation. Nevertheless, synthetic samples have a limited validity to train classifiers when applied to classify real data. 2. After Database acquisition, a subset of the available samples is used for training a classifier, user model, etc. The algorithm is tested and trimmed using some other samples of the database (testing subset). 3. The developed system jumps from the laboratory to real world operation (physical access, web access, etc.). This procedure is certainly useful for developing a biometric system, for comparing several different algorithms under the same training and testing conditions, etc., but it suffers a set of drawbacks, such as: a) In real world conditions the system will be autonomous. b) Laboratory databases have removed those samples with low quality, because if the human supervisor detects a noisy speech recording, blurred face image, etc., will discard the sample and will ask the user for a new one.

Biometric System Verification Close to “Real World” Conditions

239

STEP 1 Biometric Database (laboratory)

ALGORITHMS

BIOMETRIC DATABASE

BIOMETRIC SECURITY APPLICATIONS

ALGORITHM

STEP 2

BIOMETRIC SECURITY APPLICATION

Classic design

Biometric Database (operational)

Proposed approach

Fig. 2. Classic design (on the left) versus proposed approach (on the right)

c) Database acquisition with a human supervisor is a time consuming task. d) Real systems must manage a heterogeneous number of samples per user. Laboratory system developments will probably ignore this situation and thus, will provide a suboptimal performance due to mismatch between the present conditions during development and normal operation. 1.2 Proposed Approach (Step 2) A more sophisticated approach involves two main steps (see figure 2 on the right). The operation can be summarized in the next steps: 1. Based on algorithms developed under the “classical approach”, a physical access control system is operated. 2. Simultaneously to system operation, biometric acquired samples are stored in a database. This procedure provides the following characteristics: a)

In general, the number of samples per user and the time interval between acquisitions will be different for each user. While this can be seen as a drawback in fact this is a chance to develop algorithms in conditions similar to “real world” where the user’s accesses are not necessary regular. b) While supervised databases contain a limited number of recording sessions, this approach permits to obtain, in an easy way, a long term evolution database.

240

A. Morales et al.

c)

Biometric samples must be checked and labeled a posteriori, while this task is easier in supervised acquisitions. d) While incorrect (noisy, blurred, etc.) samples are discarded in supervised databases, they exhibit a great interest when trying to program an application able to manage the Failure to Acquire rate. In addition, these bad quality samples are obtained in a realistic situation that hardly can be obtained in laboratory conditions.

Fig. 3. Multimodal interface for biometric database acquisition (hand-geometry, speech, face and on-line signature). Frontal view (top).

Fig. 4. Physical installation (at EUPMt) in a wall for door opening system

Biometric System Verification Close to “Real World” Conditions

241

2 Multimodal Interface for Biometric Recognition and Database Acquisition In this section we present a multimodal device specially designed to acquire speech, on-line signature, hand-geometry and face. The system is prepared for four biometric traits, the acquisition protocol asks the user to provide his/her identity and two biometric traits (randomly selected). If both biometric traits are positively identified, the user is declared as “genuine”. In case of tilt, a third biometric trait is checked. The core of this system is a hewlett-packard notebook with touch screen (suitable for online signature acquisition). The technological solutions behind each biometric trait are DCT-NN [9] for face recognition, SVM for hand-geometry, HMM for signature and GMM for speaker recognition. Figure 4 shows a physical installation in a wall for door opening system.

3 Real World: One Step Further from Laboratory Conditions The goal of research should be to develop applications useful for daily usage. However, nowadays, most of the research is performed in laboratory conditions, which are far from “real world” conditions. While this laboratory conditions are interesting and necessary in the first steps, it is important to jump from laboratory to real world conditions. This implies to find a solution for a large number of problems that never appear inside the laboratory. In conclusion, the goal is not a fine trimming that provides a very small error in laboratory conditions. The goal is a system able to generalize (manage new samples not seen in the laboratory). It is important to emphasize that the classical Equal Error Rate (EER) for biometric system adjustment implies that the verification threshold is set up a posteriori (after knowing the whole set of test scores). While this is possible in laboratory conditions, this has no sense in a real world operation system. Thus, system performance measured by means of EER offers a limited utility. The Table 2 shows the system performance of the multimodal biometric system with two different set up methods. During four months, 102 people (70 genuine and 32 impostors) use the system. More than 900 unsupervised accesses were achieved. In the Laboratory set up method, we process the database acquired in “real world” condition using the set up configuration obtained with previous laboratory conditions experiments. In the “Real world” method we use a posteriori set up configuration to obtain the less EER. Table 2. “Real World” System performance with different set up methods Verification Method Laboratory set up “Real world” set up

FAR 5.1% 2.5%

FRR 15.3% 2.3%

EER 10.2% 2.4%

242

A. Morales et al.

Figure 5 shows an example of difference between laboratory set up and “Real world” set up. In this example we use the Hand Geometry Classifier Threshold versus FAR and FRR curves. On the left we use a 30 people database obtained in laboratory conditions. The best EER is obtained for -0.06 Hand Geometry Classifier Threshold. Working with the “Real World” database used for the Table 2 we observe -0.33 optimum threshold.

Fig. 5. Hand Geometry Classifier Threshold versus FRR and FAR, Laboratory Database (on the left) “Real World” database (on the right)

In Figure 6 we divide the “Real World” database in two different databases with the same length. We obtain similar thresholds in both databases. In this case, we can use the database 1 to obtain the set-up of the system.

Fig. 6. Hand Geometry Classifier Threshold versus FRR and FAR, “Real World” database 1 (on the left) “Real World” database 2 (on the right)

4 Conclusions In this paper we have presented a multimodal interface for biometric database acquisition. This system makes feasible the acquisition of four different biometric traits: hand-geometry, voice, on-line signature and still face image. The results obtained

Biometric System Verification Close to “Real World” Conditions

243

using the laboratory set up in a “Real World” system shows that we are far from the best set up options. To use set up information obtained from laboratory conditions experiment in “Real World” systems can be not advisable. In this paper we have emphasized the convenience of unsupervised database acquisition. Acknowledgments. This work has been supported by FEDER and MEC, TEC200613141-C03/TCM, and COST-2102.

References 1. Faundez-Zanuy, M., Fierrez-Aguilar, J., Ortega-Garcia, J., Gonzalez-Rodriguez, J.: Multimodal biometric databases: an overview. IEEE Aerospace and electronic systems magazine 21(9), 29–37 (2006) 2. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection performance. In: European speech Processing Conference Eurospeech 1997, vol. 4, pp. 1895–1898 (1997) 3. Samaria, F., Harter, A.: Parameterization of a stochastic model for human face identification. In: 2nd IEEE Workshop on Applications of Computer Vision, Sarasota (Florida) (December 1994) 4. Color FERET. Facial Image Database, Image Group, Information Access Division, ITL, National Institute of Standards and Technology (October 2003) 5. Roure-Alcobé, J., Faundez-Zanuy, M.: Face recognition with small and large size databases. In: IEEE 39th International Carnahan Conference on Security Technology ICCST’2005 Las Palmas de Gran Canaria, October 2005, pp. 153–156 (2005) 6. Faundez-Zanuy, M.: Door-opening system using a low-cost fingerprint scanner and a PC. IEEE Aerospace and Electronic Systems Magazine 19(8), 23–26 (2004) 7. Faundez-Zanuy, M., Fabregas, J.: Testing report of a fingerprint-based door-opening system. IEEE Aerospace and Electronic Systems Magazine 20(6), 18–20 (2005) 8. http://biolab.csr.unibo.it/research.asp?organize=Activities&s elect=&selObj=12&pathSubj=111%7C%7C12& 9. Faundez-Zanuy, M., Roure-Alcobé, J., Espinosa-Duró, V., Ortega, J.A.: An efficient face verification method in a transformed domain. Pattern recognition letters 28(7), 854–858 (2007)

Developing HEO Human Emotions Ontology Marco Grassi Department of Biomedical, Electronic and Telecommunication Engineering Università Politecnica delle Marche, Ancona, Italy [email protected]

Abstract. A big issue in the task of annotating multimedia data about dialogs and associated gesture and emotional state is due to the great variety of intrinsically heterogeneous metadata and to the impossibility of a standardization of the used descriptor in particular for the emotional state of the subject. We propose to tackle this problem using the instruments and the vision offered by Semantic Web through the development of an ontology for human emotions that could be used in the annotation of emotion in multimedia data, supplying a structure that could grant at the same time flexibility and interoperability, allowing an effective sharing of the encoded annotations between different users.

1 Introduction A great research effort has been made in recent years in the field of multimodal communication asserting how human language, gestures, gaze, facial expressions and emotions are not entities amenable to study in isolation. A cross-modal analysis of both verbal and non-verbal channel has to be carried out to capture all the relevant information involved in the communication, starting from different perspectives, ranging from advanced signal processing application to psychological and linguistic analysis. Working in the annotation of dialogues and associated gesture and emotional states for multimedia data this means to encode a great variety of metadata (data about data) that are intrinsically heterogeneous and to make such metadata effectively sharable amongst different users. This represents a big issue, in particular for the emotional state of the subject, which represents a key feature in the non-verbal communication, due to the impossibility of a standardization of the used descriptors. Different kind of supports means different kind of extracted features. Also the used approach for features extraction and emotion classification strongly determines the kind of information to deal with. A first distinction is between manual and automatic system. The difference is twofold, both in the kind of features used for the description and in the grain of the description. Human annotation uses all the nuances in the description that are present in literature while, at least at the moment, automatic recognition system usually describe just a small number of different states. Also inside these two groups, a high variability exists in the description of an emotion. Different automatic recognition methods, even when related with the same medium, are interested in different features. On the other side, in the scientific community the debate over human J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 244–251, 2009. © Springer-Verlag Berlin Heidelberg 2009

Developing HEO Human Emotions Ontology

245

emotions it’s still opened and there is not common agreement about which features are the most relevant in the definition of an emotion and which are the relevant emotions and their names. Due to this great and intrinsic heterogeneity it is impossible to define a standard and unique set of describers for the wide and variable spread of descriptors for human emotions that could grant at the same time flexibility (possibility to use a large set of descriptor adapt to every kind of description) and interoperability (possibility to share the understandable information between different users) [1]. We propose to tackle this problem using the instruments and the vision offered by Semantic Web in the encoding of the metadata. This means not only to encode the information in a machine-processable language but also to associate semantics to it, a well defined and unambiguous meaning that could make the information univocally interpreted even between different users. On this purpose, we are working at the development of HEO (Human Emotion Ontology), a high level ontology for human emotions, that could supply the most significant concepts and properties for the description of human emotions and that could be extended according to the use’s purpose by defining lower level concepts and properties related to more specific descriptions or by linking it to other more specialized ontologies. The purpose is to create a description framework that could grant at the same time flexibility and interoperability, that can be used for example in a software application for video annotation, allowing an effective sharing of the encoded information between different users.

2 Semantic Web and Ontologies The Semantic Web [2] is an initiative that aims at improving the current state of the World Wide Web. The main idea is to represent the information in a proper way that could encode also semantic in a machine-processable format and to use intelligent techniques to take advantage of these representations allowing more powerful and crossed information retrieval, through semantic queries and data connections that are far more evolved than the ones that simply rely on the textual representation of the information. Implementing the Semantic Web requires adding semantic metadata, data that describes data, to describe information resources. To such purpose, Semantic Web uses RDF (Resources Description Framework) [3], which provides a foundation for representing and processing metadata. Although often called language RDF is essentially a data model, whose basic building block is an object-attribute-value triple, called a statement. A resource is as an object, a “thing” we want to talk about. Resources may be authors, books, publishers, places, people, hotels, rooms, search queries, and so on. Properties are a special kind of resources that describe relations between resources, for example “written by”, “age”, “title”, and so on. Statements assert the properties of resources. Values can either be resources or literals (strings). In order to provide machine-accessibility and machine-processability, the RDF triples (x, P, y) can represented in a XML syntax. RDF is a powerful language that lets users describe resources using their own vocabularies. Anyhow RDF doesn’t make assumptions about any particular application domain, nor does it defines the semantics of any domain. To assure an effective sharing of the encoded information it is necessary to provide a shared understanding of a domain, to overcome differences in terminology. For this purpose it is necessary the introduction of ontologies, which therefore play a fundamental role in the Semantic Web.

246

M. Grassi

Ontologies basically deal with knowledge representation and can be defined as formal explicit descriptions of concepts in a domain of discourse (named classes or concepts), properties of each concept describing various features and attributes of the concept (roles or properties), and restrictions on property (role restrictions)[4]. An ontology together with a set of individual instances of classes constitutes a knowledge base. Ontologies make possible the sharing of common understanding about the structure of information among people or software agents. Once aggregated through an ontology these information can be used to answer user queries or as input data to other applications. Ontologies allow the reuse of the knowledge. This means that it’s possible to start from an existing ontology and to extended it for the own purpose or that in the building of a large ontology it is possible to integrate several existing ontologies describing portions of the large domain. In addition, through the definition of a taxonomical organization of concepts and properties, which are expressed in a hierarchical classification of super/sub concepts and properties, ontologies make possible reasoning. This mean that starting from the data and the additional information expressed in the form of ontology it is possible to infer new relationship between data. In Semantic Web different languages have been developed for writing ontologies, in particular RDF Schema and OWL. RDF Schema [5] can be seen as an RDF vocabulary and is a primitive ontology language. It offers certain modelling primitives with fixed meaning. Key concepts of RDF Schema are class, subclass relations, property, sub-property relations, and domain and range restrictions. OWL (Ontology Web Language) [6] is a language more specifically conceived for ontologies’ creation. OWL builds upon RDF and RDF Schema and XML-based RDF syntax is used; instances are defined using RDF descriptions and most RDFS modelling primitives are used. Anyhow, OWL introduces a number of features that are missing in RDF Schema like local scope of property, disjointness of classes, Boolean combination of classes (like union, intersection and complement), cardinality restriction and Special characteristics of properties (like transitive, unique or inverse). The OWL language provides three increasingly expressive sublanguages designed for use by specific communities of implementers and users. OWL Lite is intended for easy use and light computability. OWL DL offers maximum expressiveness without losing computational completeness (all entailments are guaranteed to be computed) and decidability (all computations will finish in finite time) of reasoning systems. OWL Full is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees. OWL DL in particular is based on the description logics [7], a solid logic formalism for knowledge representation and reasoning.

3 Ontologies and Languages for the Description of Human Emotion In the last years the study of emotions has dragged a growing attention from different research fields, ranging from advanced signal processing to psychology and linguistics. It is common agreement, between the researchers in the emotion field, that to full describe the wide variety of human emotions a small fixed set of archetypal emotion categories results too limiting. The great research effort of the last years has led to

Developing HEO Human Emotions Ontology

247

great advances in the understanding of emotions and the development of many different models and classification techniques for them. As a result, a standardization of the knowledge about emotions is becoming always more important but at the same time more difficult. At the state of the art, few works has been done in the development of complete and formal emotion ontologies, probably due to the difficulties mentioned above in the standardization of what defines an emotion and how to encode it. Some interesting works exist in the field of virtual human animation. In [8], an ontology for emotional face expression profiles has been developed for virtual humans, starting from facial animation concepts standardized in the MPEG-4 (FAPS) and defining some relationship with emotions through expression profiles that utilize psychological models of emotions. Even if not standardized into an ontology, very interesting works in the definition of proper languages for emotions description have been done in particular by the Humaine Project (http://emotion-research.net/) and by the W3C’s Emotion Markup Language Incubator Group (http://www.w3.org/2005/Incubator/emotion/). Inside the Humaine Project, the EARL language (Emotion Annotation and Representation Language) [9] has been developed for the annotation of audio-video databases and used with ANVIL software [10]. Even if not standardized into an ontology but expressed in XML Schema, EARL offers a powerful structure for emotions description. It makes in fact possible to specify the emotion categories, the dimensions, the intensity and even appraisals selecting the most appropriate case from a predefined list. Moreover, EARL includes elements to describe mixed emotions as well as regulation mechanisms like for example the degree of simulation or suppression. The Emotion Markup Language Incubator Group of the W3C (World Wide Web Consortium) has recently published a first report about the preliminary work in the definition of EmotionML, a markup language for the description and annotation of emotion [11]. Even it’s not sure if the group’s activity will lead in the future to a W3C standardization, it represent the most complete and authoritative reference for emotion descriptors in a machine interpretable language. Starting for 39 use cases in emotion annotation, emotion recognition and emotion generation and investigating the existing mark-up language (in particular EARL) the report proposes a collection of high level descriptors for the encoding of information about emotions and their related components. The proposed language allows to describe emotion both by category and by dimension and the modality in which it’s expressed (face, voice, body, text), to deal with appraisal, triggering events and the action tendencies.

4 An Overview of HEO The HEO ontology has been developed in OWL language with a threefold purpose. It provides a standardization of the knowledge of the emotion that can be useful in particular for people with little expertise in the emotion field. It allows the definition of a common vocabulary that can be used in describing emotion with a fully machine accessible semantics. It can to be used for the creation of the menus for a multimedia annotation software application. In fact, the taxonomical organization of classes and property defines a clear hierarchy of the descriptors and the restrictions introduced for properties and allowed datatypes limit the set of possible instances, allowing in this way the structuring of the annotation menus and submenus.

248

M. Grassi

Fig. 1. Graphical representation of HEO’s main classes and properties

As mentioned above, developing an ontology means to define a set of classes and properties describing a knowledge domain and to express their relationships. HEO introduces the basic class Emotion and a set of related classes and properties expressing the high level descriptors of the emotions. They represent the general structure for describing emotion, which can be refined introducing subclasses and subproperties that are representative of the specific model used in the description and can be extended by linking other ontologies. In the definition of such classes and properties we rely on the main descriptors introduced by the W3C Emotion Incubator Group. In figure 1, it is shown a representative graph of the most relevant classes and properties of the ontology. An emotion can be described both in a discrete way – by using the property hasCategory to classify the category of the emotion and in a dimensional way, by using the property hasDimension. In describing the emotion’s category it’s possible to refer to the 6 archetypal emotions (anger, disgust, fear, joy, sadness, surprise) introduced by Eckman [12] (using the ArchetypalCategory class), particularly used for automatic emotions classification through face expression recognition, or to the wider set of 48 categories defined by Douglas-Cowie, E., et al. (using the DouglasCowieCategory) [13]. Dealing with a discrete classification of the emotion it’s necessary to supply a measure of the entity of the emotion, this can be made using the hasIntensity property of the Category class, with value ranging between [0,1]. In literature there exist different set of dimensions for the representation of an emotion. A commonly used set for emotion dimension is the arousal, valence, dominance

Developing HEO Human Emotions Ontology

249

set, which are known in the literature also by different names, including "evaluation, activation and power", "pleasure, arousal, dominance". For this reason we introduced the ArousalValenceDominance subclass of the Dimension class, which has the properties hasArousal, hasValence, hasDominance. To overcome the different terminology, these properties have been mapped to the hasEvaluation, hasActivation, hasPower properties, by stating in OWL that they are equivalent object properties. Also emotion related features have been introduced into the ontology. The appraisal, the evaluation process leading to an emotional response, can be described using by the hasAppraisal property. The appraisals set proposed by Scherer [14] can be used by defining the property novelty, intrinsicPleasantness, goal-needSignificance, copingPotential and norm-selfSignificance of the subclass SchererAppraisal, which can assume values from -1 to 1 to describe positive or negative values of the properties. Also action tendencies play an important role in the description of an emotion because Emotions have a strong influence on the motivational state of a subject, for example anger is linked to the tendency to attack while fear is linked to the tendency to flee or freeze. Action tendencies can be viewed as a link between the outcome of an appraisal process and actual actions. The model by Frijda [15] can be used to describe the action tendencies through the ActionTendency’s subclass FrijdaActionTendency and its properties approach, avoid, being-with, attending, rejecting, non-attending, agonistic, interrupting, dominating and submitting with values ranging between [0,1]. Rarely the description of emotions rely with full-blown emotions, usually emotions are regulated in the emotion generation process, for this HEO use the hasRegulation property. A description of the regulation process can be supplied describing how the emotion is genuine, masked or simulated, through the class Regulation and its properties genuine, masked and simulated, with values ranging between [0,1]. Emotions can also be expressed through different channels, like face, voice, gesture and text. To his purpose HEO introduce the hasModality class and the subclasses Face, Voice, Gesture and Text for the Modality class. Such subclasses can be further refined by the introduction of specific ontology. An important parameter that has to be introduced in the annotation is the confidence of the annotation itself. Such parameter should be associated separately for each of the descriptor of emotions and related states. This can be made by defining a superclass EmotionDescriptor for every class describing the emotion (Category, Dimension, Appraisal, etc…) with the hasConfidence property, whose values ranging between [0,1]. Important information should also be supplied about the person affected by the emotion. To this purpose we propose to reuse the FOAF (Friend Of A Friend) ontology (http://www.foaf-project.org/), an ontology commonly used to describe person, their activities and their relations to other people and objects (figure 2). Such ontology presents a wide set of descriptors for information related to persons and in particular for their Web resources, like firstName, familyName, title, gender, mbox (email), weblog. Such descriptors can be extended by adding other properties to the Person main class, that are relevant in the for the emotion annotation, like age, culture, language, education and personality traits and defining the ObservedPerson’s subclass of the class Person, which can be connected with the HEO’s Emotion class through the property affectPerson. For the observed person, information should be supplied about the subject with whom he/she interacts (using the interactWith property for the

250

M. Grassi

Fig. 2. Linking HEO to FOAF ontology

ObservedPerson subclass) that could be another person or a device. In the former case it could be relevant to supply information about the kind of relationship that exists between the persons, as for example the degree of kinship or friendship, working relationship. In the latter case information about the device should be supplied. Information can also be supplied about who is performing the annotation, using the isAnnotatedBy property of Emotion’s class, in particular if it’s made by an human or a machine. In the former case, the descriptor added to the Person class could be relevant to analyze how emotions are differently perceived by a person, for example according to their culture. Another, interesting information should also be given about the experience in the annotation. To such purpose we propose the definition of a HumanAnnotator subclass of the Person class, with the hasAnnotationExperience with value ranging between [0, 1].

5 Conclusions and Future Efforts In this paper we have presented the HEO ontology that we are currently developing, describing its main classes and properties. The present ontology shall undergo to a revision process by experts in the field of emotion annotation and more classes and property will be added. More efforts will be also focused in linking HEO with other ontologies, for example with the existing MPEG-7 ontologies, that could be used to encode information about the multimedia content in which the emotion occurs. We are actually operating a survey for the existing software applications for multimedia annotation to discover the main features and requirements, before to proceed to development of web application for semantic multimedia annotation. Such application should allow the annotation of generic knowledge domains by importing descriptors from ontology files and encode the annotations in form of RDF statement in order to grant an intelligent management of the acquired information, through advanced queries and data connections.

Developing HEO Human Emotions Ontology

251

References 1. Scientific description of emotion - W3C Incubator Group Report, July 10 (2007), http://www.w3.org/2005/Incubator/emotion/ XGR-emotion/#ScientificDescriptions 2. Antoniou, G., Van Harmelen, F.: A Semantic Web Primer. Mit Press Cooperative Information Systems (2004) 3. Manola, F., Miller, E. (eds.): RDF Primer W3C Recommendation 10 (February 2004), http://www.w3.org/TR/REC-rdf-syntax 4. Kompatsiaris, Y., Hobson, P. (eds.): Semantic Multimedia and Ontologies: Theory and Applications. Springer, Heidelberg (2008) 5. Brickley, D., Guha, R.V. (eds.): RDF Vocabulary Description Language 1.0: RDF Schema W3C Recommendation, February 10 (2004), http://www.w3.org/TR/rdf-schema/ 6. McGuinness, L., van Harmelen, F.: OWL Web Ontology Language Overview W3C Recommendation, February 10 (2004), http://www.w3.org/TR/owl-features/ 7. Franz Baader, et al.: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003) 8. Garcia-Rojas, A., Raouzaiou, A., Vexo, F., Karpouzis, K., Thalmann, D., Moccozet, L., Kollias, S.: Emotional face expression profiles supported by virtual human ontology. Computer Animation and Virtual Worlds Journal, 259–269 (2006) 9. Schröder, M., Pirker, H., Lamolle, M.: First suggestions for an emotion annotation and representation language. In: Proceedings of LREC 2006 Workshop on Corpora for Research on Emotion and Affect, Genoa, Italy, pp. 88–92 (2006) 10. Elements of an EmotionML 1.0 - W3C Incubator Group Report, November 20 (2008), http://www.w3.org/2005/Incubator/emotion/XGR-emotionml-20081120 11. Kipp, M.: Spatiotemporal Coding in ANVIL. In: Proceedings of the 6th international conference on Language Resources and Evaluation, LREC 2008 (2008) 12. Ekman, P.: The Face Revealed. Weidenfeld & Nicolson, London (2003) 13. Douglas-Cowie, E., et al.: HUMAINE deliverable D5g: Mid Term Report on Database Exemplar Progress (2006), http://emotion-research.net/deliverables/D5g%20final.pdf 14. Scherer, K.R., Shorr, A., Johnstone, T. (eds.): Appraisal processes in emotion: theory, methods, research. Oxford University Press, Canary (2001) 15. Frijda, N.: The Emotions. Cambridge University Press, Cambridge (1986)

Common Sense Computing: From the Society of Mind to Digital Intuition and beyond Erik Cambria1 , Amir Hussain1 , Catherine Havasi2 , and Chris Eckl3 1

Dept. of Computing Science and Maths, University of Stirling, Scotland, UK 2 MIT Media Lab, MIT, Massachusetts, USA 3 Sitekit Labs, Sitekit Solutions Ltd, Scotland, UK {eca,ahu}@cs.stir.ac.uk,[email protected],[email protected] http://cs.stir.ac.uk/~ eca/commonsense

Abstract. What is Common Sense Computing? And why is it so important for the technological evolution of humankind? This paper presents an overview of past, present and future eﬀorts of the AI community to give computers the capacity for Common Sense reasoning, from Minsky’s Society of Mind to Media Laboratory’s Digital Intuition theory, and beyond. Is it actually possible to build a machine with Common Sense or is it just an utopia? This is the question this paper is trying to answer. Keywords: AI, Semantic networks, NLP, Knowledge base management.

1

Introduction

What magical trick makes us intelligent? - Marvin Minsky was wondering more than two decades ago - The trick is that there is no trick. The power of intelligence stems from our vast diversity, not from any single, perfect principle [1]. Human brain in fact is a very complex system, maybe the most complex in nature. The functions it performs are the product of thousands and thousands of diﬀerent subsystems working together at the same time. Such a perfect system is very hard to emulate: nowadays in fact there are plenty of expert systems around but none of them is actually intelligent, they just have the veneer of intelligence. The aim of Common Sense Computing is to teach computers the things we all know about the world and give them the capacity for reasoning on these things.

2

The Importance of Common Sense

Communication is one of the most important aspects of human life. Communicating has always a cost in terms of energy and time, since information needs to be encoded, transmitted and decoded. This is why people, when communicating with each other, provide just the useful information and take the rest for granted. This ‘taken for granted’ information is what we call Common Sense: obvious things people normally know and usually leave unstated. J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 252–259, 2009. c Springer-Verlag Berlin Heidelberg 2009

Common Sense Computing

253

We are not talking about the kind of knowledge you can find in Wikipedia but all the basic relationships among words, concepts, phrases and thoughts that allow people to communicate with each other and face everyday life problems. It’s a kind of knowledge that sounds obvious and natural to us but it is actually daedal and multifaceted: the illusion of simplicity comes from the fact that, as each new group of skills matures, we build more layers on top of them and tend to forget about the previous layers. Today computers lack this kind of knowledge. They do only what they are programmed to do: they only have one way to deal with a problem and, if something goes wrong, they get stuck. This is why nowadays we have programs that exceed the capabilities of world experts but are not one able to do what a three years old child can do. Machines can only do logical things, but meaning is an intuitive process: it can’t be reduced to zeros and ones. To help us work, computers must get to know what our jobs are. To entertain us, they need to know what we like. To take care of us, they have to know how we feel. To understand us, they must think as we think. We need to transmit computers our Common Sense knowledge of the world because soon there won’t be enough human workers to perform the necessary tasks for our rapidly aging population. To face this AI emergency, we’ll have to give them physical knowledge of how objects behave, social knowledge of how people interact, sensory knowledge of how things look and taste, psychological knowledge about the way people think, and so on. But having a database of millions of Common Sense facts will not be enough: we’ll also have to teach computers how to handle this knowledge, retrieve it when necessary, learn from experience, in a word we’ll have to give them the capacity for Common Sense reasoning.

3

The Birth of Common Sense Computing

It’s not easy to state when exactly Common Sense Computing was born. Before Minsky many AI researchers started to think about the implementation of a Common Sense reasoning machine: the very first one was maybe Alan Turing when, in 1950, he first raised the question “can machines think?”. But he never managed to answer that question, he just provided a method to gauge artificial intelligence: the famous Turing test. 3.1

The Advice Taker

The notion of Common Sense in AI is actually dated 1958, when John McCarthy proposed the ‘advice taker’ [2], a program meant to try to automatically deduce for itself a suﬃciently wide class of immediate consequences of anything it was told and what it already knew. McCarthy stressed the importance of finding a proper method of representing expressions in the computer and developed the idea of creating a property list for each object in which are listed the specific things people usually know about it. It was the first attempt to build a Common Sense knowledge base but, more important, it was the epiphany of the need of Common Sense to move forward in the technological evolution.

254

3.2

E. Cambria et al.

The Society of Mind Theory of Human Cognition

While McCarthy was more concerned with establishing logical and mathematical foundations for Common Sense reasoning, Minsky was more involved with theories of how we actually reason using pattern recognition and analogy. These theories were organized in 1986 with the publication of The Society of Mind, a masterpiece of AI literature containing an illuminating vision of how the human brain works. Minsky sees the mind made of many little parts called ‘agents’, each mindless by itself but able to lead to true intelligence when working together. These groups of agents, called ‘agencies’, are responsible to perform some type of function, such as remembering, comparing, generalizing, exemplifying, analogizing, simplifying, predicting, and so on. The most common agents are the so called ‘K-lines’ whose task is simply to activate other agents: this is a very important issue since agents are all highly interconnected and activating a K-line can cause a significant cascade of eﬀects. To Minsky, in fact, mental activity ultimately consists in turning individual agents on and oﬀ: at any time only some agents are active and their combined activity constitutes the ‘total state’ of the mind. K-lines are a very simple but powerful mechanism since they allow entering a particular configuration of agents that formed a useful society in a past situation: this is how we build and retrieve our problem solving strategies in our mind, this is how we should develop our problem solving strategies in our programs.

4

Towards Programs with Common Sense

Minsky’s theory was welcomed with great enthusiasm by the AI community and gave birth to many attempts to build Common Sense knowledge bases and exploit them to develop intelligent systems e.g. Cyc and WordNet. 4.1

The Cyc Project

Cyc [3] is one of the first attempts to assemble a massive knowledge base spanning human Common Sense knowledge. Initially started by Doug Lenat in 1984, this project utilizes knowledge engineers who handcraft assertions and place them into a logical framework using CycL, Cyc’s proprietary language. Cyc’s knowledge is represented redundantly at two levels: a frame language distinction (epistemological level), adopted for its eﬃciency, and a predicate calculus representation (heuristic level), needed for its expressive power to represent constraints. While the first level keeps a copy of the facts in the uniform user language, the second level keeps its own copy in diﬀerent languages and data structures suitable to be manipulated by specialized inference engines. Knowledge in Cyc is also organized into ‘microtheories’, resembling Minsky’s agencies, each one with its own knowledge representation scheme and sets of assumptions. 4.2

WordNet

Begun in 1985 at Princeton University, WordNet [4] is a database of words, primarily nouns, verbs and adjectives. It has been one of the most widely used

Common Sense Computing

255

resources in computational linguistics and text analysis for the ease in interfacing it with any kind of application and system. The smallest unit in WordNet is the word/sense pair, identified by a ‘sense key’. Word/sense pairs are linked by a small set of semantic relations such as synonyms, antonyms, is-a superclasses, and words connected by other relations such as part-of. Each synonym set, in particular, is called synset: it consists in the representation of a concept, often explained through a brief gloss, and represents the basic building block for hierarchies and other conceptual structures in WordNet.

5

From Logic Based to Common Sense Reasoning

Using logic-based reasoning can solve some problems in computer programming. However, most real-world problems need methods better at making decisions based on previous experience with examples, or by generalizing from types of explanations that have worked well on similar problems in the past. In building intelligent systems we have to try to reproduce our way of thinking: we turn ideas around in our mind to examine them from diﬀerent perspectives until we find one that works for us. Since computers appeared, our approach to solve a problem has always consisted in first looking for the best way to represent the problem, then looking for the best way to represent the knowledge needed to solve it and finally looking for the best procedure for solving it. This problem-solving approach is good when we have to deal with a specific problem but there’s something basically wrong with it: it lead us to write only specialized programs that cope with solving only that kind of problem. 5.1

The Open Mind Common Sense Project

Initially born from an idea of David Stork, the Open Mind Common Sense (OMCS) project [5] is a kind of second-generation Common Sense database: knowledge is represented in natural language, rather than using a formal logical structure, and information is not handcrafted by expert engineers but spontaneously inserted by online volunteers. The reason why Lenat decided to develop an ad-hoc language for Cyc is that vagueness and ambiguity pervade English and computer reasoning systems generally require knowledge to be expressed accurately and precisely. But, as expressed in the Society of Mind, ambiguity is unavoidable when trying to represent the Common Sense world. No single argument in fact is always completely reliable, but if we combine multiple types of arguments we can improve the robustness of reasoning as well as we can improve a table stability by providing it with many small legs in place of just one very big leg. This way information is not only more reliable but also stronger: if a piece of information goes lost we can still access the whole meaning, exactly as the table keeps on standing up if we cut out one of the small legs. Diversity is in fact the key of OMCS project success: the problem is not choosing a representation in spite of another but it’s finding a way for them to work together in one system.

256

5.2

E. Cambria et al.

Acquiring Common Sense by Analogy

In 2003 Timothy Chklovski introduced the cumulative analogy method [6]: a class of analogy-based reasoning algorithms that leverage existing knowledge to pose knowledge acquisition questions to the volunteer contributors. Chklovski’s Learner first determines what other topics are similar to the topic the user is currently inserting knowledge for, then it uses cumulative analogy to generate and present new specific questions about this topic. Because each statement consists of an object and a property, the entire knowledge repository can be visualized as a large matrix, with every known object of some statement being a row and every known property being a column. Cumulative analogy is performed by first selecting a set of nearest neighbors, in terms of similarity, of the treated concept and then by projecting known properties of this set onto not known properties of the concept and presenting them as questions. The replies to the knowledge acquisition questions formulated by analogy are immediately added to the knowledge repository, aﬀecting the similarity calculations. 5.3

ConceptNet

In 2004 Hugo Liu and Push Singh, refined the ideas of the OMCS project in ConceptNet [7], a semantic resource structurally similar to WordNet, but whose scope of contents is general world knowledge in the same vein as Cyc. While WordNet is optimised for lexical categorisation and Cyc is optimised for formalised logical reasoning, ConceptNet is optimised for making practical context-based inferences over real-world texts. In ConceptNet, WordNet’s notion of node in the semantic network is extended from purely lexical items (words and simple phrases with atomic meaning) to include higher-order compound concepts, e.g. ‘satisfy hunger’ or ‘follow recipe’, to represent knowledge around a greater range of concepts found in everyday life. Most of the facts interrelating ConceptNet’s semantic network in fact are dedicated to making rather generic connections between concepts. This type of knowledge can be brought back to Minsky’s K-lines as it increases the connectivity of the semantic network and makes it more likely that concepts parsed out of a text document can be mapped into ConceptNet. In ConceptNet version 2.0 a new system for weighting knowledge was implemented, which scores each binary assertion based on how many times it was uttered in the OMCS corpus, and on how well it can be inferred indirectly from other facts. In ConceptNet version 3.0 [8] users can also participate in the process of refining knowledge by evaluating existing statements. 5.4

Digital Intuition

The best way to solve a problem is to already know a solution for it but if we have to face a problem we have never met before we need to use our ‘intuition’. Intuition can be explained as the process of making analogies between the current problem and the ones solved in the past to find a suitable solution. Minsky

Common Sense Computing

257

attributes this property to the so called ‘diﬀerence-engines’, a particular kind of agents which recognize diﬀerences between the current state and the desired state and act to reduce each diﬀerence by invoking K-lines that turn on suitable solution methods. To emulate this ‘reasoning by analogy’ we use AnalogySpace [9], a process which generalizes Chklovski’s cumulative analogy. In this process, ConceptNet is first mapped into a sparse matrix and then truncated Singular Value Decomposition (SVD) is applied over it to reduce its dimensionality and capture the most important correlations. The entries in the resulting matrix are numbers representing the reliability of the assertions and their magnitude increases logarithmically with the confidence score. Applying SVD on this matrix causes it to describe other features that could apply to known concepts by analogy: if a concept in the matrix has no value specified for a feature owned by many similar concepts, then by analogy the concept is likely to have that feature as well. This process is naturally extended by the ‘blending’ technique [10], a new method to perform inference over multiple sources of data simultaneously, taking advantage of the overlap between them. This enables Common Sense to be used as a basis for inference in a wide variety of systems and applications so that they can achieve Digital Intuition about their own data.

6

Applications of Common Sense Computing

We are involved in an EPSRC project whose main aim is to further develop and apply the above-mentioned technologies in the field of Common Sense Computing to build a novel intelligent software engine that can auto-categorise documents, and hence enable the development of future semantic web applications whose design and content can dynamically adapt to the user. 6.1

Enhancing the Knowledge Base

The key to perform Common Sense reasoning is to find a good trade-oﬀ for representing knowledge: since in life no two situations are ever the same, no representation should be too concrete, or it will not apply to new situations, but, at the same time, no representation should be too abstract, or it will suppress too many details. ConceptNet already supports diﬀerent representations by maintaining diﬀerent ways of conveying the same idea with redundant concepts, but we plan to enhance this multiple representation by connecting ConceptNet with dereferenceable URIs and RDF statements to enlarge the Common Sense knowledge base on a diﬀerent level. We also plan to improve ConceptNet by giving a geospatial reference to all those pieces of knowledge that are likely to have one and hence make it suitable to be exploited by geographic oriented applications. The ‘knowledge retrieval’ is one of the main strengths of ConceptNet: we plan to improve it by developing games to train the semantic network and by pointing on social networking. We also plan to improve ConceptNet on the level of what Minsky calls ‘selfreflection’ i.e. on the capability of reflecting about its internal structure and

258

E. Cambria et al.

cognitive processes. ConceptNet in fact currently focuses on the kinds of knowledge that might populate the A-brain of a Society of Mind: it knows a great deal about the kinds of objects, events and other entities that exist in the external world, but it knows far less about how to learn, reason and reflect. We plan to give the semantic network the ability to self-check its consistency e.g. by looking for words that appear together in ways that are implausible statistically and ask users for verification or keeping trace of successful learning strategies. The system is also likely to remember attempts that led to particularly negative conclusions in order to avoid unproductive strategies in the future. To this end we plan to improve the ‘negative expertise’ of the semantic network, which is now just partially implemented by asking users to judge inferences, in order to give the system the capability to learn from its mistakes. 6.2

Understanding the Knowledge Base

Whenever we try to solve a problem, we continuously and almost instantly switch to diﬀerent points of view in our mind to find a solution. Minsky argues that our brains may use special machinery, that he calls ‘panalogy’ (parallel analogy), that links corresponding aspects of each view to the same ‘slot’ in a larger-scale structure that is shared across several diﬀerent realms. ConceptNet’s current knowledge representation doesn’t allow us to think about how to implement such a strategy yet, but, once we manage to give ConceptNet a multiple representation as planned, we’ll have to start thinking about it. At the moment SVD-based methods are used on the graph structure of ConceptNet to build the vector space representation of the Common Sense knowledge. Principal Component Analysis (PCA) is an optimal way to project data in the mean-square sense but the eigenvalue decomposition of the data covariance matrix is very expensive to compute. Therefore we plan to explore new methods such as Sparse PCA, a technique consisting in formulating PCA as a regression-type optimization problem, and Random Projection, a method less reliable than PCA but computationally cheaper. 6.3

Exploiting the Knowledge Base

Our primary goal is to build an intelligent auto-categorization tool which uses Common Sense Computing, together with statistical methods, to make the document classification more accurate and reliable. We also plan to apply Common Sense Computing for the development of emotion-sensitive systems. By analysing users’ Facebook personal messages, emails, blogs, etc., the engine will be able to extract users’ emotions and attitudes and use this information to be able to better interact with them. In the sphere of e-games, for example, such an engine could be employed to implement conversational agents capable to react to user’s frame of mind changes and thus enhance players’ level of immersion. In the field of enterprise 2.0 the engine could be used to develop customer care applications capable to measure users’ level of satisfaction. In the field of e-health, finally, PDA-microblogging

Common Sense Computing

259

techniques could be applied to seize clinical information from patients by studying their pieces of chat, tweets or SMS, and an e-psychologist could be developed to provide help for light psychological problems.

7

Conclusions

It is hard to measure the total extent of a person’s Common Sense knowledge, but a machine that does humanlike reasoning might only need a few dozen millions of items of knowledge. Thus we would be tempted to give a positive answer to the question “is it actually possible to build a machine with Common Sense?”. But, as we saw in this paper, Common Sense Computing is not just about collecting Common Sense knowledge: it’s about how we represent it and how we use it to make inferences. We made very good progress in performing this since McCarthy’s ‘advice taker’ but we are actually still scratching the surface of human intelligence. So we can’t give a concrete answer to that question, not yet. The road to the creation of a machine with the capacity of Common Sense reasoning is still long and tough but we feel that the path undertaken so far is a good one. And, even if we fail in making machines intelligent, we believe we’ll be able to at least teach them who we are and thus make them able to better contribute to the technological evolution of human kind.

References 1. Minsky, M.: The Society of Mind. Simon and Schuster (1986) 2. McCarthy, J.: Programs with Common Sense. In: Proceedings of the Teddington Conference on the Mechanization of Thought Processes (1959) 3. Lenat, D.: Cyc: toward programs with common sense. ACM Press, New York (1990) 4. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998) 5. Singh, P.: The Open Mind Common Sense Project. KurzweilAI.net (2002) 6. Chklovski, T.: Learner: a system for acquiring commonsense knowledge by analogy. K-CAP (2003) 7. Liu, H., Singh, P.: ConceptNet: a practical commonsense reasoning toolkit. BT Technology Journal (2004) 8. Havasi, C., Speer, R., Alonso, J.: ConceptNet 3: a Flexible, Multilingual Semantic Network for Common Sense Knowledge. RANLP (2007) 9. Speer, R., Havasi, C., Lieberman, H.: AnalogySpace: Reducing the Dimensionality of Common Sense Knowledge. AAAI (2008) 10. Havasi, C., Speer, R., Pustejovsky, J., Lieberman, H.: Digital Intuition: Applying Common Sense Using Dimensionality Reduction. IEEE Intelligent Systems (2009)

On Development of Inspection System for Biometric Passports Using Java Luis Terán and Andrzej Drygajlo Speech Processing and Biometrics Group Swiss Federal Institute of Technology Lausanne (EPFL) Switzerland {luis.terantamayo,andrzej.drygajlo}@epfl.ch http://scgwww.epfl.ch/

Abstract. Currently it is possible to implement Biometric Passport applets according to ICAO specifications. In this paper, an ePassport Java Card applet, according to ICAO specifications using the Basic Access Control security, is developed. A system for inspection of the ePassport applet, using Java, in order to test its functionalities and capabilities is also implemented. The simulators, which are developed in this paper, can display the communication between the inspection system and the Java Cards, which could be real or emulated cards. Keywords: Biometrics, ePassport, Java Card, Inspection System, Basic Access Control.

1 Introduction Over the last two years, Biometric Passports have been introduced in many countries to improve the security in Inspection Systems and enhance procedures and systems that prevent identity and passport fraud. Along with the deployment of new technologies, countries need to test and evaluate its systems since the International Civil Aviation Organization (ICAO) provides the guidelines, but the implementation is up to each issuing country. The specific choice of each country as to which security features to include or not include makes a major difference in the level of security and privacy protection available. Table 1. ePassport Deployments Country

Security

Biometric

Italy

RFID Type Deployment 14443

2006

Passive, Active Authentication, BAC

Photo

U.S.

14443

2005

Passive, Active Authentication, BAC

Photo

Netherlands

14443

2005

Passive, Active Authentication, BAC

Photo

Germany

14443

2005

Passive, Active Authentication, BAC

Photo

J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 260–267, 2009. © Springer-Verlag Berlin Heidelberg 2009

On Development of Inspection System for Biometric Passports Using Java

261

In this paper we focus in the development of an Inspection System for Biometric Passports. The standards and specifications of Machine Readable Travel Documents (MRTD’s) are described in Section 2. The technologies used for the implementation of the inspection system for Biometric Passports, and a description of the implementation and its utilization are described in Sections 3. The conclusions are drawn in Section 4. Table 1 shows some examples of ePassport implementations and the security features selected.

2 ICAO Specifications The Machine Readable Travel Document (MRTD) is an international travel document, which contains machine-readable data of the travel document holder. MRTDs facilitate the identification of travelers and enhance the security levels. MRTDs are developed with the assistance of the ICAO’s Technical Advisory Group for Machine Readable Travel Documents (ICAO TAG/MRTD) and the ISO Working Group 3 (JTC1/SC17/WG3). MRTDs such as passports, visas or other travel documents, have the following key issues to be considered: Global Interoperability, Uniformity, Technical Reliability, Practicality, and Durability. ICAO has elaborated different technical reports. The main technical reports, which are part of ICAO specifications and considered in this paper, are: Logical Data Structure Technical Report and Public Key Infrastructure Technical Report. 2.1 Logical Data Structure The Logical Data Structure (LDS) technical report specifies a global and interoperable data structure for recording identity details including biometric data. The data stored in the LDS is read electronically and designed to be flexible and expandable for future needs. A series of mandatory and optional elements has been defined for LDS, which are used in MRTDs. The use of biometric data is optional except the use of the encoded face. Each data group is stored in different Elementary Files (EF’s). The structure and coding of data objects are defined in ISO/IEC 7816-4 [1]. Each data object is encoded using Tag - Length - Value (TLV). Any data object is denoted {T - L - V} with a tag field followed by a length field encoding a number, which represents the size of the value field. If the size is equal to zero, the data field is absent. A constructed data object is denoted {T - L - V {T1 - L1 - V1}...{Tn - Ln Vn}}, which represent a concatenation and interweaving of data objects. This type of structure is used in Data Groups containing more than one value field, which are preceded by specific Tag and Length field. Figure 1 shows the complete structure of LDS, which includes mandatory and optional data elements defined for LDS (version 1.7)

262

L. Terán and A. Drygajlo

Fig. 1. Logical Data Structure

2.2 Public Key Infrastructure The aim of the Public Key Infrastructure (PKI) report scheme is to enable MRTDinspecting authorities (Receiving States) to verify the authenticity and integrity of the data stored in the MRTD. ICAO has specified two types of authentications: Passive and Active Authentication. The main treats of a contactless IC chip compared with a traditional contact chip are that the information stored in a contactless chip could be read without opening the document, and that an unencrypted communication between a chip and a reader could be eavesdropped. The use of Access Control is optional. If it is implemented, an inspection system must prove that it has access to the IC chip. ICAO technical report PKI for Machine Readable Travel Documents offering ICC Read-Only Access [4] provides specifications of Basic Access Control and Secure Messaging. Basic Access control consists of tree steps as described next: Derivation of Document Basic Access Keys (KENC and KMAC)

In order to get access to an IC chip and set up a Secure Channel, an inspection system derives the Document Basic Access Keys (KENC and KMAC) from the MRZ as is mentioned next: -

The inspection system reads the MRZ. A field called MRZ_information consists of a concatenation of the fields: Document number, date of birth, and date of expiry, as is shown next: LINE 1:

P < U S AAM OS S < < F R AN K< < < < < < < < < < < < < < < < < < < < < < < < < <
0, a matching algorithm is δ-secure against the wolf attack if WAP < δ. Une, Otsuka, and Imai [2] showed that there exist strong wolves with extremely large p’s for typical matching algorithms of two modalities, fingerprint-minutiae and finger-vein patterns (cf.[4],[5]). If we use only FAR as a security measure, then we cannot precisely evaluate the security against the wolf attack. Therefore we should use not only FAR but also WAP to evaluate the security against strong intentional impersonation attacks such as the wolf attack.

3.3

The False Rejection Rate

The false rejection rate (FRR) is the probability that a user claiming a correct identity will be rejected, namely FRR = Ave P[match(u, u) = reject] = Ave Ave (1 − Ps (u)). $ $ $ u← −U u← −U s← −Xu

(4)

298

4

Y. Kojima et al.

Our Proposed Matching Algorithm

In this section, we construct a matching algorithm secure against the wolf attack, and show the accuracy and security of the constructed algorithm. We embed a wolf-judgement function W O LF in a traditional one-to-one matching algorithm. Our proposed matching algorithm prop-match is defined as follows. The enrollment phase of prop-match is the same as that of traditional match. Let W O L F be a function which, for any feature vector s ∈ M and any user w ∈ U outputs the number of the valid user w′ such that w′ = w and dec(s, t( w′ )) = sim, where t( w′ ) is a stored template of w′ , namely W OLF (s, w) = |{w′ ∈ U | w′ = v, dec(s, tw′ ) = sim}|. First, a user v claims the identity idw and presents his/her biometric sample bv . For an input value s generated from bv and a template t stored with idw , if dec(s, t) = dissim, then match outputs “rejected”. Otherwise prop-match calculates W OLF (s, w). If W OLF (s, w) ≤ T − 1, then prop-match outputs “accept” and otherwise prop-match outputs “reject.” The proposed algorithm prop − match rejects all samples which show the similarity to T or more templates. Intuitively, prop − match recognizes such samples as wolf-samples and rejects them. The verification phase is described of our proposed matching algorithm in Fig. 3

User

v ←U

System

bv , id w

⎯ ⎯→ (t , id w ) is taken from the DB s ← X v (s is generated from bv ) match(v, w) = accept if

dec(s, t ) = sim WOLF (s, w) ≤ T − 1

reject otherwise

Fig. 3. Verification phase in our proposed algorithm

4.1

The False Acceptance Rate for Prop-Match

Let FARprop denote the false acceptance rate of our proposed algorithm, namely, which as calculated as follows :

T −1

Ave Ps (v) FARprop = Ave $ $ −Xu (u,v)← −(U×U)× s← k=0

I⊂U \{v} ♯I= k

v′ ∈I

Ps (v ′ )

(1 − Ps (v ′′ )). (5)

v ′′ ∈(U \{v})\I

fx denotes the product of the fx , x ∈ S. From (2), (5), it is clear that

x∈S

T −1 prop

FAR

≤ FAR, since

k=0

I⊂U \{v} ♯I= k

v ′ ∈I

Ps (v ′ )

(1 − Ps (v ′′ )) ≤ 1.

v ′′ ∈(U \{v})\I

A Matching Algorithm Secure against the Wolf Attack

4.2

299

The Wolf Attack Probability for Prop-Match

Let WAPprop denote the wolf attack probability of our proposed algorithm, namely, which as calculated as follows : T −1

WAPprop = max Ave Ave Ps (v) $ a∈A $ −U s← −Xa v← k=0

I⊂U \{v} ♯I=k

Ps (v ′ )

v ′ ∈I

(1 − Ps (v ′′ )).

v ′′ ∈(U \{v})\I

(6) Theorem 1. Our proposed matching algorithm is attack, namely we have WAPprop ≤

T -secure against the wolf |U |

T . |U |

Proof. From (6), we have T −1

max Ave Ave Ps (v) $ a∈A $ −U s← −Xa v← k=0 = max a∈A

= max a∈A

≤ max a∈A

1 |U | 1 |U | 1 |U |

I⊂U \{v} ♯I=k

Ps (v ′ )

v ′ ∈I

(1 − Ps (v ′′ ))

v ′′ ∈(U \{v})\I

T −1

P(Xa = s) Ps (v)

Ps (v ′ )

k=0 I⊂U \{v}v ′ ∈I ♯I=k

v∈U s∈M

(1 − Ps (v ′′ ))

v ′′ ∈(U \{v})\I

T −1

s∈M

(k + 1)

P(Xa = s)

I⊂U ♯I=k+1

k=0

Ps (v ′ )

v ′ ∈I

(1 − Ps (v ′′ ))

v ′′ ∈U \I

T −1

P(Xa = s) T s∈M

k=0

I⊂U ♯I=k+1

Ps (v ′ )

v′ ∈I

(1 − Ps (v ′′ )) ≤

v ′′ ∈U \I

T . |U | (7)

The result follows. 4.3

The False Rejection Rate for Prop-Match

Let FRRprop denote the false rejection rate of our proposed algorithm, namely, which as calculated as follows : FRRprop = Ave P[match(u, u) = reject] $ u← −U ⎛

|U|−1

⎜ = Ave Ave ⎝ {1 − Ps (u)}+ Ps (u) $ $ u← −Us← −Xu k=T

I⊂U \{u} ♯I=k

Ps (v ′ )

v ′ ∈I

k=T

I⊂U \{u} v ′ ∈I ♯I=k

⎞

⎟ {1 − Ps (v ′′ )}⎠

v ′′ ∈(U \{u})\I

|U|−1

= FRR + Ave Ave Ps (u) $ $ u← −U s← −Xu

Ps (v ′ )

{1 − Ps (v ′′ )}.

v ′′ ∈(U \{u})\I

(8)

300

Y. Kojima et al.

Let Y be a random variable representing the distribution of W O LF on ∐ {u}× u∈U

Xu , namely

P(Y = k) = Ave Ave P[W O LF (s, u) = k] $

$

u←U s←X

= Ave Ave $

$

u←U s←X

I⊂U \{u} #I=k

v ′ ∈I

Ps (v ′ )

(1 − Ps (v ′′ )) .

(9)

v′′ ∈(U \{u})\I

From (8), we have |U|−1

FRRprop ≤ FRR +

P(Y = k) = FRR + P(Y ≥ T )

(10)

k=T

Theorem 2. The mean and the standard deviation of Y are µ and σ, respectively. If T = µ + aσ, then we have 1 FRRprop ≤ FRR + 2 . (11) a 1 Proof. From Chebyshev’s inequality, we have P(|Y − µ| ≥ aσ) ≤ 2 . If Y ≥ T , a then |Y − µ| ≥ aσ. Therefore we have 1 (12) P(Y ≥ T ) ≤ P(|Y − µ| ≥ aσ) ≤ 2 . a From (10), the result immediately follows. ⊓ ⊔

5

Conclusions

By Theorem 1 and Theorem 2, if T is carefully chosen, then our proposed algorithm prop-match is not only secure but also accurate. We remark that our proposed algorithm do not need to calculate the entropy of the distributions during the verification phase. Therefore it can be more eﬃcient than the algorithm proposed by Inuma, Otsuka, and Imai [1].

References 1. Inuma, M., Otsuka, A., Imai, H.: Theoretical Framework for Constructing Matching Algorithms in Biometric Authentication Systems. In: ICB, pp. 806–815 (2009) 2. Une, M., Otsuka, A., Imai, H.: Wolf Attack Probability: A Theoretical Security Measure in Biometric Authentication Systems. IEICE Transactions on Information and Systems 91(5), 1380–1389 (2008) 3. Matsumoto, T., Matsumoto, H., Yamada, K., Hoshino, S.: Impact of Artificial Gummy Fingers on Fingerprint Systems. In: Proceedings of SPIE, vol. 4677, pp. 275–289 (2002) 4. Ratha, N., Connell, J., Bolle, R.: Enhancing Security and Privacy in Biometricsbased Authentication Systems. IBM systems journal 40(3), 614–634 (2001) 5. Miura, N., Nagasaka, A., Miyatake, T.: Feature Extraction of Finger Vein Patterns Based on Iterative Line Tracking and Its Application to Personal Identification. Systems and Computers in Japan 35(7) (2004)

A Region-Based Iris Feature Extraction Method Based on 2D-Wavelet Transform Nima Tajbakhsh1, Khashayar Misaghian2, and Naghmeh Mohammadi Bandari3 1

Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, University of Tehran, Iran 2 Biomedical Engineering Department, Iran University of Science and Technology, Tehran, Iran 3 Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Iran [email protected], [email protected], [email protected]

Abstract. Despite significant progress made in iris recognition, handling noisy and degraded iris images is still an open problem and deserves further investigation. This paper proposes a feature extraction method to cope with degraded iris images. This method is founded on applying the 2D-wavelet transform on overlapped blocks of the iris texture. The proposed approach enables us to select the most informative wavelet coefficients providing both essential texture information and enough robustness against the degradation factors. Our experimental results on the UBIRIS database demonstrate the effectiveness of the proposed method that achieves 4.10%FRR (@ FAR=.01 %) and 0.66% EER. Keywords: Noisy iris recognition, Region-based image processing, 2D-wavelet transform.

1 Introduction The importance of security is an undeniable fact that plays a crucial role in our societies. Having a high level of security against terrorist attacks prompts governments to tighten security measures. Undoubtedly, employing biometric traits constitutes a fundamental part of governments’ efforts to provide national security. Among the proposed biometrics, the iris is known as the most accurate one which is broadly deployed in commercial recognition systems. Pioneering work on iris recognition –as the basis of many commercial systems– was done by Daugman [1]. In this algorithm, the 2D Gabor filters are adopted to extract oriented–based texture features corresponding to a given iris image. After the Daugman’s work, several researchers [2-6] proposed their own feature extraction methods to achieve more compacted codes and to accelerate the decision-making process. Although their efforts have led to a great progress from the viewpoint of the computational time and the accuracy, the iris-based recognition systems still suffer from lack of acceptability. This mainly originates from the fact that subjects are reluctant to participate in image acquisition process repeatedly until the system manages to record an ideal iris image. Accordingly, devising some methods that are J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 301–307, 2009. © Springer-Verlag Berlin Heidelberg 2009

302

N. Tajbakhsh, K. Misaghian, and N.M. Bandari

capable to handle low quality iris images can be considered as an effective approach to increase the acceptability of the iris-based recognition systems. In this paper, we propose a region-based feature extraction method based on 2D-Discrete Wavelet Transform (2D-DWT) that aims at giving a general presentation of the iris texture in a way less affected by degradation factors. The rest of the paper is organized as follows: Section 2 introduces related works of the literature. Section 3 presents the proposed feature extraction method. Experimental results are given in section 4; finally, section 5 concludes this paper.

2 Related Works In this section, we focus our attention to methods that are basically developed based on the wavelet transform, and readers for more information about state-of-the-art methods are referred to a comprehensive survey [7] conducted by Bowyer et al. In literature of the iris recognition, the wavelet transform constitutes the basis of many well-known feature extraction methods. These methods could roughly be divided into two categories. The first one is compromised of methods utilizing the 1D-wavelet transform as the core of the feature extraction module [3, 5, 6, 8-12]. For instance, Boles and Boashash [3] apply the 1D-wavelet transform at various resolution levels of a virtual circle on an iris image, and Ma et al. [5,6] utilize the wavelet transform to extract sharp variations of the intensity signals. Methods that are categorized in the second group, utilize 2D-wavelet transform to extract the iris texture information [4, 13-18]. For instance, Lim et al. [4] use the 2D-wavelet decomposition and generate a feature vector for each given pattern comprised of the resulting coefficients from

HH 4 sub–

image and average of the coefficients contained in the HH1 , HH 2 , HH3 sub–images. A rather similar approach based on the 2D-wavelet decomposition is also proposed by Poursaberi and Araabi [18] in which the wavelet coefficients of the LH 3 , HL3 and

LH 4 , HL4 sub–images are extracted and coded based on the signs of the coefficients. In the both methods, global information of the iris texture is obtained through applying 2D-DWT on the whole iris texture. However, analyzing the texture in this way provides no specific solution toward local noisy regions of the texture and motion blurred iris images. Furthermore, such a general texture presentation cannot reveal region-based iris information that plays a crucial role in the decision-making process. To compensate for mentioned drawbacks, a possible solution is to apply the 2D-DWT on sub-regions of the iris that provides more textural information and enables us to select the most discriminating coefficients regarding to the quality of captured images.

3 Proposed Feature Extraction Method In the proposed method, we make use of a region-based approach founded on the 2DDWT for the following reasons:

A Region-Based Iris Feature Extraction Method Based on 2D-Wavelet Transform

303

• Region-based image decomposition gives abundant textural information and consequently achieves more accurate recognition system. • Partitioning the iris texture splits noisy regions into several blocks and this potentially reduces adverse effects of the local noisy regions to a minimum. • Region-based approach permits us to benefit from local and global information of every block thus provides both essential texture features and a high level of robustness against the degradation factors. • Selecting just few coefficients from a block facilitates registration between two iris samples and as a result, the inherent similarity between images of a subject is better revealed. The feature extraction method begins with partitioning the iris texture to 32x32 pixels blocks with 50% overlap in both directions. Then, the 2D-wavelet decomposition is performed on every block. Through an optimization process on the training set, six wavelet coefficients with the most discriminating power less affected by the degradation factors are selected. Positions of the selected coefficients in resulting sub-images are depicted in Figure 1. By putting together extracted values for each coefficient, a matrix is created. To encode six generated matrices, we follow two coding strategies to achieve the best results on the training set. The first coding method is based on the signs of the extracted coefficients and the second one is founded on the zero crossing of first derivate of matrices along the horizontal direction. We apply both coding strategies to encode two matrices created for vertical details at the fourth level that result in four binary matrices. The remaining matrices generated from the approximation coefficients of the fourth level of decomposition are coded based on the second coding approach. At last, the eight binary matrices are concatenated and final binary matrix corresponding to an iris pattern is produced.

4 Experiments In this section, at first, we briefly describe the UBIRIS database that is used in our experiments, then, feature extraction methods we use to compare our results are introduced and at last, the experimental results are presented. 4.1 UBIRIS Database The UBIRIS database is composed of 1877 images from 241 European subjects captured in two different sessions. The images in the first session are gathered in a way that adverse effects of the degradation factors reduce to a minimum, whereas the images captured in the second session have irregularities in reflection, contrast, natural luminosity, and focus. Figure 2 shows some samples of the UBIRIS database. The rationale behind choosing this database is to examine effectiveness of our method dealing with a variety of the degradation factors that may occur as a result of relaxing constraints on subjects’ behavior.

304

N. Tajbakhsh, K. Misaghian, and N.M. Bandari

Fig. 1. An overview of the proposed feature extraction method. In this figure, generating two binary matrices corresponding to a coefficient of vertical detail sub-image is depicted.

Fig. 2. Iris samples from the UBIRIS database

4.2 Methods Used for Comparison To compare our approach with other methods, we use feature extraction strategies suggested in [18, 19]. We select these methods because the wavelet–based method [18] yields results that are comparable with several of state–the–art–methods and the method based on Gauss-Laguerre filter [19] as a filter–based one, by generating a binary matrix similar to IrisCode [1] can be considered as a Daugman–like algorithm. Furthermore, corresponding authors of both papers provided us with all source codes, thus permitting us to have a fair comparison. During our experiments, the segmentation method is same for all methods, no strategy is adopted for detecting the eyelids and eyelashes, and we just discard the upper half of the iris to eliminate the eyelashes. Moreover, there are few iris images suffering from nonlinear texture deformation because of mis-localization of the iris. We deliberately do not modify and let them to enter the feature extraction and

A Region-Based Iris Feature Extraction Method Based on 2D-Wavelet Transform

305

matching process modules. Although segmentation error can significantly increase the overlap between inter– and intra–class distributions [20], this can simulate what happens in practical applications and also permits us to evaluate the robustness of the proposed method and those suggested in [18, 19] dealing with the texture deformation. 4.3 Results We evaluate the performance of the proposed method in verification mode. To do this, we create the training set by choosing one high quality and one low quality iris images per subject and put all remaining iris images (eight images per subject) in the test set.

Fig. 3. Distribution of inter- and intra-class comparisons obtained through evaluating the proposed method on the UBIRIS database

Fig. 4. Roc plots of the suggested methods in [18, 19] and the one proposed for the UBIRIS database

306

N. Tajbakhsh, K. Misaghian, and N.M. Bandari

Table 1. Comparison between the error rates obtained from the proposed method and algorithms suggested in [18, 19] for the UBIRIS Database Method

EER

FR

Proposed Poursaberi [18] Ahmadi [19]

0.66 2.65 2.46

18.41 6.95 6.63

FRR (@ FAR=.01%) 4.10 7.80 10.15

To compensate for rotation of the eye during acquisition process, we store twelve additional iris codes for six rotations on either side by horizontal shifts of 2, 4, 6, 8, 10, and 12 pixels each way in the normalized images. Therefore, to measure dissimilarity of two iris patterns, thirteen comparisons are made and the minimum distance is considered as the dissimilarity value. To form inter- and intra-class distributions, the test samples are compared against two training samples of each subject according to the mentioned shifting strategy and the minimum distance is chosen. Figure 3 shows resulting distributions for inter and intra-class comparisons. The outliers in the tail of the intra-class distribution are generated during comparing of two iris samples of a subject that at least one of them seriously suffers from the degradation factors. In other words, despite taking the proposed course of action, there has been some degraded iris images that mistakenly are rejected by the system. The Receiver Operating Curves (ROCs) of the proposed method and the other ones are depicted in Figure 4. As can be seen, the proposed method outperforms and gives the highest performance. Table 1 enables quantitative comparison between our method and the implemented ones. From Table 1, it is seen that our method achieves the least equal error rate (EER) and the smallest amount of False Rejection Rate (FRR) while provides the maximum fisher ratio (FR) explaining more separability of inter- and intra-class distributions.

5 Conclusion This paper proposed a new feature extraction method to deal with iris images that suffer from the local noisy regions and other degradation factors like motion blurriness and lack-of-focus. On the understanding that the region-based image processing makes it possible to handle noisy mages, we utilized the 2D-DWT to obtain an informative regional presentation of the iris. Analyzing the texture in this way enabled us to locate the most discriminating coefficients providing enough robustness against the degradation factors. Although selecting just few reliable coefficients leads to miss some important details of the iris structures, this is the price we have to pay to achieve a robust presentation. We evaluated the performance of the proposed approach using the UBIRIS database in verification mode. The experimental results confirmed the efficiency of our approach compared with methods suggested in [18, 19] where 0.66% of EER and 4.10% of FRR (@ FAR=.01%) were obtained.

A Region-Based Iris Feature Extraction Method Based on 2D-Wavelet Transform

307

References [1] Daugman, J.: High Confidence Visual Recognition of Persons by a Test of Statistical Independence. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1148–1161 (1993) [2] Wildes, R.P.: Iris Recognition: An Emerging Biometric Technology, vol. 85, pp. 1348– 1363. IEEE Press, Los Alamitos (1997) [3] Boles, W.W., Boashash, B.: A Human Identification Technique Using Images of the Iris and Wavelet Transform. IEEE Transactions on Signal Processing 46(4), 1085–1088 (1998) [4] Lim, S., Lee, K., Byeon, O., Kim, T.: Efficient Iris. Recognition through Improvement of Feature Vector and. Classifier. ETRI Journal 23(2), 61–70 (2001) [5] Ma, L., Tan, T., Wang, Y., Zhang, D.: Efficient Iris Recognition by Characterizing Key Local Variations. IEEE Transactions on Image Processing 13(6), 739–750 (2004) [6] Ma, L., Tan, T., Wang, Y., Zhang, D.: Local Intensity Variation Analysis for Iris Recognition”. Pattern Recognition 37(6), 1287–1298 (2004) [7] Bowyer, K.W., Hollingsworth, K., Flynn, P.J.: Image Understanding for Iris Biometrics: A Survey. Computer Vision and Image Understanding (2), 281–307 (2008) [8] Chena, C., Chub, C.: High Performance Iris Recognition based on 1-D Circular Feature Extraction and PSO–PNN Classifier. Expert Systems with Applications (Article in press) doi:10.1016/j.eswa.2009.01.033 [9] Huang, H., Hu, G.: Iris Recognition Based on Adjustable Scale Wavelet Transform. In: 27th Annual International Conference of the Engineering in Medicine and Biology Society, Shanghai, pp. 7533–7536 (2005) [10] Huang, P.S., Chiang, C., Liang, J.: Iris Recognition Using Fourier-Wavelet Features. In: 5th International Conference Audio- and Video-Based Biometric Person Authentication, Hilton Rye Town, pp. 14–22 (2005) [11] Tajbakhsh, N., Araabi, B.N., Soltanian–Zadeh, H.: Noisy Iris Verification: A Modified Version of Local Intensity Variation Method. Accepted in 3rd IAPR/IEEE International Conference on Biometrics, Alghero (2009) [12] Chena, C., Chub, C.: High performance iris recognition based on 1-D circular feature extraction and PSO–PNN classifier. Expert Systems with Applications (Article in press) [13] Son, B., Kee, G., Byun, Y., Lee, Y.: Iris Recognition System Using Wavelet Packet and Support Vector Machines. In: 4th International Workshop on Information Security Applications, Jeju Island, pp. 365–379 (2003) [14] Kim, J., Cho, S., Choi, J., Marks, R.J.: Iris Recognition Using Wavelet Features. The Journal of VLSI Signal Processing (38), 147–156 (2004) [15] Cho, S., Kim, J.: Iris Recognition Using LVQ Neural Network. In: International conference on signals and electronic systems, Porzan, pp. 155–158 (2005) [16] Alim, O.A., Sharkas, M.: Iris Recognition Using Discrete Wavelet Transform and Artificial Neural Networks. In: IEEE International Symposium on Micro-Nano Mechatronics and Human Science, Alexandria, pp. 337–340 (2005) [17] Narote, S.P., Narote, A.S., Waghmare, L.M., Kokare, M.B., Gaikwad, A.N.: An Iris Recognition Based on Dual Tree Complex Wavelet Transform. In: IEEE TENCON 2007, pp. 1–4 (2007) [18] Poursaberi, A., Araabi, B.N.: Iris Recognition for Partially Occluded Images: Methodology and Sensitivity Analysis. EURASIP Journal on Advances in Signal Processing 2007(1), 12 pages (2007) [19] Ahmadi, H., Pousaberi, A., Azizzadeh, A., Kamarei, M.: An Efficient Iris Coding Based on Gauss–Laguerre Wavelets. In: second IAPR/IEEE International Conference on Biometrics, Seoul, pp. 917–926 (2007) [20] Proença, H., Alexandre, L.A.: A Method for the Identification of Inaccuracies in the Pupil Segmentation. In: First International Conference on Availability, Reliability, and Security, Vienna, pp. 227–230 (2006)

A Novel Contourlet Based Online Fingerprint Identification Omer Saeed, Atif Bin Mansoor, and M Asif Afzal Butt National University of Sciences and Technology, Pakistan [email protected]

Abstract. Biometric based personal identification is regarded as an effective method for automatically recognizing an individuals identity. As a method for preserving the security of sensitive information biometrics has been applied in various fields over last few decades. In our work, we present a novel core based global matching approach for fingerprint matching using the Contourlet Transform. The core and delta points along with the ridge and valley orientations have strong directionality or directional information. This directionality has been exploited as the features and considered for matching. The obtained ROI is analyzed for its textures using Contourlet transform which divides the 2-D spectrum into fine slices by employing Directional Filter Banks (DFBs). Distinct features are then extracted at diﬀerent resolutions by calculating directional energies for each sub-block from the decomposed subband outputs, and given to a Euclidian distance classifier. Finally adaptive majority vote algorithm is employed in order to further narrow down the matching criterion. The algorithm has been tested on a developed database of 126 individuals, enrolled with 8 templates each.

1

Introduction

Fingerprint identification is an important biometric technique for personal recognition. The fingerprints are graphical flow-like ridges and valleys on human fingers. Due to their uniqueness, fingerprints are the most widely used biometric modality today. Fingerprints vary in quality depending upon scanning conditions, sensor quality, surface of the sensor etc. A pre-processing stage is necessary in order to culminate these eﬀects. It may consist of employing image enhancement techniques like histogram equalization, adaptive histogram, deblurring, Fourier transform and 2D adaptive noise removal filtering etc. In local feature extraction based approaches, minutia such as ridge ending and bifurcation are used to represent a fingerprint and matching scheme is adopted. These approaches require very accurate minutia extraction, additionally they contain limited information compared to complete fingerprint. This paper investigates a holistic personal identification approach using the Contourlet J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 308–317, 2009. c Springer-Verlag Berlin Heidelberg 2009

A Novel Contourlet Based Online Fingerprint Identification

309

Transform. It can further be combined with other biometric modalities so as to form a highly accurate and reliable biometric based personal identification system. 1.1

Algorithm Development

The approach followed for the development of fingerprint based identification system is shown in Figure 1.

Fig. 1. Algorithm Development

2

Image Acquisition

For our work, we have employed Digital Persona’s “U are U 4000-B” to capture fingerprint scans. It is a miniature USB interfaced fingerprint reader which is an optical scanner and gives on-line scans. It has a resolution of 512 dip (dots per inch). Scans capture area of 14.6m m × 18.1mm the image output is a 8bit grayscale image. The image Acquisition software is developed using JAVA platform. The captured image is then displayed on the monitor. Acquired image contains unwanted reflections of the scanner at its base. These unnecessary rows, shown in Figure 2(a), are excluded as shown in Figure 2(b), by rejecting the bits at the base of the image. The image is then saved bit by bit in windows bitmap (BMP) format. The acquired image has dimensions of 512 × 460pixels. While giving scans the users were guided to place their fingers on the scanner such that the vertical axis of the finger is aligned with the vertical axis of the scanner. Provisions have been made in the saving module to visually inspect the fingerprint scan prior to saving it for enrollment and generating templates. If the image quality is not satisfactory the scan can be discarded. A new scan is taken subsequently and saved.

310

O. Saeed, A. Bin Mansoor, and M.A.A. Butt

(a)

(b)

Fig. 2. (a) Unwanted reflections at the image’s base (b) Reflections removed

2.1

Pre-processing

Before an image can be further processed for reference point localization a preprocessing stage is required. As the fingerprint images acquired from fingerprint reader are not assured with perfect quality, pre-processing of an image aids in its enhancement, though there is some loss of image integrity but usually this trade-oﬀ is made. Fingerprint Image enhancement is to make the image clearer for easy further operations. In our work we have applied pre-processing by histogram equalization, adaptive thresholding, Fourier transform and adaptive binarization. 2.2

Histogram Equalization

Histogram equalization is to expand the pixel value distribution of an image so as to increase the perceptional information [1]. The original histogram of a fingerprint image has the bimodal type. The histogram after the histogram equalization occupies all the range from 0 to 255 and the visualization eﬀect is enhanced. 2.3

Adaptive Thresholding

It is observed that the acquired image is illuminated unevenly. For a factor like uneven illumination, histogram cannot be easily segmented [2]. In order to counter this problem we divided the complete image into sub-images and then for each of this sub-image, used a separate threshold in order to segment each image. The threshold was selected based on the average intensity values of the respective sub-images. 2.4

Enhancement Using Fourier Transform

We divided the image into small processing blocks (32 by 32 pixels) and performed the Fourier transform. The image segments are now transformed into the Fourier Frequency Domain. For each segment, in order to enhance a specific

A Novel Contourlet Based Online Fingerprint Identification

(a)

311

(b)

Fig. 3. Enhanced Image (a) Acquired Image (b) Image after Fourier Enhancement

block by its dominant frequencies (Frequency with greater number of components for a given threshold), we convolve the FFT of each segment with the respective dominant frequency of that block. Taking the inverse Fourier transform gives the enhanced image. In the resultant enhanced image the falsely broken ridges are connected while some of the spurious connections are disconnected. Figure 3(a) depicts the acquired fingerprint and 3(b) illustrates the enhanced image after taking the Fourier transform of the respective blocks. 2.5

Adaptive Image Binarization

We performed Fingerprint Image Binarization in order to transform the 8-bit grayscale image to a 1-bit image with value zero for ridges and value one for the valleys. After the operation, ridges in the fingerprint are highlighted in black color while the valleys are white. We have employed local adaptive binarization method to achieve binarization of image. The procedure is same as that of the adaptive thresholding for segmentation. But after segmentation if the value of the pixel is larger than the mean intensity value of block a value of one is assigned to it otherwise it is zero.

3

Reference Point Localization

In order to extract a region of interest there has to be a reference point or a characteristic feature which can be located every time a fingerprint image is processed by the algorithm. This is called the reference point localization or singularity detection. In our work as global feature extraction technique is adopted, core point is the reference point. Core point is the point on the inner most ridge that has the maximum curvature. 3.1

Ridge Flow Estimation

Finally the enhanced image obtained is now segmented into non overlapping bocks of size 16 × 16 pixels. In each of the segmented blocks value of gradients is calculated first in the x direction as (g x) and then in the y direction as (g y), where

312

O. Saeed, A. Bin Mansoor, and M.A.A. Butt

g x are values for cosine and g x are values for sine [3]. Least square approximation value for each of the block is then calculated. An average value for the gradients is calculated and a threshold is set. Once the ridge orientation estimation for each of the block is complete the insignificant ridges are rejected. The point for which the value of this sum is minimum is the point on the inner most ridge, designated as the core point.

4

Region of Interest (ROI) Extraction

Once the core point is localized, we now extract the Region of Interest (ROI). The size of the region of interest is of considerable importance, as it is this area from which the features are extracted and considered for subsequent matching. The region around the core point contains greater variations and directional information, so an area of 128 × 128 pixels is cropped around it. There are certain issues pertaining to the ROI extraction. One such problem is that if a core point is located at the extreme margin of the image and cropping a region centering on the core will include region outside the boundary of the image as shown in Figure 4.

Fig. 4. Core point located to the extreme margin of the image

In this research work, when ROI is drawn around the singularity and any portion of the ROI extends outside the image boundary, is shifted inwards to contain the relevant area only. The diﬀerence between the values of the ROI boundary and the image boundary is calculated for this shift. Entire ROI is moved inwards until it is completely included inside the image’s margin. Figure 5 depicts this process of shifting of ROI to include only the appropriate area. The advantage of the scheme is that no external data is added, such as averaging intensity values or filling the ROI. The data used in the ROI is the actual data contained inside the original image. Additionally, instead of decreasing the size of the ROI the same size (128 × 128) has been utilized.

5

Feature Vector Generation

In our work, we have employed the contourlet transform for the feature extraction. 5.1

Contourlet Transform

The contourlet transform is an eﬃcient directional multi resolution image representation, utilizing non-separable filter banks developed in the discrete form;

A Novel Contourlet Based Online Fingerprint Identification

(a)

(b)

313

(c)

Fig. 5. ROI included inside the image boundary (a) ROI outside image margin (b) ROI moved inwards (c) Complete ROI

conceptually the contourlet transform first utilizes a wavelet like transform for edge detection such as the Laplacian pyramid, and then the utilizes a local directional transform for contour segment detection such as the directional filter bank to link point discontinuities into linear structure. Therefore Contourlet have elongated supports at various scales, directions and aspect ratios [4], [5]. Contourlet can eﬃciently handle the intrinsic geometrical structure containing contours. It is proposed by Minh Do and Martin Vetterli [5] and provides sparse representation at both spatial and directional resolutions. The reconstruction is perfect, almost critically sampled with a small redundancy factor of up to 4/3. Contourlet transform uses a structure similar to that of curvelets [6], [7] a double filter bank structure comprising the Laplacian pyramid with a directional filter bank as shown in Figure 6.

Fig. 6. Contourlet Transform Structure

5.2

Feature Vector Generation

The transformed ROI is decomposed into sub bands at four diﬀerent resolution levels. Figure 7, gives the pictorial view of the decomposition at just two levels at diﬀerent directional sub-bands. In actual decomposition, level-1 corresponds to an ROI of 128 × 128 pixels. At further levels, the size of ROI is determined by expression l og2 N i.e. for level 2 it is 64 × 64. Similarly level 3 and 4 sizes are 32 × 32 and 16 × 16 respectively. Now at each resolution level “k” the ROI is decomposed in 2 sub-bands.

314

O. Saeed, A. Bin Mansoor, and M.A.A. Butt

Fig. 7. Decomposition of the transformed ROI

5.3

Selected Features

As fingerprint contain strong directionality, so the related directional energies may be exploited as fingerprint features. The image is decomposed using the DFB (Directional Filter Bank). Five levels decomposition is considered which yields a total of 60 blocks. Let S kθdenotes the subband image at k level and th θ direction. Similarly let σθ block in k denotes the standard deviation of the k the θ direction sub band image, and cθ k (x, y) is the contourlet coeﬃcient value at pixel (x, y) in the subband block Skθ, then the value for directional energy Ekθ, for that sub band block can be calculated by Equations 1 and 2 [8]. Ekθ= nint

255(σθ ) k − σm i n σm − σ ax m i n

(1)

Where σθ k =

1 (ckθ (x, y) − ckθ )2 ( ) n

(2)

x,y∈Skθ

nint(x) is the function that returns the nearest integer value to x; σmax and σmin are the maximum and minimum standard deviation values for a particular subblock. N is the number of pixels in sub band Skθ .ckθ is the mean of contourlet coeﬃcients ckθ (x, y) in the sub band block Skθ . 5.4

Normalized Energies Calculations

The normalized energy for each block is computed from the Equation 3. Here Ekθ represents directional energy of sub-band θ at k level and Ekθ(t) represents total directional energy of all sub-block at k level, while E is the normalized energy. Ekθ E= (3) Ekθ(t)

A Novel Contourlet Based Online Fingerprint Identification

6

315

Fingerprint Matching

Matching is performed by calculating the Normalized Euclidean Distance between the input feature vector and template feature vector. Euclidean distance between two vectors is calculated by squaring the diﬀerence between corresponding elements of the feature vector.

7

Experimental Results

The performance evaluation criterion was adopted from that of the Fingerprint Verification Competition 2002 [9]. The end result of the calculations is a Genuine and an Imposter distribution. These distributions when plotted, gives us the genuine and the imposter Curves. Each point on the genuine and imposter curve corresponds to value for a single matching score. A total of 126 individuals were enrolled. Each individual is enrolled with eight templates in the database constituting 1008 images. The results for the contourlet based matching are given in Figure 8. Equal Error Rate of 1.2% has been achieved.

(a)

(b)

(c) Fig. 8. (a) ROC based analysis (b) Threshold Vs FMR and FNMR (c) Genuine and Imposter Distributions

316

7.1

O. Saeed, A. Bin Mansoor, and M.A.A. Butt

Speed

The registration of an individual takes about 25 seconds whereas the identification takes about 7.8 seconds on a 1 GB RAM 1.6 GHz Intel Core Duo processor with Windows XP operating system. The speed may be increased manifolds by specialized hardware implementation.

8

Adaptive Majority Vote Algorithm

To further narrow our criteria for matching we have implemented Adaptive Majority Vote Algorithm. The values of the normalized Euclidian distances are preserved as a matrix, where number of templates “M” for “N” individual enrolled is a constant. Whose rows are equal to the number of people enrolled “M” with an interval of “N” columns. A classifying threshold is defined. The value for this threshold is kept greater as compared to that of the matching threshold which is attained from the ROCs (Receiver Operating Characteristics). The algorithm searches, each row with an interval of “N” for the values less than that of the classifying threshold. The row with the maximum number of values less than classifying threshold is considered a match. In case there is more than one row, with the same number of values less than that of the classifying threshold a slight decrement is made to the value of this threshold and the algorithm continues search on the mentioned criteria. Subsequently, a single interval is located and the smallest value in the interval is checked. If its value is less than that of the final threshold (Threshold achieved from the ROCs), the template is considered a match. Otherwise it is rejected.If more than one interval has the same number of maximum values less than the classifying threshold and this threshold is decreased; though the decrement made is very small, but still all rows are rejected and finally no value is returned. So, there has to be some plausible solution to this occurrence. Although, the probability of this occurrence is very less but still the issue has been addressed. Once the mentioned situation occurs, the last decrement made to the value of the classifying threshold is again incremented with a factor of one half of the decrement. This continues until a single interval is returned and then the prior mentioned criteria for matching are applied.

9

Conclusion

The paper investigates novel contourlet based fingerprint identification and matching. The energy features extracted exhibit the pattern of ridge and valley flow on various resolutions because of the multi resolution decomposition through Contourlet Transform. The results depict the eﬀectiveness of the proposed scheme by demonstrating the EER of 1.2%. In future, we intend to investigate the combined operation of fingerprint and palm print identification for a multimodal system.

A Novel Contourlet Based Online Fingerprint Identification

317

References 1. Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MATLAB. Prentice-Hall, Inc., Saddle River (2003) 2. Jain, A.K., Maltoni, D.: Handbook of Fingerprint Recognition. Springer, New York (2003) 3. Zhang, Q., Yan, H.: Fingerprint classification based on self organizing maps. Pattern Recoginition (2002) 4. Do, M.N.: Directional multi-resolution image representation. Ph.D. dissertation, Department of Communication Systems Swiss Federal Institute of Technology Lausanne (2001) 5. Do, M.N., Vetterli, M.: Beyond Wavelets, ch. Contourlets. Academic Press, London (2003) 6. Candes, E.J., Donoho, D.L.: Curvelets– a surprisingly eﬀective nonadaptive representation for objects with edges, Stanford Univ. CA Dept. of Statistics, pp. 1–16 (2000), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.2419 7. Candes, E.J., Donoho, D.L.: Curvelets, multiresolution representation, and scaling laws. In: Aldroubi, A., Laine, A.F., Unser, M.A. (eds.) SPIE , vol. 4119(1), pp. 1–12 (2000), http://dx.doi.org/10.1117/12.408568 8. Park, C.-H., Lee, J.-J., Smith, M., il Park, S., Park, K.-H.: Directional filter bankbased fingerprint feature extraction and matching. IEEE Transactions on Circuits and Systems for Video Technology 14(1), 74–85 (2004) 9. Maio, D., Maltoni, D., Cappelli, R., Wayman, J.L., Jain, A.K.: Fvc2002: Second fingerprint verification competition. In: 3rd International Conference on Pattern Recognition, pp. 811–814 (2002)

Fake Finger Detection Using the Fractional Fourier Transform Hyun-suk Lee, Hyun-ju Maeng, and You-suk Bae Dept. of Computer Engineering, Korea Polytechnic University 2121, JeongWang-dong, SiHeung-Si, KyongGi-do, South Korea {leehsuk,hjmaeng,ysbae}@kpu.ac.kr

Abstract. This paper introduces a new method for detecting fake finger using the fractional Fourier transform (FrFT). The advantage of this method is to require one fingerprint image. The fingerprint is a texture with the interleaving of ridge and valley. When the fingerprint is transformed into the spectral domain, we found energy of fingerprint. Generally, the energy of live fingerprints is larger than the energy of fake fingerprints. Therefore, the energy in the spectral image of a fingerprint can be a feature for detecting of fake fingers. We transformed the fingerprint image into the spatial frequency domain using 2D Fast Fourier transform and detected a specific line in the spectrum image. This lineis transformed into the fractional Fourier domain using the fractional Fourier transform. And, the standard deviation of it is used to discriminate between fake and live fingerprints. For experiment, we made fake fingers of silicone, gelatin, paper and film. And, the fake finger database is created, by which the performance of a fingerprint verification system can be evaluated with higher precision. The experimental results demonstrate that the proposed method can detect fake fingers. Keywords: fake finger detection, fractional fourier transform, fingerprint recognition system.

1 Introduction Fingerprint recognition systems have become popular these days and have been adopted in a wide range of applications because the systems’ performances are better than other biometric systems and low-cost image acquisition devices have become available [1]. However, the feasibility of fake fingerprints’ attacks has been reported by some researchers [2]. Some previous work showed that fake fingerprints can spoof some fingerprint recognition systems actually. Since fake fingerprints can cause very serious security troubles and even crimes, a reliable method to detect fake fingerprints’ attack is strongly required. Practically, there are two ways to detect fake fingerprints in fingerprints recognition systems [3]. One is the hardware-based way, including odor detection [4], blood pressure detection [5], body temperature detection [6], pulse detection [7], and human skin analysis [8]. These methods need additional hardwares [3], so that their implementations are expensive and bulky. The other is software-based way, including analysis on perspiration and shade changes between ridges and valleys, comparison of J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 318–324, 2009. © Springer-Verlag Berlin Heidelberg 2009

Fake Finger Detection Using the Fractional Fourier Transform

319

fingerprint image sequence [9], and observation of sweat pores [10]. These methods need complicated algorithms to analyze fingerprints’ images. However, they do not require any extra hardwares or costs, and they can react against fake fingerprints’ attacks much more flexibly. While many physiological approaches to detect fake fingerprints have been proposed, however, in this paper, we propose a novel method based on the fractional Fourier transform (FrFT). This paper is organized as follows. Section 2 describes the fabrication of fake fingerprints. Section 3 provides a brief overview of the fractional Fourier transform. In section 4 and 5, the proposed method is discussed and evaluated with some experimental results. Finally, in section 6, we give short conclusion.

2 Fabrication of Fake Fingers There are two ways when fabricating fake fingerprints. One is produced by cloning with a plastic mold under personal agreement. The other is produced by cloning from a residual fingerprint. Because fabricating fake fingerprint needs appropriate materials and appropriate processing, it is hard to make fake fingerprint. Especially, fabricating fake fingerprint from a latent fingerprint is requiring a professional skill. Material and procedure are two essential factors when fabricating fake fingerprints. The common materials are paper, film, and silicon. Gelatin and synthetic rubber are also used very often for fake fingerprints because they have physical and electrical properties very similar to human skin. Recently, prosthetic fingers, clones of whole fingers, have appeared. Prosthetic fingers are expensive yet, but they are almost same

Fig. 1. The process of fake finger making according to each material

320

H.-s. Lee, H.-j. Maeng, and Y.-s. Bae

with real fingers and can be used semi-permanently. Figure 1 shows the fabrication procedure of fake fingerprints of four materials: silicon, gelatin, film, and paper. In the experiments, we made the four kind of fake fingerprints using paraffin (candle) molds rather than direct printing of fingerprints.

3 Fractional Fourier Transform The fractional Fourier transform is a generalized form of the conventional Fourier transform. By adding a degree of freedom, the order, the fractional Fourier transform allows spectral analysis in a certain space-frequency domain. Equation 1 denotes the basic formula of the fractional Fourier transform fa(u), the a-th order fractional Fourier transform of a certain function f(u). The order, a, which ranges from 0 to 1, determines the space-frequency domain of the transform. When a equals to 0, the transform becomes the original function f(u), the exact space domain, and, when a equals to 1, the transform becomes the conventional Fourier transform of f(u), the exact frequency domain [11].

f a (u ) = ∫ ∞−∞ 1 − i cot α exp iπ (cot αu 2 − 2 csc αuu '+ cot αu '2 )

α = aπ /2

[

]

(1)

The fractional Fourier transform can be implemented very simply in optical systems, and this greatly reduces computing time [12]. In this paper, we apply the transform to fingerprint recognition system and attempt to find the transform’s optimal domain in which fake fingerprints can be identified most clearly.

4 Proposed Method In this paper, we proposed a fake finger detection method using the fractional Fourier transform. A fingerprint is a texture with the interleaving of ridges and valleys [3].

Fig. 2. Diagram of the proposed algorithm

Fake Finger Detection Using the Fractional Fourier Transform

321

Since an original live fingerprint and its fake have different clearness of the ridgevalley texture, we can notice the difference between two fingerprints’ energies in the space-frequency domain. Therefore, the energy difference in the space-frequency domain can be a useful indicator to identify fake fingerprints. Figure 2 diagrams the computing procedure of the proposed method. First, the fingerprint image is transformed into the frequency domain through fast Fourier transform (FFT). To find energy distribution of image of fourier transform from center to edge, horizontal line is extracted. Next, this one dimensional signal is transformed into fractional Fourier transform’s domain, and the standard deviation can be calculated from equation 2. n Std = ∑ ( xi − µ )2 /( n −1) i =1

(2)

Fake fingerprints can be identified according to this standard deviation. To be more specific, if the standard deviation is greater than certain threshold value, and then the input fingerprint is from a live finger, otherwise from a fake finger. Figure 3 graphically summarizes standard deviations of the respective persons and materials.

Fig. 3. Standard Deviation Average for each material (live, silicone, gelatin, paper and film fingerprints)

5 Experiment Results 5.1 Fake Finger Database The used fake finger database in this paper means fake fingerprint images and real fingerprint images for comparing with fake fingerprints.

322

H.-s. Lee, H.-j. Maeng, and Y.-s. Bae

In order to evaluate the performance of the proposed method, a database of fake fingerprints was collected from 15 persons using an optical fingerprint sensor, without activating the sensor fake finger detection function. The database contains total 3,750 fingerprint images of 15 persons: 75 fingers (15 live, 15 silicon, 15 gelatin, 15 paper, and 15 film) x 50 impressions. Sample images of the fake fingerprint database are shown in figure 4.

Fig. 4. Sample images of the fake finger database

5.2 Results In this paper, we evaluated the proposed method using 750 live and 3,000 fake fingerprints. Experimental results of the proposed method are shown in figure 5 and 6. Figure 5 shows that the error rates of overall, live and fake fingerprints according to the threshold value. Figure 6 reports the error rates of overall fingerprints according to the order (‘a’). In this result, the best error rate of about 11.4% is obtained when the order is 0.7. Table 1 compares the performance of proposed method with other methods. The error rates (best error rates) of the proposed method are similar to the lowest error rates of method. While perspiration-based method, method of the lowest error rate, needs to capture image pairs and the performance of this method is susceptible to various factors, the proposed method only needs one image to find fractional fourier energy.

Fig. 5. Error rate for each threshold value (overall, live and fake fingers)

Fake Finger Detection Using the Fractional Fourier Transform

323

Fig. 6. Error rate for each order ‘a’ (overall) Table 1. Performance compares with other methods

Method Perspiration-based methods[2,5] Skin deformation-based method[9]

Band-selective fourier spectrum based method[3] Proposed method

Error rate (%) Approximately 10

Approximately 16

Precondition Need to capture image pairs Hardness hypothesis. Need a fingerprint scanner capable of capturing and delivering frames at proper rate Only need one image

Approximately 11

Only need one image

Approximately 16

6 Conclusions This paper proposed a fake fingerprint detection method based on the fractional Fourier transform. This method can identify real fingerprint and fake fingerprint using a fingerprint image. The result of experiment using fake fingerprint database shows error rate about 11.4% by using certain area (or region) after the fractional Fourier transform. In conclusion of this paper, we could verify there is a possibility to detect fake fingerprint by using the fractional Fourier transform. Although it doesn't applied in this paper, we expect better results if it includes additional pre-processing. In order to reliable detecting fake fingerprints, it has to do more research process to reduce error rate and define exactly each step of algorithm Also we are planning additional testplan by using various sensors with database which is including more fake fingerprint images to take reliable results.

324

H.-s. Lee, H.-j. Maeng, and Y.-s. Bae

This research proposes a method based on the software for detecting fake fingers. This method can detect fake fingerprints by one fingerprint image. More research is needed to reduce processing-time by pointing out only requested parts. More tests will be processed by making data bases that includes more fake fingerprints images.

References [1] Maltoni, D., Maio, D., Jain, A.K., Prabhakar, S.: Handbook of fingerprint Recognition. Springer, New York (2003) [2] Matsumoto, T., Matsumoto, H., Yamada, K., Hoshino, S.: Impact of artificial “Gummy” fingers on fingerprint systems. In: Proc. SPIE, vol. 4677 (2002) [3] Jin, C., Kim, H., Elliott, S.: Liveness Detection of Fingerprint Based on Band-Selective Fourier Spectrum. In: Nam, K.-H., Rhee, G. (eds.) ICISC 2007. LNCS, vol. 4817, pp. 168–179. Springer, Heidelberg (2007) [4] Baldisserra, D., Franco, A., Maio, D., Maltoni, D.: Fake Fingerprint Detection by Odor Analysis. In: ICBA 2006. Proceedings International Conference on Biometric Authentication, Hong Kong (2006) [5] Drahansky, M., Notzel, R., Funk, W.: Liveness Detection based on Fine Movements of the Fingertip Surface. In: 2006 IEEE Information Assurance Workshop, June 21-23, 2006, pp. 42–47 (2006) [6] van der Putte, T., Keuning, J.: Biometrical fingerprint recognition: don’t get your fingers burned. In: Proceedings of IFIP TC8/WG8.8 Fourth Working Conference on Smart Card Research and Advanced Applications, pp. 289–303. Kluwer Academic Publishers, Dordrecht (2000) [7] Reddy, P.V., Kumar, A., Rahman, S.M.K., Mundra, T.S.: A New Method for Fingerprint Antispoofing using Pulse Oxiometry. In: IEEE on Biometrics: Theory, Applications, and Systems, Washington DC (2007) [8] Jia, J., Cai, L., Zhang, K., Chen, D.: A New Approach to Fake Finger Detection Based on Skin Elasticity Analysis. In: Lee, S.-W., Li, S.Z. (eds.) ICB 2007. LNCS, vol. 4642, pp. 309–318. Springer, Heidelberg (2007) [9] Antonelli, A., Cappelli, R., Maio, D., Maltoni, D.: Fake Finger Detection by Skin Distortion Analysis. IEEE Transactions on Information Forensics and Security 1(3), 360– 373 (2006) [10] Parthasaradhi, S.T.V., Derakhshani, R., Hornak, L.A., Schuckers, S.A.C.: Time-Series Detection of Perspiration as a Liveness Test in fingerprint Devices. IEEE Trans. on Systems, Man, and Cybernetics - Part C 35(3) (2005) [11] Ozaktas, H.M., Zalevsky, Z., Alper Kutay, M.: The fractional Fourier Transform: With Applications in Optics and Signal Processing. Wiley, New York (2001) [12] Wilson, C.L., Watson, C.I., Paek, E.G.: Effect of resolution and image quality on combined optical and neural network fingerprint matching. In: PR, vol. 33(2) (2000)

Comparison of Distance-Based Features for Hand Geometry Authentication Javier Burgues, Julian Fierrez, Daniel Ramos, and Javier Ortega-Garcia ATVS - Biometric Recognition Group, EPS, Universidad Autonoma de Madrid, Campus de Cantoblanco, C/ Francisco Tomas y Valiente 11, 28049 Madrid, Spain {javier.burgues,julian.fierrez,daniel.ramos, javier.ortega}@ uam.es

Abstract. A hand-geometry recognition system is presented. The development and evaluation of the system includes feature selection experiments using an existing publicly available hand database (50 users, 500 right hand images). The obtained results show that using a very small feature vector high recognition rates can be achieved. Additionally, various experimental findings related to feature selection are obtained. For example, we show that the least discriminative features are related to the palm geometry and thumb shape. A comparison between the proposed system and a reference one is finally given, showing the remarkable performance obtained in the present development when considering the best feature combination. Keywords: Hand geometry, biometrics, feature selection.

1 Introduction Nowadays, people identification to control access to certain services or facilities is a very important task. The traditional method to assert that a person is authorized to perform an action (e.g. using a credit card) was the use of a password. This kind of identification methods has the problem of usually requiring long and complicated passwords to augment the security level, at the cost of user inconvenience. People identification through biometric traits is a possible solution to enable secure identification in a user convenient way [1]. In biometric systems, users are automatically recognized by their physiological or behavioral characteristics (e.g. fingerprint, iris, face, hand, signature, etc.) In the present work, we focus on hand biometrics. Traditional hand recognition systems can be split in three modalities: geometry, texture and hybrid. We concentrate our efforts in the first one due to its simplicity. In the literature, several hand geometry recognition systems have been developed [2-4]. For example, in [2] a hand recognition system is presented based on various fingers widths, heights, deviations and angles. The work described in [3] treats the J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 325–332, 2009. © Springer-Verlag Berlin Heidelberg 2009

326

J. Burgues et al.

fingers individually by rotating and separating them from the hand. Oden et al. [4] used the finger shapes represented with fourth degree implicit polynomials. On the other hand, in [5] only palm texture information of the hand is used to identify a user. Finally, a third kind of hand recognition methods employ fusion of hand geometry and texture, as for example [6]. As mentioned before, the present work is focused on hand geometry. In particular, we implement and study a distance-based hand verification system based on hand geometry features inspired by previous works [2,8]. These features are compared in order to find new insights into their discriminative capabilities. As a result, we obtain a series of experimental findings such as the instability of features related to the thumb shape and location. A comparison between the proposed system and a reference one is finally given, showing the remarkable performance obtained in the present development when considering the best feature combination. The rest of the paper is structured as follows. In section 2 we describe the processing blocks of our authentication system based on hand geometry. Section 3 describes the experimental results and observations obtained related to feature selection. Finally, conclusions are drawn in section 5, together with the future work.

2 Distance-Based Hand Geometry Authentication The global architecture of our system is shown in Fig. 1. The first step is a hand boundary extraction module, from which the hand silhouette is obtained. The radial distance from a fixed reference point is then computed for the silhouette to find, for all fingers, their valleys and tips coordinates. Then, some distance-based measures considering these reference points are calculated to conform the feature vector representation of the hands. Given test and enrolled hands, the matching is based on a distance measure between their feature vectors. 2.1 Boundary Extraction Input images are first converted to a gray scale and then binarized using Otsu’s method. A morphological closing with a small circle used as structuring element removes spurious irregularities. After that, we search for the connected components present in the image assuming that the largest component is the hand and the others (if any) are potentially disconnected fingers or noise. Various shape measures are computed for the disconnected components found in order to detect disconnected fingers (e.g. due to rings), case in which we reconnect the finger to the hand using morphological operations. Once the hand boundary is extracted, we detect the wrist region. To do so, we search for the segment perpendicular to the major axis of the hand, closest to the center of the palm with a length equal or less than half of the maximum palm width (see Fig. 1 for example images).

Comparison of Distance-Based Features for Hand Geometry Authentication

327

Fig. 1. Block diagram of the main steps followed in our system to extract features and matching two hands. Original image (a) is first binarized (b). The boundary is then calculated and the plot (c) of the radial distance from a reference point lets us estimate the coordinates of tips and valleys (d). After that, feature extraction is done by measuring some finger lengths and widths (e). Last, given two hands, their matching is based on a distance between their feature sets.

2.2 Tips and Valleys Detection Once the boundary of the hand is available we fix a reference point in the wrist, from which the boundary is clockwise scanned calculating the Euclidean distance to the reference point. The resulting one-dimensional function is examined to find local maxima and minima. Maxima of the curve correspond to finger tips and minima are associated to finger valleys. Depending on the hand acquisition, first maxima will correspond to the thumb or to the little finger. This process is depicted in Fig. 1c. Before feature extraction, we compute a valley point for every finger at each side of its base (left and right). The only two fingers for which a simple analysis of the previous minima results in these valley points are the middle and ring fingers. For the other fingers we take as reference point the only available valley associated to the finger, and then we compute the Euclidean distance between this point and the boundary points at the other side of the finger. The point that yields the minimum distance is selected as the remaining valley point for that finger.

328

J. Burgues et al.

Fig. 2. Set of features studied in the proposed hand geometry authentication system

2.3 Feature Extraction We define the reference point of a finger as the middle point between the two finger valleys. The length of the finger is calculated as the Euclidean distance from the tip to the finger reference point. Fig. 2 shows the notation used to name the hand features we propose. For each finger, its length is denoted with letter ‘L’ and a number that identifies the finger (1 for index, 2 for middle, 3 for ring, 4 for little and 5 for thumb). Finger widths (‘W’) keep the same numbering with an additional character indicating if it is the upper (‘u’) or the lower width (‘d’). See Fig. 3. The thumb only contains one width measure, at the middle of the finger, denoted as W5. There are also some palm distance features named as P1, P2 and P3 (see Fig. 2). In the experimental section we will study various combinations of these features. 2.4 Similarity Computation Once the feature vector has been generated, the next step is to compute the matching score between two hands. In our system, based in a distance measure, lower values of the matching score represent hands with higher similarity, therefore the matching score represents dissimilarity. If we denote the feature vector of one hand as m1[i], i = 1,…,N, and the feature vector of another hand as m2[i], i = 1,…,N, then their dissimilarity is computed as:

d ( m1 , m2 ) = ∑ m1[i ]−m2 [i ] N

i =1

with N being the length of the feature vectors.

(1)

Comparison of Distance-Based Features for Hand Geometry Authentication

329

3 Experiments In the first section, the database used in this work is detailed and the protocol used to generate genuine and impostor scores is explained. The reference system is summarized in section 3.2. Finally, the results obtained in the feature selection experiments are shown. The best combination achieved will be included in the final system to evaluate its performance in comparison to the reference system. 3.1 Database and Experimental Protocol The experiments have been carried out using a publicly available database, captured by the GPDS group of the Univ. de Las Palmas de Gran Canaria in Spain [8]. This database contains 50 users with 10 right hand samples per user. The image acquisition was supervised: users cannot place the hand in the scanner in any position, scanner surface was clean, illumination was non variable, etc. Hence, high quality images were obtained. To fairly compare the performance of our system with the reference one, both systems were tested over the same database using the same protocol. Impostor scores are obtained by comparing the user model to one hand sample (the sixth one) of all the remaining users. Genuine scores are computed by comparing the last 5 available samples per user with its own model (which is constructed with the first hand sample). This protocol uses one sample per user for enrollment and five samples per user for test. Overall system performances are reported by means of DET plots [10]. 3.2 Reference System Fig. 3 shows the processing steps of the recognition system used as reference for comparison with our development. This reference system is fully described and available through [9]. In the reference system the image is first preprocessed and then,

Fig. 3. Processing steps and feature extraction for the reference system (extracted from [7])

330

J. Burgues et al.

for each finger, the histogram of the Euclidean distances of boundary points to the major axis of the finger is computed. The features of the hand boundary are the five normalized histograms. Then, given two hands, the symmetric Kullback-Leibler distance between finger probability densities is calculated in order to measure the grade of similarity. 3.3 Experiments The set of features presented in Sect. 2.3 consists of 17 measures from different zones of the hand. Specifically, there are five finger lengths, nine finger widths and three palm widths. This set of features is based on a selection from the best features proposed in [8] and some features studied in [2]. In our first experiment, some subsets of features were manually chosen and then tested to check their performance. Table 1 shows the results. We observe that not considering the information of the thumb in the feature set (feature subset 2 vs. feature subset 1) provides a significant performance improvement (from more than 9.6% to less than 1.7% EER). This is in accordance with the results presented in [4], and may be due to the freedom of movement of this finger, which makes hard to estimate correctly its valley points. Because of this, for the rest of experiments we discard the features related to the thumb (i.e., L5 and W5). Also interesting, the lengths of the four remaining fingers are useful because removing any of them deteriorates the system performance (subsets 3, 4 and 5 vs. subset 2). On the other hand, the palm lengths considered (P1 to P3) do not provide any benefit (subset 6 vs. subset 2). Maybe, this is due to the fact that these features related to the palm use the three exterior valley points which are most difficult to be precisely estimated. Finally, in Table 1, we can see that the basic information provided by the finger lengths (subset 2) benefits from the incorporation of the finger widths (subset 7). The system performance for the feature sets present in Table 1 is analyzed for all the verification threshold operating points by means of DET plots in Fig. 4. Fig. 4a shows the DET plot of: (i) the five finger lengths (feature subset 1), (ii) four finger lengths, excluding the thumb (feature subset 2) and (iii) the reference system. Table 1. EER for different subsets of features. Feature nomenclature is the same as the one used in Fig. 2. Feature Features subset ID

Equal Error Rate (%)

1 2 3 4 5 6 7 8

9.66 1.68 5.70 4.83 3.06 5.54 1.24 5.09 2.97

L1, L2, L3, L4, L5 L1, L2, L3, L4 L1, L4 L2, L3 L2, L3, L4 L1, L2, L3, L4, P1, P2, P3 L1, L2, L3, L4, W1u, W1d, W2u, W2d, W3u, W3d, W4u, W4d L1, L2, L3, L4, W1u, W1d, W2u, W 2d, W3u, W3d, W4u, W4d, P1, P2, P3

Reference system

Comparison of Distance-Based Features for Hand Geometry Authentication

(a)

331

(b)

Fig. 4. (a) Performance obtained using three different feature sets. This experiment reports results about which fingers must be included in the feature set. (b) DET comparative between four proposed feature sets. In this picture, the influence of palm and finger widths is examined. Fig. 4b shows the results of the system evaluation with: (i) four finger lengths, excluding the thumb (feature subset 2), (ii) the set used in (i) plus palm widths (P1 to P3) (feature subset 6), (iii) four finger lengths and their associated widths (feature subset 7) and (iv) the reference system. Also interesting, the best Equal Error Rate achieved in the proposed system (1.24%) is lower than the reference system (2.97%).

4 Conclusions and Future Work A new recognition system based on hand geometry has been proposed. In this work, different sets of features have been evaluated and some experimental findings have been obtained. We have observed that the features based on the thumb are the least discriminative. This may be due to its freedom of movement, which makes hard to estimate correctly the valley points that define this finger. For the four remaining fingers, we have concluded that their lengths and widths are the most discriminative features. Also interesting, the palm widths report bad results, perhaps due to their relation with the thumb valley points. Finally, the results obtained for the best feature combination (1.24% EER) improve the reference system performance (2.59% EER) over the same database and experimental protocol with a relative improvement of more than 50% in the EER. Future work includes applying feature subset selection methods to the proposed set of features and the development of quality detection algorithms to automatically discard low quality images which worsen the system performance.

332

J. Burgues et al.

Acknowledgements. J. F. is supported by a Marie Curie Fellowship from the European Commission. This work was supported by Spanish MEC under project TEC2006-13141-C03-03.

References 1. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. on Circuits and Systems for Video Technology. 14, 4–20 (2004) 2. Sanchez-Reillo, R., Sanchez-Avila, C., Gonzalez-Marcos, A.: Biometric identification through hand geometry measurements. IEEE Trans. on Pattern Analysis and Machine Intelligence. 22, 1168–1171 (2000) 3. Yörük, E., Konukoglu, E., Sankur, B.: Shape-Based Hand Recognition. IEEE Trans. on Image Processing. 15, 1803–1815 (2006) 4. Oden, C., Ercil, A., Buke, B.: Combining implicit polynomials and geometric features for hand recognition. Patter Recognition Letter 24, 2145–2152 (2003) 5. Zhang, D., Kong, W.K., You, J., Wong, M.: Online Palmprint Identification. IEEE Trans. on Pattern Analysis and Machine Intelligence. 25, 1041–1050 (2003) 6. Kumar, A., Wong, D.C.M., Shen, H.C., Jain, A.K.: Personal authentication using hand images. Pattern Recognition Letters 27, 1478–1486 (2006) 7. Geoffroy, F., Likforman, L., Darbon, J., Sankur, B.: The Biosecure geometry-based system for hand modality. In: ICASSP, vol. 147, pp. 195–197 (2007) 8. González, S., Travieso, C.M., Alonso, J.B., Ferrer, M.A.: Automatic biometric identification system by hand geometry. In: Proceedings. IEEE 37th Annual 2003 International Carnahan Conference Security Technology, 2003, pp. 281–284 (2003) 9. Dutagaci, H., Fouquier, G., Yoruk, E., Sankur, B., Likforman-Sulem, L., Darbon, J.: Hand Recognition. In: Petrovska-Delacretaz, D., Chollet, G., Dorizzi, B. (eds.) Guide to Biometric Reference Systems and Performance Evaluation. Springer, London (2008) 10. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: EUROSPEECH 1997, pp. 1895–1898 (1997)

A Comparison of Three Kinds of DP Matching Schemes in Verifying Segmental Signatures Seiichiro Hangai, Tomoaki Sano, and Takahiro Yoshida Dept. of Electrical Engineering, Tokyo University of Science, 1-14-6 Kudankita, Chiyoda-ku, Tokyo, 102-0073, Japan {hangai,yoshida}@ee.kagu.tus.ac.jp, [email protected]

Abstract. In on-line signature verification, DP (Dynamic Programming) matching is an essential technique in making a reference signature and/or calculating the distance between two signatures. For continuous signatures such as western, it is reasonable that the matching is tried over whole signature, i.e., from the start of signature to the end of signature. However, for segmental signatures such as eastern (e.g., Japanese or Chinese), the DP matching generates unrecoverable error caused by wrong correspondences of segments, and leads to worse EER (Equal Error Rate). In this study, we compare three kinds of DP matching scheme; (1)Based on First name part and Second name part, (2)Based on characters, and (3)Based on strokes. From experimental results, 1.8% average EER in whole DP matching decreases into 0.5% in scheme (1), 0.105% in scheme (2) and 0.00244% in scheme (3) for 14 writers’ signatures with intersession variability over one month. In this paper, problems in DP matching and results of signature verification with XY position, pressure, and inclination are given under proposed matching schemes. Keywords: signature verification, DP matching, segmental signature, EER.

1 Introduction Recently, authentication systems using hand written signature have been realized and are starting to be used[1,2]. In order to improve the security level, there are many schemes, some of which use selected features based on G.A.(Genetic Algorithms) [3] and features of global and local[4]. We have proposed an on-line signature verification system and shown that pen inclination (azimuth and altitude) data improve the security level[5]. For instance, by combining the pen inclination data with pen position data and pen pressure data, we can realize 100% verification rate for 24 persons after 2 weeks passed. However, the intersession variability of signature decreases the verification rate as days go by. In general, lowering the threshold can rescue the decrease at the sacrifice of the robustness to forgery. A main reason of this decrease is that the reference pattern cannot adapt to the change of signatures after a long term. To reduce this degradation, we proposed a reference renewal method and J. Fierrez et al. (Eds.): BioID_MultiComm2009, LNCS 5707, pp. 333–339, 2009. © Springer-Verlag Berlin Heidelberg 2009

334

S. Hangai, T. Sano, and T. Yoshida

obtained the verification rate of 98.5% after 9 weeks passed[6,7]. In these methods, we have applied DP matching to make a reference signature from multiple samples and/or calculating the distance between two signatures in verification process. For continuous stroked signatures by western people, it is reasonable to match signatures over whole stroke, i.e., from the start of signature to the end of signature, by the normal DP matching with narrow matching window. However, for segmental signatures by eastern (e.g., Japanese or Chinese) people, the DP matching over all segments generates unrecoverable error caused by wrong correspondences between segments in different samples. This unconcern with segmental structure leads the EER to pushing down. In this study, we compare three kinds of DP matching scheme; (1)Matching after dividing into First name part and Second name part, (2)Matching after dividing into characters, and (3)Matching after dividing into strokes in verifying Japanese signatures. In this paper, we discuss the problems in applying DP matching to segmental signatures. Experimental results and a comparison with three kinds of matching for 14 writers’ signatures over one month are also given.

2 Writing Information Acquired by Tablet and Verification Process Fig. 1 shows acquired five writing data, i.e., x-y position x(t) and y(t), pen pressure p(t), pen azimuth θ(t), and pen altitude φ(t). Fig. 2 shows an example of acquired signature and time series data. Errors in matching the i-th test signature Si to the reference signature Sr are calculated by the following equations,

L(t ) =

{xr (t ) − xi (t )}2 + {yr (t ) − yi (t )}2

P(t ) = p r (t ) − pi (t )

(2)

⎛ Vr (t ) ⋅ Vr (t ) ⎞ A(t ) = cos ⎜ rr ri ⎟ ⎜ V (t ) V (t ) ⎟ i ⎝ r ⎠ −1

x y Y-axis

(1)

(3)

X-axis

azimuth:θ Altitude:φ Pressure:p

Fig. 1. Acquired five writing data

A Comparison of Three Kinds of DP Matching Schemes

335

where, L(t), P(t), and A(t) are errors of xy position, pen pressure, and pen inclination respectively. V (t ) is a 3D vector, which gives the feature of pen inclination as follows,

⎡ sin θ (t )cosϕ (t ) ⎤ r V (t ) = ⎢⎢− cosθ (t )cosϕ (t )⎥⎥ ⎢⎣ ⎥⎦ sin ϕ (t )

(4)

In making reference and/or verifying, we use accumulated errors by the following equations,

∫ L(t )dt

1 L= ΔT

T + ΔT

1 ΔT

T + ΔT

1 ΔT

T + ΔT

P= A=

∫ P(t )dt

(5)

t =T

∫ A(t)dt

(6)

t =T

(7)

t =T

where T is the start time of accumulation, and ∆T is the accumulation time which depends on matching scheme described later.

Fig. 2. An example of acquired signature and time series data

336

S. Hangai, T. Sano, and T. Yoshida

For the purpose of comparing the EER with respect to each feature, we use three kinds of data sets (XY position data set, pen pressure data set, and pen inclination data set) for three kinds of DP matching schemes. The reference are made as follows, 1) Inspect matching part in signatures for registration, and define the data that has the longest writing time as 'parent' data. Others are defined as 'child' data. 2) Compand time axis of child data to match time axis of parent data by DP matching based on (1) First name part and Second name part, (2) Characters, and (3) Strokes.. 3) After companding, make reference data set by averaging in each time slot. Verification is done by comparing the error between reference data and acquired data with changing the threshold.

3 Problem in Segmental Signature Verification and Proposal of Three Kinds of DP Matching Schemes In making reference using segmental signature, we often faced to the problem caused by corresponding illegal points. Fig.3 shows the accumulated XY position error given by Eq.(5), XY position error, and pen pressure along signature time. It is found that the error increases rapidly in the time when the pen pressure becomes zero marked by asterisk. In that time, the accumulated error is on the upward trend. This means that the correct matching becomes difficult once the matching failed. In order to solve the problem, we propose three kinds of matching schemes.

Fig. 3. Accumulated L(t), L(t) and P(t) along time

A Comparison of Three Kinds of DP Matching Schemes

337

3.1 DP Matching After Dividing Signature into First Name Part and Second Name Part By using frequency of x coordinates, we divide a signature into characters as shown in Fig. 4. After dividing, we make First name part and Second name part manually. DP matching is done by each part by minimizing accumulated error given in Eq.(5), or (6), or (7). First name part

Second name part

Fig. 4. Dividing signature using frequency of X coordinates 3.2 DP Matching After Dividing Signature into Characters After dividing characters as shown in Fig.4, DP matching is done for the following characters separately. The accumulated error is reset at the start of each character. divided characters

3.3 DP Matching After Dividing Signature into Strokes By using pen pressure data, we divide the signature into the following multiple strokes. No turning point [8,9] is used in segmentation. DP matching is done for each stroke after the accumulated error is reset at the start of each stroke.

338

S. Hangai, T. Sano, and T. Yoshida

4 Performance of Proposed DP Matching Schemes Three kinds of DP matching schemes are compared using Japanese signature database with 14 writers, 4times/day, and 30 days. The number of signature is 1680. Intersession variability of the EER along days is used as a measure as shown in Fig. 5, Fig. 6 and Fig.7. In Fig.5, the EERs of three kinds of scheme using XY position are compared with the EER using whole DP matching scheme. From the figure, the effect of the division is clear. In case of stroke division, there is no EER between sessions

Fig. 5. Comparison of Intersession variability along days using XY position

Fig. 6. Comparison of Intersession variability along days using pressure

A Comparison of Three Kinds of DP Matching Schemes

339

Fig.6 shows the EERs using pressure. The EERs are worse than that of XY position. However, we can find the effect of division for signature. As same as pen position, stroke division shows the best performance. Fig.7 shows the EERs using inclination. The EERs are not so improved even by the stroke division.

Fig. 7. Comparison of Intersession variability along days using inclination

5 Conclusion In this paper, we divide a signature into characters and strokes, and improve the performance of DP matching without increasing EER. By using Japanese signature dataset, XY position after stroke division can achieve the EER of 0.00244% for 14 writers’ signatures with intersession variability over one month.

References [1] Cyber-Sign, http://www.cybersign.com/ [2] SOFTPRO, http://www.signplus.com/en/ [3] Galbally, J., Fierrez, J., Freire, M.R., Ortega-Garcia, J.: Feature Selection Based on Genetic Algorithms for On-Line Signature Verification. In: IEEE workshop on AIAT, pp. 198–203 (2007) [4] Tanaka, M., Bargiela, A.: Authentication Model of Dynamic Signatures using Global and Local Features. In: IEEE 7th workshop on Multimedia Signal Processing, pp. 1–4 (2005) [5] Hangai, S., Yamanaka, S., Hamamoto, T.: On-Line Signature Verification Based On Altitude and Direction of Pen Movement. In: IEEE ICME 2000 (August 2000) [6] Yamanaka, S., Kawamoto, M., Hamamoto, T., Hangai, S.: Signature Verification Adapting to Intersession Variability. In: IEEE ICME 2001 (August 2001) [7] Kawamoto, M., et al.: Improvement of On-line Signature Verification System Robust to Intersession Variability. In: Tistarelli, M., Bigun, J., Jain, A.K. (eds.) ECCV 2002. LNCS, vol. 2359, pp. 168–175. Springer, Heidelberg (2002) [8] Bierhals, N., Hangai, S., Scharfenberg, G., Kempf, J., Hook, C.: Extraction of Target Areas based on Turning Points for Signature Verification – A Study on Signature/Sign Processed Dynamic Data Format SC37 N2442, Tech. Rep. of Biometrics Security Group (June 2008) [9] ISO/IEC JTC1 /SC37 N 2442 (2008)

Ergodic HMM-UBM System for On-Line Signature Verification⋆ Enrique Argones R´ ua, David P´erez-Pi˜ nar L´ opez, and Jos´e Luis Alba Castro Signal Theory Group, Signal Theory and Communications Department, University of Vigo {eargones,dperez,jalba}@gts.tsc.uvigo.es

Abstract. We propose a novel approach for on-line signature verification based on building HMM user models by adapting an ergodic Universal Background Model (UBM). State initialization of this UBM is driven by a dynamic signature feature. This approach inherits the properties of the GMM-UBM mechanism, such as minimizing overfitting due to scarcity of user training data and allowing a world-model type of likelihood normalization. This system is experimentally compared to a baseline state-of-the-art HMM-based online signature verification system using two diﬀerent databases: the well known MCYT-100 corpus and a subset of the signature part of the BIOSECURE-DS2 corpus. The HMM-UBM approach obtains promising results, outperforming the baseline HMM-based system on all the experiments.

1

Introduction

Behavioral biometrics are based on measurements extracted from an activity performed by the user, in a conscious or unconscious way, and they are inherent to his/her own personality or learned behavior. In this sense, behavioral biometrics have some interesting pros, like user acceptance and cancelability, but they still lack of the same level of uniqueness as physiological biometrics. Among all the biometric traits that can be categorized as pure behavioral, the signature, and the way we sign, is the one that has the widest social acceptance for identity authentication. Furthermore, learning the dynamics of the real signer is a very diﬃcult task when compared to replicate the shape of a signature. This is the main reason behind the research eﬀorts conducted the last decade on dynamic or on-line signature verification. On-line signature verification approaches can be coarsely categorized depending on the feature extraction process and the signature representation and matching strategy. A wide variety of features can be extracted from a dynamic signature and they are usually divided into local and global [1,2,3]. Signature representation and matching strategies must cope with the large intra-user variability inherent to this problem and they can be divided into template- and statistical-based. ⋆

This work has been partially supported by Spanish Ministry of Science and Innovation through project TEC2008-05894.

J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 340–347, 2009. c Springer-Verlag Berlin Heidelberg 2009

Ergodic HMM-UBM System for On-Line Signature Verification

341

Template-based matching approaches use to rely on Dynamic-Time-Warping to perform training - test signatures alignment with diﬀerent distance measurements [4][5][6]. Statistical-based approaches mostly rely on representing the underlying distribution of dynamic data with Gaussian Mixture Models [7] or, exploiting the analogy to speech recognition, representing the signature as a first order Hidden Markov Model with continuous output observations modeled as Gaussian mixtures [8][9]. We propose a novel approach based on building HMM user models by adapting a Universal Background Model (UBM). This UBM is an ergodic HMM with a fixed number of states driven by a dynamic signature feature. This approach has two main advantages: i) it inherits the properties of the GMM-UBM mechanism succesfully applied to speaker verification [10], like minimizing overfitting due to scarcity of user training data and allowing a “world-model” type of likelihood normalization that has also shown good results in signature verification [7], and ii) it allows a great deal of flexibility to automatically accomodate signaturedependent dynamics by adapting the ergodic HMM-UBM to each user data. The rest of the paper is organized as follows. Section 2 is dedicated to introduce the baseline system to compare with. This is largely based on the work in [8]. Here we use the same set of features and the HMM topology that gave them the best performance, but we do not use any type of score normalization to avoid introducing the same improvements into the comparing systems that could distort the objective of this work. Section 3 is the theoretical core of the article and it is devoted to explain the ergodic HMM-UBM scheme applied for user adaptation. Section 4 explains the experimental framework where we have used two databases: the publicly available MCYT-100 and a subset of the signature part of the BIOSECURE-DS2 database1 . Section 5 shows the experimental results and analyse them.

2

Baseline System: User-Specific HMMs

An HMM [12] is a statistical model with an unobserved first-order discrete Markov process. Each state of this hidden (non observable) Markov chain is associated with a probability distribution. The observations of a HMM are produced according to these distributions, and hence according to the hidden state. A HMM is formally defined by the following parameters: – S = {S1 , . . . , SN }: the state set of the Markov chain, where N is the number of states. – A = {a ij } ; i, j ∈ {1, . . . , N }: state transition matrix. aij = P (q t = Sj |q t−1 = Si ). – B = {b i (x)} ; i ∈ {1, . . . , N }, where b i (x) is the probability density functions of the observations at state Si . – Π = {πi }: set of the initial state probabilities, where πi = P (q0 = Si )). 1

This database will soon be publicly available and comprises three diﬀerent multimodal datasets DS1, DS2 and DS3 [11].

342

E.A. R´ ua, D. P´erez-Pi˜ nar L´ opez, and J.L.A. Castro

Output distributions b i (x) are commonly modeled as Gaussian Mixture ModM els (GMMs): b i (x) = m=1 wi ,m N(x, µi,m , Σi,m ), where N(x, µi,m , Σi,m ) denotes a multivariate normal distribution with mean µi,m and covariance matrix Σi,m . An HMM is fully characterized by the set Λ = {A , B , Π}. In a state of the art HMM-based on-line signature verification system, an HMM Λu is used to model the K reference signatures provided by the user u at the enrolment stage. The verification score of a given signature O = {o0 , . . . , oT −1 } claiming the identity u is calculated as P (O|Λu ). This probability is usually approximated using the Viterbi algorithm. For the baseline experiment the number of states was 2, like the best performing system in [8], but we allowed diﬀerent number of mixtures per state M = {16, 32, 64, 128}. This baseline system and the HMM-UBM presented in the next Section share the same feature space. The feature extraction proposed in [8] is used, resulting in vectors with 14 parameters.

3

HMM-UBM System: Adapted HMMs

In an UBM-HMM verification system an user-independent HMM is trained using signatures from many diﬀerent users. This HMM plays the role of UBM. Then, user models are obtained adapting the HMM statistics to the enrolment signatures. However, the use of an UBM-HMM approach in on-line signature verification faces us to a new problem: signatures must be labeled as a previous step to the Baum-Welch training. This labeling must be designed on a user-independent basis, i.e., an user-independent approach must be adopted. The usual arbitrary partition of the signature in contiguous segments of approximately the same length is not valid here, since these partitions, and therefore the resulting output distributions, are strongly user-dependent. One solution is to cluster the feature space on the basis of dynamic characteristics. An HMM trained using this strategy can play the role of Universal Background Model (UBM), since the meaning of the states is user independent. This model will characterize signatures on a common basis, grouping features in meaningful states. The HMM-UBM system proposed in this paper uses an activity detector for the task of state labeling. The use of an activity detector to drive the initial state labeling results in an ergodic structure in the trained UBM, where states are related to signature dynamics. This activity detector can generate output sequences of states according to the level and persistence of the dynamic input. In our case, the activity detector can be set up to produce sequences with two or five diﬀerent states. This is illustrated in Figure 1. Two diﬀerent dynamic characteristics are proposed as activity detector inputs for state labeling: log-velocity and pressure. Logarithms are applied to velocity in order to give the activity detector a better sensitivity to low velocity values. An important advantage of UBM systems is that they are more resiliant to overfitting eﬀects, since MAP-adapted versions of the UBM [13] can work as user

Ergodic HMM-UBM System for On-Line Signature Verification

343

Fig. 1. Activity detector: two diﬀerent granularity levels are provided. Output state sequences have two or five diﬀerentiated states, that will produce two or five states UBMs respectively.

models. MAP adaptation can produce accurate models despite the scarcity of training samples [10]. Verification scores are obtained as the loglikelihood ratio between the user HMM and the UBM, in the same manner that a GMM-UBM system [14]. Only adaptation of the output distribution means µi,m is performed. As T −1 w N (x ,µ ,Σi,m ) soft suming we have observed ci,m = Ss=0 t=0 γ t (i, s) Mi,mw Nt (xi,m ,µ ,Σ ) k=1

i,k

t

i,k

i,k

counts of samples associated with the m th Gaussian component of the state Si , where γ t (i , s) is the probability of the state i at time t in the enrolment sequence s, and S is the number of available enrolment sequences for user u, then, the MAP estimates of the model means for user u can be written as: ˆL µ ˆ i,m = µM i,m

ci,m ri,m ri0 ,m + µi,m ri0 ,m + ci,m ri,m ri0 ,m + ci,m ri,m

,

(1)

L where µM i,m is the ML mean estimate given the enrolment data, µi,m is the 2 value of the mean in the UBM, ri,m = 1/ σ i,m are the precisions, and ri0 ,m are the prior precisions. A more complete MAP formulae description can be found at [13]. The experiments carried out in this paper involve UBM-HMM systems with initial state labeling driven by pressure or logvelocity, and with several complexity levels. The number of states of the UBMs will be two or five, to account for a coarse or more precise dynamic activity clustering. Besides, diﬀerent number of mixtures per state M = {16, 32, 64, 128} are used, like in the baseline system.

4

Experimental Framework

Experiments outlined in this paper were run against the dynamic signature subcorpus of two databases: MCYT-100 and Biosecure Data Set 2 (DS2). Both databases share several common characteristics, but diﬀer in key aspects, such as the skillness of forgeries. Both databases are described in this section, along with the experimental protocol used.

344

4.1

E.A. R´ ua, D. P´erez-Pi˜ nar L´ opez, and J.L.A. Castro

MCYT-100

The MCYT bimodal biometric database [15] consists of fingerprint and on-line signature modalities. Dynamic signature sequences were acquired with a WACOM INTUOS A6 USB pen tablet capable of detecting height, so that pen-up movements are also considered. Data was captured at 100 Hz and includes position in both axis, pressure, azimuth angle and altitude angle, both referred to the tablet. Sampling frequency used for acquisition leads to a precise discretetime signature representation, taking into account the biomechanical sequences frequency distribution. MCYT-100 is a subcorpus of 100 users from MCYT database. Signature modality used in our experiments includes both genuine signatures and shapebased skilled forgeries with natural dynamics, generating low-force forgeries, as defined in [16]: impostors were asked to imitate the shape trying to generate a smooth and natural signature. Each user in the dataset has 50 associated signatures, of which 25 are genuine and 25 are skilled forgeries generated by subsequent contributors. 4.2

Biosecure DS2 On-Line Signature Database

BIOSECURE-DS2 (Data Set 2) is part of the Biosecure Multimodal Database captured by 11 international partners under the BioSecure Network of Excellence. For our experiments, we have used the dynamic signature modality of a subset of 104 contributors. Signature acquisition was carried out using a similar procedure to the one conducted in MCYT [15]. Pen coordinates, pressure, azimuth and altitude signals are available. Signatures were captured in two acquisition sessions, producing 30 genuine signatures and 20 forgeries available for each user. Forgeries generation involved training impostors with static and dynamic information by means of a dynamic representation of the signature to imitate, obtaining brute-force skilled forgeries, as defined in [16]. This characteristic gives this dataset a higher level of dificulty. 4.3

Experimental Protocols

The user-specific HMM system is used as a baseline to assess the capabilities of the UBM-HMM system. Tests are performed on the MCYT-100 and the DS2 databases. All users from MCYT-100 are pooled on a single group. Only a posteriori results are provided for this dataset. In contrast, users from BIOSECUREDS2 are divided into two disjoint groups of 50 users each to build an open set protocol. Two-fold cross-validation is then used to provide a priori results. The MCYT-100 corpus is divided into two partitions: 10 genuine signatures for each user are defined as the training partition, and the remaining 15 genuine signatures and all 25 low-force skilled forgeries [16] are used for testing. The Biosecure DS2 corpus is also divided into a training partition with 10 genuine

Ergodic HMM-UBM System for On-Line Signature Verification

345

Table 1. WER% (1.0) of the diﬀerent systems on the MCYT-100 and BIOSECUREDS2 datasets. LV and P stand for LogVelocity- and Pressure-driven state labeling respectivelly.

Dataset MCYT-100 a posteriori W ER%(1.0) BIOSECURE-DS2 a priori W ER%(1.0)

HMM-UBM HMM-UBM HMM-UBM HMM-UBM User-specific M LV, N = 2 LV, N = 5 P, N = 2 P, N = 5 HMM 16 4.71 3.08 5.31 3.75 6.86 32 4.11 3.76 4.43 3.74 5.90 64 3.29 3.94 3.47 3.60 4.97 128 3.46 3.49 3.52 3.34 7.72 16 6.83 6.99 7.90 7.18 13.24 32 6.34 8.58 6.44 6.35 8.75 64 5.92 5.82 6.64 6.57 7.90 128 8.59 6.11 6.30 5.77 8.75

signatures for each user and a test partition with 20 genuine signatures and 20 brute-force skilled forgeries. Both user-specific HMM training for the baseline system, and HMM adaptation in the HMM-UBM system are performed on the training partitions. In the HMM-UBM system, the second group from BIOSECURE-DS2 is used to train the UBM for the MCYT-100 dataset, whereas the first group UBM is trained using the second group data and vice versa in the BIOSECURE-DS2 dataset.

5

Experimental Results

Performance of both systems is evaluated in two ways. DET curves [17] are provided for a visual comparison of systems, and Weighted Error Rates (WER) are also provided for a numerical comparison of systems performance. WER is F AR defined as W ER(R) = F RR+R· . 1+R Tables 1 shows the W E R%(1.0) of HMM-UBM and user-specific HMM systems on the MCYT-100 and BIOSECURE-DS2 datasets. Comparable results are obtained for all HMM-UBM systems in each dataset. In the case of the MCYT-100 dataset, best results are obtained for HMM-UBM with LogVelocitydriven initialization, five states and 16 mixtures per state, with a posteriori W E R% (1.0) = 3.08. In the case of the BIOSECURE-DS2 dataset, best results correspond to HMM-UBM with Pressure-driven initialization, five states and 128 mixtures per state. In all cases, HMM-UBM systems obtain better results than the baseline system. For a fair comparison between datasets, the HMM-UBM system with LogVelocity-driven initialization, M = 64, N = 5 was evaluated on the MCYT-100 dataset, using the BIOSECURE-DS2 dataset for threshold computation. W E R%(1.0) = 4.49 is obtained, which is still lower than the best W E R%(1.0) = 5.77 obtained for the BIOSECURE-DS2 dataset. Figure 2 shows DET curves of the best HMM-UBM and user-specific HMM systems on both BIOSECURE-DS2 and MCYT-100 datasets. It is easy to realize that HMM-UBM DET curves are below baseline DET curves for both

346

E.A. R´ ua, D. P´erez-Pi˜ nar L´ opez, and J.L.A. Castro DET curve of best HMM−UBM and user−specific HMM systems on both MCYT−100 and BIOSECURE−DS2 datasets. User−specific HMMs, 64 GM/state, DS2 P HMM−UBM, 5 states, 128 GM/state, DS2 User−specific HMMs, 64 GM/state, MCYT−100 LV HMM−UBM, 5 states, 16 GM/state, MCYT−100

False Acceptance Rate (in %)

40

20

10 5 2 1 0.5 0.2 0.1 0.1 0.2 0.5

1 2 5 10 20 False Rejection Rate (in %)

40

Fig. 2. Best HMM-UBM and best user-specific HMM systems’ DET curves in MCYT100 and BIOSECURE-DS2 database

MCYT-100 and BIOSECURE-DS2 datasets, what reinforces the superiority of the proposed UBM-HMM system. A second outcome emerges from these curves: systems obtain worse a posteriori performance in BIOSECURE-DS2 than in the MCYT-100 dataset. This behaviour confirms the BIOSECURE-DS2 dataset as a challenging scenario for current on-line signature verification systems, due to the quality of its forgeries. These high quality brute-force skilled forgeries cause an error rate increase of about 87% for the best HMM-UBM systems and of 59% for the user-specific HMM system, both referred to their respective MCYT-100 counterparts.

6

Conclusions and Future Work

In this paper we have presented a HMM-UBM approach for on-line signature verification. This approach is compared to a state-of-the-art on-line verification system based on user-specific HMMs on two diﬀerent databases: MCYT-100 and BIOSECURE-DS2. The HMM-UBM approach obtains promising results, outperforming the user-specific HMMs system on all the experiments. Futhermore, on-line signature verification BIOSECURE-DS2 database is presented. Bruteforce skilled forgeries contained in this corpus make it a challenging scenario for on-line signature, as demonstrated in the experiments, where error rates are significantly worse in the BIOSECURE-DS2 database when compared to the MCYT-100 corpus. Future work will include new experimental comparisons of the HMM-UBM system and other state-of-the-art on-line signature verification systems, including score normalization techniques, such as cohort normalization, that can significantly improve verification performance, as demonstrated in [4,15,8].

Ergodic HMM-UBM System for On-Line Signature Verification

347

References 1. Fi´errez Aguilar, J., Nanni, L., L´ opez-Pe´ nalba, J., Ortega Garc´ıa, J., Maltoni, D.: An on-line signature verification system based on fusion of local and global information. In: IEEE International Conference on Audio- and Video-Based Person Authentication, pp. 523–532 (2005) 2. Richiardi, J., Ketabdar, H., Drygajlo, A.: Local and global feature selection for on-line signature verification. In: Proc. IAPR 8th International Conference on Document Analysis and Recognition, ICDAR (2005) 3. Vielhauer, C., Steinmetz, R., Mayerhofer, A.: Biometric hash based on statistical features of online signatures. In: Proceedings of 16th International Conference on Pattern Recognition, 2002, vol. 1, pp. 123–126 (2002) 4. Jain, A.K., Griess, F.D., Connell, S.D.: On-line Signature Verification. Pattern Recognition 35(1), 2963 (2002) 5. Fa´ undez-Zanuy, M.: On-line signature recognition based on vq-dtw. Pattern Recognition 40(3), 981–992 (2007) 6. Schimke, S., Vielhauer, C., Dittmann, J.: Using adapted levenshtein distance for on-line signature authentication. In: ICPR (2), pp. 931–934 (2004) 7. Richiardi, J., Drygajlo, A.: Gaussian mixture models for on-line signature verification. In: Intl. Multimedia Conf., Proc. ACM SIGMM workshop on Biometrics methods and applications, pp. 115–122 (2003) 8. Fi´errez Aguilar, J., Ortega Garc´ıa, D., Ramos, J.G.: Hmm-based on-line signature verification: Feature extraction and signature modeling. Pattern Recognition Letters 28(16), 2325–2334 (2007) 9. Van, B.L., Garcia-Salicetti, S., Dorizzi, B.: Fusion of hmm’s likelihood and viterbi path for on-line signature verification. In: European Conference on Computer Vision, pp. 318–331 (2004) 10. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker Verification using Adapted Gaussian Mixture Models. Digital Signal Processing 10, 19–41 (2000) 11. Ortega, J., et al.: The multi-scenario multi-environment biosecure multimodal database (bmdb). IEEE Transactions on Pattern Analysis and Machine Intelligence (2009) 12. Rabiner, L.R.: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, vol. 77, pp. 257–286. IEEE Computer Society Press, Los Alamitos (1989) 13. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR, Upper Saddle River (2001) 14. Mart´ınez D´ıaz, M., Fi´errez Aguilar, J., Ortega Garc´ıa, J.: Universal background models for dynamic signature verification. In: IEEE International Conference on Biometrics: Theory, Applications, and Systems, pp. 1–6 (2007) 15. Ortega Garcia, J., Fierrez Aguilar, J., Simon, D., Gonzalez, J., Faundez Zanuy, M., Espinosa, V., Satue, A., Hernaez, I., Igarza, J.-J., Vivaracho, C., Escudero, D., Moro, Q.-I.: Mcyt baseline corpus: a bimodal biometric database. In: IEE Proceedings: Vision, Image and Signal Processing, December 2003, vol. 150, pp. 395–401 (2003) 16. Wahl, A., Hennebert, J., Humm, A., Ingold, R.: Generation and Evaluation of Brute-Force Signature Forgeries. In: Gunsel, B., Jain, A.K., Tekalp, A.M., Sankur, B. (eds.) MRCS 2006. LNCS, vol. 4105, pp. 2–9. Springer, Heidelberg (2006) 17. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET Curve in Assessment of Detection Task Performance. In: European Conference on Speech Communication and Technology, pp. 1895–1898 (1997)

Improving Identity Prediction in Signature-based Unimodal Systems Using Soft Biometrics M´ arjory Abreu and Michael Fairhurst Department of Electronics, University of Kent, Canterbury, Kent CT2 7NT, UK {mcda2,M.C.Fairhurst}@kent.ac.uk

Abstract. System optimisation, where even small individual system performance gains can often have a significant impact on applicability and viability of biometric solutions, is an important practical issue. This paper analyses two diﬀerent techniques for using soft biometric information (which is often already available or easily obtainable in many applications) to improve identity prediction accuracy of signature-based tasks. It is shown that such a strategy can improve performance of unimodal systems, supporting high usability profiles and low-cost processing.

1

Introduction

Biometrics-based systems for individual identity authentication are now well established [1]. Many factors will influence the design of a biometric system and, in addition to obvious performance indicators such as accuracy rates, issues concerning reliability, flexibility, security and usability are extremely important. It is necessary to understand and evaluate each part of the process in developing these systems. The advantages and disadvantages of the many diﬀerent modalities available (fingerprint, face, iris, voice, handwritten signature, etc) are well documented, and a wide variety of diﬀerent classification/matching techniques have been extensively explored. There are many proposed approaches which use multimodal solutions to improve accuracy, flexibility and security, though these solutions can be of relatively high cost and diminished usability [2]. Thus, where possible unimodal solutions are still important, and, indeed, attempts to improve the performance of such configurations are widely reported. One approach which aims to improve the accuracy of unimodal systems is to include non-biometric (soft biometric) information in the decision-making process. In the main, the reported work in this area incorporates relevant information (such as gender, age, handedness, etc) in order to help to narrow the identityrelated search space [3], [4], [5], [6], [7], [8], but the precise methodology adopted is sometimes quite arbitrary. This paper presents an investigation of two rather diﬀerent but potentially very eﬀective techniques for including soft biometric information into the overall identification decision. Although we will focus in this paper specifically on the J. Fierrez et al. (Eds.): BioID MultiComm2009, LNCS 5707, pp. 348–356, 2009. c Springer-Verlag Berlin Heidelberg 2009

Improving Identity Prediction in Signature-based Unimodal Systems

349

handwritten signature as the biometric modality of choice, only one of our proposed approaches is modality-dependent, giving a valuable degree of generality to our study.

2

Techniques for Exploiting Soft Biometrics

The available literature shows a relatively modest number of examples of using soft biometrics to improve accuracy yet, when this approach is adopted, it generally leads to improved performance. In this paper we propose two diﬀerent ways of using such information in a more eﬃcient way than has been adopted hitherto. 2.1

Soft Biometrics as a Tool for Feature Selection

In this approach, the soft biometric information works as a feature selector, the selection being related to the demographic information that is saved in a Knowledge Database of the system. Fig. 1 shows how the process is realised. During the training phase, it is important to understand the relationship between the dynamic features and the soft biometric information, from which the system is able to choose the most suitable features for each user. This is the information that will be stored in the Knowledge Database module and will be used in the feature selection. The feature analysis is carried out, after training the system with all the features, as follows: 1. Select all the static features and test the system saving the partitioned error rates related with the soft biometrics. 2. For each dynamic feature: – Add this feature to the vector of the static features, – Test this new vector in the system, – Save the error rates to each related soft biometric. 3. Once the system is tested with all diﬀerent feature combinations and the partitioned error rates saved, the analysis of each dynamic feature is executed with respect to each soft biometric information: – Save in the Knowledge Database module each dynamic feature that generated better performances than when using only static features. As an example to illustrate the general operation of this method, a hypothetical three-feature (fea1, fea2 and fea3) biometric-based identity prediction task is considered. In this task let us assume that the features fea1 and fea2 are static features and fea3 is a dynamic feature. The soft-biometric information of the user is known and is recorded as either “X” or “Y” (these labels representing appropriate values depending on the particular instance of soft biometric information. For instance, for gender the labels will be “male” and “female”). After the training phase with all the features (fea1, fea2 and fea3), the system is tested with fea1 and fea2 (only static features) first. The partitioned error rates for each “X” and “Y” for the soft-biometric information might then appear as

350

M. Abreu and M. Fairhurst

Table 1. Example: Fictional illustrative data Static Features System Soft-Biometric Error Mean “X” “Y” 10.21% 5.40% 4.81% Static Features + Dynamic Feature System Soft-Biometric Error Mean “X” “Y” 9.54% 4.54% 5.00%

in Table 1. The next step is to test the system with fea1, fea2 and fea3 and also save the partitioned error rates. This information can also be seen in Table 1. If there is any gain in accuracy with respect to the partitioned error rates, then this dynamic feature is seen to improve performance. The information stored in the Knowledge Database module (Fig. 1) during the training phase is as follows: – if soft-biometric information is tagged “X” then fea3 can improve accuracy. – if soft-biometric information is tagged “Y” then fea3 is not contributing to improved accuracy. Once the features are selected, the corresponding input values are shown to the system and the system computes its result.

Fig. 1. Soft Biometric as feature selector

2.2

Fig. 2. Soft Biometric as input features

Soft Biometric as an Extra Input Feature

In this approach, the soft biometric information functions, in essence, as an extra input feature. The soft biometric information is used in the same way as any other biometric feature and is simply added to the input vector. Fig. 2 shows a schematic illustration of this process. This information can be added into our analysis of system performance, where these additional characteristics eﬀectively define sub-groups within our overall

Improving Identity Prediction in Signature-based Unimodal Systems

351

test population. These new information sources contribute to the system input information in the same way as the extracted sample features are used, which requires the integration of the further features. Using the same example introduced in Section 2.1, the input of the system will be the three features (fea1, fea2 and fea3) and the additional soft biometric feature, designated soft-fea here. In this case, the input will be a binary value, 10 for “X” and 01 for “Y”, using Hamming Distance coding [9].

3

A Case Study

In order to analyse how these two approaches compare in an application based on real data, we now present a practical Case Study. 3.1

Database

The multimodal database used in this work was collected in the Department of Electronics at the University of Kent [10] as part of the Europe-wide BioSecure Project. In this database, there are samples of Face, Speech, Signature, Fingerprint, Hand Geometry and Iris biometrics of 79 users collected in two sessions. In the work reported here we have used the signature samples from both sessions. Table 2. Signature features F e a t u r e Execution Time Pen Lift Signature Width Signature Height Height to Width Ratio Average Horizontal Pen Velocity in X Average Horizontal Pen Velocity in Y Vertical Midpoint Pen Crossings Azimuth Altitude Pressure Number of points comprising the image Sum of horizontal coordinate values Sum of vertical coordinate values Horizontal centralness Vertical centralness

T y p e Dynamic Dynamic Static Static Static Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Static Static Static Static Static

The database contains 25 signature samples for each subject, where 15 are samples of the subject’s true signature and 10 are attempts to imitate another user’s signature. In this investigation we have used only the 15 genuine signatures of each subject. The data were collected using an A4-sized graphics tablet with a density of 500 lines per inch. There are 16 representative biometric features extracted from each signature sample, as identified in Table 2. These features are chosen to be representative of those known to be commonly adopted in signature processing applications. All the available biometric features are used in the classification process as input to the system. During the acquisition of this database, the subjects were required to provide some additional information which constitutes exactly the sort of soft biometric data discussed above.

352

M. Abreu and M. Fairhurst

In particular, we recorded the following information about each user: – Gender: Male or Female. – Age information: For our purposes three age groups were identified, namely: under 25 years, 25-60 years and over 60 years. – Handedness: Right or Left. In the study reported here, we have explored the users’ characteristics with respect to the age information and handedness information. The possible values assigned to the age features, use a Hamming Distance coding [9], and are defined as 100 (< 25), 010 (25-60) or 001 (> 60), depending on the age group. In the same way, the possible values assigned to handedness features are 10 (right) and 01 (left). 3.2

Classification Methods

In order to evaluate the eﬀectiveness of the proposed two approaches, we choose four diﬀerent classifiers (to allow a comparative study), as described below. – Fuzzy Multi-Layer Perceptron (FMLP) [11]: This classifier incorporates fuzzy set theory into a multi-layer Perceptron framework, and results from the direct “fuzzyfication” in the network level of the MLP, in the learning level, or in both. – Support Vector Machines (SVM) [12]: This approach embodies a functionality very diﬀerent from that of more traditional classification methods and is based on an induction method which minimizes the upper limit of the generalization error related to uniform convergence. – K-Nearest Neighbours (KNN) [13]: In this method, the training set is seen as composed of n-dimensional vectors and each element represents an n-dimensional space point. The classifier estimates the k nearest neighbours in the whole dataset based on an appropriate distance metric (Euclidian distance in the simplest case). – Optimized IREP (Incremental Reduced Error Pruning) (JRip) [14]: The Decision Tree usually uses pruning techniques to decrease the error rates of a dataset with noise, one approach to which is the Reduced Error Pruning method. In order to guarantee robustness in the classification process, we chose a tenfold-cross-validation approach because of its relative simplicity, and because it has been shown to be statistically sound in evaluating the performance of classification tasks [15]. In ten fold cross validation, the training set is equally divided into ten diﬀerent subsets. Nine out of ten of the training subsets are used to train the classifier and the tenth subset is used as the test set. The procedure is repeated ten times, with a diﬀerent subset being used as the test set.

Improving Identity Prediction in Signature-based Unimodal Systems

3.3

353

Experimental Results

In order to analyse the performance of our proposed new techniques for using soft biometric information to improve accuracy, it is important first to evaluate the performance of the classifiers without this additional information. Table 3 shows the error rates of the classifiers when all features (All-F) are used and when only static features (Sta-F) are used. The classifier that presents the best results, highlighted in bold in Table 3, is FMLP. It is important to note that in all the classifiers, using all features leads to an improvement of around 5%, (corresponding to approximately 60 additional samples that are classified correctly). This is perhaps not in itself surprising, but confirms what can readily be determined from the available literature, that using dynamic features improves the accuracy of a signature-based classification task. Table 3. Results without and with soft-biometrics Classification Results without Soft biometric Methods All-F FMLP 10.21%±2.37 Jrip 11.67%±2.96 SVM 10.92%±2.31 KNN 14.12%±2.31 Classification Results without Soft biometric Methods Sta-F FMLP 15.97%±3.69 Jrip 18.22%±3.54 SVM 16.31%±3.97 KNN 22.84%±4.01

Age-based results S-F/Age All-F+Age Sta-F+Age 8.84%±1.98 9.54%±2.91 12.91%±2.57 11.87%±1.74 10.28%±2.57 16.84%±2.91 8.69%±1.67 8.21%±2.64 13.73%±2.54 15.34%±1.39 12.47%±2.83 18.81%±2.88 Handedness-based results S-F/Hand All-F+Hand Sta-F+Hand 10.27%±2.89 10.81%±3.69 12.09%±1.58 12.64%±2.81 11.27%±3.47 15.74%±1.79 9.57%±2.36 9.83%±3.59 11.94%±1.67 14.37%±2.47 13.97%±3.77 16.33%±1.88

However, Table 3 shows the results for the same classifiers when the additional soft biometric information (Age and Handedness) is also incorporated. The three columns show respectively the error rate measured in the cases for selected features using soft biometrics (S-F), all features plus soft biometrics (All-F) and static features plus soft biometrics (Sta-F). From an analysis of Table 3, it is possible to see that by adding selected dynamic features to the static features or adding soft biometrics either to all features or only to static features produces a better performance than using only static features. The overall error rates are broadly related to the sophistication (and, generally therefore, complexity) of the classifiers, with the best results being obtained with the SVM and FMLP classifiers. Analysing the results based on the incorporation of the age-based additional information and using the t-test, it is possible to note that there is no statistical diﬀerence between the S-F/Age results and the All-F+Age results while, on the other hand, these results are both statistically better than the Sta-F+Age. Analysing the results when the handedness information is incorporated and using the t-test, it can be seen that all the results are statistically comparable. Fig. 3 and Fig. 4 show the comparison among all five diﬀerent configurations according to the partitioned age bands and left and right handedness. Careful

354

M. Abreu and M. Fairhurst

Fig. 3. Result of all classifiers in all config- Fig. 4. Result of all classifiers in all configurations partitioned into age bands urations partitioned into handedness

examination of these results shows, however, how choice of classifier and configuration can lead directly to an optimised process with respect to a given task objective (for example, when working within a particular age band, or focusing on a particular population group). 3.4

Enhancing Oﬀ-Line Processing

An issue of particular interest here is related to a question about oﬀ-line signature processing and how this might be enhanced. Most implementations of signature verification systems use a wide range of feature types, but almost all assume the availability of the dynamic features which are only extractable when on-line sample acquisition is possible. Yet there are still important application where only oﬀ-line sampling is possible (remote bank cheque clearing, many document-based applications, etc), and thus it is instructive to consider performance achievable when only static features can be used for processing. It is generally expected that static-only solutions will return poorer levels of performance than can be achieved with the much richer feature set available when dynamic features are incorporated and, indeed, the results shown above confirm this here. It is interesting to consider the results of enhancing the processing by incorporating soft biometrics, such as can be seen when we add in the age-based information, with the results shown in Fig. 3 and Fig. 4. The improvement in performance which this brings about suggests a very valuable further benefit of an approach which seeks to exploit the availability of soft biometrics as a means of enhancing performance. In this case, such an approach provides the opportunity for significant enhancement in a scenario which has important practical implications yet which is often especially limited by the inherent nature of the task domain.

Improving Identity Prediction in Signature-based Unimodal Systems

4

355

Concluding Remarks

This paper has introduced two new techniques by means of which to include soft biometric information to improve identity prediction accuracy. The results presented are very encouraging, and show how additional information often available or explicitly collected in practical scenarios can be exploited in a way which can enhance the identification process. Although accuracy improvements tend to be modest (perhaps not surprisingly given the small scale of this initial experimental study) the gains aﬀorded can nevertheless make an impact in practical situations and provide a basis for further development of the general strategy proposed. Further work is required to develop optimisation procedures in configurations such as those investigated here, and to extend the analysis to integrate diﬀerent types of soft biometric information. Already, however, the work reported here is beginning to oﬀer some options to a system designer in seeking to improve error rate performance in unimodal systems, providing alternatives to the increased complexity and reduced usability incurred in multibiometric systems.

Acknowledgment The authors gratefully acknowledge the financial support given to Mrs Abreu from CAPES (Brazilian Funding Agency) under grant BEX 4903-06-4.

References 1. Jain, A., Nandakumar, K., Nagar, A.: Biometric template security. EURASIP 8(2), 1–17 (2008) 2. Toledano, D.T., Pozo, R.F., Trapote, A.H., G´ omez, L.H.: Usability evaluation of multi-modal biometric verification systems. Interacting with Computers 18(5), 1101–1122 (2006) 3. Franke, K., Ruiz-del-Solar, J.: Soft-biometrics: Soft-computing technologies for biometric-applications. In: AFSS, pp. 171–177 (2002) 4. Jain, A., Dass, S., Nandakumar, K.: Can soft biometric traits assist user recognition? In: ICBA, pp. 561–572 (2004) 5. Jain, A., Nandakumar, K., Lu, X., Park, U.: Integrating faces, fingerprints, and soft biometric traits for user recognition. In: ECCV Workshop BioAW, pp. 259–269 (2004) 6. Zewail, R., Elsafi, A., Saeb, M., Hamdy, N.: Soft and hard biometrics fusion for improved identity verification. In: MWSCAS, July 2004, vol. 1, pp. I-225–I-228 (2004) 7. Ailisto, H., Vildjiounaite, E., Lindholm, M., M¨ akel¨ a, S., Peltola, J.: Soft biometricscombining body weight and fat measurements with fingerprint biometrics. Pattern Recogn. Lett. 27(5), 325–334 (2006) 8. Jain, A.K., Dass, S.C., Nandakumar, K.: Soft biometric traits for personal recognition systems. In: Zhang, D., Jain, A.K. (eds.) ICBA 2004. LNCS, vol. 3072, pp. 731–738. Springer, Heidelberg (2004)

356

M. Abreu and M. Fairhurst

9. Hamming, R.W.: Error detecting and error correcting codes. The Bell System Technical Journal 26(2), 147–160 (1950) 10. Ortega-Garcia, J., Alonso-Fernandez, F., Fierrez-Aguilar, J., Garcia-Mateo, C., Salicetti, S., Allano, L., Ly-Van, B., Dorizzi, B.: Software tool and acquisition equipment recommendations for the three scenarios considered. Technical Report Report No.: D6.2.1. Contract No.: IST-2002-507634 (Jun 2006) 11. Canuto, A.: Combining Neural Networks and Fuzzy Logic for Aplications in Character Recognition. PhD thesis, Department of Electronics, University of Kent, Canteburry, UK (May 2001) ( Adviser-Fairhurst, M.C.) 12. Nello, C., John, S.T.: An introduction to support vector machines and other kernelbased learning methods. Robotic 18(6), 687–689 (2000) 13. Arya, S., Mount, D., Netanyahu, N., Silverman, R., Wu, A.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891– 923 (1998) 14. Furnkranz, J., Widmer, G.: Incremental reduced error pruning. In: ICML 1994, New Brunswick, NJ, pp. 70–77 (1994) 15. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

Author Index

Abel, Andrew 65 Abreu, M´ arjory 348 Akakin, Hatice Cinar 105 Alba Castro, Jos´e Luis 340 Alotaibi, Yousef Ajami 162 Ant´ on, Luis 97 Argones R´ ua, Enrique 340 ´ Avila, Andr´es I. 187 Atah, Joshua A. 170 Bae, You-suk 318 Bandari, Naghmeh Mohammadi Bashir, Muzaffar 200 Beetz, Michael 122 Behzad, Moshiri 130 Biermann, Michael 220 Bringer, Julien 178 Burgues, Javier 325 Butt, M Asif Afzal 308 Cambria, Erik 252 Carballo, Sara 285 Carbone, Domenico 73 Castrill´ on, Modesto 97 Chabanne, Herv´e 178 Chetouani, Mohamed 65 Chia, Chaw 212 Dimitrova, Desislava 207 Dimov, Dimo 146, 192 Dittmann, Jana 220 Dobriˇsek, Simon 114 Drygajlo, Andrzej 25, 260 Eckl, Chris 252 Ejarque, Pascual 81 Esposito, Anna 73 F` abregas, Joan 236 Fairhurst, Michael 348 Fariba, Bahrami 130 Faundez, Marcos 236 ´ Ferrer, Miguel Angel 236 Fierrez, Julian 154, 285, 325 Freire, David 97 Freire, Manuel 236

Gajˇsek, Rok 114 Galbally, Javier 285 Garrido, Javier 236 Gkalelis, Nikolaos 138 Gluhchev, Georgi 207 Gonzalez, Guillermo 236 Gonzalez-Rodriguez, Joaquin Grassi, Marco 244 G´ omez, David 81 ˇ Gros, Jerneja Zganec 57 301

Hadid, Abdenour 9 Hangai, Seiichiro 333 Havasi, Catherine 252 Henniger, Olaf 268 Hernado, Javier 81 Hernando, David 81 Howells, Gareth 170 Husain, Amir 162 Hussain, Amir 65, 252 Imai, Hideki 293 Inuma, Manabu 293 Jalal, Arabneydi

130

Kanak, Alper 276 Kempf, J¨ urgen 200 Kevenaar, Tom A.M. 178 Kindarji, Bruno 178 Kojima, Yoshihiro 293 Kyperountas, Marios 89 K¨ ummel, Karl 220 Laskov, Lasko 192 Lee, Hyun-suk 318 Li, Weifeng 25 Lopez-Moreno, Ignacio

49

Maeng, Hyun-ju 318 Mansoor, Atif Bin 308 Marinov, Alexander 146 Mayer, Christoph 122 Miheliˇc, Aleˇs 57 Miheliˇc, France 114

49

358

Author Index

Milgram, Maurice 65 Misaghian, Khashayar 301 Morales, Aythami 236 Moreno-Moreno, Miriam 154 Muci, Adrialy 187 M¨ uller, Sascha 268 Nguyen, Quoc-Dinh Nolle, Lars 212

65

Ortega, Javier 236 Ortega-Garcia, Javier 154, 285, 325 Ortego-Resa, Carlos 49 Otsuka, Akira 293 Paveˇsi´c, Nikola 1, 114 P´erez-Pi˜ nar L´ opez, David Pietik¨ ainen, Matti 9 Pitas, Ioannis 89, 138 Pˇribil, Jiˇr´ı 41 Pˇribilov´ a, Anna 41 Radhika, K.R. 228 Radig, Bernd 122 Ramos, Daniel 49, 325 Rawls, Allen W. 17 Riaz, Zahid 122 Ribalda, Ricardo 236

340

Ricanek Jr., Karl 17 Ringeval, Fabien 65 Riviello, Maria Teresa

73

Saeed, Omer 308 Sankur, Bulent 105 Sano, Tomoaki 333 Scheidat, Tobias 220 Sekhar, G.N. 228 Sheela, S.V. 228 Sherkat, Nasser 212 Shigetomi, Rie 293 So˜ gukpinar, Ibrahim 276 Staroniewicz, Piotr 33 ˇ Struc, Vitomir 1, 114 Tajbakhsh, Nima Tefas, Anastasios Ter´ an, Luis 260

301 138

Venkatesha, M.K. 228 Vielhauer, Claus 220 Yoshida, Takahiro

333

Zhu, Kewei 25 ˇ Zibert, Janez 114 Zlateva, Nadezhda 146