338 88 11MB
English Pages 630 [640] Year 2007
Three-Dimensional Television
Signals and Communication Technology Circuits and Systems Based on Delta Modulation Linear, Nonlinear and Mixed Mode Processing D.G. Zrilic ISBN 3-540-23751-8 Functional Structures in Networks AMLn – A Language for Model Driven Development of Telecom Systems T. Muth ISBN 3-540-22545-5 RadioWave Propagation for Telecommunication Applications H. Sizun ISBN 3-540-40758-8 Electronic Noise and Interfering Signals Principles and Applications G. Vasilescu ISBN 3-540-40741-3 DVB The Family of International Standards for Digital Video Broadcasting, 2nd ed. U. Reimers ISBN 3-540-43545-X Digital Interactive TV and Metadata Future Broadcast Multimedia A. Lugmayr, S. Niiranen, and S. Kalli ISBN 3-387-20843-7 Adaptive Antenna Arrays Trends and Applications S. Chandran (Ed.) ISBN 3-540-20199-8 Digital Signal Processing with Field Programmable Gate Arrays U. Meyer-Baese ISBN 3-540-21119-5 Neuro-Fuzzy and Fuzzy Neural Applications in Telecommunications P. Stavroulakis (Ed.) ISBN 3-540-40759-6 SDMA for Multipath Wireless Channels Limiting Characteristics and Stochastic Models I.P. Kovalyov ISBN 3-540-40225-X Digital Television A Practical Guide for Engineers W. Fischer ISBN 3-540-01155-2 Multimedia Communication Technology Representation, Transmission and Identification of Multimedia Signals J.R. Ohm ISBN 3-540-01249-4 Information Measures Information and its Description in Science and Engineering C. Arndt ISBN 3-540-40855-X Processing of SAR Data Fundamentals, Signal Processing, Interferometry A. Hein ISBN 3-540-05043-4
Chaos-Based Digital Communication Systems Operating Principles, Analysis Methods, and Performance Evalutation F.C.M. Lau and C.K. Tse ISBN 3-540-00602-8 Adaptive Signal Processing Application to Real-World Problems J. Benesty and Y. Huang (Eds.) ISBN 3-540-00051-8 Multimedia Information Retrieval and Management Technological Fundamentals and Applications D. Feng, W.C. Siu, and H.J. Zhang (Eds.) ISBN 3-540-00244-8 Structured Cable Systems A.B. Semenov, S.K. Strizhakov, and I.R. Suncheley ISBN 3-540-43000-8 UMTS The Physical Layer of the Universal Mobile Telecommunications System A. Springer and R. Weigel ISBN 3-540-42162-9 Advanced Theory of Signal Detection Weak Signal Detection in Generalized Obeservations I. Song, J. Bae, and S.Y. Kim ISBN 3-540-43064-4 Wireless Internet Access over GSM and UMTS M. Taferner and E. Bonek ISBN 3-540-42551-9 The Variational Bayes Method in Signal Processing ˇ ıdl and A. Quinn V. Sm´ ISBN 3-540-28819-8 Topics in Acoustic Echo and Noise Control Selected Methods for the Cancellation of Acoustical Echoes, the Reduction of Background Noise, and Speech Processing E. Hänsler and G. Schmidt (Eds.) ISBN 3-540-33212-x Terrestrial Trunked Radio - TETRA A Global Security Tool Peter Stavroulakis ISBN 3-540-71190-2 Three-Dimensional Television Capture, Transmission, Display H.M. Ozaktas and L. Onural (Eds.) ISBN 3-540-72532-6
Haldun M. Ozaktas · Levent Onural (Eds.)
Three-Dimensional Television Capture, Transmission, Display
With 316 Figures and 21 Tables
Prof. Haldun M. Ozaktas
Prof. Levent Onural
Dept. of Electrical Engineering Bilkent University TR-06800 Bilkent, Ankara Turkey
Dept. of Electrical Engineering Bilkent University TR-06800 Bilkent, Ankara Turkey
Library of Congress Control Number: 2007928410 ISBN 978-3-540-72531-2 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com c Springer-Verlag Berlin Heidelberg 2008 The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: by the authors and Integra, India using a Springer LATEX macro package Cover design: eStudio Calamar S.L., F. Steinen-Broo, Pau/Girona, Spain Printed on acid-free paper
SPIN: 11781196
543210
Preface
This book was motivated by, and most of its chapters derived from work conducted within, the Integrated Three-Dimensional Television—Capture, Transmission, and Display project which is funded by the European Commission 6th Framework Information Society Technologies Programme and led by Bilkent University, Ankara. The project involves 19 partner institutions from 7 countries and over 180 researchers throughout Europe and extends over the period from September 2004 to August 2008. The project web site is www.3dtv-research.org. The editors would like to thank all authors who contributed their work to this edited volume. All contributions were reviewed, in most cases by leading experts in the area. We are grateful to all the anonymous reviewers for their help and their fruitful suggestions, which not only provided a basis for accepting or declining manuscripts, but which also improved their quality significantly. We would also like to thank our Editor Christoph Baumann at Springer, Heidelberg for his guidance throughout the process. We hope that this book will prove useful for those interested in three-dimensional television and related technologies and that it will inspire further research that will help make 3DTV a reality in the near future. The editors’ work is supported by the EC within FP6 under Grant 511568 with the acronym 3DTV.
About the Authors
Gozde B. Akar received the BS degree from Middle East Technical University, Turkey in 1988 and MS and PhD degrees from Bilkent University, Turkey in 1990 and 1994, respectively, all in electrical and electronics engineering. Currently she is an associate professor with the Department of Electrical and Electronics Engineering, Middle East Technical University. Her research interests are in video processing, compression, motion modeling and multimedia networking. Anil Aksay received his BS and MS degrees in electrical and electronics engineering from Middle East Technical University, Ankara, Turkey in 1999 and 2001, respectively. Currently he is with the Multimedia Research Group in METU, where he is working as a researcher towards the PhD degree. His research interests include multiple description coding, image and video compression, stereoscopic and multi-view coding, video streaming and error concealment. A. Aydin Alatan received his BS degree from Middle East Technical University, Ankara, Turkey in 1990, the MS and DIC degrees from Imperial College, London, UK in 1992, and the PhD degree from Bilkent University, Ankara, Turkey in 1997, all in electrical engineering. He was a post-doctoral research associate at Rensselaer Polytechnic Institute and New Jersey Institute of Technology between 1997 and 2000. In August 2000, he joined the faculty of Electrical and Electronics Engineering Department at Middle East Technical University. Jaakko Astola received the PhD degree in mathematics from Turku University, Finland in 1978. Between 1979 and 1987 he was with the Department of Information Technology, Lappeenranta University of Technology, Lappeenranta, Finland. From 1987 to 1992 he was an associate professor in applied mathematics at Tampere University, Tampere, Finland. From 1993 he has been a professor of Signal Processing at Tampere University of Technology.
VIII
About the Authors
His research interests include signal processing, coding theory, spectral techniques and statistics. Richard Bates is a research fellow within the Imaging and Displays Research Group at De Montfort University, where his main interests are software development for 3DTV displays and evaluating the usability and acceptability of 3DTV displays. He holds a PhD on human-computer interaction. Kostadin Stoyanov Beev is a researcher in the Central Laboratory of Optical Storage and Processing of Information, Bulgarian Academy of Sciences. He received the PhD degree in 2007 in the area of wave processes physics. His MS degree in engineering physics has two specializations: quantum electronics and laser technique, and medical physics. His research interest is in optics, holography, material science, biophysics and evanescent waves. Kristina Nikolaeva Beeva is a researcher in the Central Laboratory of Optical Storage and Processing of Information, Bulgarian Academy of Sciences. Her MS degree in engineering physics has two specializations: quantum electronics and laser technique, and medical physics. Her research interest is in the fields of optics, holography, laser technique and biophysics. Philip Benzie obtained an honours degree in electrical and electronic engineering from Aberdeen University (AU) in 2001. His PhD on the application of finite element analysis to holographic interferometry for non-destructive testing was obtained in 2006 at AU. Currently his research interests include holographic imaging, non-destructive testing and underwater holography. M. Oguz Bici received the BS degree in electrical and electronics engineering from Middle East Technical University (METU), Ankara, Turkey in 2005. Currently he is with the Multimedia Research Group of the Electrical and Electronics Engineering Department, METU, studying towards a PhD degree. His research interests are multimedia compression, error resilient/multiple description coding and wireless multimedia sensor networks. Cagdas Bilen received his BS degree in electrical and electronics engineering from Middle East Technical University (METU), Ankara, Turkey in 2005. Currently he is a MS student in the Electrical and Electronics Engineering Department and a researcher in Multimedia Research Group of METU. Among his research topics are image and video compression, error concealment, distributed video coding, multiple description coding, stereoscopic and multiview video coding. Sukhee Cho received the BS and MS degrees in computer science from Pukyong National University in 1993 and 1995, respectively. She received the PhD degree in electronics and information engineering from Yokohama National University in 1999. She is currently with the Radio and Broadcasting Research Division, Electronics and Telecommunications Research Institute
About the Authors
IX
(ETRI), Daejeon, Korea. Her research interests include stereoscopic video coding, multi-viewpoint video coding (MVC) and 3DTV broadcasting systems. M. Reha Civanlar received the PhD degree in electrical and computer engineering from North Carolina State University in 1984. He is currently a vice president and director of the Media Lab in DoCoMo USA Labs. He was a visiting professor of computer engineering at Koc University in Istanbul for four years starting in 2002. Before, he was the head of Visual Communications Research Department at AT&T Labs Research. He is a recipient of 1985 Senior Paper Award of the IEEE Signal Processing Society and a fellow of the IEEE. Edilson de Aguiar received the BS degree in computer engineering from the Espirito Santo Federal University, Vitoria, Brazil in 2002 and the MS degree in computer science from the Saarland University, Saarbr¨ ucken, Germany in 2004. He is currently working as a PhD student in the Computer Graphics Group at the Max-Planck-Institut (MPI) Informatik, Saarbr¨ ucken, Germany. His research interests include computer animation, motion capture and 3D video. Stephen DiVerdi is a doctoral candidate at the University of California in Santa Barbara. He received his bachelors degree in computer science from Harvey Mudd College in 2002. His research covers the intersection of graphics, vision, and human computer interaction, with an emphasis on augmented reality. Funda Durupınar is a PhD candidate at the Department of Computer Engineering, Bilkent University, Ankara, Turkey. She received her BS degree in computer engineering from Middle East Technical University in 2002 and her MS degree in computer engineering from Bilkent University in 2004. Her research interests include physically-based simulation, cloth modeling, behavioral animation and crowd simulation. Karen Egiazarian received the MS degree in mathematics from Yerevan State University, Armenia, and the PhD degree in physics and mathematics, from Moscow Lomonosov State University, and the DrTech degree from Tampere University of Technology. Currently he is full professor in the Institute of Signal Processing, Tampere University of Technology. His research interests are in the areas of applied mathematics, and signal and image processing. G. Bora Esmer received his BS degree from Hacettepe University, Ankara in 2001 and the MS degree from Bilkent University, Ankara in 2004. He is a PhD student in the area of signal processing since 2004. His areas of interest include optical information processing and image processing. Christoph Fehn received the Dr-Ing. degree from Technical University of Berlin, Germany. He currently works as a scientific project manager at Fraunhofer HHI and as an associate lecturer at University of Applied Sciences,
X
About the Authors
Berlin. His research interests include video processing and coding, computer graphics, and computer vision for applications in the area of immersive media, 3DTV, and digital cinema. He has been involved in MPEG standardization activities for 3D video. Atanas Gotchev received MS degrees in communications engineering and in applied mathematics from Technical University of Sofia, Bulgaria, the PhD degree in communications engineering from Bulgarian Academy of Sciences, and the DrTech degree from Tampere University of Technology, Finland. Currently he is a senior researcher at the Institute of Signal Processing, Tampere University of Technology. His research interests are in transform methods for signal, image and video processing. Uˇ gur G¨ ud¨ ukbay is an associate professor at the Department of Computer Engineering, Bilkent University, Ankara, Turkey. He received his PhD degree in computer engineering and information science from Bilkent University in 1994. Then, he conducted research as a postdoctoral fellow at Human Modeling and Simulation Laboratory, University of Pennsylvania, USA. His research interests include different aspects of computer graphics, multimedia databases, computational geometry, and electronic arts. He is a senior member of IEEE and a professional member of ACM. Jana Harizanova is a researcher in the Central Laboratory of Optical Storage and Processing of Information of the Bulgarian Academy of Sciences. Recently she has received her PhD degree in the field of holographic and laser interferometry. Her main research interests include interferometry, diffraction optics and digital signal processing. Tobias H¨ ollerer is an assistant professor of computer science at the University of California, Santa Barbara, where he leads a research group on imaging, interaction, and innovative interfaces. H¨ ollerer holds a graduate degree in informatics from the Technical University of Berlin and MS and PhD degrees in computer science from Columbia University. Klaus Hopf received the Dipl-Ing degree in electrical engineering from the Technical University of Berlin, Germany. He has performed work within government-funded research projects on the development of videoconferencing systems, and in research projects developing new technologies and for the representation of video images in high resolution (HDTV) and 3D. Namho Hur received the BS MS and PhD degrees in electrical and electronics engineering from Pohang University of Science and Technology (POSTECH), Pohang, Korea in 1992, 1994, and 2000. He is currently with the Radio and Broadcasting Research Division, Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea. As a research scientist, he has been with Communications Research Centre Canada (2003–2004). His main research interests are control theory, power electronics and 3DTV broadcasting systems.
About the Authors
XI
Rossitza Ilieva received the MS degree in physics in 1968 and a post graduation in optics and holography in 1979 from Sofia University. She has experience in stereo imaging and holography (researcher at CLOSPI BAS, 1975–2005), Since then she is a researcher in the Electrical and Electronics Engineering Department of Bilkent University, working on holographic 3DTV displays. Peter Kauff is the head of the Immersive Media & 3D Video Group in the Image Processing Department of the Fraunhofer HHI. He is with HHI since 1984 and has been involved in numerous German and European projects related to digital HDTV signal processing and coding, interactive MPEG-4based services, as well as in a number of projects related to advanced 3D video processing for immersive tele-presence and immersive media. Jinwoong Kim received the BS and MS degrees from the Department of Electronics Engineering of Seoul National University in 1981 and 1983, respectively. He received the PhD degree from the Department of Electrical and Computer Engineering of Texas A&M University in 1993. He joined ETRI in 1983 where he is currently a principal member of research staff. He has been involved in many R&D projects including TDX digital switching system, HDTV encoder system and chipset and MPEG-7 and MPEG-21 core algorithms and applications. He is now 3DTV project leader at ETRI. Metodi Kovachev received the MS degree in 1961 and the PhD degree in physics in 1982 from Sofia University. He has experience in optical design, stereo imaging, holography, optical processing, electronics (director and senior researcher at CLOSPI, Bulgarian Academy of Sciences, 1975–2005). Since then he is a senior researcher in the Electrical and Electronics Engineering Department of Bilkent University, working on holographic 3DTV displays. Alper Koz received the BS and MS degrees in Electrical and Electronics Engineering from Middle East Technical University, Ankara, Turkey in 2000 and 2002, respectively. He is currently a PhD student and a research assistant in the same department. His research interests include watermarking techniques for image, video, free view, and 3D television. Hyun Lee received the BS degree in electronics engineering from the Kyungpook National University, Korea in 1993 and the MS degree from KAIST (Korean Advanced Institute of Science and Technology), Korea in 1996. He enrolled in a doctoral course at KAIST in 2005. Since 1999, he has been with the Digital Broadcasting Research Division in ETRI (Electronics and Telecommunications Research Institute). His current interests include mobile multimedia broadcasting, digital communications and 3DTV systems. Wing Kai Lee holds a PhD degree in optical engineering from the University of Warwick. He is currently working as a research fellow in the Imaging and Displays Research Group, De Montfort University. His research interests include 3D display and imaging, high speed imaging, holography, optical non-contact measurements, digital video coding, and medical imaging.
XII
About the Authors
Marcus A. Magnor heads the Computer Graphics Lab at Technical University Braunschweig. He received his BA (1995) and MS (1997) in physics and his PhD (2000) in electrical engineering. He established the independent research group Graphics-Optics-Vision at the Max-Planck-Institut Informatik in Saarbr¨ ucken and received the venia legendi in computer science from Saarland University in 2005. His research interests entwine around the visual information processing pipeline. Recent and ongoing research topics include video-based rendering, 3DTV, as well as astrophysical visualization. Aydemir Memi¸soˇ glu works as a software engineer at Havelsan AS¸., Ankara, Turkey. He received his BS and MS degrees in computer engineering from Bilkent University, Ankara, Turkey in 2000 and 2003, respectively. His research interests include different aspects of computer graphics, specifically human modeling and animation. Philipp Merkle received the Dipl-Ing degree from the Technical Univerity of Berlin, Germany in 2006. He has been with Fraunhofer HHI since 2003. His research interests are mainly in the field of representation and coding of free viewpoint and multi-view video, including MPEG standardization activities. Karsten M¨ uller received the Dr-Ing degree in electrical engineering and the Dipl-Ing degree from the Technical Univerity of Berlin, Germany, in 2006 and 1997 respectively. He has been with the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, Berlin since 1997. His research interests are mainly in the field of representation, coding and reconstruction of 3D scenes. He has been involved in MPEG standardization activities. Andrey Norkin received his MS degree in computer science from the Ural State Technical University, Russia in 2001 and LicTech degree in signal processing from Tampere University of Technology, Finland in 2005. Currently he is a researcher at the Institute of Signal Processing, Tampere University of Technology, where he is working towards his PhD degree. His research interests include image and video coding, error resilience of compressed images, video, and 3D meshes. Alex Olwal is a PhD candidate at KTH (The Royal Institute of Technology) in Stockholm. He was a visiting researcher at Columbia University in 2001–2003 and at UC Santa Barbara in 2005. Alex’s research focuses on interaction techniques and novel 3D user interfaces, such as augmented and mixed reality. His research interests include multimodal interaction, interaction devices, and ubiquitous computing. Levent Onural received his PhD in electrical and computer engineering from SUNY at Buffalo in 1985 (BS, MS from METU) and is presently a full professor at Bilkent University. He received a TUBITAK award in 1995 and an IEEE Third Millenium Medal in 2000, and is an associate editor of IEEE Transactions on Circuits and Systems for Video Technology. Currently he is leading the European Commission funded 3DTV Project as the coordinator.
About the Authors
XIII
J¨ orn Ostermann studied electrical engineering and communications engineering. Since 2003 he is full professor and head of the Institut f¨ ur Informationsverarbeitung at the Leibniz Universit¨ at Hannover, Germany. He is a fellow of the IEEE and a member of the IEEE Technical Committee on Multimedia Signal Processing and past chair of the IEEE CAS Visual Signal Processing and Communications (VSPC) Technical Committee. His current research interests are video coding and streaming, 3D modeling, face animation, and computer-human interfaces. Haldun M. Ozaktas received a PhD degree from Stanford University in 1991. He joined Bilkent University, Ankara in 1991, where he is presently professor of electrical engineering. In 1992 he was at the University of ErlangenN¨ urnberg as an Alexander von Humboldt Fellow. In 1994 he worked as a consultant for Bell Laboratories, New Jersey. He is the recipient of the 1998 ICO International Prize in Optics and the Scientific and Technical Research Council of Turkey Science Award (1999), and a member of the Turkish Academy of Sciences and a fellow of the Optical Society of America. ¨ uc B¨ ulent Ozg¨ ¸ is a professor at the Department of Computer Engineering and the dean of the Faculty of Art, Design and Architecture, Bilkent University, Ankara, Turkey. He formerly taught at the University of Pennsylvania, USA, Philadelphia College of Arts, USA, and the Middle East Technical University, Turkey, and worked as a member of the research staff at the Schlumberger Palo Alto Research Center, USA. His research areas include different aspects of computer graphics and user interface design. He is a member of IEEE, ACM and IUA. Ismo Rakkolainen received a DrTech degree from the Tampere University of Technology, Finland in 2002. He has 26 journal and conference proceedings, articles, 1 book chapter, 4 patents and several innovation awards. His primary research interests include 3D display technology, 2D mid-air displays, virtual reality, interaction techniques and novel user interfaces. Tarik Reyhan received his BS and MSdegree from the Electrical Engineering Department of METU in 1972 and 1975, respectively. He received his PhD degree from the Electrical Engineering Department of the University of Birmingham, UK in 1981. He worked in ASELSAN from 1981 to 2001. He joined the Electrical and Electronics Engineering Department of Bilkent University in 2001. His area of interests include R&D management, telecommunications, RF design, electronic warfare and night vision. Simeon Hristov Sainov is a professor in Central Laboratory of Optical Storage and Processing of Information, Bulgarian Academy of Sciences, and the head of the Holographic and Optoelectronic Investigations Group. He has graduated from St. Petersburg State University, Faculty of Physics. His major fields of scientific interest are physical optics, near-field optics, holography and laser refractometry.
XIV
About the Authors
Ventseslav Sainov is the director of the Central Laboratory of Optical Storage and Processing of Information of the Bulgarian Academy of Sciences. He has expertise in the fields of light sensitive materials, holography, holographic and laser interferometry, shearography, non-destructive testing, 3D micro/macro measurements, and optical and digital processing of interference patterns. Hans-Peter Seidel is the scientific director and chair of the Computer Graphics Group at the Max-Planck-Institut (MPI) Informatik and a professor of computer science at Saarland University. He has received grants from a wide range of organizations, including the German National Science Foundation (DFG), the German Federal Government (BMBF), the European Community (EU) and NATO. In 2003 Seidel was awarded the Leibniz Preis, the most prestigious German research award, from the German Research Foundation (DFG). Ian Sexton holds a PhD on 3D displays architecture and his research interests include 3D display systems, computer architecture, computer graphics, and image processing. He founded the Imaging and Displays Research Group at De Montfort University and is an active member of the SID and sits on its UK and Ireland Chapter Committee. Aljoscha Smolic received the Dr-Ing. degree from Aachen University of Technology in 2001. He is a scientific project manager at the Fraunhofer HHI and an adjunct professor at the Technical University of Berlin. His research interests include video processing and coding, computer graphics, and computer vision. He has been leading MPEG standards activities for 3D video. Ralf Sondershaus Ralf Sondershaus received his Diplom (MS) degree in computer science from the University of T¨bingen in 2001. From 2001 until 2003, he worked in the field of geographic visualization. From 2003, he was a PhD candidate at the Department for Graphical-Interactive Systems (GRIS) at the University of T¨ ubingen where he received his PhD in 2007. His research interests include compact multi-resolution models for huge surface and volume meshes, volume visualization and geographic information systems (GIS). Nikolce Stefanoski studied mathematics and computer science with telecommunications as field of application. Since January 2004 he has been working toward a PhD degree at the Institut f¨ ur Informationsverarbeitung of the Leibniz Universit¨ at Hannover, Germany. His research interests are coding of time-variant 3D geometry, signal processing, and stochastics. Elena Stoykova is a member of the SPIE and scientific secretary of the Central Laboratory of Optical Storage and Processing of Information of the Bulgarian Academy of Sciences. She has an expertise in the fields of interferometry, diffraction optics, digital signal processing, Monte-Carlo simulation. She is the author of more than 100 publications in scientific journals and proceedings.
About the Authors
XV
Phil Surman holds a PhD on 3D television displays from De Montfort University. He has been conducting independent research for many years on 3D television. He helped to instigate several 3DTV European projects and is currently working on multi-viewer 3DTV displays at the Imaging and Displays Research Group. A. Murat Tekalp received the PhD degree in electrical, computer and systems engineering from Rensselaer, Troy, New York in 1984. He has been with Eastman Kodak Company (1984–1987) and University of Rochester, New York (1987–2005), where he has been promoted to Distinguished University Professor. Since 2001, he is a professor at Koc University, Istanbul, Turkey. He has been selected a Distinguished Lecturer by IEEE Signal Processing Society and is a fellow of IEEE. Christian Theobalt is a postdoctoral researcher at the MPI Informatik in Saarbr¨ ucken, Germany and head of the junior research group 3D Video and Vision-based Graphics within the Max-Planck Center for Visual Computing and Communication. His research interests include free-viewpoint and 3D video, markerless optical motion capture, 3D computer vision, and imageand physics-based rendering. George A. Triantafyllidis received the Diploma and PhD degrees from the Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, Greece in 1997 and 2002, respectively. He has been with Informatics and Telematics Institute, Thessaloniki, Greece from 2000 to 2004 as a research associate and since 2004 as a senior researcher. His research interests include 3D data processing, medical image communication, multimedia signal processing, image analysis and stereo image sequence coding. Libor V´ aˇ sa graduated from the University of West Bohemia in 2004 with a specialisation in computer graphics and data visualisation. Currently he is working towards his PhD degree in the Computer Graphics Group at the University of West Bohemia in the field of dynamic mesh compression and simplification. John Watson was appointed to a chair (professorship) in optical engineering at Aberdeen University in 2004. His research interests include underwater holography, subsea laser welding, laser induced spectral analysis and optical image processing. He is an elected member of the administrative committee of OES and a fellow of the IET and IOP. Thomas Wiegand is the head of the Image Communication Group in the Image Processing Department of Fraunhofer HHI. He received the Dipl-Ing degree in electrical engineering from the Technical University of HamburgHarburg, Germany in 1995 and the Dr-Ing degree from the University of Erlangen-Nuremberg, Germany in 2000. He is associated rapporteur of ITU-T VCEG, associated rapporteur/co-chair of the JVT, and associated chair of MPEG Video.
XVI
About the Authors
Mehmet S ¸ ahin Ye¸sil is a PhD student at the Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey. He received his BS and MS degrees in computer engineering from Bilkent University, Ankara, Turkey in 2000 and 2003, respectively. His research interests are computer graphics, cryptography, and computer and network security. He works as an officer at the Turkish Air Forces. Kugjin Yun received the BS and MS degrees in computer engineering from Chunbuk National University of Korea in 1999 and 2001. He joined the Electronics and Telecommunications Research Institute (ETRI) in 2001, and he is currently with the Broadcasting System Research Group. His research interests include the 3D T-DMB and 3DTV broadcasting systems. Xenophon Zabulis received his BA, MS and PhD degrees in computer science degree from the University of Crete in 1996, 1998, and 2001, respectively. He has worked as a postdoctoral fellow at the Computer and Information Science Department, at the interdisciplinary General Robotics, Automation, Sensing and Perception Laboratory and at the Institute for Research in Cognitive Science, both at the University of Pennsylvania. He is currently a research fellow at the Institute of Informatics and Telematics, Centre for Research and Technology Hellas, Thessaloniki, Greece.
Contents
1 Three-dimensional Television: From Science-fiction to Reality Levent Onural and Haldun M. Ozaktas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2 A Backward-compatible, Mobile, Personalized 3DTV Broadcasting System Based on T-DMB Hyun Lee, Sukhee Cho, Kugjin Yun, Namho Hur and Jinwoong Kim . . . . 11 3 Reconstructing Human Shape, Motion and Appearance from Multi-view Video Christian Theobalt, Edilson de Aguiar, Marcus A. Magnor and Hans-Peter Seidel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4 Utilization of the Texture Uniqueness Cue in Stereo Xenophon Zabulis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 Pattern Projection Profilometry for 3D Coordinates Measurement of Dynamic Scenes Elena Stoykova, Jana Harizanova and Ventseslav Sainov . . . . . . . . . . . . . . 85 6 Three-dimensional Scene Representations: Modeling, Animation, and Rendering Techniques Uˇgur G¨ ud¨ ukbay and Funda Durupınar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 7 Modeling, Animation, and Rendering of Human Figures ¨ uc¸, Aydemir Memi¸soˇglu Uˇgur G¨ ud¨ ukbay, B¨ ulent Ozg¨ and Mehmet S ¸ ahin Ye¸sil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 8 A Survey on Coding of Static and Dynamic 3D Meshes Aljoscha Smolic, Ralf Sondershaus, Nikolˇce Stefanoski, Libor V´ aˇsa, Karsten M¨ uller, J¨ orn Ostermann and Thomas Wiegand . . . . . . . . . . . . . . . 239
XVIII Contents
9 Compression of Multi-view Video and Associated Data Aljoscha Smolic, Philipp Merkle, Karsten M¨ uller, Christoph Fehn, Peter Kauff and Thomas Wiegand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 10 Efficient Transport of 3DTV A. Murat Tekalp and M. Reha Civanlar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 11 Multiple Description Coding and its Relevance to 3DTV Andrey Norkin, M. Oguz Bici, Anil Aksay, Cagdas Bilen, Atanas Gotchev, Gozde B. Akar, Karen Egiazarian and Jaakko Astola . . 371 12 3D Watermarking: Techniques and Directions Alper Koz, George A. Triantafyllidis and A. Aydin Alatan . . . . . . . . . . . . 427 13 Solving the 3D Problem—The History and Development of Viable Domestic 3DTV Displays Phil Surman, Klaus Hopf , Ian Sexton, Wing Kai Lee and Richard Bates 471 14 An Immaterial Pseudo-3D Display with 3D Interaction Stephen DiVerdi, Alex Olwal, Ismo Rakkolainen and Tobias H¨ ollerer . . . . 505 15 Holographic 3DTV Displays Using Spatial Light Modulators Metodi Kovachev, Rossitza Ilieva, Philip Benzie, G. Bora Esmer, Levent Onural, John Watson, Tarik Reyhan . . . . . . . . . . . . . . . . . . . . . . . . . 529 16 Materials for Holographic 3DTV Display Applications Kostadin Stoyanov Beev, Kristina Nikolaeva Beeva and Simeon Hristov Sainov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 17 Three-dimensional Television: Consumer, Social, and Gender Issues Haldun M. Ozaktas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
1 Three-dimensional Television: From Science-fiction to Reality Levent Onural and Haldun M. Ozaktas Department of Electrical Engineering, Bilkent University, TR-06800 Bilkent, Ankara, Turkey
Moving three-dimensional images have been depicted in many science-fiction films. This has contributed to 3D video and 3D television (3DTV) to be perceived as ultimate goals in imaging and television technology. This vision of 3DTV involves a ghost-like, yet high quality optical replica of an object that is visually indistinguishable from the original (except perhaps in size). These moving video images would be floating in space or standing on a tabletop-like display, and viewers would be able to peek or walk around the images to see them from different angles or maybe even from behind (Fig. 1.1). As such, this vision of 3DTV is quite distinct from stereoscopic 3D imaging and cinema. 3D photography, cinema, and TV actually have a long history; in fact, stereoscopic 3D versions of these common visual media are almost as old as their 2D counterparts. Stereoscopic 3D photography was invented as early as 1839. The first examples of 3D cinema were available in the early 1900s. Various forms of early 2D television were developed in the 1920s and by 1929, stereoscopic 3DTV was demonstrated. However, while the 2D versions of photography, cinema, and TV have flourished to become important features of twentieth century culture, their 3D counterparts have almost disappeared since their peak around 1950. Our position is that this was not a failure of 3D in itself, but a failure of the then only viable technology to produce 3D, namely stereoscopy (or stereography). Stereoscopic 3D video is primarily based on the binocular nature of human perception, and it is relatively easy to realize. Two simultaneous conventional 2D video streams are produced by a pair of cameras mimicking the two human eyes, which see the environment from two slightly different angles. Then, one of these streams is shown to the left eye, and the other one to the right eye. Common means of separating the right-eye and left-eye views are glasses with colored transparencies or polarization filters. Although the technology is quite simple, the necessity to wear glasses while viewing has often been considered as a major obstacle in front of wide acceptance of 3DTV. But perhaps more importantly, within minutes after the onset of viewing, stereoscopy frequently causes eye fatigue and feelings similar to that experienced during motion
2
L. Onural and H. M. Ozaktas
Fig. 1.1. Artist’s vision of three-dimensional television. (graphic artist: Erdem Y¨ ucel)
sickness, caused by a mismatch of perceptory cues received by the brain from different sensory sources. Recently, with the adoption of digital technologies in all aspects of motion picture production, it has become possible to eliminate some of the factors which result in eye fatigue. This development alone makes it quite probable for stereoscopic 3D movies to be commonplace within a matter of years. Nevertheless, some intrinsic causes of fatigue may still remain as long as stereoscopy remains the underlying 3D technology. Stereoscopic 3D displays are similar to conventional 2D displays: a vertical screen or a monitor produces the two video channels simultaneously, and special glasses are used to direct one to the left eye and the other to the right eye. In contrast, autostereoscopic monitors are novel display devices where no special glasses are required. Covering the surface of a regular high-resolution digital video display device with a vertical or slanted lenticular sheet, and driving these monitors by so-called interzigged video, one can deliver the two different scenes to the left and the right eyes of the viewer, provided that the viewer stays in the correct position. (A lenticular sheet is essentially a transparent film or sheet of plastic with a fine array of cylindrical lenses. The ruling of the lenses can either be aligned or slanted with respect to the axes of the display.) Barrier technology is another way of achieving autostereoscopy: electronically generated fence-like optical barriers coupled with properly interzigged digital pictures generate the two or more different views required. It is possible to provide many more views than the two views of classical stereoscopy, by using the autostereoscopic approach in conjunction with slanted lenticular sheets or barrier technology. Up to nine views are common, creating
1 Three-dimensional Television: From Science-fiction to Reality
3
horizontal parallax with a viewing angle of about 20 degrees. Classical stereoscopy with its two views is not able to yield parallax in response to head movement. People watching three-dimensional scenes expect occlusion and disocclusion effects when they move with respect to the scene; certain parts of objects should appear and disappear as one moves around. This is not possible with two fixed views, producing an unnatural result if the observer is moving. Head-tracking autostereoscopic display devices have been developed to avoid this viewer position constraint; however, serving many users at the same time remains a challenge. Free viewpoint video (FVV) functionality is another approach to allowing viewer movement. It offers the same functionality familiar from threedimensional computer graphics. The user can choose a viewpoint and viewing direction within a visual scene interactively. In contrast to pure computer graphics applications which deal with synthetic images, FVV deals with real world scenes captured by real cameras. As in computer graphics, FVV relies on a certain three-dimensional representation of the scene. If from that threedimensional representation, a virtual view (not an available camera view), corresponding to an arbitrary viewpoint and viewing direction can be rendered, free viewpoint video functionality will have been achieved. In most cases, it will be necessary to restrict to some practical limits the navigation range (the allowed virtual viewpoints and viewing directions). Rendering stereo pairs from the three-dimensional representation not only provides three-dimensional perception, but also supports natural head-motion parallax. Despite its drawbacks, stereoscopic 3D has found acceptance in some niche markets such as computer games. Graphics drivers that produce stereo video output are freely available. With the use of (very affordable) special glasses, ordinary personal computers can be converted into three-dimensional display systems, allowing three-dimensional games to be played. Stereo video content is also becoming available. Such content is either originally captured in stereo (such as in some commercially available movies) or is converted from ordinary two-dimensional video. Two-dimensional to three-dimensional conversion is possible with user-assisted production systems, and is of great interest for content owners and producers. Stereoscopic 3D, whether in its conventional form as in the old stereoscopic cinema, or in its more modern forms involving autostereoscopic systems, falls far from the vision of true optical replicas that have been outlined at the beginning of this chapter. To circumvent the many problems and shortcomings of stereoscopy in a radical manner, it seems necessary to abandon the basic binocular basis of stereoscopy, and by turning to basic physical principles, to focus on the goal of true optical reconstruction of optical wave fields. Optically sensitive devices, including cameras and human eyes, do not “reach out” to the environment or the objects in them; they merely register the light incident on them. The light registered by our eyes, which carries the information about the scene, is processed by our visual system and brain, and thus we perceive our environment. Therefore, if the light field which fills a given 3D region can be
4
L. Onural and H. M. Ozaktas
recorded with all its physical attributes, and then recreated from the recording in the absence of the original object or scene, any optical device or our eyes embedded in this recreated light field will “see” the original scene, since the light incident on the device or our eyes will be essentially indistinguishable in both cases. This is the basic principle of holography, which is a technique known since 1948. Holography is distinct from ordinary photography in that it involves recording the entire optical field, with all its attributes, rather than merely its intensity or projection (“holo” in holography refers to recording of the “whole” field). As expected, the quality of the holographic recording and reconstruction process will directly affect the fidelity of the created ghostlike images to their originals. Digital holography and holographic cinema and TV are still in their infancy. However, advances in optical technology and computing power have brought us to the point where we can seriously consider making this technology a reality. It seems highly likely that high quality 3D viewing will be possible as the underlying optics and electronics technologies mature. Integral imaging (or integral photography) is an incoherent 3D photographic technique which has been known since 1905. In retrospect, the technique of integral imaging can also be classified as a kind of holography, since this technique also aims to record and reproduce the physical light distribution. The basic principle is to record the incidence angle distribution of the incoming light at every point of recording, and then regenerate the same angular illumination distribution by proper back projection. The same effect is achieved in conventional holography by recording the phase and amplitude information simultaneously, instead of the intensity-only recording of conventional photography. The phase information is recorded using interference, and therefore, holographic recordings require coherent light (lasers). Intensity recording, such as with common optical emulsion or digital photography, loses the direction information. It is helpful to keep in mind the distinction between 3D displays and 3D television (3DTV). We use the term 3D display to refer to imaging devices which create 3D perception as their output. 3DTV refers to the whole chain of 3D image acquisition, encoding, transport/broadcasting, reception, as well as display. We have so far mostly discussed the display end of 3DTV technology. An end-to-end 3DTV system requires not only display, but also capture and transmission of the 3D content. Some means of 3D capture were already implicit in our discussion of displays. For example, stereoscopic 3DTV involves a stereoscopic camera, which is nothing but two cameras rigidly mounted side by side with appropriate separation. The recording process in integral imaging is achieved using microlens arrays, whereas holographic recording employs coherent light and is based on optical interference. In these conventional approaches, the modality of 3D image capture is directly related to that of 3D image reconstruction, with the reconstruction process essentially amounting to reversal of the capture process. In contrast, current research in 3DTV is targeting a quite different approach in which the input capture and output display
1 Three-dimensional Television: From Science-fiction to Reality
5
modalities are completely decoupled and bridged by digital representation and processing. In recent years, tremendous efforts has been invested worldwide to develop convincing 3DTV systems, algorithms, and applications. This includes improvements over the whole processing chain, including image acquisition, three-dimensional representation, compression, transmission, signal processing, interactive rendering, and display (Fig. 1.2). The overall design has to take into account the strong interrelations between the various subsystems. For instance, an interactive display that requires random access to threedimensional data will affect the performance of a coding scheme that is based on data prediction. The choice of a certain three-dimensional scene representation format is of central importance for the design of any 3DTV system. On the one hand, it sets the requirements for acquisition and signal processing. On the other hand, it determines the rendering algorithms, degree of and mode of interactivity, as well as the need for and means of compression and transmission. Various three-dimensional scene representations are already known from computer graphics and may be adapted to 3DTV systems as well. These include different types of data representations, such as three-dimensional mesh models, multiview video, per-pixel depth, or holographic data representations. Different capturing systems which may be considered include multi-camera systems, stereo cameras, lidar (depth) systems, or holographic cameras. Different advanced signal processing algorithms may be involved on the sender
Fig. 1.2. Functional blocks of an end-to-end 3DTV system (from L. Onural, H. M. Ozaktas, E. Stoykova, A. Gotchev, and J. Watson, An overview of the holographic display related tasks within the European 3DTV project, in Photon Management II: SPIE Proceedings 6187, 2006)
6
L. Onural and H. M. Ozaktas
side, including three-dimensional geometry reconstruction, depth estimation, or segmentation, in order to transform the captured data into the selected three-dimensional scene representation. Specific compression algorithms need to be applied for the different data types. Transmission over different channels requires different strategies. The vast amount of data and user interaction for FVV functionality essential to many systems complicates this task even further. On the receiver side, the data needs to be decoded, rendered, and displayed. In many cases this may require specific signal conversion and display adaptation operations. Interactivity needs to be taken care of. Finally, the images need to be displayed. Autostereoscopic displays have already been mentioned, but there are also other more ambitious types of displays. Such displays include volumetric displays, immersive displays and, of course, holographic displays. For those who have set their eyes on the ambitious applications of three-dimensional imaging, the fully-interactive, full parallax, high-resolution holographic display is the ultimate goal. Whether or not this is achievable depends very much on the ability to efficiently handle the vast amounts of raw data required by a full holographic display and the ability to exploit the rapid developments in optical technologies. Current end-to-end 3DTV systems require tightly coupled functional units: the display and the capture unit must be designed together, and therefore, compression algorithms are also quite specific to the system. However, it is quite likely that in future 3DTV systems, the techniques for 3D capture and 3D display will be totally decoupled from each other. It is currently envisioned that the information provided by the capture device will provide the basis for the computerized synthesis of the 3D scene. This synthesis operation will heavily utilize 3D computer graphics techniques (which are commonly used in computer animations) to assemble 3D scene information from multiple-camera views or other sets of complementary data. However, instead of synthetic data, the 3D scene information will be created from a real-life scene. Many techniques have been developed for the capture of 3D scene information. A common technique is based on shooting the scene simultaneously from different angles using multiple conventional 2D cameras. Camera arrays with up to 128 cameras have been discussed in the literature. However, acceptable quality 3D scene information can be captured by using a much smaller number of cameras, especially if the scene is not too complex. The synthesized 3D video, created from the data provided by the capture unit, can then be either transmitted or stored. An important observation is that, 3D scenes actually carry much less information than one may initially think. The difference between 2D images and 3D images is not so much like the difference between a 2D array and a 3D array of numbers, since most objects are opaque and in any event, our retinas are two-dimensional detectors. The difference is essentially the additional information associated with depth and parallax. Therefore, 3D video is highly compressible. Special purpose compression techniques have already been reported in the literature and research
1 Three-dimensional Television: From Science-fiction to Reality
7
in this area is ongoing. Transmission of such data is not too different than transmission of conventional video. For example, video streaming techniques which are commonly used over the Internet can easily be adapted to the 3D case. Nevertheless, such adaptation does require some care as the usability of incomplete 3D video data is totally different than the usability of incomplete 2D video, and packet losses are common in video streaming. In order for the display to show the 3D video, the received data in abstract form must first be translated into driving signals for the specific 3D display device to be used. In some cases, this can be a challenging problem requiring considerable processing. Development of signal processing techniques and algorithms for this purpose is therefore crucial for successful realization of 3DTV. Decoupling of image acquisition and display is advantageous in that it can provide complete interoperability by enabling the display of the content on totally different display devices with different technologies and capabilities. For instance, it may be possible to feed the same video stream to a highend holographic display device, a low-end stereoscopic 3D monitor, or even a regular 2D monitor. Each display device will receive the same content, but will have a different signal processing interface for the necessary data conversion. In the near future, it is likely that multiview video will be the common mode of 3DTV delivery. In multiview video, a large amount of 2D video data, captured in parallel from an array of cameras shooting the same scene from different angles, will be directly coded by exploiting the redundancy of data, and then streamed to the receiver. The display at the receiving end, at least in the short term, will then create the 3D scene autostereoscopically. (In the long term, the autostereoscopic display may be replaced with volumetric or holographic displays.) Standardization activities for such a 3DTV scheme are well underway under the International Organization for Standardization Moving Picture Experts Group (ISO MPEG) and International Telecommunication Union (ITU) umbrellas. Countless applications of 3D video and 3DTV have been proposed. In addition to household consumer video and TV, there are many other consumer applications in areas such as computer games and other forms of entertainment, and video conferencing. Non-consumer applications include virtual reality applications, scientific research and education, industrial design and monitoring, medicine, art, and transportation. In medicine, 3DTV images may aid diagnosis as well as surgery. In industry, they may aid design and prototyping of machines or products involving moving parts. In education and science, they may allow unmatched visualization capability. Advances in this area will also be closely related to advances in the area of interactive multimedia technologies in general. While interactivity is a different concept from three-dimensionality, since both are strong trends, it is likely they will overlap and it will not be surprising if the first 3DTV products also feature a measure of interactivity. Indeed, since interactivity may also involve immersion into the scene and three-dimensionality is an important aspect of
8
L. Onural and H. M. Ozaktas
the perception of being immersed in a scene, the connections between the two trends may be greater than might be thought at first. Although the goals are clear, there is still a long way to go before we have widespread commercial high-quality 3D products. A diversity of technologies are necessary to make 3DTV a reality. Successful realization of such products will require significant interdisciplinary work. The scope of this book reflects this diversity. To better understand where each chapter fits in, it is helpful to again refer to the block diagram in Fig. 1.2. Chapter 2 presents a novel operational end-to-end prototype 3DTV system with all its functional blocks. The system is designed to operate over a terrestrial Digital Media Broadcast (T-DMB) infrastructure for delivery to mobile receivers. Chapters 3, 4, and 5 deal with different problems and approaches associated with the capture of 3D information. In Chap. 3, a novel 3D human motion capture system, using simultaneous multiple video recordings, is presented after an overview of various human motion capture systems. Chapter 4 shows that it is possible to construct 3D objects from stereo data by utilizing the texture information. A totally different 3D shape capture technique, based on pattern projection, is presented in detail in Chap. 5. Representation of dynamic 3D scenes is essential especially when the capture and display units are decoupled. In decoupled operation, the data captured by the input unit is not directly forwarded to the display; instead, an intermediate 3D representation is constructed from the data. Then, this representation is used for display-specific rendering at the receiving end. Chapters 6 and 7 present examples of representation techniques within an end-to-end 3DTV system. In Chap. 6, a detailed overview of modeling, animation, and rendering techniques for 3D are given. Chapter 7, on the other hand, details a representation for the more specific case where the object is a moving human figure. Novel coding or compression techniques for 3DTV are presented in Chaps. 8 and 9. Chapter 8 deals specifically with the compression of 3D dynamic wiremesh models. Compression of multi-view video data is the focus of Chap. 9, which provides the details of an algorithm which is closely related to ongoing standardization activities. Transport (transmission) of 3DTV data requires specific techniques which are distinct from its 2D counterpart. Issues related to streaming 3D video are discussed in Chap. 10. Chapter 11 discusses the adaptation of the multiple description coding technique to 3DTV. Watermarking of conventional images and video has been widely discussed in the literature. However, the nature of 3D video data requires novel watermarking techniques specifically designed for such data. Chapter 12 discusses 3D watermarking techniques and proposes novel approaches for this purpose. Different display technologies for 3DTV are presented in Chaps. 13, 14, and 15. Chapter 13 gives a broad overview of the history of domestic 3DTV displays together with contemporary solutions. Chapter 14 describes an
1 Three-dimensional Television: From Science-fiction to Reality
9
immaterial pseudo-3D display with 3D interaction, based on the unique commercial 2D floating-in-the-air fog-based display. Chapter 15 gives an overview and the state-of-the-art of spatial light modulator based holographic 3D displays. Chapter 16 discusses in detail the physical and chemical properties of novel materials for dynamic holographic recording and 3D display. Finally, the last chapter discusses consumer, social, and gender issues associated with 3DTV. We believe that early discussion and investigation of these issues are important for many reasons. Discussion of consumer issues will help evaluation of the technologies and potential products and guide developers, producers, sellers, and consumers. Discussion of social and gender issues may help shape public decision making and allow informed consumer choices. We believe that it is both an ethical and a social responsibility for scientists and engineers involved in the development of a technology to be aware of and contribute to awareness regarding such issues. We believe that this collection of chapters provides a good coverage of the diversity of topics that collectively underly the modern approach to 3DTV. Though it is not possible to cover all relevant issues in a single book, we believe this collection provides a balanced exposure for those who want to understand the basic building blocks of 3DTV systems from a broad perspective. Readers wishing to further explore the areas of 3D video and television may also wish to consult four recent collections of research results [1, 2, 3, 4] as well as a series of elementary tutorials [5]. Parts of this chapter appeared in or were adapted from [6] and [7]. This work is supported by the EC within FP6 under Grant 511568 with the acronym 3DTV.
References 1. 3D Videocommunication: Algorithms, Concepts and Real-Time Systems in Human Centred Communication. O. Schreer, P. Kauff, and T. Sikora, editors. Wiley, 2005. 2. Three-Dimensional Television, Video, and Display Technologies. B. Javidi and F. Okano, editors. Springer, 2002. 3. Special issue on three-dimensional video and television. M. R. Civanlar, J. Ostermann, H. M. Ozaktas, A. Smolic, and J. Watson, editors. Signal Processing: Image Communication, Vol. 22, issue 2, pp. 103–234, February 2007. 4. Special issue on 3-D technologies for imaging and display. B. Javidi and F. Okano, editors. Proceedings of the IEEE, Vol. 94, issue 3, pp. 487–663, March 2006. 5. K. Iizuka. Welcome to the wonderful world of 3D (4 parts). Optics and Photonics News, Vol. 17, no. 7, p. 42, 2006; Vol. 17, no. 10, p. 40, 2006; Vol. 18, no. 2, p. 24, 2007; Vol. 18, no. 4, p. 28, 2007. 6. L. Onural. Television in 3-D: What are the prospects? Proceedings of the IEEE, Vol. 95, pp. 1143–1145, 2007. 7. M. R. Civanlar, J. Ostermann, H. M. Ozaktas, A. Smolic, and J. Watson. Special issue on three-dimensional video and television (guest editorial). Signal Processing: Image Communication, Vol. 22, pp. 103–107, 2007.
2 A Backward-compatible, Mobile, Personalized 3DTV Broadcasting System Based on T-DMB Hyun Lee, Sukhee Cho, Kugjin Yun, Namho Hur and Jinwoong Kim Electronics and Telecommunications Research Institute (ETRI) 161 Gajeong-dong, Yuseong-gu, Daejeon, 305-350 Republic of Korea
2.1 Introduction Mobile reception of broadcasting services has recently got much attention worldwide. Digital multimedia broadcasting (DMB), Digital video broadcasting-handheld (DVB-H), MediaFLO are such examples. Among them, commercial service of terrestrial DMB (T-DMB) was launched in Korea for the first time to provide mobile multimedia services in 2005. Telecommunication Technology Association (TTA) of Korea and ETSI in Europe established a series of specification for T-DMB video and data services based on the Eureka147 digital audio broadcasting (DAB) system [1, 2, 3, 4]. Multimedia-capable mobile devices are becoming core of the portfolio of various multimedia information generation and consumption platforms. People are expected to depend more and more on mobile devices for their access and use of multimedia information. Various acquisition and display technologies are also being developed to meet ever-increasing demand of users for higher quality multimedia content. Reality is one of the major reference for judging the high quality, and there have been a lot of research activities on 3DTV and ultra-high definition TV(UDTV) concepts and systems [12, 13, 14]. UDTV is a result of a research direction for providing immersive reality with ultra-high resolution of the image, wider coverage of color gamut, and big screen for frame-less feeling. On the other hand, research of 3DTV is pursuing the direction of providing the feeling of depth, especially exploiting the human stereopsis. 3D display without need of wearing glasses, giving natural feeling of depth without eye fatigue has long been the goal of 3DTV researchers, which is still a very challenging area. Even though perfect 3D display is yet a far way to go, as a result of recent developments of display technology, we can easily implement high quality autostereoscopic 3D display on small size multimedia devices with reasonable cost overhead. Providing mobility and increased reality in multimedia information services is thus a promising direction for the future. Specifically, 3D AV service
12
H. Lee et al.
over T-DMB system is attractive due to the facts that (1) glassless 3D viewing with small display is relatively easy to implement and more suitable to single user environment like T-DMB, (2) T-DMB is a new media and thus has more flexibility in adding new services on existing ones, (3) 3D AV handling capability of 3D T-DMB terminals has lots of potential to generate new types of services if it is added with other components like built-in stereo camera. We believe that a portable, personal 3D T-DMB system will be a valuable stepping stone towards realizing the ideal 3DTV for multi-users at home. In this paper, we investigate various issues on implementing 3D AV services on the T-DMB system. There are four major issues on the new system development: (1) content creation, (2) compression and transmission, (3) display, (4) service scenarios and business models. We first look into system and functional requirements of the 3D T-DMB system, and then propose solutions for the cost-effective and flexible system implementation to meet such requirements. In Sect. 2.2, we overview T-DMB system in terms of system specification and possible services. In Sect. 2.3, 3D T-DMB requirements and major issues for implementation are described with efficient solutions for the issues. In Sect. 2.4, results of system implementation along with simulation results are presented. In Sect. 2.5, various service scenarios of 3D AV contents and business models are covered. Finally, we draw some conclusions in Sect. 2.6.
2.2 Overview of T-DMB System T-DMB system[3] is based on Eureka-147 DAB system1 especially as its physical layer. Due to the robustness of coded orthogonal frequency division multiplexing (COFDM) technology to multi-path fading, mobile receivers moving at high speed can reliably receive multimedia data which are sensitive to channel errors[5]. Enhancements from Eureka-147 DAB for video service are achieved by highly efficient source coding technology for compressing multimedia data, object-based multimedia handling technology for interactivity, synchronization technology among video, audio, and auxiliary data stream, multiplexing technology for multiplexing media streams, and enhanced channel error correction technology.
1
The European countries started the research about the DAB technology by establishing the Eureka-147 joint project from the end of 1980s. The objective of the project was to develop DAB system that enables to provide audio service of high-quality with mobile reception. In 1994 ETSI (European Telecommunications Standards Institute) adopted the basic DAB standard ETSI EN 300 401[1]. The ITU-R issued Recommendations (BS.1114) and (BO.1130) relating to satellite and terrestrial digital audio broadcasting, recommending the use of Eureka-147 DAB mentioned as “Digital System A” in 1994.
2 3D Broadcasting System Based on T-DMB
13
2.2.1 T-DMB Concept The core concepts of T-DMB are personality, mobility and interactivity. Personality means that T-DMB can provide individual user with personal services by using portable devices (mobile phone, PDA, laptop PC and DMB receiver). Mobility is another important concept of T-DMB, which means that it can offer seamless reception of broadcasting contents at anytime, anywhere. The last but not least is interactivity which enables to serve bidirectional services linked with mobile communication network. The examples of this service are pay per view (PPV), on-line shopping, and Internet service. 2.2.2 T-DMB Protocol Stack Broadcasting protocol being supported in T-DMB is shown in Fig. 2.1. T-DMB accommodates audio and data service as well as video service including MPEG-2 and MPEG-4 technologies. For data service, Eureka-147 DAB supports a variety of transport protocols such as multimedia object transfer (MOT), IP tunneling, and transparent data channel (TDC)[2]. On these protocols, DAB system can transport various data information such as: •
•
Program-associated data (PAD): text information related to audio program such as audio background facts, a menu of future broadcasts and advertising information as well as visual information such as singer images or CD cover; Non-program-associated data (NPAD): travel and traffic information, headline news, stock marked price or weather forecasting information among other things.
2.2.3 Video Services The main service of T-DMB is video services rather than basic CD quality audio and data service. Figure 2.2 shows how various types of services are
2 Multi ch ch
MCI & SI
DLS TDC
MOT
IP tunneling
TDC
MPEG-4 A/V
TMC EWS
MPEG-4 BIFS
MPEG-4 SL Audio
PAD
NPAD
MPEG-2 TS FEC (RS)
FIDC
Audio service
Data service
FIC
MSC DAB (Eureka-147) T-DMB
Fig. 2.1. T-DMB broadcasting protocol stack [17]
Video service
14
H. Lee et al. DAB (ETSI EN 300 401) FIC Data service Multiplexer Control Data
Fast Information (FIC) Path
Audio Program Service
DAB Audio Frame DAB Audio Framepath Path
Packet Data Service
Packet Mode Data Path
Video Multiplexer
Stream Mode
Optional Conditional Access Scrambler
Energy Dispersal Scrambler
Convolutional Encoder
Main Service (MSC) Multiplexer
Video Service
Service Information (SI) Path
Time Interleaver
CIFs
Transmission Frame Multiplexer
Service Information
OFDM Signal Generator
Video Transmission Signal
Fig. 2.2. The conceptual transmission architecture for the video service2
composed for transmission. There are two transmitting modes for visual data service. One is packet mode data channel and the other is stream mode data channel. • •
Packet mode: It is the basic data transport mechanism. The data are organized in data groups which consist of header, a data field of up to 8191 bytes and optionally a cyclic redundancy check (CRC); Stream mode: It can provide a constant data rate of a multiple of 8 kbps or 32 kbps depending on coding profiles. T-DMB video service data are normally carried in stream mode2 .
Details of constructing the video service are shown in Fig. 2.3, which is the internal structure of the video multiplexer in Fig. 2.2. The video, audio, and the auxiliary data information which make up a video service, are multiplexed into an MPEG-2 TS and outer error correction bits are added. The multiplexed and outer-coded stream is transmitted by the stream mode data channel. The initial object descriptor (IOD) generator creates IOD and the object descriptor (OD)/binary format for scene description (BIFS) generator creates OD/BIFS streams that comply with the ISO/IEC 14496-1[6]. Advanced video coding (AVC, MPEG-4 Part 10), which has high coding efficiency for multimedia broadcasting service at a low data transfer rate, is used to encode video contents and bit sliced arithmetic coding (BSAC) is used to encode audio contents. The video and audio encoders generate encoded bit streams compliant with the AVC Baseline profile and BSAC, respectively. BIFS is also 2
c c European Telecommunications Standards Institute 2006. European Broadcasting Union 2006. Further use, modification, redistribution is strictly prohibited. ETSI standards are available from http://pda.etsi.org/pda/ and http://www.etsi.org/services products/freestandard/home.htm
2 3D Broadcasting System Based on T-DMB IOD Data
IOD Generator
PES Packet
Outer Convolutional Interleaver
AUX Data SL Packet
MPEG-2 TS
Outer Encoder RS(204,188)
PES Packet
Video SL Packet
TS Multiplexer
Audio SL Packet
PES Packetizer
AUX Data ES
PES Packet
PES Packetizer
AUX Data
Audio ES
PSI Section 14496 Section
PES Packetizer
Audio Encoder
Video ES SL Packetizer
Video Encoder
OD/BIFS SL Packet
OD/BIFS Stream
Section Generator
OD/BIFS Generator
15
Fig. 2.3. The conceptual architecture of the video multiplexer for a video service2
adopted to encode an interactive data related to video contents. Each media stream is firstly encapsulated into MPEG-4 sync layer (SL) packet stream, compliant with ISO/IEC 14496-1 system standard[6]. The section generator creates sections compliant with ISO/IEC 13818-1[8] for the input IOD, OD, and BIFS. And then, each PES packetizer generates a PES packet stream compliant with ISO/IEC 13818-1 for each SL packet stream. The TS multiplexer combines the input sections and PES packet streams into a single MPEG-2 transport stream complying with ISO/IEC 13818-1. The MPEG-2 transport stream is encoded for forward error correction by using Reed Solomon coding and convolutional interleaving, and finally fed into the DAB sub-channel as a stream data service component.
2.3 Requirements and Implementation Issues of 3D T-DMB 2.3.1 3D T-DMB Requirements The requirements of 3D T-DMB system are as follows: (1) Backward compatibility: Like other broadcasting services, new 3D T-DMB service should also be backward-compatible with the existing 2D T-DMB. This means that users with 2D T-DMB terminals should be able to receive 3D T-DMB services and view the content with their 2D display. 3D T-DMB is based on the stereoscopic presentation of 3D visual scenes, and the video information can be easily represented with a reference 2D
16
H. Lee et al.
video plus some types of additional information. 2D T-DMB receivers can use only the reference video information for 2D representation. On the other hand, 3D T-DMB receivers can use both data to generate a stereoscopic video (left view image and right view image for each video frame) and render it onto 3D display. (2) Forward compatibility: This means that new 3D T-DMB terminals can receive 2D T-DMB services and view 2D visual information. This basically requires that display of 3D T-DMB terminals can be switched between 2D and 3D mode. The 3D T-DMB terminals should have 2D mode to function exactly as 2D T-DMB terminals. (3) 2D/3D switchable display: This is one of the essential requirements of 3D T-DMB system, not only for forward compatibility mentioned in (2) above, but also for providing various 2D/3D hybrid visual services. The latter will be explained in detail in Sect. 2.5. (4) Low transmission overhead: T-DMB system has limited bit budget for signal transmission. Table 2.1 shows the available bandwidth for each operating mode, and Table 2.2 shows typical service allocations of T-DMB broadcasters in Korea. There should be a trade-off relationship between lowering overhead bitrate and 3D visual quality presented to the users. Therefore, it is very important to use highly efficient video compression schemes as well as efficient multiplexing and transmission schemes. In addition to these requirements, other factors such as easiness of content creation, flexibility in adaptation to display evolution, and overall system safety in terms of ‘3D eye strain’ should also be carefully taken care of in order to make 3D T-DMB a platform for viable and long-lasting 3D services. 2.3.2 3D T-DMB System Architecture and Transport 3D T-DMB system provides stereoscopic video as well as 3D surround audio. Stereoscopic video can either be represented in a video plus corresponding depth information[7] or two (left and right) video. Though the former has advantages in terms of flexibility and efficiency, difficulty in acquiring accurate depth information for general scenes is still a big challenge. So, in our system video input is defined as two video signals which are to be compressed and transmitted. The system is designed to handle and carry 3D surround sound as well, which is rendered adaptively to speaker configurations of 3D T-DMB Table 2.1. Available bitrate for each protection level[1] Protection level
1-A 2-A 3-A 4-A 1-B 2-B 3-B 4-B
Convolutional coding rate 1/4 3/8 1/2 3/4 4/9 4/7 2/3 4/5 Available bitrate per 1 ensemble (kbps) 576 864 1152 1728 1024 1312 1536 1824
2 3D Broadcasting System Based on T-DMB
17
Table 2.2. Typical service allocation of T-DMB broadcasters in Korea (Protection Level: 3-A) Broadcaster
Number of Channels
Bitrate (kbps)
Contents
KBS
1 3 1 3 1 3 1 2 1 1 2 2
544 128 544 128 544 128 512 160 329 544 128 512/544
KBS1 TV Music, Business News MBC TV Radio, Business News SBS TV Radio, Traffic Information YTN TV Music, Traffic Information Data Korea DMB TV Music, Culture U1 Media TV/KBS2 TV
MBC SBS YTN DMB
Korea DMB U1 Media
Video Audios Video Audios Video Audios Video Audios Data Video Audios Videos
terminals (for example, stereo for portable devices and 5.1 channel in a car audio environment). Figure 2.4 shows the internal structure of 3D T-DMB transmitting server. The proposed 3D T-DMB media processor consists of 3D video encoding, 3D audio encoding, MPEG-4 Systems encoding, MPEG-4 over MPEG-2 encapsulator, and channel coding parts. The MPEG-4 Systems encoding part is modified from its 2D counterpart so that it can generate SL packets which include 3D T-DMB signals in backward compatible way. The MPEG-4 over MPEG-2 encapsulator converts SL packets to MPEG-2 TS packets. Note that the program specific information (PSI) is also utilized in making MPEG-2 TS packets in the same block. Now we will look into the crucial idea of representing AV objects in MPEG4 Systems in more detail. In a 3D T-DMB system, we have four AV objects in total, i.e., Vl , Va , As , and Aa , which are left video, additional video, stereo audio, and additional audio, respectively. To meet the backward compatibility, we propose a scheme of using two ODs, each OD consisting of two ESs. Additionally, two ODs are assumed to be independent, but two ESs in each OD are assumed to be dependent. The dependence of ESs can be indicated simply by assigning ‘TRUE’ boolean value to StreamDependenceFlag and by assigning the same ES− IDs to dependsOn− ES− ID in ES− Descriptor, as scripted in Fig. 2.5. Next, according to the definition of the MPEG-4 Systems, we assign 0x21 and 0x40 to objectTypeIndication of Vl and As in DecoderConfigDescriptor, respectively. On the other hand, in the case of Va and Aa , we assign 0xC0 and 0xC1 to the objectTypeIndication indicating user private in MPEG-4 Systems, respectively. Note that the dependent ESs of Va and Aa are ignored in current 2D T-DMB receivers thus disturbing none of their functions, but are identified in new 3D T-DMB receivers.
18
H. Lee et al. Audio
Audio Encoder (MUSICAM)
Convolutional Encoder
Time Interleaver
3D Video Encoding
MUX
Time Interleaver
Convolutional Encoder
Convolutional Interleaver
DVB-ASI
MPEG-4 Systems Encoding
Reed Solomon (RS) Encoder
IOD Generator
PSI Generator
OD/BIFS Generator
TS Multiplexer
Aa
Multi-channel 3D Audio Encoder
PES Packetizer
3D Audio Encoding As
SL Packetizer
Va
Stereoscopic Video Encoder
COFDM Modulator
Vl
MPEG-4 over MPEG-2 3D DMB Media Processor
Conventional DAB Transmitter (Eureka-147)
Fig. 2.4. Block diagram of the T-DMB system including 3D AV services
ObjectDescriptor { //OD for 3D Video ObjectDescriptorID 3 esDescr [ // Description for Video(Left-view images) ES ES_Descriptor { ES_ID 3 OCRstreamFlag TRUE OCR_ES_ID 5 muxInfo muxInfo { ... } decConfigDescr DecoderConfigDescriptor { streamType 4 // Visual Stream bufferSizeDB 15060000 objectTypeIndication 0x21 // reserved for ISO use decSpecificInfo DecoderSpecificInfoString { ... } } slConfigDescr SLConfigDescriptor { ... } }] esDescr [ // Description for 3D Addtional Video Data(Right-view images ) ES ES_Descriptor { ES_ID 4 Stream DependenceFlag TRUE dependsOn_ES_ID 3 OCRstreamFlag TRUE OCR_ES_ID 5 muxInfo muxInfo { ... } decConfigDescr DecoderConfigDescriptor { streamType 4 // Visual Stream bufferSizeDB 15060000 objectTypeIndication 0xC0 // User Private decSpecificInfo DecoderSpecificInfoString { ... } } slConfigDescr SLConfigDescriptor { ... } } ] }
Fig. 2.5. Backward compatible OD scheme for 3D video
RF
2 3D Broadcasting System Based on T-DMB
19
2.3.3 Content Creation Since the resolution of T-DMB video is QVGA (320 × 240), we need basically to capture or generate two QVGA video signals for stereoscopic 3D T-DMB. At this moment, most of 2D T-DMB contents are obtained by down-sampling SDTV (Standard Definition TV) or HDTV (High Definition TV) resolution contents. As T-DMB service gets more popular in the future, we can expect DMB-specific contents are also produced. The ratio of the down-sampling is from 4:1 up to 27:1, which can remove details of the image in some critical regions. Thus unlike the 2D DMB case, this down-sampling can lead to a major quality deterioration of resulting 3D video. One other problem is that due to the small disparity values permitted in 3D T-DMB displays, large disparities in HDTV scenes cause discomfort when they are converted to DMB contents by simple down-sampling. Disparities in CG contents should also be limited appropriately. 2.3.4 Video Compression Three factors should be considered for 3D T-DMB video compression. They are limited transmission bandwidth of T-DMB system, backward compatibility with the existing 2D DMB service, and finally exploiting characteristics of human stereopsis. Basic concept of the compression technology for stereoscopic video is based on spatio-temporal prediction structure, since there exists an inter-view redundancy between different view scenes captured at the same time. So far, there have been MPEG-2 MVP (multi-view video profile), MPEG-4 based visual coding using temporal scalability, and AVC/H.264 based multi-view video coding (MVC) as typical stereoscopic video coding (SSVC) with spatiotemporal prediction structure. Basically, the prediction structures of them are same except for the structure derived from multiple reference frames on AVC/H.264. For 3D video encoding in our 3D T-DMB system, we propose a residualdownsampled stereoscopic video coding (RDSSVC) which downsamples the residual data based on AVC/H.264. Figure 2.6 shows a block diagram of the stereoscopic video coding based on AVC/H.264. In order to guarantee monoscopic 2D video service over conventional T-DMB system, left-view images are coded with temporal-only prediction structure without downsampling but right-view images are coded with spatio-temporal prediction structure on AVC/H.264 shown in Fig. 2.7. The motion and disparity are estimated and compensated for the original resolution. The residual data representing the prediction errors are downsampled before the transform and quantization. Hence, the downsampled residual data are transformed, quantized, and coded by CAVLC. In decoding, after the inverse quantization and inverse transform, the reconstructed residual data are upsampled to the original resolution and compensated with the pixels in
20
H. Lee et al.
Right-view + Image
Down
DCT
Q
CAVLC
Bitstream
IQ
IDCT
UP + + Motion & Disparity Compensation
Frame Memory
Motion & Disparity Estimation
Reconstructed frame MV/DV
Left-view Image
AVC/H.264
Bitstream
Fig. 2.6. Structure of residual-downsampling based stereoscopic video coding
the correspondent blocks on the reference pictures which have already been decoded. Left-view sequence is encoded and decoded following exactly H.264 specification. By down-sampling the residual data before transmission and then up-sampling in the receiving side, apparent visual quality degrades in return for bit savings. However, final perceptual quality of the stereoscopic 3D video will be about the same as ‘full bitrate’ version due to ‘additive nature’ of human stereopsis[15]. Using RDSSVC we could reduce the resulting bitrates while maintaining the perceptual quality of the stereoscopic video.
Right-view Images
P
P
P
P
Left-view Images
I
P
P
P
Disparity compensated prediction Motion compensated prediction
Fig. 2.7. Structure of reference frames in stereoscopic video coding
2 3D Broadcasting System Based on T-DMB
21
2.3.5 Display For stereoscopic 3D T-DMB system, we use parallax barrier type autostereoscopic LCD display. We implemented displays with two different resolutions: one is VGA (640 × 480) and the other is QVGA (320 × 240). Parallax barrier type 3D display has several merits such as easy implementation of 2D/3D switching function, low cost overhead and small physical size increase. Though an autostereoscopic display without eye-tracking has narrow fixed viewing zone, the nature of T-DMB use pattern (holding the portable device in a hand and view the screen, usually for personal use) makes it a commercially acceptable display for stereoscopic image and video service. Allowing larger viewing zone (in terms of viewing distance and head movement)while maintaining crosstalk to minimum is very crucial to the acceptance of the autostereoscopic displays. VGA resolution has advantages over QVGA one with higher perceptual quality and more suitability to 2D/3D hybrid service. 2.3.6 3D Audio For the 3D audio signal, 5.1 channel input signals are encoded in two paths: the first path is SSLCC (sound source location cue coding) encoding, which processes the input signal to generate a stereo audio As and additional surround information, and the next path is BSAC encoding of the stereo audio signal. If a T-DMB terminal has multi-channel speakers, then SSLCC encoded information is added to basic stereo audio to reproduce multi-channel 3D audio[16]. In addition to the 3D reproducing capability of the 3D T-DMB system, BIFS-based object handling functions inherent in MPEG-4 standards enable various interactive play modes of audio.
2.4 Implementation and Simulation Results 2.4.1 System Implementation With the use of Pentium-IV PC and auto-stereoscopic display, we implemented a prototype 3D T-DMB terminal. Figure 2.8 shows a photograph of the prototype system. The 3D T-DMB terminal consists of MPEG-4 over MPEG-2 de-capsulator, MPEG-4 Systems decoder, 3D video decoder, 3D audio decoder, and scene compositor. The MPEG-4 over MPEG-2 de-capsulator is to reconstruct SL packets of Vl , Va , As , Aa , and OD/BIFS. The PSI analyzer parses IOD information and then hands it over to IOD decoder. The MPEG-4 Systems decoding is to recover the ESs of 3D audio-visual signals as well as OD/BIFS from the SL packets. According to the OD information, both the 3D video decoder and the 3D audio decoder determine how to decode the signals. Next, the decoded video and audio signals enter the 3D video generator and the 3D audio
22
H. Lee et al. 3D T-DMB Receiver
3.5 inch 3D Display D-Sub
MPEG-4 over MPEG-2 Demultiplexing PSI Analyzer
MPEG-4 Systems Decoding IOD Decoder
3D Video Decoding
3D Audio Decoding 3D audio generator
Multi-channel 3D Audio Decoder
2D/3D AV
2D/3D LCD, Speakers
3D video generator
SL De-Packetizer
PES De-Packetizer
TS De-Multiplexer
RF Tuner
TS
Stereoscopic Video Decoder
Scene Compositor / Renderer
OD/BIFS Decoder
RF
Scene Composition
Fig. 2.8. Prototype of a 3D T-DMB receiver and its structure
generator, respectively. If the display (VGA resolution) mode is set to ‘3D’, 3D video generator enlarges left and right images to 320 × 480, respectively, by up-sampling them twice in the vertical direction and interleaves the left and right video signals. In the case of ‘2D’ mode, 3D video generator enlarges the left image to 640 × 480 and outputs left video signal. Similarly, 3D audio generator makes multi-channel 3D audio signals by mixing As and Aa in
2 3D Broadcasting System Based on T-DMB
23
the case of ‘3D’ mode. Normally, the stereo audio (As ) is fed into the scene compositor. Finally, the scene compositor produces the synchronized video and audio signals. 2.4.2 Backward Compatibility The backward compatibility issue of the 3D T-DMB system was stressed in the previous section. To verify this property of the proposed 3D T-DMB system, we tested our system with the 3D T-DMB bitstreams satisfying the new syntax and semantics explained above. We have verified that the proposed system satisfies the required backward compatibility with the TDMB system. As we mentioned previously, the conventional T-DMB receiver ignored the elementary streams for the additional video (Va ) and the additional audio (Aa ) because ‘OD/BIFS Decoder’ in ‘MPEG-4 Systems Decoding’ block simply ignores the Vα and Aα of unidentified (new) objectTypeIndication. 2.4.3 Video Coding It is also important to evaluate how much gain is improved by the proposed stereoscopic video coding. Hence, we compared the performance of the proposed coding method with Simulcast and the conventional SSVC. The three coding methods were implemented using the JSVM (joint scalable video model) 4.4 reference software of H.264/AVC. We used two sequences of ‘Diving’ and ‘Soccer’ of 320 × 240 (QVGA) with 240 frames as test sequence; they are captured by our stereoscopic camera. Figure 2.9 shows one frame for ‘Diving’ and ‘Soccer’ sequences, respectively. In Simulcast, the images of left- and right-views are encoded as I-pictures at every second and the rest images are encoded as P-pictures. In the temporal prediction, the three closest preceding frames are referenced as shown in Fig. 2.10.
Fig. 2.9. One frame for ‘Diving’ (left) and ‘Soccer’ (right) sequences
24
H. Lee et al.
Right-view Images
I
P
P
P
Left-view Images
I
P
P
P
Motion compensated prediction
Fig. 2.10. Structure of reference frames in Simulcast
For the proposed and conventional stereoscopic video coding, left-view images are encoded as I-pictures at every second, the rest images and all right-view images are encoded as P-pictures. In P-picture type, the structure of reference frames is shown in Fig. 2.7. The coding method for left- view images is exactly same as that of Simulcast. On the other hand, the coding of right-view images includes temporal prediction from the two closest preceding frames and spatial prediction from one left view frame of the same time in both proposed and conventional stereoscopic video coding. In the proposed coding, the procedure of residual-downsampling is done by reducing the resolution of residual data by half on the horizontal direction. In principle, the procedure of residual-downsampling is performed for the macroblocks with prediction modes over 8x4 size among Inter modes because a transform is done on the 4x4 block size. In detail, the coding of left-view images includes all existing prediction modes in AVC without residual-downsampling: Skip mode, Inter modes (16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4), and Intra modes (16x16, 4x4). On the other hand, in the coding of right-view images, we made a selection whether the procedure of the residual-downsampling should be done or not according to the types of prediction mode except Skip and Intra modes, and the macroblocks of 4x8 and 4x4 types are excluded from residual-downsampling. We show the coding results using RD (rate-distortion) curves of the proposed coding, conventional stereoscopic video coding, and Simulcast in Figures. 2.11 and 2.12. In the experiments, coding bitrates are allocated by eleven QP values from 22 to 42 increasing with step 2. For 3D T-DMB, total coding bitrate for 3D video is available under 768 kbps and right-view video should be encoded under 384 kbps within 768 kbps. Hence, Figures. 2.11 and 2.12 represent PSNR values versus the coding bitrates of under 768 kbps and under 384 kbps, respectively. Figure 2.11 represents the results of total coding efficiency for left- and right-view sequences. The proposed coding method has higher efficiency by up to 0.2 dB and by 0.2 ∼ 2.2 dB than the conventional stereoscopic video coding and Simulcast, respectively, for both test sequences. RD curves in Fig. 2.12 represent the results of the coding efficiency for only right-view sequence. The proposed coding method has higher efficiency by up to 0.7 dB and 0.2 ∼ 2.3 dB than the conventional
2 3D Broadcasting System Based on T-DMB 'Diving' (Left + Right)
'Soccer' (Left + Right)
44
36
43
35
PSNR [dB]
PSNR [dB]
42 41 40 39
34 33 32
Simulcast
Simulcast
31
SSVC
38
SSVC
Proposed
37 300
400
500
600
700
25
800
Proposed
30 330 430 530 630 730 830 930 1030 1130
900 1000
Bitrate [kbit/s]
Bitrate [kbit/s]
(a)
(b)
Fig. 2.11. PSNR versus total bitrates of left and right-view sequences: (a) ‘Diving’ sequence; (b) ‘Soccer’ sequence
stereoscopic video coding and Simulcast, respectively. It is noted in the RD curves that the gain of the proposed coding method becomes bigger as the coding bit rate decreases. As the bit rate becomes higher, the irreversible loss caused by residual downsampling cannot be restored regardless of how many bits are allocated. In practical situation, the bitrates of right-view images can be reduced much lower than that of left-view images because human perception is dominated by the high-quality component of a stereo pair. This can be achieved by separate rate control of each view. Hence it is necessary in the future work to study rate-control and test subjective quality for stereoscopic video coding in order to get better coding efficiency.
'Diving' (Right)
'Soccer' (Right)
35
44 42
33 PSNR [dB]
PSNR [dB]
40 38 36 34
31
29
Simulcast
Simulcast
SSVC
32
SSVC
Proposed
Proposed
30
27 0
100
200
300
Bitrate [kbit/s]
(a)
400
500
600
0
100
200
300
400
500
600
Bitrate [kbit/s]
(b)
Fig. 2.12. PSNR versus bitrates of right-view sequence: (a) ‘Diving’ sequence; (b) ‘Soccer’ sequence
26
H. Lee et al.
2.5 3D T-DMB Service Scenarios and Business Models 3D visual services in 3D T-DMB system can be divided into several categories. The main service type will be full 3D AV service which presents 3D videos with accompanying 3D surround audio. Since 3D audio has its own merits, it may be serviced with or without 3D video. T-DMB broadcasters usually have more than one audio channels in their channel portfolio and some users prefer audio channels to video channels. 3D images can be sent together with music programs so that users can enjoy 3D image slide show while they are listening to music from audio channels. In order not to exhaust users with more eye straining full 3D video program, we can combine 2D with 3D visual contents to show hybrid types of contents. We can select some highlight scene from a program and show it in eye-catching 3D video and all other portions of the program in plain 2D video. For advertising videos, main objects can be presented in 3D overlapped inside conventional 2D visual content. It is well known that 3D visual scenes keep eyes longer than the presentation in 2D format. We could also adopt Picture in Picture (PiP) style for this kind of service. Figures 2.13 and 2.14 show examples of hybrid video services. In order for various types of 3D or 2D/3D hybrid services to be supported by 3D T-DMB system, issues like data compression format, multiplexing and synchronization among several audio-visual objects, and methods of signalling/identification of new service types should be addressed. We are currently working on these issues for integrated 3D T-DMB system implementation.
Partial 3D image
Background 2D image
Fig. 2.13. The example of 2D/3D hybrid service in T-DMB
2 3D Broadcasting System Based on T-DMB
27
PiP (2D image)
Background 3D image
PiP (2D image)
Fig. 2.14. An example of 3D PiP service in T-DMB
2.6 Concluding Remarks We introduced a 3D T-DMB prototype system which can provide 3D AV services over T-DMB while maintaining the backward compatibility with the T-DMB system. We implemented prototype transmitting server and 3D T-DMB terminals, and have verified that the proposed concept of 3D AV services over T-DMB works well. And under the limited bandwidth and the small-sized display, the subjective tests have shown that the developed system can provide acceptable depth feeling and video quality. Similar approach could be applied to various applications such as terrestrial digital television, digital cable television, IPTV, and so on. T-DMB is a very attractive platform for successful commercial 3DTV trials due to its service characteristics: small display, single viewer and a new media. If the 3D T-DMB service is widely accepted, ‘big’ 3DTV at home will naturally be the next step.
Acknowledgement “This work was supported by the IT R&D program of MIC/IITA, [2007-S00401] Development of Glasses-free Single-user 3D Broadcasting Technologies”
References 1. ETSI EN 300 401 (2000) Radio Broadcasting Systems; Digital Audio Broadcasting (DAB) to Mobile, Portable and Fixed Receivers 2. ETSI EN 301 234 (2006) Digital Audio Broadcasting (DAB); Multimedia Object Transfer (MOT) Protocol
28
H. Lee et al.
3. ETSI TS 102 428 (2005) Digital Audio Broadcasting (DAB); DMB Video Service; User Application Specification 4. TTAS.KO-07.0026 (2004) Radio Broadcasting Systems; Specification of the Video Services for VHF Digital Multimedia Broadcasting (DMB) to Mobile, Portable and Fixed Receivers 5. Hoeg W, Lauterbach T (2003) Digital Audio Broadcasting: Principles and Applications of Digital Radio. John Wiley & Sons, England 6. ISO/IEC 14496-1 (2001) Information Technology – Generic Coding of Audio– Visual Objects-Part 1: Systems 7. ISO/IEC CD 23002-3 (2006) Auxiliary Video Data Representation 8. ISO/IEC 13818-1 (2000) Information Technology – Generic Coding of Moving Pictures and Associated Audio Information: Systems, Amendment 7: Transport of ISO/IEC 14496 Data Over ISO/IEC 13818-1 9. ITU-R Rec. BS. 775-1 (1994) Multichannel Stereophonic Sound System With and Without Accompaning Picture 10. ITU-R Recommendation BT.500-10 (2000) Methodology for the Subjective Assessment of the Quality of Television Picture 11. Cho S, Hur N, Kim J, Yun K, Lee S (2006) Carriage of 3D Audio-Video Services by T-DMB. In: Proceedings of International Conference on Multimedia & Expo (ICME). Toronto, pp. 2165–2168 12. Javidi B, Okano F (2002) Three-Dimensional Television, Video, and Display Technologies. Springer-Verlag, Berlin 13. Schreer O, Kauff P, Sikora T (2005) 3D Video Communication: Algorithms, Concepts and Real-Time Systems in Human Centered Communication. John Wiley, England 14. Sugawara M, Kanazawa M, Mitani K, Shimamoto H, Yamashita T, Okano F (2003) Ultrahigh-Definition Video System With 4000 Scanning Lines. SMPTE Journal 112:339–346 15. Pastoor S (1991) 3D Television: A Survey of Recent Research Results on Subjective Requirements. Signal Processing: Image Communication 4:21–32 16. Seo J, Moon HG, Beack S, Kang K, Hong JK (2005) Multi-channel Audio Service in a Terrestrial-DMB System Using VSLI-Based Spatial Audio Coding. ETRI Journal 27:635–638 17. Gwangsoon Lee, KyuTae Yang, Kwang-Yong Kim, YoungKwon Hahm, Chunghyun Ahn, and Soo-In Lee (2006) Design of Middleware for Interactive Data Services in the Terrestrial DMB ETRI Journal, 28(5):652–655
3 Reconstructing Human Shape, Motion and Appearance from Multi-view Video Christian Theobalt1 , Edilson de Aguiar1 , Marcus A. Magnor2 and Hans-Peter Seidel1 1 2
MPI Informatik, Saarbr¨ ucken, Germany TU Braunschweig, Braunschweig, Germany
3.1 Introduction In recent years, an increasing research interest in the field of 3D video processing has been observed. The goal of 3D video processing is the extraction of spatio-temporal models of dynamic scenes from multiple 2D video streams. These scene models comprise of descriptions of the shape and motion of the scene as well as its appearance. Having these dynamic representations at hand, one can display the captured real world events from novel synthetic camera perspectives. In order to put this idea into reality, algorithmic solutions to three major problems have to be found: the problem of multi-view acquisition, the problem of scene reconstruction from image data, and the problem of scene display from novel viewpoints. Human actors are presumably the most important elements of many realworld scenes. Unfortunately, it is well-known to researchers in computer graphics and computer vision that both the analysis of shape and motion of humans from video, as well as their convincing graphical rendition are very challenging problems. To tackle the difficulties of the involved problems, we propose in this chapter three model-based approaches to capture the motion, as well as the dynamic geometry of moving humans. By applying a fast dynamic multiview texturing method to the captured time-varying geometry, we are able to render convincing free-viewpoint videos of human actors. This chapter is a roundup of several algorithms that we have recently developed. It shall serve as an overview and make the reader aware of the most important research questions by illustrating them on state-of-the art research prototypes. Furthermore, a detailed list of pointers to related work shall enable the interested reader to explore the field in greater depth on its own. For all the proposed methods, human performances are recorded with only eight synchronized video cameras. In the first algorithmic variant, a template model is deformed to match the shape and proportions of the captured human actor and it is made to follow the motion of the person by means of a markerfree optical motion capture approach. The second variant extends the first
30
C. Theobalt et al.
one, and enables the estimation not only of shape and motion parameters of the recorded subject, but also the reconstruction of dynamic surface geometry details that vary over time. The last variant shows how we can incorporate high-quality laser-scanned shapes into the overall work-flow. Using any of the presented method variants, the human performances can be rendered in realtime from arbitrary synthetic viewpoints. Time-varying surface appearance is generated by means of a dynamic multi-view texturing from the input video streams. The chapter is organized as follows. We begin with details about the multiview video studio and the camera system we employ for data acquisition, Sect. 3.2. In Section 3.3 we review the important related work. Our template body model is described in Sect. 3.4 Our template body model is described in Sect. 3.4. The first marker-less algorithm variant is described in Sect. 3.5. Here, also the details of our silhouette-based analysis-through-synthesis approach are explained. The second algorithmic variant that enables capturing of timevarying surface details is described in Sect. 3.6. Finally, Sect. 3.7 presents our novel approach to transfer the sequence of poses of the template model to a high-quality laser scan of the recorded individual. The nuts and bolts of the texturing and blending method are explained in Sect. 3.8. In Sect. 3.9, we present and discuss results obtained with either of the described algorithmic variants. We conclude the chapter with an outlook to future directions in Sect. 10.
3.2 Acquisition – A Studio for Multi-view Video Recording The input to our system are multiple synchronized video streams of a moving person, so-called MVV sequences, that we capture in our multi-view video studio. The spatial dimensions of the room, which are 11 by 5 meters, are large enough to allow multi-view recording of dynamic scenes from a large number of viewpoints. The ceiling has a height of approximately 4m. The walls and floor are covered with opaque black curtains and a carpet respectively to avoid indirect illumination into the studio. The studio features a multi-camera system that enables us to capture a volume of approx.4×4×3 m with eight externally synchronized video cameras. We employ Imperx MDC-1004 cameras that feature a 1004×1004 CCD sensor with linear 12 bits-per-pixel resolution and a frame rate of 25 fps. The imaging sensors can be placed in arbitrary positions, but typically we resort to an approximately circular arrangement around the center of the scene. Optionally, one of the cameras is placed in an overhead position. The cameras are calibrated into a common coordinate frame. Color consistency across cameras is ensured by applying a color-space transformation to each camera stream. The lighting conditions in the studio are fully-controllable. After recording a sequence, the result image data is captured in parallel by
3 Reconstructing Human Shape, Motion and Appearance from MVV
31
eight frame grabber cards, being streamed in real-time to a RAID system consisting of sixteen hard drives. Our studio now also features a Vitus Smart ∧ TM full body laser scanner. It enables us to capture high-quality triangle meshes of each person prior to recording her with the camera system
3.3 Related Work Since the work presented here jointly solves a variety of algorithmic subproblems, we can capitalize on a huge body of previous work in the fields of optical human motion capture, optical human model reconstruction, image-based rendering and mesh-based animation processing. We now give a brief overview of important related work in each of these fields. 3.3.1 Human Motion Capture By far the most widely used commercial systems for human motion capture are marker-based optical acquisition setups [1]. They make use of the principle of moving light displays [2]. Optical markings, which are either made of a retroreflective material or LEDs are placed on the body of a tracked subject. Several special-purpose high-frame-rate cameras (often with specialized light sources) are used to record the moving person. The locations of the markers in the video streams are tracked and their 3D trajectories over time are reconstructed by means of optical triangulation [3]. A kinematic skeleton is now matched to the marker trajectories to parameterize the captured motion in terms of joint angles [4]. The main algorithmic problems that have to be solved are the unambiguous optical tracking of the markers over time as well as the establishment of marker correspondences across multiple camera views [5]. Today, many commercial marker-based capturing systems are available, e.g. [6]. Although the accuracy of the measured motion data is fairly high, the application of marker-based systems is sometimes cumbersome and often impossible. The captured individuals typically have to wear special body suits. It is thus not possible to capture humans wearing normal everyday apparel, and therefore the captured video streams can not be employed for further processing, e.g. texture reconstruction. However, for the application we have in mind, the latter is essential. Marker-free motion capture approaches bridge this gap and enable the capturing of human performances without special modification of the scene [7]. The principle, as well as the challenge behind marker-free optical motion capture methods is to invert the nonlinear multi-modal map from the complex pose space to the image space by looking at specific features. Most methods in the literature use some kind of kinematic body model to track the motion. The models typically consists of a linked kinematic chain of bones and interconnecting joints, and are fleshed out with simple geometric primitives in
32
C. Theobalt et al.
order to model the physical outline of the human body. Typical shape primitives are ellipsoids [8, 9], superquadrics [10, 11, 12], and cylinders [13, 14]. Implicit surface models based on metaballs are also feasible [15]. Many different strategies have been suggested to bring such a 3D body model into optimal accordance with the poses of the human in multiple video streams. Divide and conquer methods track each body segment separately using image features, such as silhouettes [16], and mathematically constrain their motion to preserve connectivity. Conceptually related are constraint propagation methods that narrow the search-space of correct body poses by finding features in the images and propagating constraints on their relative placement within the model and over time, [17, 18]. In [17] a general architectural framework for human motion tracking systems has been proposed which is still used in many marker-free capturing methods, e.g. analysis-by-synthesis. According to this principle, model-based tracking consist of a prediction phase, a synthesis phase, an image analysis phase, and a state estimation phase. In other words, at each time step of a motion sequence the capturing system first makes a prediction of the current pose, then synthesizes a view with the model in that pose, compares the synthesized view to the actual image, and updates the prediction according to this comparison. Different tracking systems differ in what algorithmic strategy they employ at each stage. Analysis-through-synthesis methods search the space of possible body configurations by synthesizing model poses and comparing them to features in the image plane. The misalignment between these features, such as silhouettes [15], and the corresponding features of the projected model drives a pose refinement process [19, 20, 21]. Physics-based approaches derive forces acting on the model which bring it into optimal accordance with the video footage [22, 23]. Another way to invert the measurement equation from pose to image space is to apply inverse kinematics [24], a process known from robotics which computes a body configuration that minimizes the misalignment between projected model and image data. Inverse kinematics inverts the measurement equation by linearly approximating it. The method in [25] fits a kinematic skeleton model fleshed out with cylindrical limbs to one or several video streams of a moving person. A combination of a probabilistic region model, the twist parameterization for rotations and optical flow constraints from the image enable an iterative fitting procedure. An extension of this idea is described in [26] where, in addition to the optical flow constraints, also depth constraints from real-time depth image streams are employed. Rosenhahn et al. have formulated the pose recovery problem as an optimization problem. They employ conformal geometric algebra to mathematically express distances between silhouette cones and shape model outlines in 3D. The optimal 3D pose is obtained by minimizing these distances [27]. Recently, the application of statistical filters in the context of human motion capture has become very popular. Basically, all such filters employ a
3 Reconstructing Human Shape, Motion and Appearance from MVV
33
process model that describes the dynamics of the human body and a measurement model that describes how an image is formed from the body in a certain pose. The process model enables prediction of the state in the next time step and the measurement model allows for the refinement of the prediction based on the actual image data. If the noise is Gaussian and the model dynamics can be described by a linear model, a Kalman Filter can be used for tracking [9]. However, the dynamics of the complete human body is non-linear. A particle filter can handle such non-linear systems and enables tracking in a statistical framework based on Bayesian decision theory [28]. At each time step a particle filter uses multiple predictions (body poses) with associated probabilities. These are refined by looking at the actual image data (the likelihood). The prior is usually quite diffuse, but the likelihood function can be very peaky. The performance of statistical frameworks for tracking sophisticated 3D body models has been demonstrated in several research projects [13, 29, 30, 31]. In another category of approaches that have recently become popular, dynamic 3D scene models are reconstructed from multiple silhouette views and a kinematic body model is fitted to them [32]. A system that fits an ellipsoidal model of a human to visual hull volumes in real-time is described in [8]. The employed body model is very coarse and approximates each limb of the body with only one quadric. In [9] a system for off-line tracking of a more detailed kinematic body model using visual hull models is presented. The method described in [33] reconstructs a triangle mesh surface geometry from silhouettes to which a kinematic skeleton is fitted. Cheung et al. also present an approach for body tracking from visual hulls [34]. In contrast, we propose three algorithmic variants to motion capture that employ a hardware-accelerated analysis-through-synthesis approach to capture time-varying scene geometry and pose parameters [35, 36, 37] from only eight camera views. By appropriate decomposition of the tracking problem into subproblems, we can robustly capture body poses without having to resort to computationally expensive statistical filters. 3.3.2 Human Model Reconstruction For faithful tracking, but also for convincing renditions of virtual humans, appropriate human body models have to be reconstructed in the first place. These models comprise of correct surface descriptions, descriptions of the kinematics, and descriptions of the surface deformation behavior. Only a few algorithms have been proposed in the literature to automatically reconstruct such models from captured image data. In the work by Cheung et al. [34], a skeleton is estimated from a sequence of shape-fromsilhouette volumes of the moving person. A special sequence of moves has to be performed with each limb individually in order to make model estimation feasible. In the approach by Kakadiaris et al. [22], body models are estimated from multiple video streams in which the silhouettes of the moving person
34
C. Theobalt et al.
have been computed. With their method too, skeleton reconstruction is only possible if a prescribed sequence of movements is followed. In [38, 39], an approach to automatically learn kinematic skeletons from shape-from silhouette volumes is described that does not employ any a priori knowledge and does not require predefined motion sequences. The method presented in [40] similarly proposes a spectral clustering-based approach to estimate kinematic skeletons from arbitrary 3D feature trajectories. In [41] a method is described that captures the surface deformation of the upper body of a human by interpolating between different range scans. The body geometry is modeled as a displaced subdivision surface. A model of the body deformation in dependence on the pose parameters is obtained by the method described in [42]. A skeleton model of the person is known a priori and the motion is captured with a marker based system. Body deformation is estimated from silhouette images and represented with needles that change in length and whose endpoints form the body surface. Recently, Anguelov et al. [43] have presented a method to learn a parameterized human body model that captures both variations in shape, as well as variations in pose from a database of laser scans. In contrast, two of the algorithmic variants describe a template-based approach that automatically builds the kinematic structure and the surface geometry of a human from video data. 3.3.3 Free-viewpoint Video Research in free-viewpoint video aims at developing methods for photorealistic, real-time rendering of previously captured real-world scenes. The goal is to give the user the freedom to interactively navigate his or her viewpoint freely through the rendered scene. Early research that paved the way for free-viewpoint video was presented in the field of image-based rendering (IBR). Shape-from-silhouette methods reconstruct geometry models of a scene from multi-view silhouette images or video streams. Examples are imagebased [44, 45] or polyhedral visual hull methods [46], as well as approaches performing point-based reconstruction [47]. The combination of stereo reconstruction with visual hull rendering leads to a more faithful reconstruction of surface concavities [48]. Stereo methods have also been applied to reconstruct and render dynamic scenes [49, 50], some of them employing active illumination [51]. On the other hand, light field rendering [52] is employed in the 3DTV system [53] to enable simultaneous scene acquisition and rendering in real-time. In contrast, we employ a complete parameterized geometry model to pursue a model-based approach towards free-viewpoint video [35, 54, 55, 56, 57, 37]. Through commitment to a body model whose shape is made consistent with the actor in multiple video streams, we can capture humans motion and his dynamic surface texture [36]. We can also apply our method to capture personalized human avatars [58].
3 Reconstructing Human Shape, Motion and Appearance from MVV
35
3.3.4 Mesh-based Deformation and Animation In the last algorithmic variant we explain in this chapter, we map the motion that we captured with our template model to a high-quality laser scan of the recorded individual. To this end, we employ a method for motion transfer between triangle meshes that is based on differential coordinates [59, 60]. The potential of these methods has already been stated in previous publications, however the focus always lay on deformation transfer between synthetic moving meshes [61]. Using a complete set of correspondences between different synthetic models, [62] can transfer the motion of one model to the other. Following a similar line of thinking, [63, 64] propose a mesh-based inverse kinematics framework based on pose examples with potential application to mesh animation. Recently, [65] presents a multi-grid technique for efficient deformation of large meshes and [66] presents a framework for performing constrained mesh deformation using gradient domain techniques. Both methods are conceptually related to our algorithm and could also be used for animating human models. However, none of the papers provides a complete integration of the surface deformation approach with a motion acquisition system, nor does any of them provide a comprehensive user interface. We capitalize on and extend ideas in this field in order to develop a method that allows us to easily make a high-quality laser scan of a person move in the same way as the performing subject. Realistic motion for the scan as well as non-rigid surface deformations are automatically generated.
3.4 The Adaptable Human Body Model While 3D object geometry can be represented in many ways, we employ a triangle mesh representation since it offers a closed and detailed surface description and can be rendered very fast on graphics hardware. Since the template human model should be able to perform the same complex motion as its real-world counterpart, it is composed of multiple rigid-body parts that are linked by a hierarchical kinematic chain. The joints between segments are suitably parameterized to reflect the kinematic degrees of freedom of the object. Besides object pose, also the shape and dimensions of the separate body parts must be customized in order to optimally reproduce the appearance of the real subject. A publicly available VRML geometry model of a human body is used as our template model, Fig. 3.1a. It consists of 16 rigid body segments: one for the upper and lower torso, neck, and head, and pairs for the upper arms, lower arms, hands, upper legs, lower legs and feet. A hierarchical kinematic chain connects all body segments. 17 joints with a total of 35 joint parameters define the pose of the template model. For global positioning, the model provides three translational degrees of freedom which influence the position of the skeleton root, i.e located at the
36
C. Theobalt et al.
(a)
(b)
(c)
Fig. 3.1. (a) Surface model and the underlying skeletal structure - spheres indicate joints and the different parameterizations used. (b) Schematic illustration of local vertex coordinate scaling by means of a B´ezier scaling curve. (c) The two planes in the torso illustrate the local scaling directions
pelvis. Different joints in the body model provide different numbers of rotational degrees of freedom the same way as the corresponding joints in an anatomical skeleton do. Figure 3.1a shows individual joints in the kinematic chain of the body model and the respective joint color indicates if it is a 1-DOF hinge joint, a 3-DOF ball joint, or a 4-DOF extended joint [35]. In addition to the pose parameters, the model provides anthropomorphic shape parameters that control the bone lengths as well as the structure of the triangle meshes defining the body surface. The first set of anthropomorphic parameters consists of a uniform scaling that scales the bone as well as the surface mesh uniformly in the direction of the bone axis. In order to match the geometry more closely to the shape of the real human each segment features four onedimensional B´ezier curves, B+x (u), B−x (u), B+z (u), B−z (u), that are used to scale individual coordinates of each vertex in the local triangle mesh. The scaling is performed in the local +x, -x, +z, and -z directions of the coordinate frame of the segment which are orthogonal to the direction of the bone axis. Figure 3.1b shows the effect of changing the B´ezier scaling values using the arm segment as an example. Intuitively, the four scaling directions lie on two orthogonal planes in the local frame. For illustration, we show these two planes in the torso segment in Fig. 3.1c.
3.5 Silhouette-based Analysis-through-synthesis The challenge in applying model-based analysis for free-viewpoint video reconstruction is to find a way how to automatically and robustly adapt the geometry model to the appearance of the subject as it was recorded by the video cameras. In general, we need to determine the parameter values that achieve the best match between the model and the video images. Regarding this task as an optimization problem, the silhouettes of the actor, as seen from the different camera viewpoints, are used to match the
3 Reconstructing Human Shape, Motion and Appearance from MVV
37
model to the video images (an idea used in similar form in [67]): the model is rendered from all camera viewpoints, and the rendered images are thresholded to yield binary masks of the silhouettes of the model. The rendered model silhouettes are then compared to the corresponding image silhouettes [35, 54, 55, 57]. As comparison measure, the number of silhouette pixels is determined that do not overlap. Conveniently, the exclusive-or (XOR) operation between the rendered model silhouette and the segmented video-image silhouette yields those pixels that are not overlapping. The energy function thus evaluates to: EXOR (μ) =
Y X N i=0 x=0 y
(Ps (x, y) ∧ Pm (x, y)) ∨ (Ps (x, y) ∧ Pm (x, y))
(3.1)
where μ is the model parameters currently considered, e.g. pose or anthropomorphic parameters, N the number of cameras, and X and Y the dimensions of the image. Ps (x, y) is the 0/1-value of the pixel (x, y) in the captured image silhouette, while Pm (x, y, μ) is the equivalent in the reprojected model image given that the current model parameters are μ. Fortunately, this XOR energy function can be very efficiently evaluated in graphics hardware (Fig. 3.2). An overview of the framework used to adapt model parameter values such that the mismatch score becomes minimal is shown in Fig. 3.2. A standard numerical optimization algorithm, such as Powell’s method [68], runs on the
Fig. 3.2. Hardware-based analysis-through-synthesis method: To match the geometry model to the multi-video recordings of the actor, the image foreground is segmented and binarized. The model is rendered from all camera viewpoints and the boolean XOR operation is executed between the foreground images and the corresponding model renderings. The number of remaining pixels in all camera views serves as matching criterion. Model parameter values are varied via numerical optimization until the XOR result is minimal. The numerical minimization algorithm runs on the CPU while the energy function evaluation is implemented on the GPU
38
C. Theobalt et al.
CPU. As a direction set method it always pertains a number of candidate descend directions in parameter space. The optimal descend in one direction is computed using Brent’s line search method. For each new set of model parameter values, the optimization routine invokes the matching function evaluation routine on the graphics card. One valuable benefit of model-based analysis is the low-dimensional parameter space when compared to general reconstruction methods. The parameterized model provides only a few dozen degrees of freedom that need to be determined, which greatly reduces the number of potential local minima. Furthermore, many high-level constraints are implicitly incorporated, and additional constraints can be easily enforced by making sure that all parameter values stay within their anatomically plausible range during optimization. Finally, temporal coherence is straightforwardly maintained by allowing only some maximal rate of change in parameter value from one time step to the next. The silhouette-based analysis-through-synthesis approach is employed for two purposes, the initialization of the geometry of the model (Sect. 3.5.1) and the computation of the body pose at each time step (Sect. 3.5.2). 3.5.1 Initialization In order to apply the silhouette-based framework to real-world multi-view video footage, the generic template model must first be initialized, i.e. its proportions must be adapted to the subject in front of the cameras. This is achieved by applying the silhouette-based analysis-through-synthesis algorithm to optimize the anthropomorphic parameters of the model. This way, all segment surfaces can be deformed until they closely match the stature of the actor. During model initialization, the actor stands still for a brief moment in a pre-defined pose to have his silhouettes recorded from all cameras. The generic model is rendered for this known initialization pose, and without user intervention, its proportions are automatically adapted to the silhouettes of the individual. First, only the torso is considered. Its position and orientation is determined approximately by maximizing the overlap of the rendered model images with the segmented image silhouettes. Then the pose of arms, legs and head are recovered by rendering each limb in a number of orientations close to the initialization pose and selecting the best match as starting point for refined optimization. Following the model hierarchy, the optimization itself is split into several sub-optimizations in lower-dimensional parameter spaces (Sect. 3.5.2). After the model has been coarsely adapted in this way, the uniform scaling parameters of all body segments are adjusted. The algorithm then alternates typically around 5–10 times between optimizing joint parameters and segment scaling parameters until it converges. Finally, the B´ezier control parameters of all body segments are optimized in order to fine-tune each outline of the segment such that it complies with the
3 Reconstructing Human Shape, Motion and Appearance from MVV
(a)
(b)
39
(c)
Fig. 3.3. (a) Template model geometry. (b) Model after 5 iterations of pose and scale refinements. (c) Model after adapting the B´ezier scaling parameters
recorded silhouettes. Figure 3.3 shows the initial model shape, its shape after five iterations of pose and scene optimization, and its shape after B´ezier scaling. From now on the anthropomorphic shape parameters found remain fixed. 3.5.2 Marker-free Pose Tracking The analysis-through-synthesis framework enables us to capture the pose parameters of a moving subject without having the actor wear any specialized apparel. The individualized geometry model automatically tracks the motion of the actor by optimizing the 35 joint parameters for each time step. The model silhouettes are matched to the segmented image silhouettes of the actor such that the model performs the same movements as the human in front of the cameras, Fig. 3.4. At each time step an optimal stance of the model is found by performing a numerical minimization of the silhouette XOR energy functional (3.1) in the space of pose parameters. To efficiently avoid local minima, the model parameters are not all optimized simultaneously. Instead, the hierarchical structure of the model is
(a)
(b)
(c)
Fig. 3.4. (a) One input image of the actor performing. (b) Silhouette XOR overlap. (c) Reconstructed pose of the template body model with kinematics skeleton
40
C. Theobalt et al.
exploited. We effectively constrain the search space by exploiting structural knowledge about the human body, knowledge about feasible body poses, temporal coherence in motion data and a grid sampling preprocessing step. Model parameter estimation is performed in descending order with respect to the individual impact of the segments on silhouette appearance and their position along the kinematic chain of the model. First, position and orientation of the torso is varied to find its 3D location. Next, arms, legs and head are considered. Finally, hands and feet are examined, Fig. 3.5. Temporal coherence is exploited by initializing the optimization for one body part with the pose parameters found in the previous time step. Optionally, a simple linear prediction based on the two preceding parameter sets is feasible. In order to cope with fast body motion that can easily mislead the optimization search, we precede the numerical minimization step with a regular grid search. The grid search samples the dimensional parameter space at regularly-spaced values and checks each corresponding limb pose for being a valid pose. Using the arm as an example, a valid pose is defined by two criteria. Firstly, the wrist and the elbow must project into the image silhouettes in every camera view. Secondly, the elbow and the wrist must lie outside a bounding box defined around the torso segment of the model. For all valid poses found, the error function is evaluated, and the pose that exhibits the minimal error is used as starting point for a direction set downhill minimization. The result of this numerical minimization specifies the final limb configuration. The parameter range from which the grid search draws sample values is adaptively changed based on the difference in pose parameters of the two preceding time steps. The grid sampling step can be computed at virtually no cost and significantly increases the convergence speed of the numerical minimizer. The performance of the silhouette-based pose tracker can be further improved by capitalizing on the structural properties of the optimization problem. First, the XOR evaluation can be speed up by restricting the computation to a sub-window in the image plane and excluding non-moving body parts
Fig. 3.5. Body parts are matched to the silhouettes in hierarchical order: the torso first, then arms, legs and head, finally hands and feet. Local minima are avoided by a limited regular grid search for some parameters prior to optimization initialization
3 Reconstructing Human Shape, Motion and Appearance from MVV
41
from rendering. Second, optimization of independent sub-chains can be performed in parallel. A prototype implementation using 5 PCs and 5 GPUs, as well as the improved XOR evaluation exhibited a speed-up of up to factor 8. Details can be found in [54].
3.6 Dynamic Shape and Motion Reconstruction In the method described in Sect. 3.5, we have presented a framework for robustly capturing the shape and motion of a moving subject from multi-view video. However, the anthropomorphic shape parameters of the model are captured from a static initialization posture and, in turn, remain static during the subsequent motion tracking. Unfortunately, by this means we are unable to capture subtle time-varying geometry variations on the body surface, e.g. due to muscle bulging or wrinkles in clothing. This section presents an extension to our original silhouette-based initialization method that bridges this gap. By not only taking silhouette constraints but also a color-consistency criterion into account, we are able to reconstruct also dynamic geometry details on the body model from multi-view video [36]. To achieve this purpose, our algorithm simultaneously optimizes the body pose and the anthropomorphic shape parameters of our model (Sect. 3.4). Our novel fitting scheme consists of two steps, spatio-temporally consistent (STC) model reconstruction and dynamic shape refinement. The algorithmic workflow between these steps is illustrated in Fig. 3.6. Our method expects a set of synchronized multi-view video sequences of a moving actor as input. The STC model is characterized by two properties. Firstly, at each time step of video its pose matches the pose of the actor in the input streams. Secondly, it features a constant set of anthropomorphic shape parameters that have not only been reconstructed from a single initialization posture, but from a set of postures that have been taken from sequences in which the person performs arbitrary motion. To reconstruct the STC representation, we employ our silhouette-based analysis-by-synthesis approach within a spatio-temporal optimization procedure, Sect. 3.6.1.
Fig. 3.6. Visualization of the individual processing steps of the Dynamic Shape and Motion Reconstruction Method
42
C. Theobalt et al.
The spatio-temporally-consistent scene representation is consistent with the pose and shape of the actor at multiple time steps, but it still only features constant anthropomorphic parameters. In order to reconstruct dynamic geometry variations we compute appropriate vertex displacements for each time step of video separately. To this end, we jointly employ a color- and silhouette-consistency criterion to identify slightly inaccurate surface regions of the body model which are then appropriately deformed by means of a Laplacian interpolation method, Sect. 3.6.2. 3.6.1 Spatio-temporal Model Reconstruction We commence the STC reconstruction by shape adapting the template model using the single-pose initialization procedure described in Sect. 3.5.1. Thereafter, we run a two-stage spatio-temporal optimization procedure that alternates between pose estimation and segment deformation. Here, it is important to note that we do not estimate the respective parameters from single time steps of multi-view video, but always consider a sequence of captured body poses. In the first step of each iteration, the pose parameters of the model at each time step of video are estimated using the method proposed in Sect. 3.5.2. In the second step, the B-spline control values for each of the 16 segments are computed by an optimization scheme. We find scaling parameters that optimally reproduce the shape of the segments in all body poses simultaneously. A modified energy function sums over the silhouetteXOR overlap contributions at each individual time step. Figure 3.7 shows the resulting spatio-temporally consistent model generated with our scheme. 3.6.2 Dynamic Shape Refinement The spatio-temporal representation that we have now at our disposition is globally silhouette-consistent with a number of time steps of the input video sequence. However, although the match is globally optimal, it may not exactly match the silhouettes of the actor at each individual time step. In particular, subtle changes in body shape that are due to muscle bulging or deformation of the apparel are not modeled in the geometry. Furthermore, certain types
(a)
(b)
(c)
(d)
Fig. 3.7. (a) Adaptable generic human body model; (b) initial model after skeleton rescaling; (c) model after one and (d) several iterations of the spatio-temporal scheme
3 Reconstructing Human Shape, Motion and Appearance from MVV
43
of geometry features, such as concavities on the body surface, can not be captured from silhouette images alone. In order to capture these dynamic details in the surface geometry, we compute per-vertex displacements for each time step of video individually. To this end, we also exploit the color information in the input video frames. Assuming a purely Lambertian surface reflectance, we estimate appropriate per-vertex displacements by jointly optimizing a multi-view color-consistency and a multi-view silhouette-consistency measure. Regularization terms that assess mesh distortions and visibility changes are also employed. The following subsequent steps are performed for each body segment and each time step of video: 3.6.2.1 Identification of Color-inconsistent Regions We use color information to identify, for each time step of video individually, those regions of the body geometry which do not fully comply with the appearance of the actor in the input video images. To numerically assess the geometry misalignment, we compute for each vertex a color-consistency measure similar to the one described in [69]. By applying a threshold to the color-consistency measure we can decide if a vertex is in a photo-consistent or photo-inconsistent position. After calculating the color-consistency measure for all vertices, all photo-inconsistent vertices are clustered into contingent photo-inconsistent surface patches by means of a region growing method. In Fig. 3.6, color-inconsistent surface patches are marked in grey. 3.6.2.2 Computation of Vertex Displacements We randomly select a set of vertices M out of each color-inconsistent region that we have identified in the previous step. For each vertex j ∈ M with position vj we compute a displacement rj in the direction of the local surface normal that minimizes the following energy functional: E(vj , rj ) = wI EI (vj + rj ) + wS ES (vj + rj )+ wD ED (vj + rj ) + wP EP (vj , vj + rj ) • •
•
(3.2)
EI (vj + rj ) is the color-consistency measure. The term ES (vj + rj ) penalizes vertex positions that project into image plane locations that are very distant from the boundary of the silhouette of the person. The inner and outer distance fields for each silhouette image can be pre-computed by means of the method described in [70]. ED (v) regularizes the mesh segment by measuring the distortion of triangles. We employ a distortion measure which is based on the Frobenius norm κ [71]: a 2 + b 2 + c2 √ − 1, (3.3) κ= 4 3A
44
•
C. Theobalt et al.
where a, b and c are the lengths of the edges of a triangle and A is the area of the triangle. For an equilateral triangle the value is 0. For degenerate triangles it approaches infinity. To compute ED (vj + rj ) for a displaced vertex j at position vj + rj , we average the κ values for the triangles adjacent to j. The term EP (vj , vj +rj ) penalizes visibility changes that are due to moving a vertex j from position vj to position vj + rj . It has a large value if in position vj + rj the number of cameras that sees that vertex is significantly different from the number of cameras that sees it at vj .
The weights wI , wS , wD , wP are straightforwardly found through experiments, and are chosen in such a way that EI (v) and ES (v) dominate. We use the LBFGS-B method [72], a quasi-Newton algorithm, to minimize the energy function E(vj , rj ). After calculating the optimal displacement for all M random vertices individually, these displacements are used to smoothly deform the whole region by means of a Laplace interpolation method. 3.6.2.3 Laplacian Model Deformation Using a Laplace interpolation method (see e.g. [73, 74]), each color-inconsistent region is deformed such that it globally complies with the sampled per-vertex displacements. The new positions of the vertices in a region form an approximation to the displacement constraints. Formally, the deformed vertex positions are found via a solution to the Laplace equation Lv = 0,
(3.4)
where v is the vector of vertex positions and the matrix L is the discrete Laplace operator. The matrix L is singular, and we hence need to add suitable boundary conditions to (3.4) in order to solve it. We reformulate the problem as 2 0 L v− min (3.5) d K This equation is solved in each of the three Cartesian coordinate directions (x, y and z) separately. The matrix K and the vector d impose the individual sampled per-vertex constraints which will be satisfied in least-squares sense: ⎧ ⎨ wi if a displacement is specified for i, Kij = wi if i is a boundary vertex, (3.6) ⎩ 0 otherwise. The elements of d are: ⎧ ⎨ wi · (vi + ri ) if a displacement is specified for i, if i is a boundary vertex, di = wi · vi ⎩ 0 otherwise.
(3.7)
3 Reconstructing Human Shape, Motion and Appearance from MVV
45
The values wi are constraint weights, vi is the position coordinate of vertex i before deformation, and ri is the displacement for i. The least-squares solution to (3.5) is found by solving the linear system: T T L L L x = (L2 + K2 )x = d. K K K
(3.8)
Appropriate weights for the displacement constraints are easily found through experiments. After solving the 3 linear systems individually for x, y and z-coordinate directions, the new deformed body shape that is both colorand silhouette-consistent with all input views is obtained.
3.7 Confluent Laser-scanned Human Geometry The commitment to a parameterized body model enables us to make the motion estimation problem tractable. However, a model-based approach also implies a couple of limitations. Firstly, a template model is needed for each type of object that one wants to record. Secondly, the two first methods can not handle people wearing very loose apparel such as dresses or wide skirts. Furthermore, while a relatively smooth template model enables easy fitting to a wide range of body shapes, more detailed geometry specific to each actor would improve rendering quality even more. Thus, to overcome some of the limitations imposed by the previous template-based methods, we present in this section a method to incorporate a static high-quality shape prior into our framework. Our new method enables us to make a very detailed laser scan of an actor follow her motion that was previously captured from multi-view video. The input to our framework is a MVV sequence captured from a real actor performing. We can now apply any of the model-based tracking approaches, either the one from Sect. 3.5 or Sect. 3.6, to estimate shape and motion of the subject. Our algorithm now provides a simple and very efficient way to map the motion of the moving template onto the static laser scan. Subsequently, the scan imitates the movements of the template, and non-rigid surface deformations are generated on-the-fly. Thus, no manual skeleton transfer or blending weight assignment is necessary. To achieve this goal, we formulate the motion transfer problem as a deformation transfer problem. To this end, a sparse set of triangle correspondences between the template and the scan is specified, Fig. 3.8a. The transformations of the marked triangles on the template model are mapped to their counterparts on the scan. Deformations for in-between triangles are interpolated on the surface by means of a harmonic field interpolation. The surface of the deformed scan at each time step is computed by solving a Poisson system. Our framework is based on the principle of differential mesh editing and only requires the solution of simple linear systems to map poses of the template
46
C. Theobalt et al.
(a)
(b)
Fig. 3.8. (a) The motion of the template model is mapped onto the scan by only specifying a small number of correspondences between individual triangles. (b) The pose of the real actor, captured by the template model is accurately transfered to the high-quality laser scan
to the scan. Due to this computational efficiency we can map postures even to scans with several tens of thousands of triangles at nearly interactive rates. As an additional benefit, our algorithm implicitly solves the motion retargeting problem which gives us the opportunity to map input motions to target models with completely different body proportions. Figure 3.8b shows an example where we mapped the motion of our moving template model onto a high-quality laser scan. This way, we can easily use detailed dynamic scene geometry as underlying shape representation during free-viewpoint video rendering. For details on the correspondence specification procedure and the deformation interpolation framework, we would like to refer the reader to [37].
3.8 Free-viewpoint Video with Dynamic Textures By combining any of the three methods to capture dynamic scene geometry (Sects. 3.5, 3.6 or 3.7), with a dynamic texture generation we can create and render convincing free-viewpoint videos that reproduce the omni-directional appearance of the actor. Since time-varying video footage is available, model texture does not have to be static. Lifelike surface appearance is generated by using the projective texturing functionality of modern GPUs. Prior to display, the geometry of the actor as well as the calibration data of the video cameras is transferred to the GPU. During rendering, the viewpoint information, shape of the model, the current video images, as well as the visibility and blending coefficients νi , ωi for all vertices and cameras are continuously transferred to the GPU. The color of each rendered pixel c(j) is determined by blending all l video images Ii according to c(j) =
l i=1
νi (j) ∗ ρi (j) ∗ ωi (j) ∗ Ii (j)
(3.9)
3 Reconstructing Human Shape, Motion and Appearance from MVV
47
where ωi (j) denotes the blending weight of camera i, ρi (j) is the optional viewdependent rescaling factor, and νi (j) = {0, 1} is the local visibility. During texture pre-processing, the weight products νi (j)ρi (j)ωi (j) have been normalized to ensure energy conservation. Technically, (3.9) is evaluated for each fragment by a fragment program on the GPU. The rasterization engine interpolates the blending values from the triangle vertices. By this means, time-varying cloth folds and creases, shadows and facial expressions are faithfully reproduced, lending a very natural, dynamic appearance to the rendered object. The computation of the blending weights and the visibility coefficients is explained in the following two subsections. 3.8.1 Blending Weights The blending weights determine the contribution of each input camera image to the final color of a surface point. If surface reflectance can be assumed to be approximately Lambertian, view-dependent reflection effects play no significant role, and high-quality, detailed model texture can still be obtained by blending the video images cleverly. Let θi denote the angle between a vertex normal and the optical axis of camera i. By emphasizing for each vertex individually the camera view with the smallest angle θi , i.e. the camera that views the vertex most head-on, a consistent, detail-preserving texture is obtained. A visually convincing weight assignment has been found to be ωi =
1 (1 + maxj (1/θj ) − 1/θi )α
(3.10)
where the weights ωi are additionally normalized to sum up to unity. The parameter α determines the influence of vertex orientation with respect to camera viewing direction and the impact of the most head-on camera view per vertex, Fig. 3.9. Singularities are avoided by clamping the value of 1/θi to a maximal value. Although it is fair to assume that everyday apparel has purely Lambertian reflectance, in some cases the reproduction of view-dependent appearance
(a) α = 0
(b) α = 3
(c) α = 15
Fig. 3.9. Texturing results for different values of the control factor α
48
C. Theobalt et al.
effects may be desired. To serves this purpose, our method provides the possibility to compute view-dependent rescaling factors, ρi , for each vertex onthe-fly while the scene is rendered: ρi =
1 φi
(3.11)
where φi is the angle between the direction to the outgoing camera and the direction to input camera i. 3.8.2 Visibility Projective texturing on GPU has the disadvantage that occlusion is not taken into account, so hidden surfaces get also textured. The z-buffer test, however, allows determining for every time step which object regions are visible from each camera. Due to inaccuracies in the geometry model, it can happen that the silhouette outlines in the images do not correspond exactly to the outline of the model. When projecting video images onto the model, a texture seam belonging to some frontal body segment may fall onto another body segment farther back, Fig. 3.10a. To avoid such artifacts, extended soft shadowing is applied. For each camera, all object regions of zero visibility are determined not only from the actual position of the camera, but also from several slightly displaced virtual camera positions. Each vertex is tested whether it is visible from all camera positions. A triangle is textured by a camera image only if all of its three vertices are completely visible from that camera. While too generously segmented silhouettes do not affect rendering quality, too small outlines can cause annoying untextured regions. To counter such rendering artifacts, all image silhouettes are expanded by a couple of pixels prior to rendering. Using a morphological filter operation, the object outlines of all video images are dilated to copy the silhouette boundary pixel color values to adjacent background pixel positions, Fig. 3.10b.
Fig. 3.10. (a) Small differences between object silhouette and model outline cause erroneous texture projections. (b) Morphologically dilated segmented input video frames that are used for projective texturing
3 Reconstructing Human Shape, Motion and Appearance from MVV
49
3.9 Results We have presented a variety of methods that enable us to capture shape and motion of moving people. The coupling of either of these methods with a texturing approach enables us to generate realistic 3D videos. In the following, we will briefly discuss the computational performance of each of the described methods, and will comment on the visual results obtained when using them to generate free-viewpoint renderings. Free-viewpoint videos reconstructed with each of the described methods can be rendered in real-time on a standard PC. During display, the user can interactively choose an arbitrary viewpoint onto the scene. We have applied our method to a variety of scenes ranging from simple walking motion over fighting performances to complex and expressive ballet dance. Ballet dance performances are ideal test cases as they exhibit rapid and complex motion. The silhouette-based analysis-through-synthesis method (Sect. 3.5) demonstrates that it is capable of robustly following human motion involving fast arm motion, complex twisted poses of the extremities, and full body turns (Fig. 3.11). Even on a single Intel XEON 1.8 GHz PC featuring a fairly old Nvidia GeForce 3 graphics board, it takes approximately 3 to 14 seconds to determine a single pose. On a state-of-the-art PC fitting times in the range of a second are feasible and with a parallel implementation, almost interactive frame rates can be generated. Figures 3.11 and 3.12 show the visual results obtained by applying the dynamic texturing scheme to generate realistic time-varying surface appearance. A comparison to the true input images confirms that the virtual viewpoint
Fig. 3.11. Variant I: Novel viewpoints are realistically synthesized. Two distinct time instants are shown on the left and right with input images above and novel views below
50
C. Theobalt et al.
Fig. 3.12. Variant I: Conventional video systems cannot offer moving viewpoints of scenes frozen in time. With our framework freeze-and-rotate camera shots of body poses are possible. The pictures show such novel viewpoints of scenes frozen in time for different subjects and different types of motion
renditions look very lifelike. By means of clever texture blending, a contiguous appearance with only few artifacts can be achieved. Although clever texture blending can cloak most geometry inaccuracies in the purely silhouette-fitted body model, a dynamically refined shape representation can lead to an even better visual quality. Improvements in rendering are due to improved geometry and, consequently, less surface blending errors during projective texturing. Figure 3.13a illustrates the typical geometry improvements that we can achieve through our spatio-temporal shape refinement approach. Often, we can observe improvements at silhouette boundaries. Due to the computation of per-time step displacements we can capture shape variations that are not representable by means of varying B´ezier parameters. In general, the exploitation of temporal coherence already at the stage of STC model reconstruction leads to a better approximation of the true body shape which is most clearly visible if the actor strikes extreme body postures as they occur in Tai Chi, Fig. 3.13b. However, the shape improvements come at the cost of having to solve a more complex optimization problem. In a typical setting, it takes around 15 minutes on a 3.0 GHz Pentium IV with a GeForce 6800 to compute the spatio-temporally consistent model. For STC
3 Reconstructing Human Shape, Motion and Appearance from MVV
51
(a)
(b)
Fig. 3.13. Variant II: (a) The right subimage in each pair of images shows the improvements in shoulder geometry that are achieved by capturing time-varying surface displacements in comparison to pure silhouette matching (left subimages). (b) Different time instants of a Tai Chi motion; the time-varying shape of the torso has been recovered
reconstruction, we do not use the full 100 to 300 frames of a sequence but rather select 5 to 10 representative postures. The per time step surface deformation takes typically around 4 minutes per time step using our non-optimized implementation. Both purely template-based algorithmic variants are subject to a couple of limitations. Firstly, there certainly exist extreme body posture, such as the fetal position, that cannot be faithfully recovered with our motion estimation scheme. However, in all the test sequences we processed, body postures were captured reliably. Secondly, the application of a clever texture blending enables us to generate free-viewpoint renderings at high quality. Nonetheless, the commitment to a segmented shape model may cause some problems. For instance, it is hard to implement a surface skinning scheme given that the model comprises of individual geometry segments. Furthermore, it is hard to capture the geometry of people wearing fairly loose apparel since the anthropomorphic degrees of freedom provide a limited deformation range. Even the per-time step vertex displacements cannot capture all crisp surface details since we implicitly apply a smoothness constraint during Laplacian deformation. Fortunately, our third algorithmic variant (Sect. 3.7) allows us to overcome some of the limitations imposed by the template model. It allows us to apply a laser-scanned shape model in our free-viewpoint rendering framework. While the template model is still
52
C. Theobalt et al.
employed for capturing the motion, a scanned surface mesh is used during rendering. To make the scan mimic the captured motion of the actor, we apply a mesh deformation transfer approach that requires the user only to specify a handful of triangle correspondences. As an additional benefit, the motion transfer approach implicitly generates realistic non-rigid surface deformations. To demonstrate the feasibility of our method, we have acquired full body scans of several individuals prior to recording their motion with our multicamera setup. Figure 3.14 shows two free-viewpoint renditions of a dynamically textured animated scan next to ground truth input images. In both cases, the renderings convincingly reproduce the true appearance of the actor. The high-detail human model enables us to faithfully display also the shape of wider apparel that can not be fully reproduced by a deformable template. Also, the shapes of different heads with potentially different hair styles can be more reliably modeled. The scan model also has a number of conceptual advantages. For instance, a contiguous surface parameterization can now be computed which facilitates higher-level processing operations on the surface appearance, as they are, for instance, needed in reflectance estimation [75, 76]. Nonetheless, there remain a few limitations. For instance, subtle timevarying geometry details that are captured in the scans, such as folds in the trousers, seem to be impainted into the surface while the scan moves. We believe that this can be solved by applying a photo-consistency-based deformation of the surface at each time step. Also, the placement of the correspondences by the user requires some training. In our experience though, even unexperienced people rather quickly gained a feeling for correct correspondence placement. Despite these limitations, this is the first approach of its kind that enables capturing the motion of a high-detail shape model from video without any optical markings in the scene.
Fig. 3.14. Variant III: Our Confluent 3D Video approach enables the creation of free-viewpoint renditions with high-quality geometry models. Due to the accurate geometry the rendered appearance of the actor (left sub-images) nicely corresponds to his true appearance in the real world (right sub-images)
3 Reconstructing Human Shape, Motion and Appearance from MVV
53
3.10 Conclusions In this chapter, three very powerful and robust methods to capture timevarying shape, motion and appearance of moving humans from only a handful of multi-view video streams were presented. By applying a clever dynamic 3D texturing method from camera images to the moving geometry representations, we can render realistic free-viewpoint videos of people in real-time on a standard PC. In combination, the presented methods enable high-quality reconstruction of human actors completely passively, which has never been possible up to now. Our model-based variants also open the door for attacking new challenging reconstruction problems that were hard due to the lack of a decent dynamic scene capture technology, e. g. cloth tracking. Depending on the application in mind, each one of them has its own advantages and disadvantages. Thus, the question which one eventually is the best has to be answered by the user.
Acknowledgements This work is supported by EC within FP6 under Grant 511568 with the acronym 3DTV.
References 1. B. Bodenheimer, C. Rose, S. Rosenthal, and J. Pella. The process of motion capture: Dealing with the data. In Proc. of Eurographics Computer Animation and Simulation, 1997. 2. G. Johannson. Visual perception of biological motion and a model for its analysis. In Perception and Psychophysics, 14(2):201–211, 1973. 3. M. Gleicher. Animation from observation: Motion capture and motion editing. In Computer Graphics, 4(33):51–55, November 1999. 4. L. Herda, P. Fua, R. Plaenkers, R. Boulic, and D. Thalmann. Skeleton-based motion capture for robust reconstruction of human motion. In Proc. of Computer Animation 2000, IEEE CS Press 2000. 5. M. Ringer and J. Lasenby. Multiple-hypothesis tracking for automatic human motion capture. In Proc. of European Conference on Computer Vision, 1: 524–536, 2002. 6. www.vicon.com. 7. T.B. Moeslund and E. Granum. A survey of computer vision-based human motion capture. In CVIU, 81(3):231–268, 2001. 8. K.M. Cheung, T. Kanade, J.-Y. Bouguet, and M. Holler. A real time system for robust 3D voxel reconstruction of human motions. In Proc. of CVPR, 2: 714–720, June 2000. 9. I. Miki´c, M. Triverdi, E. Hunter, and P. Cosman. Articulated body posture estimation from multicamera voxel data. In Proc. of CVPR, 1:455ff, 2001.
54
C. Theobalt et al.
10. C. Sminchisescu and B. Triggs. Kinematic jump processes for monocular 3d human tracking. In Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, I 69–76, 2003. 11. D.M. Gavrila and L.S. Davis. 3D model-based tracking of humans in action: A multi-view approach. In CVPR 96, 73–80, 1996. 12. I.A. Kakadiaris and D. Metaxas. Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In Proc. CVPR, 81–87, Los Alamitos, California, USA, 1996. IEEE Computer Society. 13. H. Sidenbladh, M.J. Black, and J.D. Fleet. Stochastic tracking of 3D human figures using 2D image motion. In Proc. of ECCV, 2:702–718, 2000. 14. L. Goncalves, E. DiBernardo, E. Ursella, and P. Perona. Monocular tracking of the human arm in 3D. In Proc. of CVPR, 764–770, 1995. 15. R. Plaenkers and P. Fua. Articulated soft objects for multi-view shape and motion capture. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 2003. 16. A. Mittal, L. Zhao, and L.S. Davis. Human body pose estimation using silhouette shape analysis. In Proc. of Conference on Advanced Video and Signal-based Surveillance (AVSS), 263ff, 2003. 17. J. O’Rourke and N.I. Badler. Model-based image analysis of human motion using constraint propagation. In PAMI, 2(6), 1980. 18. Z. Chen and H. Lee. Knowledge-guided visual perception of 3d human gait from a single image sequence. In IEEE Transactions on Systems, Man and Cybernetics, 22(2):336–342, 1992. 19. N. Grammalidis, G. Goussis, G. Troufakos, and M.G. Strintzis. Estimating body animation parameters from depth images using analysis by synthesis. In Proc. of Second International Workshop on Digital and Computational Video (DCV’01), 93ff, 2001. 20. R. Koch. Dynamic 3D scene analysis through synthesis feedback control. In PAMI, 15(6):556–568, 1993. 21. G. Martinez. 3D motion estimation of articulated objects for object-based analysis-synthesis coding (OBASC). In VLBV 95, 1995. 22. I.A. Kakadiaris and D. Metaxas. 3D human body model acquisition from multiple views. In Proc. of ICCV’95, 618–623, 1995. 23. Q. Delamarre and O. Faugeras. 3D articulated models and multi-view tracking with silhouettes. In ICCV99, 716–721, 1999. 24. S. Yonemoto, D. Arita, and R. Taniguchi. Real-time human motion analysis and ik-based human figure control. In Proc. of IEEE Workshop on Human Motion, 149–154, 2000. 25. C. Bregler and J. Malik. Tracking people with twists and exponential maps. In Proc. of CVPR 98, 8–15, 1998. 26. M.M. Covelle, A. Rahimi, M. Harville, and T.J. Darrell. Articulated pose estimation using brighness and depth constancy constraints. In Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, 2: 438– 445, 2000. 27. B. Rosenhahn, T. Brox, and J. Weickert. Three-dimensional shape knowledge for joint image segmentation and pose tracking. In To appear in International Journal of Computer Vision, 2006. 28. B. Deutscher, A. Blake, and I. Reid. Articulated body motion capture by annealed particle filtering. In Proc. of CVPR’00, 2: 2126ff, 2000.
3 Reconstructing Human Shape, Motion and Appearance from MVV
55
29. T. Drummond and R. Cipolla. Real-time tracking of highly articulated structures in the presence of noisy measurements. In Proc. of ICCV, 2: 315–320, 2001. 30. J. MacCormick and M. Isard. Partitioned sampling, articulated objects, and interface-quality hand tracking. In Proc. of European Conference on Computer Vision, 2:3–19, 2000. 31. H. Sidenbladh, M. Black, and R. Sigal. Implicit probabilistic models of human motion for synthesis and tracking. In Proc. of ECCV, 1:784–800, 2002. 32. C. Theobalt, M. Magnor, P. Sch¨ uler, and H.-P. Seidel. Combining 2d feature tracking and volume reconstruction for online video-based human motion capture. In Proc. of the 10th Pacific Conference on Computer Graphics and Applications (Pacific Graphics 2002), pages 96–103, Beijing, China, 2002. IEEE. 33. A. Bottino and A. Laurentini. A silhouette based technique for the reconstruction of human movement. In CVIU, 83:79–95, 2001. 34. G. Cheung, S. Baker, and T. Kanada. Shape-from-silhouette of articulated objects and its use for human body kinematics estimation and motion capture. In Proc. of CVPR, 2003. 35. J. Carranza, C. Theobalt, M.A. Magnor, and H.-P. Seidel. Free-viewpoint video of human actors. In Proc. of SIGGRAPH’03, 569–577, 2003. 36. E. de Aguiar, C. Theobalt, M. Magnor, and H.-P. Seidel. Reconstructing human shape and motion from multi-view video. In 2nd European Conference on Visual Media Production (CVMP), pages 42–49, London, UK, December 2005. The IEE. 37. E. de Aguiar, R. Zayer, C. Theobalt, M. Magnor, and H.-P. Seidel. A framework for natural animation of digitized models. Research Report MPI-I-2006-4-003, Saarbruecken, Germany, July 2006. Max-Planck-Institut fuer Informatik. 38. C. Theobalt, E. de Aguiar, M. Magnor, H. Theisel, and H.-P. Seidel. Markerfree kinematic skeleton estimation from sequences of volume data. In ACM Symposium on Virtual Reality Software and Technology (VRST 2004), pages 57–64, Hong Kong, China, November 2004. ACM. 39. E. de Aguiar, C. Theobalt, M. Magnor, H. Theisel, and H.-P. Seidel. M3: Marker-free model reconstruction and motion tracking from 3d voxel data. In 12th Pacific Conference on Computer Graphics and Applications, PG 2004, pages 101–110, Seoul, Korea, October 2004. IEEE. 40. E. de Aguiar, C. Theobalt, and H.-P. Seidel. Automatic learning of articulated skeletons from 3d marker trajectories. In Proc. of ISVC’06, 2006. 41. B. Allen, B. Curless, and Z. Popovic. Articulated body deformations from range scan data. In Proc. of ACM SIGGRAPH 02, 612–619, 2002. 42. P. Sand, L. McMillan, and J. Popovic. Continuous capture of skin deformation. In ACM Transactions on. Graphics, 22(3):578–586, 2003. 43. D. Anguelov, P. Srinivasan, D. Koller, S. Thrun, J. Rogers, and J. Davis. SCAPE - shape completion and animation of people. In ACM Transactions on Graphics (Proc. of SIGGRAPH’05), 24(3): 408–416, 2005. 44. W. Matusik, C. Buehler, R. Raskar, S.J. Gortler, and L. McMillan. Image-based visual hulls. In Proc. of ACM SIGGRAPH 00, 369–374, 2000. 45. S. W¨ urmlin, E. Lamboray, O.G. Staadt, and M.H. Gross. 3d video recorder. In Proc. of IEEE Pacific Graphics, 325–334, 2002. 46. T. Matsuyama and T. Takai. Generation, visualization, and editing of 3D video. In Proc. of 1st International Symposium on 3D Data Processing Visualization and Transmission (3DPVT’02), 234ff, 2002.
56
C. Theobalt et al.
47. M.H. Gross, S. W¨ urmlin, M. N¨ af, E. Lamboray, C.P. Spagno, A.M. Kunz, E. Koller-Meier, T. Svoboda, L.J. Van Gool, S. Lang, K. Strehlke, A. Vande Moere, and O.G. Staadt. blue-c: a spatially immersive display and 3d video portal for telepresence. In ACM Transactions on Graphics (Proc. of SIGGRAPH’03), 22(3):819–827, 2003. 48. M. Li, H. Schirmacher, M. Magnor, and H.-P. Seidel. Combining stereo and visual hull information for on-line reconstruction and rendering of dynamic scenes. In Proc. of IEEE Multimedia and Signal Processing, 9–12, 2002. 49. C. Lawrence Zitnick, S. Bing Kang, M. Uyttendaele, S. Winder, and R. Szeliski. High-quality video view interpolation using a layered representation. In ACM TOC (Proc. SIGGRAPH’04), 23(3):600–608, 2004. 50. T. Kanade, P. Rander, and P.J. Narayanan. Virtualized reality: Constructing virtual worlds from real scenes. In IEEE MultiMedia, 4(1):34–47, 1997. 51. M. Waschb¨ usch, S. W¨ urmlin, D. Cotting, F. Sadlo, and M. Gross. Scalable 3D video of dynamic scenes. In Proc. of Pacific Graphics, 629–638, 2005. 52. M. Levoy and P. Hanrahan. Light field rendering. In Proc. of ACM SIGGRAPH’96, 31–42, 1996. 53. W. Matusik and H. Pfister. 3d tv: A scalable system for real-time acquisition, transmission, and autostereoscopic display of dynamic scenes. In ACM Transactions on Graphics (Proc. of SIGGRAPH’04), 23(3):814–824, 2004. 54. C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel. A parallel framework for silhouette-based human motion capture. In Vision, Modeling and Visualization 2003 (VMV-03): Proc., pages 207–214, Munich, Germany, November 2003. 55. C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel. Enhancing silhouettebased human motion capture with 3d motion fields. In Jon Rokne, Reinhard Klein, and Wenping Wang, editors, 11th Pacific Conference on Computer Graphics and Applications (PG-03), pages 185–193, Canmore, Canada, October 2003. IEEE. 56. C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel. Combining 3d flow fields with silhouette-based human motion capture for immersive video. In Graphical Models, 66:333–351, September 2004. 57. C. Theobalt, J. Carranza, M. Magnor, and H.-P. Seidel. 3d video – being part of the movie. In ACM SIGGRAPH Computer Graphics, 38(3):18–20, August 2004. 58. N. Ahmed, E. de Aguiar, C. Theobalt, M. Magnor, and H.-P. Seidel. Automatic generation of personalized human avatars from multi-view video. In VRST ’05: Proc. of the ACM Symposium on Virtual Reality Software and Technology, pages 257–260, Monterey, USA, December 2005. ACM. 59. M. Alexa, M.-P. Cani, and K. Singh. Interactive shape modeling. In Eurographics Course Notes. 2005. 60. O. Sorkine. Differential representations for mesh processing. In Computer Graphics Forum, 25(4), 2006. 61. R. Zayer, C. R¨ ossl, Z. Karni, and H.-P. Seidel. Harmonic guidance for surface deformation. In Marc Alexa and Joe Marks, editors, Proc. of Eurographics 2005, 24:601–609, 2005. 62. R.W. Sumner and J. Popovic. Deformation transfer for triangle meshes. In ACM Transactions on Graphics, 23(3):399–405, 2004. 63. R.W. Sumner, M. Zwicker, C. Gotsman, and J. Popovic. Mesh-based inverse kinematics. In ACM Transactions on Graphics, 24(3):488–495, 2005.
3 Reconstructing Human Shape, Motion and Appearance from MVV
57
64. K.G. Der, R.W. Sumner, and J. Popovic. Inverse kinematics for reduced deformable models. In ACM Transactions on Graphics, 25(3):1174–1179, 2006. 65. L. Shi, Y. Yu, N. Bell, and W.-W. Feng. A fast multigrid algorithm for mesh deformation. In ACM Transactions on Graphics, 25(3):1108–1117, 2006. 66. J. Huang, X. Shi, X. Liu, K. Zhou, L.-Y. Wei, S.-H. Teng, H. Bao, B. Guo, and H.-Y. Shum. Subspace gradient domain mesh deformation. In ACM Transactions on Graphics, 25(3):1126–1134, 2006. 67. H.P.A. Lensch, W. Heidrich, and H.-P. Seidel. A silhouette-based algorithm for texture registration and stitching. In Graphic Models, 63(4): 245–262, 2001. 68. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. In Numerical Recipes in C++. 2002, Cambridge University Press. 69. P. Fua and Y.G. Leclerc. Object-centered surface reconstruction: Combining multi-image stereo and shading. In International Journal of Computerised Vision, 16(1):35–55, 1995. 70. M.N. Kolountzakis and K.N. Kutulakos. Fast computation of the euclidian distance maps for binary images. In Information Processing Letters, 43(4):181–184, 1992. 71. P.P. Pebay and T.J. Baker. A comparison of triangle quality measures. In Proc. to the 10th International Meshing Roundtable, 327–340, 2001. 72. R. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. In SIAM Journal of Science Comparative, 16(5): 1190–1208, 1995. 73. G. Farin. Curves and Surfaces for CAGD: A Practical Guide, 1999. Morgan Kaufmann. 74. Y. Lipman, O. Sorkine, D. Cohen-Or, D. Levin, C. R¨ossl, and H.-P. Seidel. Differential coordinates for interactive mesh editing. In Franca Giannini and Alexander Pasko, editors, Shape Modeling International 2004 (SMI 2004), pages 181–190, Genova, Italy, 2004. IEEE. 75. C. Theobalt, N. Ahmed, E. de Aguiar, G. Ziegler, H.P.A. Lensch, M. Magnor, and H.-P. Seidel. Joint motion and reflectance capture for creating relightable 3d videos. Research Report MPI-I-2005-4-004, Saarbruecken, Germany, April 2005, Max-Planck-Institut fuer Informatik. 76. C. Theobalt, N. Ahmed, H.P.A. Lensch, M. Magnor, and H.-P. Seidel. Enhanced dynamic reflectometry for relightable free-viewpoint video, Research Report MPI-I-2006-4-006, Saarbr¨ ucken, Germany, 2006, Max-Planck-Institut fuer Informatik.
4 Utilization of the Texture Uniqueness Cue in Stereo Xenophon Zabulis Informatics and Telematics Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
4.1 Introduction The cue to depth due to the assumption of texture uniqueness has been widely utilized in approaches to shape-from-stereo. Despite the recent growth of methods that utilize spectral information (color) or silhouettes to threedimensionally reconstruct surfaces from images, the depth cue due to the texture uniqueness constraint remains relevant, as being utilized by a significant number of contemporary stereo systems [1, 2]. Certainly, combination with other cues is necessary for maximizing the quality of the reconstruction, since they provide of additional information and since the texture-uniqueness cue exhibits well-known weaknesses; e.g. at cases where texture is absent or at the so-called “depth discontinuities”. The goal of this work is to provide of a prolific, in terms of accuracy, precision and efficiency, approach to the utilization of the texture uniqueness constraint which can be, thereafter, combined with other cues to depth. The uniqueness constraint assumes that a given pixel from one image can match to no more than one pixel from the other image [3, 4]. In stereo methods, the uniqueness constraint is extended to assume that, locally, each location on a surface is uniquely textured. The main advantages of the cue derived from the uniqueness constraint over other cues to depth are the following. It is independent from silhouette-extraction, which requires an accurate segmentation (e.g. [5]). It is also independent of any assumption requiring that cameras occur around the scene (e.g. [6]) or on the same baseline (e.g. [7, 8]). Moreover, it does not require that cameras are spectrally calibrated, such as in voxel carving/coloring approaches (e.g. [9, 10, 11]). The locality of the cue due to the uniqueness constraint facilitates multi-view and parallel implementations, for real-time applications [12, 13, 14]. Traditionally, stereo-correspondences have been established through a similarity search, which matched image neighborhoods based on their visual similarity [15, 16]. After the work in [17], volumetric approaches have emerged that establish correspondences among the acquired images after
60
X. Zabulis
backprojecting them onto a hypothetical surface. Space-sweeping methods [17, 18, 19, 20, 21, 22, 23], backrpoject the images onto a planar surface, which is “swept” along depth to evaluate different disparities. Orientation-optimizing methods [24, 25, 26, 27, 28], compare the backprojections onto hypothetical surface segments (surfels), which are evaluated at range of potential locations and orientations in order to find the location and orientation at which the evaluated backprojections match the best. The relation with the traditional, neighborhood-matching, way of establishing correspondences is the following. In volumetric methods, a match is detected as a location at which the similarity of backprojections is locally maximized. This location is considered as the 3D position of the surface point and the image points that it projects at as corresponding. Orientation-optimizing methods compensate for the projective distortion in the similarity-matching process and have been reported to be of superior accuracy to window-matching and space-sweeping approaches [1, 29]. The reason is that the matching process is more robust when the compared textures are relieved from - the different for each camera - projective distortion. On the other hand, their computational cost is increased as many times as the number of evaluated orientations of the hypothetical surface segment. The remainder of this chapter is organized as follows. In Sect. 4.2, relevant work is reviewed. In Sect. 4.3, the uniqueness cue and relevant theory are defined in the context of volumetric approaches and an accuracy-increasing extension to this theory is proposed. In Sect. 4.4, computational optimizations that accelerate the above method and increase its precision are proposed. In addition, the theoretical findings of Sect. 4.3 are employed to define a space-sweeping approach of enhanced accuracy, which is then combined with orientation-optimizing methods into a hybrid method. Finally in Sect. 4.5, conclusions are drawn and the utilization of the contributions of this work by stereo methods is discussed.
4.2 Related Work Both traditional and volumetric techniques optimize a similarity criterion along a spatial direction in the world or in the image, to determine the disparity or position of points on the imaged surfaces. In traditional stereo (e.g. [15, 16]) or space-sweeping [17, 18, 19, 20, 21, 22, 23], a single orientation is considered, typically the one frontoparallel to the cameras. Orientationoptimizing techniques [24, 25, 26, 27, 28] consider multiple orientations at an additional computational cost, but also provide an estimation of the normal of the imaged surface. As benchmarked in the literature [1, 2] and explained in Sect. 4.3.2, the accuracy of sweeping methods is limited in the presence of surface slant, compared to methods that account for surface orientation.
4 Utilization of the Texture Uniqueness Cue in Stereo
61
Although that space-sweeping approaches produce results of similar accuracy to the, traditional, neighborhood-matching algorithms [1, 29] they exhibit decreased computational cost. Furthermore, the time-efficiency of sweeping methods is reinforced when implemented to execute in commodity graphics hardware [23, 30, 31]. Due to its “single-instruction multiple-data” architecture, graphics hardware executes the, essential for the space-sweeping approach, warping and convolution operations in parallel. Regarding the shape and the size of sweeping surfaces, it has been shown [32] that projectively expanding this surface (as in [30, 31, 32, 33, 34]) exploits better the available pixel resolution, than implementing the sweep as a simple translation of the sweeping plane [17, 18, 19, 20, 21, 22, 23]. In this context, a more accurate space-sweeping method is proposed in Sect. 4.4.4. In orientation-optimizing approaches, the size of the hypothetical surface patches has been formulated as constant [24, 25, 26, 27, 28]. Predefined sets of sizes have been utilized in [28] and [24], in a coarse-to-fine acceleration scheme. However, the evaluated sizes were the same for any location and orientation of the patch, rather than modulated as their function. Metrics for evaluating the similarity of two image regions (or backprojections) fall under two main categories. Photoconsistency [20, 35] and texture similarity [21, 22, 23, 36, 37], which is typically implemented using the SAD, SSD, NCC, or MNCC [38] operators. The difficulty using the photoconsistency metric is that radiometric calibration of the cameras is required, which is difficult to achieve and retain. In contrast, the NCC and MNCC metrics are not sensitive to the absence of radiometric calibration, since they compare the correlation of intensity values rather than their differences [39]. Finally, some sweeping-based stereo reconstruction algorithms match sparse image features [17, 18, 19] but are, thus, incapable of producing dense depth maps. Global optimization approaches have also utilized the uniqueness approach [7, 26, 40, 41, 42, 43], but can yield local minima of the overall cost function and are much more difficult to parallelize than local volumetric approaches. As in local methods, the similarity operator is either an oriented backprojection surface segment (e.g. [26]) or an image neighborhood (e.g. [7]). Thus, regardless of how the readings of this operator are utilized by the reconstruction algorithm, the proposed enhancement of the hypothetical surface patch operator should only improve the accuracy of these approaches. Finally, the assumption of surface continuity [3, 4] has been utilized for resolving ambiguities as well as correcting inaccuracies (e.g. [7, 41]). In traditional stereo, some approaches to enforce this constraint are to filter the disparity map [13], bias disparity values to be in coherence with neighboring [7], or require inter-scanline consistency [42]. The continuity assumption has been also utilized in 2D [44], but seeking continuity in the image intensity domain. The assumption has also been enforced to improve the quality of reconstruction, in post-processing; an abundance of approaches for 3D filtering of the results exists in the deformable models literature (see [45] for a review).
62
X. Zabulis
4.3 The Texture Uniqueness Cue in 3D In this section, the texture uniqueness cue is formulated volumetrically, or in 3D, and it is shown that this formulation can lead to more accurate reconstruction methods than the, traditional, 2D formulation. Next, the spatial extent over which textures are matched is considered and an accuracy-increasing extension to orientation-optimizing approaches is proposed. It is noted that, henceforth, it is assumed that images portray Lambertian surfaces, which can also be locally approximated by planar patches. Extension of these concepts beyond the Lambertian domain can be found in [46]. 4.3.1 Uniqueness Cue Formulation Let a calibrated image pair Ii = 1,2 , acquired from two cameras with centers o2 o1,2 and principal axes e1,2 ; cyclopean eye is at o = o1 + and mean optical 2 e1 + e2 axis is e = 2 . Let also a planar surface patch S, of size α × α, centered at p, with unit normal n. Backprojecting Ii onto S yields image wi ( p, n): wi ( (4.1) p, n) = Ii Pi · p + R(n) · [x′ y ′ 0]T ,
where Pi is the projection matrix of Ii , R(n) is a rotation matrix so that R(n) · [0 0 1]T = n and x′ , y ′ ∈ [− α2 , α2 ] local coordinates on S. When S is tangent at a world surface, wi are identities of the surface pattern (see Fig. 4.1 left). Thus I1 (P1 x) = I2 (P2 x), ∀x ∈ S, and therefore their similarity is optimal. Otherwise wi are dissimilar, because they are collineations from different surface regions. Assuming a voxel tessellation of space, the locations of surface points and corresponding normals can be
Fig. 4.1. Left: A surface is projectively distorted in images I1,2 , but the ′ collineations w1,2 from a planar patch tangent to this surface are not (from [28], c 2004 IEEE). Right: Illustration of the geometry for (4.4)
4 Utilization of the Texture Uniqueness Cue in Stereo
63
recovered by estimating the positions at which similarity is locally maximized along the direction of the surface normal, or otherwise, exhibit a higher similarity value than their (two) neighbors in that direction. Such a location will be henceforth referred to as a similarity local maximum, or simply, local max ( imum. To localize the similarity local maxima, function V p) = s( p)κ( p), is evaluated as: s( p) = max (sim(w1 ( p, n), w2 ( p, n))) , (4.2) n
κ( p) = arg max (s( p)) . n
(4.3)
where s( p) the optimal correlation value at p, and κ( p) the optimizing orien′ tation. The best matching backprojections are w1,2 = w1,2 ( p, κ). Metric sim can be SAD, SSD, NCC, MNCC etc. To evaluate sim, a r × r lattice of points is assumed on S. In addition, a threshold τc is imposed on s so that local maxima of small similarity value are not interpreted as surface occurrences. The parameterization of n requires two dimensions and can be expressed in terms of longitude and latitude, which define any orientation within a unit hemisphere. To treat equally different eccentricities of S the orientation, c = [xc yc zc ]T , that corresponds to the pole of this hemisphere points to o; that is c = p − o (see Fig. 4.1 right). The parameterized orientations n = [xi , yi , zi ]T are then: zc · xc · cos ω · sin ψ − yc · N1 · sin ω · sin ψ + xc · N2 · cos ψ N1 · N2 zc · yc · cos ω · sin ψ + xc · N1 · sin ω · sin ψ + yc · N2 · cos ψ yi = (4.4) N1 · N2 zc · cos ψ − N2 · cos ω · sin ψ , zi = N1
x2c + yc2 + zc2 , N2 = x2c + yc2 , ω ∈ [0, 2π), ψ ∈ [0, π2 ). where N1 = The corresponding rotation that maps [0 0 1]T to the particular orientation is R = Rx · Ry , where Rx is the rotation matrix for a cos−1 zk rotation about the xx′ axis. If xk = 0, then Ry is the rotation matrix for a tan−1 ( xykk ) rotation about the yy ′ axis or, otherwise, Ry is the 3 × 3 identity matrix. The computational cost of the optimization for a voxel is O(N r2 ), where N is the number of orientations evaluated by n. A reconstruction of the locations and corresponding normals of the imaged surfaces can be obtained by detecting the similarity local maxima that are due to the occurrence of a surface. These maxima can be detected as the positions where s is maximized along the direction of the surface normal [28]. An estimation of the surface normal is provided by κ since, according to 4.3, κ should coincide with the surface normal. A suitable algorithmic approach to the computation of the above location is given by a 3D version of the Canny edge detector [47]. In this version, the gradient is also 3-dimensional, its magnitude is given by s( p) and its direction by κ( p). The non-maxima suppression xi =
64
X. Zabulis
step of Canny’s algorithm performs, in essence, the detection of similarity lo that are not local maxima along the cal maxima since it rejects all voxels in V surface normal. A robust implementation of the above approach is achieved . following the work in [48], but substituting the 3D gradient with V 4.3.2 Search Direction and Accuracy of Reconstruction Both optimizing for the orientation of the hypothesized patch S and detecting similarity maxima in the direction of the surface normal increase the accuracy of final reconstruction. The claimed increase in accuracy for the uniqueness cue is theoretically expected due to the following proposition: the spatial error of surface reconstruction is a monotonically increasing function of the angle between the normal vector of the imaged surface and the spatial direction over which a similarity measure is optimized (proof in Appendix A). This proposition constitutes of a mathematical explanation of why space-sweeping approaches are less accurate than orientation optimizing methods. Intuitively, the inaccuracy is due to the fact that the backprojections on S, which is oriented differently than the imaged surface, do not correspond to the same world points - except for the central point of S. The above proposition also explains why similarity local maxima are optimally recovered when backprojections are evaluated tangentially to the surface to be reconstructed. In addition to the above, when the search direction for local maxima is in wide disagreement with the surface normal, inhibition of valid maxima occurs, deteriorating the quality of reconstruction even further. The reason is that an inaccurate search direction may point to and, thus, suppress validly occupied neighboring voxels. When κ is more accurate, this suppression attenuates because κ points perpendicularly to the surface. The following experiments confirm that detecting similarity local maxima along κ and optimizing n provides of a more fidelious reconstruction. The first experiment utilizes computer simulation to show that this improvement of accuracy occurs even in synthetic images, where noise and calibration errors are absent. In the experiment, simulated was a binocular camera pair that imaged obliquely a planar surface. A planar patch S, oriented so that its normal was equal to e, was swept along depth. At each depth, the locations of the surface points that were imaged on S, through the backprojection process, were calculated for each camera. Thus at a given depth, a point on S indicates a pair of world points occurring someplace on the imaged surface. For each depth, the distances of such pairs of world points were summed. In Fig. 4.2(a–c), the setup as well as the initial, middle and final position of the patch are shown. Figure 4.2(d) shows the sum of distances obtained for each depth value, for a r × r, r = 11 grid on S. According to the prediction, the minimum of this summation function does not occur at δ = 0, which is the correct depth. The dislocation of this minimum is the predicted depth error for this setup. The experiment shows that even in ideal imaging conditions,
4 Utilization of the Texture Uniqueness Cue in Stereo 6
6
5
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
0
−1
−1
−1
−2 −3 −20
−2 −18
−16
−14
−12
−10
−3 −20
150 sum of coordinate distances
6
−2 −18
−16
−14
−12
−10
−3 −20
65
100
50
0 −2 −18
−16
−14
−12
−10
−1
0 distance δ
1
2
Fig. 4.2. Left to right: (a) initial, (b) middle and (c) final position of a hypothetical patch (magenta). Line pencils show the lines of sight from the optical centers to the imaged surface, through the patch. The middle line plots the direction of e. The plot on the right (d), shows the sum of distances of points imaged through the same point on the patch as a function of δ; it indicates that the maximum of similarity is c 2006 IEEE) obtained at δ = 0(< 0, in this case) (from [55],
space-sweeping is guaranteed to yield some error if the scene includes any significant amount of slant. In the second experiment (see Fig. 4.3), a well-textured planar surface was reconstructed considering an S which assumed: either solely the frontoparallel orientation - as in space-sweeping, or a set of orientations within a cone of opening γ around e. Judging by the planarity of the reconstructed surface, the least accurate reconstruction was obtained by space-sweeping. Notice that in (c), due to the compensation for the projective distortion, backprojections ′ w1,2 were more similar than in (b). As a result, higher similarity values were obtained and, thus, more local maxima exhibited a similarity value higher than threshold τc . Figure 4.3(d) is discussed in Sect. 4.3.3. 4.3.3 Optimizing Accuracy in Discrete Images In this subsection, the size, α, of S and the corresponding image areas where S projects are studied. A modulation of α is proposed to increase the accuracy of the patch operator, as it has been to date formulated [24, 25, 26, 28]. Finally, integration with the surface continuity assumption is demonstrated to alleviate the result from residual inaccuracies. In discrete images, the number of subtended pixels by the projection of S is analogous to the reciprocals of distance squared and the cosine of the relative obliqueness of S to the cameras. Thus in (4.3), when α is constant the greater the obliqueness the fewer the image pixels that the r × r image samples are obtained from. Therefore, there will always be some obliqueness after which the same intensity values will start to be multiply sampled. In this case, as obliqueness and/or distance increase the population of these intensities will tend to exhibit reduced variance, because they are being sampled from decreasingly fewer pixels. Thus, a bias in favor of greater slants and distances is predicted. Mathematically, because variance occurs in the denominator of the correlation function. Intuitively, because fewer image area supports now the similarity matching of backprojections on S, and as a consequence, this matching becomes less robust.
66
X. Zabulis 15
(mm)
40 100
65
x−axis
(pixels)
90 200
115 140
300
165
400
190 200
300 400 (pixels)
500
600
135
15
15
40
40
(mm)
(mm)
100
65
(mm)
85
60
110 z−axis
(mm)
85
60
65 90
x−axis
x−axis
90
110 z−axis
115
115
140
140
165
165
190
190 135
110 z−axis
85
(mm)
60
135
Fig. 4.3. Comparison of methods for similarity local maxima detection. Clockwise from top left: (a) Image from a horizontally arranged binocular pair (baseline was calculated three-ways: 156 mm), showing a XZ section in space at which V (b) plane-sweeping, (c) optimizing n, and (d) updated assuming surface continuity (see Sect. 4.3.3). In (a), checker size was 1cm2 and target was ≈ 1.5 m from the cameras. In the maps (b–d), dark dots are local maxima, white lines are κ, voxel = 125 = mm3 , r = 21 × 21, α = 20 mm. In (c), γ = 60◦ and n’s orientations were parameterized every 1◦ . Notice that the last two methods detect more local maxima, although that the same similarity threshold τc was used in all three c 2006 IEEE) conditions (from [55],
To observe the predicted phenomenon, surface orientation was estimated and compared to ground truth. Experiments were conducted with both real and synthetic images, to stress the point that the discussed inaccuracy cannot be attributed to noise or calibration errors and that, therefore, it must be contained in the information loss due to image discretization. In the first experiment (see Fig. 4.4), a binocular image pair was synthesized to portray a square, textured and planar piece of surface. Equations (4.2) and (4.3) were then evaluated for the central point on the surface. The similarity values obtained for each orientation of n were arranged to form a longitude - latitude map, which can be read as follows. The longitude and latitude axes correspond to the dimensions defined by, respectively, modulating ψ and ω
4 Utilization of the Texture Uniqueness Cue in Stereo Latitude
pixels
50
pixels
50
100
150
200
250
600
300
400 pixels
(deg)
20 40 60 80
200
350
400
Longtitude (deg)
300
100
Latitude
100 200
150
600
200
400 pixels
250
200
300
400
(deg)
20 40 60 80
300
350
200
Longtitude (deg)
100
67
Fig. 4.4. Two textures rendered on a 251 × 251 mm planar surface patch (left) and the corresponding similarity values obtained by rotating a S concentric with the patch (right). In the maps, camera pose is at (0, 0), crosses mark the maximal similarity value and circles mark ground truth. In the experiment, α = 100 mm, r = 15, γ = 90◦ . The binocular pair was ≈ 1.5 m from the patch and its baseline was 156 mm. The, angular, parameterization of n was in steps of .5◦ . The errors for the two conditions, measured as the angle between the ground truth normal and the esc 2007 IEEE) timated one, were and 2.218◦ (top) and 0.069◦ (bottom) (from [56],
in (4.4); coordinates (0, 0) correspond to c. In the map, lighter pixels indicate a high similarity value and darker the opposite (henceforth, this convention is followed for all the similarity maps in this chapter). Due to the synthetic nature of the images, which facilitated a perfect calibration, a small amount of the predicted inaccuracy was observed. In the second experiment, calibration errors give rise to even more misleading local maxima and, also, the similarity value at very oblique orientations of n (> 60◦ ) is observed to reach extreme positive or negative values. In Fig. 4.5, to indicate the rise of the spurious maximum at the extremes of the correlation map, the optimization was twice computed: once for γ = 120◦ and once for γ = 180◦ . In both cases, the global maximum occurred at the extreme border of this map, thus corresponding to a more oblique surface normal - relative to the camera - than ground truth. The above phenomenon can be alleviated if the size of the backprojection surface S is modulated so that its image area remains invariant. In particular, the side of S (or diameter, for a circular S) is modulated as: α0 · d v · n −1 α=· ; ω = cos , (4.5) d0 · cos ω |v | · |n| where v = p − o, d = |v |, ω is the angle between v and n and d0 , α0 initial parameters. Notice that even for a single location, size is still varied as
X. Zabulis
(d e g )
68
50 100
20 40 60 80 100
200 (deg)
200
(deg)
pixels
150
250
300
20 40 60
300 (deg)
350 50
100 150 200 250 300 350 400 pixels
50
100
150
200 (deg)
250
300
350
50
100
150
200 (deg)
250
300
350
20 40 60
Fig. 4.5. Comparison of techniques. Repeating the experiment of Fig. 4.4 for the first two frames of the “Venus” Middlebury Sequence and for two different γ s: 120◦ (top map) and 180◦ (middle map). In the experiment, α = 250 length units, baseline was 100 length units and r = 151. The surface point for which 4.2 and 4.3 were evaluated is marked with a circle (left). The projection of S subtended an area of ≈ 50 pixels. The bottom map shows the increase in accuracy obtained by the size-modulation of S with respect to obliqueness (see forward in text). Mapping of c 2007 IEEE) similarity values to intensities is individual for each map (from [56],
n is varied. Figures 4.5 and 4.6 show the angular and spatial improvement in accuracy induced by the proposed size-modulation. They compare the reconstructions obtained with patch whose size was modulated as above against those obtained with a constant-sized S - as to date practiced in [24, 25, 26, 28]. A “side-effect” of the above modulation is that the larger the distance and the obliqueness of a surface, the lower the spatial frequency that is reconstructed at. This effect is considered as a natural tradeoff since distant and oblique surfaces are also imaged at lower frequencies. Assuming surface continuity has been shown to reduce inaccuracies due to noise or lack of resolution in a wide variety of methods and especially in global-optimization approaches (see Sect. 4.2). To demonstrate the compatibility of the proposed operator with these approaches and suppress residual inaccuracies, the proposed operator is implemented with feedback obtained from a surface-smoothing process. Once local maxima have been detected in , the computed κ s are updated as follows. For voxels where a similarity V local maximum occurs, κ is replaced by the normal of the least-squares fitting plane through the neighboring occupied voxels. For an empty voxel, pe , the updated value of κ is j βiκ( pj )/ βj , where j enumerates the occupied voxels within pe ’s neighborhood. After the update, local maxima are re-detected . The results are more accurate, because similarity local maxima in the new V are detected along a more accurate estimation of the normal. Note that if n’s optimization is avoided, the initial local maxima are less accurately localized and so are the updated κs.
(pixels)
100 x−axis (voxels)
100 200 300 400 500 600 50
100
150 (pixels)
200
250
50
100
150 (pixels)
200
250
200 300 400 500
0
10
2
10 z−axis (voxels)
600
69
100 x−axis (voxels)
20 40 60 80 100 120 140 160 180 200
x−axis (voxels)
(pixels)
4 Utilization of the Texture Uniqueness Cue in Stereo
200 300 400 500
0
10
2
10 z−axis (voxels)
600
0
10
2
10 z−axis (voxels)
20 40 60 80 100 120 140 160 180 200
Fig. 4.6. Comparison of techniques. Shown is a stereo pair (left column) and three separate calculations of s across a vertical section, through the middle of the foreground surface. The bottom figures are zoom-in detail on the part that corresponds to the foreground and z-axes (horizontal in maps) are logarithmic. In the bottom figures, ground truth is marked with a dashed line. In the 2nd column, a fine α was used, hence the noisy response at the background. Using a coarse α (3rd column), yields a smoother response in greater distances, but diminishes any detail that could be observed in short range. In the 4th column, α is projectively increased, thus, normalizing the precision of reconstruction by the area that a pixel c 2007 IEEE) images at that distance (from [56],
4.4 Increasing Performance Two techniques are proposed for increasing the performance of the proposed implementation of the uniqueness cue. The first aims the provision of high-precision results and the second the aims the reduction of its computational cost. In addition, a hybrid approach is proposed that combines the rapid execution of space-sweeping with the increased accuracy of the proposed orientation-optimizing method. To enhance the accuracy of the space-sweeping part of the proposed approach, an enhanced version of spacesweeping is introduced that is based on the conclusions of Sect. 4.3.2. 4.4.1 Precision In volumetric methods, the required memory and computation increase by a cubic factor as voxel size decreases and, thus, computational requirements are quite demanding in reconstructions of high-precision. The proposed technique refines the initial voxel-parameterized reconstruction to sub-voxel precision, given V and the detected similarity local maxima as input. The local maxima are in voxel-parameterization and treated as a coarse approximation of the imaged surface. The method densely interpolates through the detected simi , in order for the result larity local maxima. This interpolation is guided by V to pass through the locations where similarity (s, or |V |) is locally maximized.
70
X. Zabulis
To formulate the interpolation, Sf henceforth refers to the 0-isosurface of |, or otherwise to the set of locations at which G = 0. At the vicinity G = |∇V of the detected local maxima, similarity is locally maximized and, thus, the derivative’s norm (G) should be 0. The result is defined as the localization of Sf at the corresponding regions. Guiding the interpolation with the locations of Sf , utilizes the obtained similarity values to accurately increase the precision of the reconstruction and not, blindly, interpolate through the detected similarity local maxima. The interpolation utilizes the Radial Basis Function (RBF) framework in [49] to approximate the isosurface. This framework requires pivots which will guide the interpolation, and which in the present case are derived by the detected local maxima. For each one of them, the values of G at the locations p1,2 = pm ±λκ, where pm is the position of the local maxima, are estimated by trilinear interpolation. The pivots are assigned with values ξ1,2 = c · G( p1,2 ), where c is −1 for the closer of the two pivot points to the camera and 1 for the other. Values ξ1,2 are of the opposite sign, to constrain the 0-isosurface to occur in between them. The value of λ is chosen to be less than voxel size (i.e. 0.9) to avoid interference with local maxima occurring at neighboring voxels [50]. Function G is approximated in high-resolution by the RBF framework and the isosurface is extracted by the Marching Cubes algorithm [51]. The result is represented as a mesh of triangles. The proposed approach is, in essence, a search for the zero-crossings of G. The computational cost of the above process is much less than the cost at the precision that is interpolated. However, it is still a of evaluating V computationally demanding process of complexity O(N 3 ), where N is the number of data points. Even though the optimization in [50], which reduces complexity to O(N log N ), was adopted the number of data points in widearea reconstructions can be quite large to obtain results in real-time. To, at least, parallelize the process, the reconstruction volume can be tessellated in overlapping cubes and the RBF can be independently computed at each. No significant differences in the reconstruction were observed when fitting the RBF directly to the whole reconstruction and in the individual cubes, due the overlap of cubes. The partial meshes are finally merged as in [52]. 4.4.2 Acceleration Two hierarchical, coarse-to-fine iterative methods are proposed for the acceleration of the search for similarity local maxima. The first, is an iterative coarse-to-fine search that reduces the number of evaluated n’s in (4.3). In this formula, the exhaustive search computes s for every n within a cone of opening γ. At each iteration i: (a) the cone is canonically sampled and the optimizing direction κi is selected amongst the sampled directions, (b) the sampling gets exponentially denser, but (c) only the samples within the opening of an exponentially narrower cone around κi−1 are evaluated. At each iteration, the opening γi of the cone is reduced as
4 Utilization of the Texture Uniqueness Cue in Stereo
71
γi+1 = γi /δ, δ > 1 (in our experiments δ = 2). Iterations begin with κ1 = c and end when γi falls below a precision threshold τγ . For a voxel at p, the parameterized normals are given by (4.4), by modulating ψ to be in [0, γi ] and setting c = p − o. In Fig. 4.7, the accuracy of the proposed method is shown as a function of computational effort. As ground truth, the result of the exhaustive search was considered, which required 10800 invocations of the similarity function. It can be seen that after 3 iterations, which correspond to a speedup > 7, the obtained surface normal estimation is inaccurate by less than 3◦ . After 7 iterations accuracy tends to be less than 1.5◦ (speedup ≈ 2). Given the correction of the surface normal in Sect. 4.3.3, the residual minute inaccuracies may be neglected without consequences for the quality of the reconstruction and the process is stopped at the 3rd iteration. Also in practice, a speedup of ≈ 20 is obtained, since in our implementation only the 1st iteration is performed if all samples are less than threshold. The second method reduces the number of evaluated voxels, by iteratively focusing computational effort at the volume neighborhoods of similarity local maxima. It is based on a scale-space treatment of the input images. At each iteration, αi = α0 /2i and I1,2 are convolved with a Gaussian of σi = σ0 /2i . Also, voxel volume is reduced by 1/23 and correlation is computed only at the neighborhoods of the local maxima detected in the previous iteration. The effect of these modulations is that at initial scales correspondences are evaluated for coarse-scale texture and at finer scales utilize more image detail. Their purpose is to efficiently compare w1,2 at coarse scales. At these scales, the projections of the points on S in the image are sparse and, thus, even a minute calibration error causes significant miscorrespondence of their projections. Smoothing, in effect, decreases image resolution and, thus, more correspondences are established at coarse scales. In Fig. 4.8, the method is demonstrated. No errors in the first iteration that led to a void in the final reconstruction were observed, utilizing 3 iterations of the above algorithm but of course this tolerance is a function of the available image resolution. In our experiments a speedup of ≈ 5 was observed on average.
4 3.5 3 2.5 2 1.5 1
9
0.8
8
−0.4
0.6
Variance (degrees)
Mean Error (degrees)
4.5
0
1000 2000 3000 4000 5000 6000 Number of MNCC comparisons
−0.2
7
0
6
0.2
5 4
0
0.4 0.5
0.6
0
1000 2000 3000 4000 5000 6000 Number of MNCC comparisons
1
Fig. 4.7. Mean (left) and variance (middle) of the error of angular optimization as a function of the computational effort, measured in similarity metric invocations. In the experiment, the results of 103 estimations were averaged γ1 = 60◦ , δ = 2 and τγ = 1◦ . The right plot illustrates the hierarchical evolution of considered orientations, as points on the unit sphere
X. Zabulis
17
(m )
(m )
16.8
16.7
16.7
16.8
16.8
16.9
16.9
17
(m )
72
17.4
17.6
17 17.1
x − a x is
17.2
x − a x is
x − a x is
17.1 17.2 17.3
17.2 17.3
17.4
17.4
17.5
17.5
17.6
17.6
16.7 −2.7
−2.9
−3.1 z−axis
−3.3 (m)
−3.5
−3.7
−2.7
−2.9
−3.1 −3.3 z−axis (m)
−3.5
−3.7
−2.7
−2.9 −3.1 −3.3 z−axis (m)
−3.5
−3.7
Fig. 4.8. Coarse-to-fine localization of similarity local maxima. A digital photograph from a ≈ 40 cm-baseline binocular pair and, superimposed, a section in space that perpendicularly intersects the imaged piece of rock. The 2D map on the right shows the result of local maxima detection across this section. Marked in white are the detected local maxima at voxel precision and with gray the result of their subvoxel approximation. These local maxima were then reprojected to the original image and marked (left). At the bottom, the three maps show the result of the coarse to fine , for the same section in space. In the experiment, α0 = 8 cm, computation of V c 2005 IEEE) voxel = (4 cm)3 , r = 21, σ0 = 5 (from [57],
4.4.3 Sphere-sweeping In this subsection, the geometry of space-sweeping is revisited and a spherical parameterization of the sweeping surface is proposed and evaluated. The proposed approach substitutes the backprojection plane, in space-sweeping, with a spherical sector that projectively expands from the cyclopean eye outwards. Using this backprojection surface, a visibility ray v departing from the optical center is always perpendicular to the backprojection surface for any eccentricity ǫ of in the field of view (FOV) (see Fig. 4.9 left). Thus, the number of sampled image pixels per unit area of backprojection surface is maximized. In contrast, a frontoparallel planar surface is imaged increasingly slanted relatively to v as ǫ moves to the periphery of the image and, therefore, a smaller accuracy is expected, based on the conclusion of Sect. 4.3.3 (see [32] for a proof). The method is formulated as follows. Let a series of concentric and expanding spherical sectors Si at corresponding distances di from the cyclopean
4 Utilization of the Texture Uniqueness Cue in Stereo
73
Fig. 4.9. Illustration of the sector (left) and voxel (right) based volume tessellations. Visibility is naturally expressed in the first representation, whereas in the c 2006 IEEE) second, traversing voxels diagonally is required (from [32],
eye (C). Their openings μ, λ in the horizontal and vertical direction, respectively, are matched to the horizontal and vertical FOVs of the cameras and tessellated by an angular step of c. Parameterization variables ψ and ω are determined as ψ ∈ {c · i − μ; i = 0, 1, 2, . . . , 2μ/c} and ω ∈ {c · j − λ; j = 0, 1, 2, . . . , 2λ/c} and [μ/c] = μ/c,[λ/c] = λ/c. For both ψ and ω, value 0 corresponds to the orientation of the mean optical axis e. To generate sectors Si , a corresponding sector S0 is first defined on a unit sphere centered at O = [0 0 0]T . A point p = [x y z]T on S0 is given by: x = sin(ψ), y = cos(ψ)sin(ω), z = cos(ψ)cos(ω), Its corresponding point p′ on Si is then: p′ = di [Rz (−θ)Ry (−φ)p + C] ,
(4.6)
where Ry and Rz are rotation matrices for rotations about the yy ′ and zz ′ axes, v1,2 are unit vectors on the principal axes of the cameras, v = (v1 +v2 )/2, and θ (longitude), φ (colatitude) v ’s spherical coordinates. Computational power is conserved, without reducing the granularity of the reconstructed depths, when parameterizing di on a disparity basis [53]: di = d0 + β i , i = 1, 2, . . . iN , where d0 and iN define the sweeping interval and β is modulated so that the farthest distance is imaged in sufficient resolution. The rest of the sweeping procedure is similar to plane-sweeping and, thus, overviewed. For each Si , the stereo images (≥ 2) are sampled at the projections Si ’s points on the acquired images, thus forming two (2μ/c×2λ/c) backprojection images. Backprojecting and locally comparing images is straightforwardly optimized by a GPU as a combination of image difference and convolution operations (e.g. [37, 54]). The highest local maximum along a ray of visibility is selected as the optimum depth. Correlation values are interpolated along depth to obtain subpixel accuracy. Parameterizing the reconstruction volume into sectors instead of voxels provides of a useful surface parameterization, because the data required to compute visibility are already structured with respect to rays from the optical center. These data refer to a sector-interpretable grid (see Fig. 4.9), but
74
X. Zabulis
are structured in memory in a conventional 3D matrix. Application of visibility becomes then more natural, because the oblique traversal of a regular voxel space is sensitive to discretization artifacts. Finally, computational acceleration obtained by graphics-hardware is equally applicable to the resulting method as to plane-sweeping [32].
50 100
−50
−50
−100
−100
−150
−150
−200
−200
250 300 350 400
mm
200
mm
(pixels)
150
−250
−300
−350
−350
−400
−400 −450
−450
450
−500
−500 100
200
300
400
500
−250
−300
0
600
100
200
300
400
500
600
0
100
200
300
mm
(pixels)
50
400
500
600
mm
−200
−200
−300
−300
−400
−400
mm
(pixels)
150 200 250 300
mm
100
−500
−500
−600
−600
−700
−700
350 400 450
−800 −300 −200 −100
−800 100
200
300
400
500
−200 −100
600
0
100
200
300
400
0
100
mm
(pixels)
200
300
400
mm
360 340
50
340
100
320
150
250
300
mm
mm
(pixels)
320 200
300
280
300
260
280 350
240
260
400 450
220 100
200
300
400
500
−350
−360 −340 −320 −300 −280 −260 −240 −220
600
−340 50
−400
−420
300
−440
350
−460
400
mm
−400
mm
(pixels)
−380
−380
250
−200
−360
−360
200
−250
−340
100 150
−300
mm
mm
(pixels)
−420 −440 −460 −480
−480
450
−500 100
200
300
(pixels)
400
500
600
350
400
450
mm
500
350
400
450
500
mm
Fig. 4.10. Comparison of plane and sphere-sweeping. Each row shows an image from a 156 mm-baseline binocular pair (left) and a planar section (as in Fig. 4.8) in space illustrating the reconstructions obtained from plane (middle) and sphere (right) sweeping. The reconstruction points shown are the ones that occur within the limits of the planar section. In the first two rows, the section intersects perpendicularly the reconstructed surface. In the 3rd row, the section is almost parallel to the viewing direction and intersects the imaged sphere perpendicularly to its equator. In the last row, the section is almost parallel to image plane. The limits of the sections and the reconstructed points are reprojected on the images of the left row. The superiority of spherical sweeping is pronounced at the sections of the reconstructions c 2006 IEEE) that correspond to the periphery of the image (from [32],
4 Utilization of the Texture Uniqueness Cue in Stereo
75
The proposed approach was compared to plane-sweeping on the same data and experimental conditions. Images were 480 × 640 pixels and target surfaces occurred from 1 m to 3 m from the cameras. In Fig. 4.10, slices along depth that were extracted from the reconstructions are compared. Almost no effect between the two methods can be observed in the reconstructions, when obtained from the center of images (top row). The results differ the most when comparing reconstructions obtained from the periphery of images (rest of rows). In terms of reconstructed area, sphere-sweeping provided about ≈ 15% more reconstructed points. 4.4.4 Combination of Approaches The methods presented in this chapter were combined into a stereo algorithm, which combines the efficiency of space-sweeping with the accuracy of orientation optimization. Results are shown in Figs. 4.11, 4.12 and 4.13. The algorithm initiates by reconstructing a given scene with the method of spheresweeping (see Sect. 4.4.3). Then the proposed orientation-optimizing operator is employed and similarity local maxima are detected. Indicatively, performing
Fig. 4.11. Reconstruction results. Images from three 20 cm-baseline binocular pairs (1st row). Demonstration of coarse to fine spatial refinement of reconstruction for the 1st binocular pair (2nd row), using the approach of Sect. 4.4.2. Multiview reconstruction of the scene (3rd row)
76
X. Zabulis
Fig. 4.12. Image from a binocular pair and reconstruction, utilizing the precision enhancement of Sect. 4.4.1. Output is a point cloud and image size was 640 × 480. Last row compares comparison of patch-based reconstructions for plane-sweeping (left), orientation optimization (middle) and enforcement of continuity (right) c 2006 IEEE) (from [55],
the correlation step for the last example, 286 sec were required on a Pentium 3.2 GHz, where image size was ≈ 106 pixels, voxel size was 10 cm, r = 21, γ = 60◦ . A wide outdoors area reconstruction, in Fig. 4.11, demonstrates the multiview expansion of the algorithm. For multiple views, at the end of each scale-iteration the space-carving rule [9] is applied to detect empty voxels and further reduce the computation at the next scale. At the last scale, the obtained V s from each view are combined with the algorithm in [28].
4 Utilization of the Texture Uniqueness Cue in Stereo
77
Fig. 4.13. Image from a 40 cm-baseline binocular pair and reconstruction, utilizing the precision enhancement of Sect. 4.4.1. The output isosurface is represented in a mesh and texture is directly mapped on it
4.5 Conclusion This chapter is concerned with the depth cue due to the assumption of texture uniqueness, as one of the most powerful and widely utilized approaches to shape-from-stereo. The factors that affect the accuracy of the uniqueness cue were studied and the reasons that cast orientation-optimizing methods of superior accuracy to traditional and space-sweeping approaches were explained. Furthermore, the proposed orientation-optimizing techniques improve the accuracy and precision orientation optimization as to date practiced. Acceleration of the orientation-optimization approach is achieved by the introduction of two coarse-to-fine techniques that operate in the spatial and angular domain of the patch-based optimization. Finally, a hybrid approach is proposed that utilizes the rapid execution of a novel, accuracy-enhanced version of space-sweeping to obtain an initial approximation of the reconstruction result. This result is then refined, based on the proposed techniques for the orientation-optimization of hypothetical surface patches. The proposed extensions of to the implementation of the uniqueness cue to depth are integratable with diverse approaches to stereo. The size-modulation
78
X. Zabulis
proposed in Sect. 4.3.3 is directly applicable to any approach that utilizes a planar surface patch, either in simple or multi-view stereo. Furthermore, utilization of the orientation-optimizing implementation of the uniqueness constraint has been demonstrated to be compatible with the assumption of surface continuity and, thus, applicable for global optimization approaches to stereo. A future research avenue of this work is the integration of the uniqueness cue with other cues to depth. The next direction of this work is the utilization of parallel hardware for the real-time computation of wide area reconstructions, based on the fact that each of the proposed methods in this chapter is massively parallelizable. Computing the similarity s and orientation κ can be independently performed for each voxel and each orientation. Furthermore, detection of similarity local maxima can be also computed in parallel, if the voxel space is tessellated into overlapping voxel neighborhoods. Regarding the sweeping method that was proposed, the similarity for each depth layer and for each pixel within this layer can also independently computed. Once the similarity values for each depth layer are available, the detection of the similarity-maximizing depth value for each column of voxels along depth can be also independently performed.
A Appendix It is shown that s is maximized at the locations corresponding to the imaged surface only when values of s are computed from collineations that are parallel to the surface. Definitions are initially provided. Let: • • • •
• •
Cameras at T , Q that image a locally planar surface, Sweeping direction v , which given a base point (e.g. the cyclopean eye) defines a line L, which intersects the imaged surface at K. The two general types of possible configurations are shown in Fig. 4.14. The hypothetical backprojection planar patch S, where the acquired images are backprojected. Patch S is on L, centered at D ∈ L and oriented as v . Function b(X1 , X2 ), X1,2 ∈ R3 be the intersection of the line through X1 and X2 with the imaged surface. The surface point that is imaged at some point A on S is b(A, T ), where T is the optical center. In the figure, B = b(A, Q) and C = b(A, T ). Point O the orthogonal projection of A on the surface. θ and φ the acute angles formed by the optical rays through A, from T and Q.
Assuming texture uniqueness, the backprojection images of B and C are predicted to be identical only when B and C coincide. Thus the distance |BC| for some point on S is studied, assuming that when |BC| → 0 correlation of backprojections is maximized. As seen in Fig. 4.14,
4 Utilization of the Texture Uniqueness Cue in Stereo Q
79
Q
v
T
v
T L L A
A
D
D
L
S
S
D O
θ K
B φ
C
A
n B
θ
O
K
φ
C
n
K
ω
Fig. 4.14. Proof figures. See corresponding text
• • •
|BC| is either |OB| + |OC| (middle) or |OC| − |OB| (left), |AO| = |AB| sin θ = |AC| sin φ, |OC| = |AC| cos φ, |OB| = |AB| cos θ, and φ < θ < π/2.
tan θ−tan φ θ+tan φ Thus, |BC| is either |AO| tan tan θ tan φ or |AO| tan θ tan φ . Both quantities are positive. The first because θ, φ ∈ (0, π/2] and the second because θ > φ (see left on the above figure), Therefore the monotonicity of |BC| as a function of δ = |KD| is fully determined by |AO|. In the special case, where θ or φ is π/2, say θ, |BC| = |OC| = |AC| cos φ = |AO|/ tan φ and |BC| is a monotonically increasing function of δ. Thus, similarity of the backprojection images on S is indeed maximized when D coincides with K. This case corresponds to ψ = 0 (see forward in text). Thus, when v = n, it is only for point D that b(D, T ) and b(D, Q) will coincide when δ = 0. The geometry for all other points on S is shown in Fig. 4.14 (right). From the figure, it is shown that in this case b(A, T ) and b(A, Q) coincide only when δ > 0 (|KD| > 0). For all the rest of the points the depth error is:
|KD|2 = r2 tan2 ω
(4.7)
which shows that the error is determined not only by r (|AD|) but also from the “incidence angle” ψ = π2 − ω between v and n. Equation (4.7) shows that when v = n (ω = 0) the similarity of backprojections is maximized at the location of the imaged surface (when δ = 0). In contrast, when ω → π2 (or, when the surface is imaged from an extremely oblique angle) reconstruction error (KD) tends to infinity. Since (4.7) holds for every point on S, it is concluded that the error in reconstruction is a monotonically increasing function of the angle between the incidence angle ψ or the search direction relative to the imaged surface.
80
X. Zabulis
Acknowledgement This work is supported by the EC within FP6 under Grant 511568 with the acronym 3DTV.
References 1. D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–2–3):7–42, 2002. 2. M. Z. Brown, D. Burschka, and G. D. Hager. Advances in computational stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(8): 993–1008, 2003. 3. D. Marr and T. Poggio. Cooperative computation of stereo disparity. Science, 194:283–287, 1976. 4. D. Marr and T. Poggio. A computational theory of human stereo vision. In Royal Society of London Proceedings Series B, Vol. 204, pp. 301–328, 1979. 5. A. Laurentini. The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(2): 150–162, 1994. 6. K. M. Cheung, T. Kanade, J. Y. Bouguet, and M. Holler. A real time system for robust 3D voxel reconstruction of human motions. In IEEE Computer Vision and Pattern Recognition, Vol. 2, pp. 714–720, 2000. 7. V. Kolmogorov and R. Zabih. Multi-camera scene reconstruction via graph cuts. In European Conference in Computer Vision, Vol. 1, pp. 379–393, 2002. 8. M. Okutomi and T. Kanade. A multiple-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):353–363, 1993. 9. K. N. Kutulakos and S. M. Seitz. A theory of shape by space carving. International Journal of Computer Vision, 38(3):197–216, 2000. 10. W. Culbertson, T. Malzbender, and G. Slabaugh. Generalized voxel coloring. In Vision Algorithms: Theory and Practice, pp. 100–115, 1999. 11. G. Slabaugh, B. Culbertson, T. Malzbender, M. Livingston, I. Sobel, M. Stevens, and R. Schafer. Methods for volumetric reconstruction of visual scenes. International Journal of Computer Vision, 3(57):179–199, 2004. 12. J. Lanier. Virtually there. Scientific American, 284(4):66–75, 2001. 13. J. Mulligan, X. Zabulis, N. Kelshikar, and K. Daniilidis. Stereo-based environment scanning for immersive telepresence. IEEE Circuits and Systems for Video Technology, 14(3):304–20, 1999. 14. N. Kelshikar, X. Zabulis, K. Daniilidis, V. Sawant, S. Sinha, T. Sparks, S. Larsen, H. Towles, K. Mayer-Patel, H. Fuchs, J. Urbanic, K. Benninger, R. Reddy, and G. Huntoon. Real-time terascale implementation of tele-immersion. In International Conference in Computer Science, pp. 33–42, 2003. 15. N. Ayache. Artificial Vision for Mobile Robots: Stereo Vision and Multisensory Perception. MIT Press, Cambridge MA, 1991. 16. E. Trucco, and A. Verri. Introductory Techniques for 3-D Computer Vision. Prentice Hall, New Jersey, 1998. 17. R. T. Collins. A space-sweep approach to true multi-image matching. In IEEE Computer Vision and Pattern Recognition, pp. 358–363, 1996.
4 Utilization of the Texture Uniqueness Cue in Stereo
81
18. J. Bauer, K. Karner, and K. Schindler. Plane parameter estimation by edge set matching. In 26th Workshop of the Austrian Association for Pattern Recognition, pp. 29–36, 2002. 19. C. Zach, A. Klaus, J. Bauer, K. Karner, and M. Grabner. Modelling and visualizing the cultural data set of Graz. In Virtual reality, archeology, and cultural Heritage, 2001. 20. K. N. Kutulakos and S. M. Seitz. A theory of shape by space carving. International Journal of Computer Vision, 38(3):199–218, 2000. 21. C. Zhang and T. Chen. A self-reconfigurable camera array. In Eurographics Symposium on Rendering, 2004. 22. T. Werner, F. Schaffalitzky, and A. Zisserman. Automated architecture reconstruction from close-range photogrammetry. In CIPA – International Symposium, 2001. 23. C. Zach, A. Klaus, B. Reitinger, and K. Karner. Optimized stereo reconstruction using 3D graphics hardware. Workshop of Vision, Modelling, and Visualization, pp. 119–126, 2003. 24. A. Bowen, A. Mullins, R. Wilson, and N. Rajpoot. Light field reconstruction using a planar patch model. In Scandinavian Conference on Image Processing, pp. 85–94, 2005. 25. R. Carceroni and K. Kutulakos. Multi-View scene capture by surfel sampling: From video streams to Non-Rigid 3D motion, shape & reflectance. International Journal of Computer Vision, 49(2–3):175–214, 2002. 26. O. Faugeras and R. Keriven. Complete dense stereovision using level set methods. In European Conference in Computer Vision, pp. 379–393, 1998. 27. A. S. Ogale and Y. Aloimonos. Stereo correspondence with slanted surfaces: Critical implications of horizontal slant. In IEEE Computer Vision and Pattern Recognition, Vol. 1, pp. 568–573, 2004. 28. X. Zabulis and K. Daniilidis. Multi-camera reconstruction based on surface normal estimation and best viewpoint selection. In IEEE International Symposium on 3D Data Processing, Visualization and Transmission, pp. 733–40, 2004. 29. D. Scharstein, R. Szeliski, and R. Zabih. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In IEEE Workshop on Stereo and Multi-Baseline Vision, pp. 131–140, 2001. 30. M. Li, M. Magnor, and H. P. Seidel. Hardware-accelerated rendering of photo hulls. Eurographics, 23(3), 2004. 31. R. Yang, G. Welch, and G. Bishop. Real-time consensus-based scene reconstruction using commodity graphics hardware. In Pacific Graphics, 2002. 32. X. Zabulis, G. Kordelas, K. Mueller, and A. Smolic. Increasing the accuracy of the space-sweeping approach to stereo reconstruction, using spherical backprojection surfaces. In International Conference on Image Processing, 2006. 33. M. Pollefeys and S. Sinha. Iso-disparity surfaces for general stereo configurations. In European Conference on Computer Vision, pp. 509–520, 2004. 34. V. Nozick, S. Michelin, and D. Arqus. Image-based rendering using planesweeping modelisation. In International Association for Pattern Recognition – Machine Vision Applications, pp. 468–471, 2005. 35. S. M. Seitz and C. R. Dyer. Photorealistic scene reconstruction by voxel coloring. International Journal of Computer Vision, 35(2):151–173, 1999.
82
X. Zabulis
36. R. Szeliski. Prediction error as a quality metric for motion and stereo. In International Conference in Image Processing, Vol. 2, pp. 781–788, 1999. 37. I. Geys, T. P. Koninckx, and L. J. Van Gool. Fast interpolated cameras by combining a gpu based plane sweep with a max-flow regularisation algorithm. In IEEE International Symposium on 3D Data Processing, Visualization and Transmission, pp. 534–541, 2004. 38. H. Moravec. Robot rover visual navigation. Computer Science: Artificial Intelligence, pp. 105–108, 1980/1981. 39. J. Mulligan and K. Daniilidis. Real time trinocular stereo for tele-immersion. In ICIP, pp. 959–962, Thessaloniki, Greece, 2001. 40. S. Paris, F. Sillion, and L. Qu. A surface reconstruction method using global graph cut optimization. In Asian Conference in Computer Vision, 2004. 41. K. Junhwan, V. Kolmogorov, and R. Zabih. Visual correspondence using energy minimization and mutual information. In International Conference in Image Processing, Vol. 2, pp. 1033–1040, 2003. 42. Y. Ohta and T. Kanade. Stereo by intra- and inter-scanline search using dynamic programming. IEEE Transactions on Pattern Analysis and Machine Intelligence, 7(2):139–154, 1985. 43. I. J. Cox, S. L. Hingorani, S. B. Rao, and B. M. Maggs. A maximum likelihood stereo algorithm. CVIU, 63(3):542–567, 1996. 44. P. Mordohai and G. Medioni. Dense multiple view stereo with general camera placement using tensor voting. In IEEE International Symposium on 3D Data Processing, Visualization and Transmission, pp. 725–732, 2004. 45. J. Montagnat, H. Delignette, and N. Ayache. A review of deformable surfaces: topology, geometry and deformation. Image and Vision Computing, 19: 1023–1040, 2001. 46. H. Jin, S. Soatto, and A. Yezzi. Multi-view stereo beyond lambert. In IEEE Computer Vision and Pattern Recognition, pp. 171–178, 2003. 47. J. F. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986. 48. O. Monga, R. Deriche, G. Malandain, and J. P. Cocquerez. Recursive filtering and edge tracking: Two primary tools for 3d edge detection. Image and Vision Computing, 9(3):203–214, 1991. 49. G. Turk and J. F. O’Brien. Modelling with implicit surfaces that interpolate. ACM Transactions on Graphics, 21(4):855–873, 2002. 50. J. C. Carr, R. K. Beatson, J. B. Cherrie, T. J. Mitchell, W. R. Fright, B. C. McCallum, and T. R. Evans. Reconstruction and representation of 3D objects with radial basis functions. In ACM – Special Interest Group on Graphics and Interactive Techniques, pp. 67–76, 2001. 51. W. Lorensen and H. Cline. Marching cubes: A high resolution 3D surface construction algorithm. Computer Graphics, 21(4):169–169, 1987. 52. G. Turk and M. Levoy. Zippered polygon meshes from range images. In ACM – Special Interest Group on Graphics and Interactive Techniques, pp. 311–318, 1994. 53. J. X. Chai, X. Tog, S. C. Chan, and H. Y. Shum. Plenoptic sampling. In ACM – Special Interest Group on Graphics and Interactive Techniques, pp. 307–318, 2000. 54. C. Zach, K. Karner, and H. Bischof. Hierarchical disparity estimation with programmable 3D hardware. In International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, pp. 275–282, 2004.
4 Utilization of the Texture Uniqueness Cue in Stereo
83
55. X. Zabulis and G. Kordelas. Efficient, Precise, and Accurate Utilization of the Uniqueness Constraint in Multi-View Stereo, 3DPVT 2006, Third International Symposium on 3D Data Processing, Visualization and Transmission, University of North Carolina, Chapel Hill, 2006. 56. X. Zabulis and G. D. Floros. Modulating the size of back projection surface patches, in volumetric stereo, for increasing reconstruction accuracy and robustness, IEEE 3DTV-Conference 2007, Kos, Greece, 2007. 57. X. Zabulis, A. Patterson and K. Daniilidis. Digitizing Archa Excavations from Multiple Views, In Proceedings of IEEE 3Digital Imaging and Modeling, 2005.
5 Pattern Projection Profilometry for 3D Coordinates Measurement of Dynamic Scenes Elena Stoykova, Jana Harizanova and Ventseslav Sainov Central Laboratory of Optical Storage and Processing of Information, Bulgarian Academy of Sciences
Introduction Three-dimensional time-varying scene capture is a key component of dynamic 3D displays. Fast remote non-destructive parallel acquisition of information being inherent property of optical methods makes them extremely suitable for capturing in 3D television systems. Recent advance in computers, image sensors and digital signal processing becomes a powerful vehicle that motivates the rapid progress in optical profilometry and metrology and stimulates development of various optical techniques for precise measurement of 3D coordinates in machine design, industrial inspection, prototyping, machine vision, robotics, biomedical investigation, 3D imaging, game industry, culture heritage protection, advertising, information exchange and other fields of modern information technologies. To meet the requirements of capturing for the needs of 3D dynamic displays the optical profilometric methods and systems must ensure accurate automated real-time full-field measurement of absolute 3D coordinates in a large dynamic range without loss of information due to shadowing and occlusion. The technical simplicity, reliability and cost of capturing systems are also crucial factors. Some of the already commercialized optical systems for 3D profilometry of real objects are based on laser scanning. As the scanning of surfaces is realized one-dimensionally in space and time (point by point or line by line) at limited speed, especially for large-scale scene in out-door conditions, these systems are subject to severe errors caused by vibration, air turbulence and other environmental influence and are not applicable for measurement in real time. Among existing techniques, the methods which rely on functional relationship of the sought object data with the phase of a periodic fringe pattern projected onto and reflected from the object occupy a special place as a full-field metrological means with non-complex set-ups and processing algorithms that are easy to implement in outdoor and industrial environment. Pattern Projection Profilometry (PPP) includes a wide class of optical methods for contouring and shape measurement going back to the
86
E. Stoykova et al.
classical shadow and projection moir´e topography [1, 2] and the well-known and widely used since the ancient times triangulation. Nowadays, pattern projection systems enable fast non-ambiguious precise measurement of surface profile of wide variety of objects from plastic zones in the notch of the micro-cracks in fracture mechanics [3] and micro-components [4] to cultural heritage monuments [5]. An optimized and equipped with a spatial light modulator (SLM) system provides measurement accuracy up to 5.10−5 from the object size [1]. The main goal of this Chapter is to consider phase-measuring methods in pattern projection profilometry as a perspective branch of structured light methods for shape measurement emphasizing on the possibility to apply these methods for time-varying scene capturing in the dynamic 3D display. The Chapter consists of 3 Sections. Section 5.1 gives the basic principles of PPP and Phase Measuring Profilometry (PMP), describes the means for generation of sinusoidal fringe patterns, formulates the tasks of phase demodulation in a profilometric system and points out the typical error sources influencing the measurement. Section 5.2 deals with phase-retrieval methods. They are divided in two groups – multiple frame and single frame methods or temporal and spatial methods. Following this division, we start with the phase-shifting approach which is outlined with its pros and cons. Special attention is dedicated to error-compensating algorithms and generalized phase-shifting techniques. Among spatial methods, the Fourier transform method is discussed in detail. The generic limitations, important accuracy issues and different approaches for carrier removal are enlightened. Space-frequency representations as the wavelet and windowed Fourier transforms for phase demodulation are also considered. Other pointwise strategies for demodulation from a single frame as quadrature filters, phase-locked loop and regularized phase tracking are briefly presented. The problem of phase unwrapping which is essential for many of the phase retrieval algorithms is explained with classification of the existing phase-unwrapping approaches. The Chapter also includes the developed by the Central Laboratory of Optical Storage and Processing of Information to the Bulgarian Academy of Sciences (CLOSPI-BAS) experimental set-ups as well as the technical solutions of problems associated with measurement of the absolute 3D coordinates of the objects and the loss of information due to shadowing effect. In the end, we discuss the phase demodulation techniques from the point of view of observation of fast dynamic processes and the current development of real-time measurements in the PMP. This work is supported by EC within FP6 under Contract 511568 “3DTV”.
5.1 Pattern Projection Profilometry 5.1.1 General Description The principle of PPP is elucidated with the scheme depicted in Fig. 5.1. The optical axes of both projector system and observation system are crossed at
5 Pattern Projection Profilometry for 3D Coordinates Measurement
87
Fig. 5.1. Schematic of pattern projection profilometry
a certain plane called reference plane. Although there exist methods based on a random pattern projection, the PPP generally relies on structured light projection [1]. In structured light techniques a light pattern of a regular structure such as a single stripe, multiple stripes, gradients, grids, binary bars, or intensity modulated fringes as e.g. a sine-wave, is projected onto the object. The object reflects the deformed light pattern when observed from another angle. Analysis of the deformed image captured with a CCD camera yields the 3D coordinates of the object provided known positions of the camera, the projector and the object. The procedure to obtain the required geometric relationships for calculation of coordinates is called camera calibration [6]. The accuracy of the measurement crucially depends on correct determination of the stripe orders of the reflected patterns and on their proper connection to the corresponding orders in the projected patterns. This presumes one or more patterns to be projected – the simpler the pattern structure, the bigger the number of patterns required to derive the object’s profile. For example, in the so-called Gray-code method systems [7] several binary patterns of varying spatial frequency are projected. Number of projections needed to compensate scarce information in binary pattern projection is substantially reduced by intensity or colour modulation of the projected patterns. Projection of more complicated patterns with increased number of stripes and intensity differences between the stripes involves more accurate but more difficult interpretation of the captured images. A detailed review and classification of coded patterns used in projection techniques for the coordinates measurement is presented in [8]. The patterns are unified in three subdivisions based on spatial, temporal (timemultiplexing) or direct codification. The first group comprises patterns whose points are coded using information from the neighbouring pixels. The advantage of such an approach is capability for measurement of time-varying scenes. Its disadvantage is the complicated decoding stage due to shadowing effect as the surrounding area cannot always be recovered. Time-multiplexing approach
88
E. Stoykova et al.
is based on measurement of intensity values for every pixel as a sequence in time. In practice, this is achieved by successive projection of a set of patterns onto the object surface that limits its application only to static measurements. The codeword for a given pixel is usually formed by a sequence of intensity values for that pixel across the projected patterns. The third subdivision is based on direct codification which means that each point of the pattern is identified just by itself. There are two ways to obtain pixel coordinates using this type of pattern: by increasing the range of colour values or by introducing periodicity in the pattern. These techniques are very sensitive to noise due to vibration, shadowing, saturation or ill-illumination. Thus, preliminary calibration is needed in order to eliminate the colour of the objects using one or more reference images that make the method inapplicable for time-varying scenes measurements. An attractive approach among structured light methods is the phase measuring profilometry (PMP) [9, 10] or fringe projection profilometry, in which the parameter being measured is encoded in the phase of a two-dimensional (2D) periodic fringe pattern. The main obligatory or optional steps of the PMP are shown schematically in Fig. 5.2. The phase measuring method enables determination of 3D coordinates of the object with respect to a reference plane or of absolute 3D coordinates of the object itself. The phase extraction requires a limited number of patterns and for some methods may need only one pattern, thus making real-time processing possible. Nowadays, the PMP is a highly sensitive tool in machine vision, computer-aided design, manufacturing, engineering, virtual reality, and medical diagnostics. A possibility for real-time remote shape control without the simultaneous physical presence of the two objects by using a comparative digital holography is shown in [11]. For the purpose, the digital hologram of the master object is recorded at one location and transmitted via Internet or using a telecommunication network to the location of the tested object where it is fed into a spatial light modulator (SLM). 5.1.2 Methods for Pattern Projection In general, the pattern projected onto the object in the PMP is described by a periodic function, f ∈ [−1, 1]. The most of the developed algorithms
Projection
Object
Fringe pattern
Phase retrieval
Coordinates
Constraints
Processing algorithm
Unwrapping
Fig. 5.2. Block-scheme of phase-measuring profilometry
5 Pattern Projection Profilometry for 3D Coordinates Measurement
89
in the PMP presume a sinusoidal profile of fringes, which means that these algorithms are inherently free of errors only at perfect sinusoidal fringe projection. Projection of purely sinusoidal fringes is not an easy task. The fringes that fulfil the requirement of f = cos[. . .] can be projected by coherent light interference of two enlarged and collimated beams. As the fringes are in focus in the whole space, this method makes large-depth and large-angle measurements possible, however at limited lateral field of measurement, restricted by the diameter of the collimating lens. The main drawback of interferometrically created fringes is the complexity of the used set-up and vulnerability to the environmental influences as well as the inevitable speckle noise produced by coherent illumination. An interesting idea how to keep the advantages of coherent illumination and to avoid the speckle noise is proposed in [12] where the light source is created by launching ultra short laser pulses into highly nonlinear photonic crystal fibres. Using of conventional imaging system with different types of single-, dual-, and multiple-frequency diffraction gratings, as an amplitude or phase sinusoidal grating or Ronchi grating, enlarges the field of measurement and avoids the speckle noise, however, at the expense of higher harmonics in the projected fringes. In such systems, a care should be taken to decrease the influence of the higher harmonics, e.g. by defocused projection of a Ronchi grating or by using an area modulation grating to encode almost ideal sinusoidal transparency as it is described in [9, 13]. A new type of projection unit based on a diffractive optical element in the form of saw tooth phase grating is described in [14]. The use of a programmable SLM, e.g. liquiud crystal display (LCD) [15, 16] or digital micro-mirror device (DMD) [17, 18, 19], permits to control very precisely the spacing, colour and structure of the projected fringes [20, 21], and to miniaturize the fringe projection system enabling applications in space-restricted environment [22]. Synthetic FPs produced by an SLM, however, also suffer from the presence of the higher harmonics. The discrete nature of the projected fringes entails tiny discontinuities in the projected pattern that lead to loss of information. This problem is more serious for the LCD projectors whereas the currently available DMD chips with 4k × 4k pixel resolution make the digital fringe discontinuities a minor problem [23]. For illustration, Figs. 5.3–5.5 show schematically implementation of the PMP based on classical Max-Zhender interferometer (Fig. 5.3) [24], on DMD projection (Fig. 5.4) [25] and by using a phase grating (Fig. 5.5) [26]. The wrapped phase maps and 3D reconstruction of the objects for these three types of illumination are presented in Fig. 5.6. 5.1.3 Phase Demodulation The 2D fringe pattern (FP) that is phase modulated by the physical object being measured may be represented by the following mathematical expression: I(r, t) = IB (r, t) + IV (r, t)f [ϕ(r, t) + φ(r, t)]
(5.1)
90
E. Stoykova et al.
Fig. 5.3. Fringe projection system, based on a Mach-Zhender interferometer: L – lens; BS – beam splitter; SF – spatial filter; P – prism; PLZT – phase-stepping device
Fig. 5.4. Fringe projection system, based on computer generated fringe patterns. L – lens
5 Pattern Projection Profilometry for 3D Coordinates Measurement
91
Fig. 5.5. Fringe projection system based on a sinusoidal phase grating; L – lens
where IB (r, t) is a slowly varying background intensity at a point r(x, y) and a moment t, IV (r, t) is the fringe visibility that is also a low-frequency signal, ϕ(r, t) is the phase term related to the measured parameter, e.g. object profile. The phase term φ(r, t) is optional being introduced during the formation of the waveform f or during the phase evaluation. The continuous FP (5.1) recorded at a moment t is imaged over a CCD camera and digitized for further analysis as a 2D matrix of quantized intensities Iij ≡ I(x = i∆x, y = j∆y) with dimensions Nx × Ny , where ∆x and ∆y are the sampling intervals along X and Y axes and define the spatial resolution, Nx is the number of columns and Ny is the number of rows. The camera spatial resolution is a crucial parameter for techniques based on the principle of optical triangulation. The brightness of each individual matrix element (pixel) is given by an integer that varies from the minimum intensity, equal to 0, to the maximum intensity, equal e.g. to 255. The purpose of computer-aided fringe analysis is to determine ϕ(r, t) across the pattern and to extract the spatial variation of
Fig. 5.6. Wrapped phase maps and 3D reconstruction of objects obtained with sinusoidal fringes generated using a) interferometer, b) DMD, c) phase grating
92
E. Stoykova et al.
the parameter being measured. In the case of profilometry, once the phase of the deformed waveform is restored, nonambiguous depth or height values can be computed. The process of phase retrieval is often called phase evaluation or phase demodulation. The fringe density in the FP is proportional to the spatial gradient of the phase [27]. Hence evaluation of the fringe density is also close to phase demodulation. In general, the phase retrieval includes the steps: (i) Phase evaluation step in which a spatial distribution of the phase, the so-called phase map, is calculated using one or more FPs. As the phaseretrieval involves nonlinear operations, implementation of many algorithms requires some constraints to be applied. (ii) The output of the phase evaluation step, in most cases, yields phase values wrapped onto the range −π to π, which entails restoration of the unknown multiple of 2π at each point. Phase unwrapping step is a central step to these algorithms, especially for realization of the automatic fringe analysis. (iii) Elimination of additional phase terms introduced to facilitate phase measurement by an adequate least squares fit, an iterative process or some other method is sometimes required. Historically, the PMP has emerged from the classical moir´e topography [28], in which the fringes modulated by the object surface create a moir´e pattern. In the dawn of the moir´e topography operator intervention was required for assignment of fringe-orders, determination of fringe extrema or interpositions. Over the years, the phase-measuring systems with coherent and non-coherent illumination that realize the principles of moir´e, speckle and holographic interferometry have been extensively developed for measurement of a wide range of physical parameters such as depth, surface profile, displacement, strain, deformation, vibration, refractive index, fluid flow, heat transfer, temperature gradients, etc. The development of interferometric methodology, image processing, and computer hardware govern the rapid progress in automation of fringe analysis. Gradually, a host of phase evaluation algorithms have been proposed and tested. A detailed overview of phase estimation methods is given in [29]. In terms of methodology, most algorithms fall into either of two categories: temporal or spatial analysis. A common feature of temporal analysis methods is that the phase value of a pixel is extracted based on the phaseshifted intensities of this pixel. Spatial analysis methods extract a phase value by evaluating the intensity of a neighbourhood of the pixel being studied [30]. A temporal analysis method is the phase-shifting profilometry. Typical spatial analysis methods are Fourier transform methods with carrier fringes and without carrier fringes. Recently, the wavelet transform method has started to gain popularity. A crucial requirement for implementation of any algorithm is the ability for automatic analysis of FPs [31]. Another important requirement for capture of 3D coordinates is to perform the measurement in real time. From this point of view, the methods capable to extract phase information from a single frame are the most perspective. In order to replace
5 Pattern Projection Profilometry for 3D Coordinates Measurement
93
the conventional 3D coordinate measurement machines using contact styli, the PMP should be able to measure diffusely reflecting surfaces and to derive correct information about discontinuous structures such as steps, holes, and protrusions [32]. 5.1.4 Conversion from Phase Map to Coordinates Usually, in the PMP the depth of the object is determined with respect to a reference plane. Two measurements are made for the object and for the reference plane that yield two phase distributions ϕobj (x, y) and ϕref (x, y), respectively. The object profile is retrieved from the phase difference, ∆ϕ(x, y) = ϕobj (x, y) − ϕref (x, y). Calibration of the measurement system, i.e. how to calculate the 3D coordinates of the object surface points from a phase map, is another important issue of all full-field phase measurement methods. The geometry of a conventional PMP system is depicted in Fig. 5.7. The reference plane is normal to the optical axis of the camera and passes through the crosspoint of the optical axes of the projector and the camera. The plane XC OYC of the Cartesian coordinate system (OXC YC ZC ) coincides with the reference plane and the axis ZC passes through the camera center. The plane P which is taken to pass through the origin of (OXC YC ZC ) is normal to the optical axis of the projector. The Cartesian coordinate system (OXP YP ZP ) with the plane XP OYP coinciding with the plane P and axis ZP passing through the center of the projector system can be transformed to (OXC YC ZC ) by rotations around the XC axis, YC axis, and ZC axis in sequence, through the angles α, β, and γ, respectively. Mapping between the depth and the phase difference depends on positions and orientations of the camera and projector, fringe spacing, location of the reference plane, etc. It is important to note that the mapping is described by the non-linear function [33]. According to
Fig. 5.7. Geometry of the pattern projection system. The depth (or height) of the object point A with respect to the reference plane R is hA
94
E. Stoykova et al.
the geometry depicted in Fig. 5.7, the phase difference at the camera pixel (x = i∆x, y = j∆y) is connected to the depth (or height) hij of the current point A on the object as viewed by the camera in (x = i∆x, y = j∆y) with respect to the reference plane R by the expression [33]: ∆ϕ(x = i∆x, y = j∆y) ≡ ∆ϕij =
aij hij 1 + bij hij
(5.2)
where the coefficients aij = aij (LP , LC , d, α, β, γ) and bij = bij (LP , LC , d, α, β, γ) depend on coordinates of point A, the rotation angles between (OXC YC ZC ) and (OXP YP ZP ), fringe spacing d and distances LP and LC of the projector and the camera, respectively, to the reference plane. If the PMP is used for investigation of a specularly reflective surface, which acts as a mirror, the phase of the FP recorded by the CCD is distorted proportionally to the slope of the tested object [34]. In this simple model it is assumed that the lateral dimensions given usually by x and y coordinates are proportional to the image pixel index (i, j). However, this simplified model gives inaccurate formulas in case of lens distortion and if magnification varies from point to point, which destroys proportionality of the x and y coordinates to the image index (i, j) [35]. Reliable conversion of the phase map to 3D coordinates needs a unique absolute phase value. This phase value can be obtained using some calibration mark, e.g. one or several vertical lines with known positions on the projector at digital fringe projection. Calibration of PMP system based on a DMD digital fringe projection is addressed in [23] where a new phasecoordinate conversion algorithm is described. In [36] calibration is governed by a multi-layer neural network trained by using the data about the FP’s irradiance and the height directional gradients obtained for the test object. In this way, it is not necessary to know explicitly the geometry of the profilometric system. 5.1.5 Error Sources An important issue of all phase determination techniques is their accuracy and noise tolerance. It seems logical to adopt the following general model of the recorded signal: I(r, t) =Nm (r, t) {IB (r, t) + IV (r, t)f [ϕ(r, t) + φ(r, t) + Nph (r, t)]} + Na (r, t)
(5.3)
where the terms Nm (r, t), Na (r, t) and Nph (r, t) comprise the possible deterministic and random error sources. Depending on the processing algorithm and the experimental realization of the profilometric measurement multiple error sources of different nature will affect the accuracy of phase restoration and henceforth, the 3-D profile recovery becomes a challenging task. Environmental error sources as mechanical vibration, turbulent and laminar air flows in the optical path, dust diffraction, parasitic fringes, ambient light, that occur
5 Pattern Projection Profilometry for 3D Coordinates Measurement
95
during the acquisition of fringe data are unavoidable, being especially crucial in the interferometric set-ups. Error sources in the measurement system as the digitization error, low sampling rate due to insufficient resolution of the camera, nonlinearity error, electronic noise, thermal or shot noise, imaging error of the projector and the camera, the background noise, the calibration errors, optical system aberrations, beam power fluctuations and nonuniformity, frequency or temporal instability of the illuminating source, spurious reflections, and defects of optical elements, low precision of the digital-data processing hardware etc. occur in nearly all optical profilometric measurement systems leading to random variations of the background and fringe visibility. Measurement accuracy can be improved by taking special measures, e.g. by using a high-resolution SLM to reduce the digitization error of the projector and by defocusing the projected FPs or by selecting a CCD camera with a higher data depth (10 or 12 bits versus 8 bits). To reduce the errors due to calibration, a coordinate measuring machine can be used to provide the reference coordinates and to build an error compensating map [37]. Speckle noise affects the systems with coherent light sources [38, 39]. A special emphasis should be put on systematic and random error sources, Nph (r, t), that influence the measured phase. Such error sources as miscalibration of the phase-shifting device or non-parallel illumination which causes non-equal spacing in the projected pattern along the object introduce a non-linear phase component. Methodological error sources such as shadowing, discontinuous surface structure, low surface reflectivity, or saturation of image-recording system, would produce unreliable phase data. Accuracy of the measurement depends on the algorithm used for phase retrieval. For the local (pointwise) methods, the calculated output at a given point is affected by the values registered successively at this point or at neighbouring points whereas in global methods all image points affect the calculated value at a single point. A theoretical comparison of three phase demodulation methods in PMP in the presence of a white Gaussian noise is made in [40].
5.2 Phase Retrieval Methods 5.2.1 Phase-shifting Method 5.2.1.1 General Description A typical temporal analysis method is the phase-shifting (PS) algorithm in which, the phase value at each pixel on a data frame is computed from a series of recorded FPs that have undergone a phase shift described by a function φ(r, t). If the reference phase φi , i = 1, 2, . . . , M is kept constant during the capture time and is changed by steps between two subsequent FPs, the method is called phase stepping or phase shifting profilometry (PSP). In this case, to determine the values of IB (r), IV (r) and ϕ(r) at each point, at least three FPs (N = 3) are required. In phase integration modification of the method, the reference phase is changed linearly in time during the measurement [41].
96
E. Stoykova et al.
The PSP is well accepted in many applications due to its well-known advantages as high measurement accuracy, rapid acquisition, good performance at low contrast and intensity variations across the FP, and possibility for determination of the sign of the wave front. The PSP can ensure accuracy better than 1/100th of the wavelength in determination of surface profiles [42]. As in all profilometric measurements PSP operates either in a comparative mode with a reference surface or in an absolute mode Usually, in PSP, phase evaluation relies on sinusoidal pattern projection I(r, t) = I0 (r, t) + IV (r, t) cos[ϕ(r, t) + φ(r, t)]
(5.4)
Violation of the assumption f [. . .] = cos(. . .) causes systematic errors in the evaluated phase distribution. Two approaches are broadly used in the phaseshifting, one based on equal phase steps – typically multiples of π/2 – and the other based on arbitrary phase steps. These two approaches are usually referred to as a conventional and a generalized PSP [43, 44]. A modification of the method with two successive frames shifted at known phase steps and one frame shifted at unknown phase step is proposed in [45]. All these phaseshifting algorithms can also be called digital heterodyning [46]. The most general approach for phase retrieval in the PSP with M FPs shifted at known phase-steps is the least squares technique [47, 48]. Under the assumption that the background intensity and visibility have only pixel-to-pixel (inter-frame) variation, in the digitized FPs m m Iij = Bij + Vijm cos(ϕij + φm ), m = 1, 2 . . . M
(5.5)
where Bij = IB (i∆x , j∆y ) and Vij = IV (i∆x , j∆y ), i = 1, 2, . . . , Ny , j = 1, 2, . . . , Nx , we have M 2 1 Bij = Bij and Vij1 = Vij2 = . . . = VijM = Vij = . . . = Bij = Bij
(5.6)
Assuming also that the phase steps are known, the object phase is obtained m from minimization of the least-square error between the experimental Iˆij and the calculated intensity distribution Sij =
M
m m 2 (Iˆij − Iij ) =
m=1
M
m 2 (Bij + aij cos φm + bij sin φm − Iˆij )
(5.7)
m=1
The unknown quantities aij = Vij cos ϕij and bij = −Vij sin ϕij are found as the least squares solution of the Equation Bij ˆ ij = aij = Ξ ˆ −1 Y ˆ ij Ω ij bij
(5.8)
5 Pattern Projection Profilometry for 3D Coordinates Measurement M
M where Ξij =
M
m=1 M
m=1
cos φm sin φm
M
m=1 M
m=1
cos2 φm
(cos φm ) sin φm
m=1
M
m=1
M
cos φm M
m=1
97
sin φm
(cos φm ) sin φm and
m=1
M
m=1
sin2 φm
m Iij
M ˆ ij = I m cos φ Y ij m m=1 M
m=1
m Iij sin φm
The phase estimate is obtained in each pixel as ˆ ij = tan−1 (−bij /aij ) ϕ
(5.9)
In the case of the so called synchronous detection the M FPs are equally spaced over one fringe period, φm = 2πm/M , and the matrix Ξij becomes diagonal. More general approach is to take M equally shifted FPs and to determine the phase from M bm Im (x, y) ϕ(x, y) = tan−1 m=1 (5.10) M am Im (x, y) m=1
The number of frames or “buckets” usually gives the name of the algorithm. Popular algorithms are the 3-frame algorithm with a step of 120◦ or 90◦ as well as the 4-frame algorithm and 5-frame algorithm with a step of 90◦ : ˆ = arctan ϕ
I4 − I2 2(I4 − I2 ) π ˆ = arctan ,ϕ , αi = (i − 1) I1 − I3 I1 − 2I3 + I5 2
(5.11)
5.2.1.2 Accuracy of the Measurement The choice of the number of frames depends on the desired speed of the algorithm, sensitivity to phase-step errors and harmonic content of the function f [. . .], and accuracy of the phase estimation. The errors in the phase-step, φ(r, t), and a nonsinusoidal waveform are the most common sources of systematic errors in the PSP [46, 49]. A nonsinusoidal signal may be caused by the non-linear response of the detector [46]. The phase shift between two consecutive images can be created using different means depending on the experimental realization of the profilometric system. Different phase-shifting devices are often subject to nonlinearity and may not ensure good repeatability. Miscalibration of phase shifters may be
98
E. Stoykova et al.
the most significant source of error [50]. In fringe-projection applications precise linear translation stages are used [51]. In interferometric systems a phase shifter usually is a mirror mounted on a piezoelectric transducer (PZT). In such systems non-stability of the driving voltage, nonlinearity, temperature linear drift and hysteresis of the PZT device and the tilt of the mirror affect the accuracy of the measurement. In the scheme presented in Fig. 5.3, a special feedback is introduced to keep constant the value of the phase-step. For largescale objects it is more convenient to create a phase shift by slightly changing the wavelength of the light source. A phase-shifting system with a laser diode (LD) source has been proposed in [52] and [53], in which the phase shift is created by a change of the injection current of the LD in an unbalanced interferometer. The phase shift can be introduced by digitally controlling the SLM that is used for generation of fringes. As an example, an electrically addressed SLM (EA-SLM) is used to display a grating pattern in [42]. In [54] a DMD microscopic system is designed in which the three colour channels in the DMD projector are programmed to yield intensity profiles with 2π/3 phase shift. Using the colour channel switching characteristic and removing the colour filter, the authors succeed to project grey-scale fringes and by proper synchronization between the CCD camera and the DMD projection to perform one 3D measurement within 10 ms. A comprehensive overview of the overall error budget of the phase-shifting measurement is made in [55, 56]. The analysis in [57] divides the error sources in the PSP into three groups. The first group comprises systematic errors with a sinusoidal dependence on the measured phase as the phase-step errors and the detector non-linearities. The second group includes random error sources that may also cause sinusoidal ϕ-dependence of the measured phase error. Such sources are the instability of the light source, random reference phase fluctuations, and mechanical vibrations. The third group of errors consists of random errors which are not correlated to the measured phase as different noises that introduce random variation across the FP. Such noises are the detector output and quantization noise of the measured intensity. ˆ , i = According to [57], the systematic phase-step error δφi = φi − φ i 1, 2 . . . M , given by the difference between the theoretical phase step, φi , for the i-th frame and the mean
value of the phase step that is actually introduced ˆ by the phase-shifter, φi , may be presented as a series δφi = ε1 φi + ε2 φ2i + ε3 φ3i +. . ., with coefficients ε1 , ε2 , ε3 , . . . that depend on the phase shifter. The error analysis in [57] has indicated the linear and the quadratic phase step deviations as one of the main error sources degrading the phase measurement accuracy. If only the linear term is kept in δφi , the error induced in ϕ in all points of the FP for most of the PS algorithms is given by [57]: δϕ =
M ∂ϕ ∂Ii i=1
∂Ii
∂φi
δφi
(5.12)
5 Pattern Projection Profilometry for 3D Coordinates Measurement
99
The linear approximation (5.15) leads to dependence of the systematic error, δϕ, on cos 2ϕ and sin 2ϕ [24, 58, 59, 60]. In fact, as it is shown in [57], the quadratic and cubic terms in (5.12) also lead to cos 2ϕ dependence of the systematic error. Influence of miscalibration and non-linearity of the phaseshifter for different phase-stepping algorithms is studied in [61]. The other frequently addressed systematic error is the non-linearity caused by the detector. To study its effect on the measured phase, [57] uses a polyno mial description of intensity error, δIi = Iˆi − Ii = α1 I 2 + α2 I 3 + α3 I 4 + . . ., i
i
i
where α1 , α2 , α3 are constants. The detector non-linearity introduces higher harmonics in the recorded FP. Calculations and simulation made by different authors show that a linear approximation in δIi leads to dependence of the phase systematic error, δϕ, on cos(M ϕ) e.g. for the four-step algorithm δϕ depends on cos(4ϕ). Important source of intensity error is the quantization error in video cameras and frame grabbers used for data acquisition. By rounding or truncating values in the analog-to-digital conversion the quantization changes the real intensity values in the FP and causes an error in the phase estimate which depends on the number of intensity levels. Quantization is a non-linear operation procedure. First quantization error analysis in PSP is made in the thesis of Koliopoulus [62] and further developed by Brophy in [63]. Brophy [63] studies how the frame-to-frame correlations of intensity error influence the phase error. In the absence of frame-to-frame correlation the phase variance δϕ2 decreases as 1/M. Brophy assumes in the analysis that the intensity quantization error expressed in grey levels is uniformly distributed in the interval [−0.5, 0.5]. This source of error does not exclude frame-to-frame correlation. As a result, δϕ2 may increase with the number of frames. Brophy obtains for √ 1/2 = 1/( 3Q). Specific algorithms a Q-level quantization the formula δϕ2 could be designed in this case to decrease the phase variance. By introducing a characteristic polynomial method Zhao and Surrel [64, 65] succeed to avoid necessity to determine the inter-frame correlation of intensities in calculation of the phase variance. For the purpose, the phase in (5.10) can be taken as an argument of a linear combination S(ϕ) =
M
cm Im =
m=1
1 IV P (ς) exp(jϕ) 2
(5.13)
in which the characteristic polynomial is defined by P (ς) =
M
cm ς m
(5.14)
m=1
where ς = exp(jφ), cm = am + jbm . Surrel [66] shows that error-compensating behaviour of any phase-shifting algorithm can be determined by analyzing location and multiplicity of the roots of P (ς). This approach permits to find
100
E. Stoykova et al.
the sensitivity of the phase-shifting algorithms also to the harmonic content in the FP [66] and to obtain a simplified expression for the phase quantization error. It has been obtained that for an 8-bit or more quantization, this error is negligible for noiseless FPs, if the intensity is spread over the whole dynamic range of the detection system. The analysis and simulations made in [67] show that in the most common CCD cameras a nominal 6-bit range is used from the available 8-bit range which leads to a phase error of the order of 0.178 radians. The accuracy is increased by a factor of four if a 12-bit camera is used [67]. Algorithms with specific phase steps to minimize the errors from miscalibration and nonsinusoidal waveforms have been derived using the characteristic polynomial. It is obtained that a (j + 3)-frame algorithm eliminates the effects of linear phase-shift miscalibration and harmonic components of the signal up to the j-th order. Vibration as a source of error is essential in the interferometric set-ups. For example, testing of flat surfaces needs a very high accuracy of 0.01µm. Vibration induces blurring and random phase errors during acquisition of the successive frames in the temporal PS. For this reason interferometric implementation of the temporal PSP is appropriate whenever the atmospheric turbulence and mechanical conditions of the interferometer remain constant during the time required for obtaining the interferograms [31]. Analysis made in [68] shows that low frequency vibrations may cause considerable phase error whereas high frequency vibration leads to a reduced modulation depth [68]. In [68] a (2 + 1) algorithm is proposed in which two interferograms, separated by a quarterwave step, are required to calculate the phase. A third normalizing interferogram, averaged over two phases that differ at 180◦ , makes possible to evaluate the background intensity. A thorough analysis of the vibration degrading effect is made in [69, 70]. Applying a Fourier analysis, an analytical description of the influence of small amplitude vibrations on the recorded intensity is obtained and the relationship between the Fourier spectrum of the phase error and the vibration noise spectrum is found by introduction of the phase-error transfer function which gives the sensitivity of the PS measurement to different noise frequency components. It is shown that immunity to vibration noise increases for the algorithms with higher number of recorded patterns. A max-min scanning method for phase determination is described in [71] and it is shown in [50] that it has a good noise tolerance to small amplitude low-frequency and high frequency noise. Lower accuracy of phase demodulation and phase unwrapping should be expected in the image zones with low fringe modulation or contrast, e.g. in areas with low reflectivity. Fringe contrast is important characteristic for finding of an optimal unwrapping path and for optimal processing of phase data such as filtering, improving visualization, and masking [72]. However, using of high fringe contrast as a quality criterion of a good data is not always reliable because this feature of the FPs is insensitive to such surface structure changes as steps. In the areas with steps which do not cast shadow the fringe
5 Pattern Projection Profilometry for 3D Coordinates Measurement
101
contrast is high but phase data are unreliable. Evaluation of the fringe contrast from several successively recorded images is inapplicable for real-time measurement. In [73] the fringe contrast and quality of the phase data are evaluated from a single FP by a least-squares approach. It is rather complicated to perform in situ monitoring of the phase step, e.g. by incorporating additional interferometric arms. The more preferable approach is the so-called self-calibration of the phase steps [74] which makes use of the redundancy of the FPs. Some of the developed self-calibrating algorithms are pointwise whereas others take into account the information contained by the whole FP. However, most of the developed self-calibration methods put restrictions on the number and quality of the FPs and on the performance of the phase shifters. Over the years, different approaches for deriving error-compensating algorithms have been proposed [49, 75]. Hibino [49] divides the PS algorithms into three categories according to their ability to compensate systematic phase-step errors. The first group comprises algorithms without immunity to systematic phase-step error, e.g. the synchronous detection algorithm. The second group consists of the error-compensating algorithms able to eliminate linear or nonlinear phase-step errors. The third group of algorithms compensate for systematic phase-step errors in the presence of harmonic components of the signal. To justify the compensating properties of the proposed algorithms different approaches have been invented as averaging of successive samples [76], a Fourier description of the sampling functions [77], an analytical expansion of the phase error [57] etc. Currently, the five-frame algorithm proposed by Schwider–Hariharan [78] becomes very popular. Hariharan et al. show that the error of the five-frame algorithm has a quadratic dependence on the phase step error. A new four interferogram method for compensating linear deviations from the phase step is developed in [79]. To increase the accuracy, algorithms based on more frames start to appear [76]. Algorithms derived in [80] based on seven or more camera frames prove to have low vulnerability to some phase-step errors and to lowfrequency mechanical vibration. In [81] three new algorithms are built with π/2 phase steps based on the Surrel [82] six-frame algorithm with a π/2 step, and four modifications of the conventional four-frame algorithm with a phase step of π/2 are studied using a polynomial model for the phase-step error. The ability to compensate errors is analyzed by the Fourier spectra analysing method. The main conclusion of the analysis is that it is possible to improve performance of π/2 algorithms by appropriate averaging technique. A selfcalibrating algorithm proposed in [83] relies on the assumption of constant arbitrary phase-steps between the consecutive FPs and quasi-uniform distribution of the measured phase taken modulo 2π in the range (0, 2π) over the recorded FP. When the phase steps differ from the actual ones, the probability density distribution of the retrieved phase is no longer uniform and exhibits two maxima. Applying of an iterative fitting procedure to a histogram built for the retrieved phase permits to find the actual phase steps and to correct
102
E. Stoykova et al.
the demodulated phase. The algorithm is further improved in [84] where the visibility of fringes across the FP is assumed to be constant whereas the background is allowed to have only intraframe variations. The improved algorithm introduces a feedback to adjust the supposed phase shifts until the calculated visibility map becomes uniform. The merit of the algorithm is its operation at arbitrary phase steps, however at the expense of constant visibity requiriment. A general approach to diminish or eliminate some error sources in PS interferometry is proposed in [85]. A model for Nph (r, t) is built which takes in account the phase-step error and considers an interferomer with a spherical Fizeau cavity. A generic algorithm for elimination of the mechanical vibration during the measurement is also described. Reference [77] adopts Fourier-based analysis to determine suitable sampling functions for the design of a five-frame PS algorithm that is insensitive to background variation when a laser diode is used as a phase shifter. Criteria are defined to check algorithm vulnerability to the background change. In addition, the authors evaluate the influence of the linear phase-shift miscalibration and the quadratic non-linearity of the detector error. An accurate method for estimation of the phase step between consecutive FPs is proposed in [51] for the case of five frame algorithm with an unknown but constant phase step, which permits to calculate the phase step as a function of coordinates and to use the so called lattice-site representation of the phase angles. In this representation the distance of the corresponding lattice site to the origin depends on the phase step. In the ideal case all lattice sites that correspond to a given phase step but to different phases lie on a straight line passing through the origin of the coordinate system whose both axes represent the numerator and denominator in the equation for phase step calculation [78]. The error sources deform somehow shape and spread of both histogram and lattice-site representation patterns. Application of the latter to analysis of behaviour of four and five frame algorithms is made in [86]. It is proven that the lattice-site representation outperforms the histogram approach for detection of errors in the experimental data. A phase shifter in an interferometric setup is vulnerable to both translational and tilt-shift errors during shifting, which results in a different phasestep value in every pixel of the same interferogram. An iterative algorithm that compensates both translational- and tilt-shift errors is developed in [87] which is based on the fact that the 2D phase distribution introduced by the phase-shifter is a plane. This plane can be determined by a first-order Taylor series expansion that makes possible to transform the nonlinear equations for defining the phase-shift plane into linear ones. By using an iterative procedure both errors can be minimized. A liquid-crystal SLM may produce nonlinear and spatially nonuniform phase shift [75]. 5.2.1.3 Generalized Phase-shifting Technique In the conventional phase-shifting algorithms the phase steps are known and uniformly spaced. In this case simple trigonometry permits derivation of
5 Pattern Projection Profilometry for 3D Coordinates Measurement
103
explicit formulas for the object phase calculation. It is also assumed that the background and visibility of fringes have only pixel-to-pixel variation but remain constant from frame to frame. In the generalized PSP, which in recent years gains increasing popularity because of the advantage to use arbitrary phase steps, these steps are unknown and should be determined from the recorded FPs. It is a frequently solved task in the PSP, e.g. for calibration of the phase-shifter. Determination of the phase-step between two consecutive interferograms is similar to the signal-frequency estimation, which has attracted a lot of attention in the signal-processing literature. However, it is more complicated due to the fact that the background intensity (the dc component) is involved in the processed signal [88]. Determination of the phase step is equivalent to the task of the phase-step calibration which, generally speaking, can be performed by using two approaches: fringe tracking or calculation of the phase-step from the recorded FPs [89]. In the fringe tracking the size of the phase step is obtained from the displacement of fringes following some characteristic features of the fringes, e.g. positions of their extrema after performing fringe skeletonizing to find the centers of dark or bright interference lines [79]. An extensive overview of algorithms for determination of unknown phase steps from the recorded FPs is made in [90]. Phase step determination in a perturbing environment is analyzed in [91]. Several methods as Fourier series method, iterative linear and non-linear least squares methods are compared on the basis of computer simulations which prove the reliability of all of them for the derivation of the phase step. Historically, development of self-calibrating algorithms starts with the first phase-stepping algorithm proposed by Carr´e in 1966 [92]. The algorithm is designed to operate at an arbitrary phase step, φ, which is determined during the processing under assumption of linear phase step errors. It requires four phaseshifted images Ii = I0 + IV cos[ϕ + (i − 1.5)φ], i = 0, . . . , 3 under assumption of the same background intensity, modulation, and phase step for all recorded images.The Carr´e algorithm accuracy is dependent on the phase step. Carr´e recommends the value of 110 degrees as most suitable. The accuracy of the algorithm has been studied both theoretically [57] and by computer simulations [55] for the phase step π/2. Computer simulations and experiments performed in [79] for the case of white additive noise and Fourier analysis made in [93] confirm the conclusion of Carr´e that highest accuracy is observed at 110 deg. In [94] search for the best step that minimize the error of the Carr´e algorithm is made by means of linear approximation of Tailor series expansion of the phase error. Linear approximation yields correct results only in the case of small error expansion coefficients. The obtained results also indicate φ = 110◦ as the best choice but only when the random intensity fluctuations (additive noise) are to be minimized. This value is not recommendable for compensation of a phase step error or a systematic intensity error. The authors draw attention to the fact that the numerator in the Carr´e algorithm should be positive which is fulfilled only for perfect images without noise. A number
104
E. Stoykova et al.
of other algorithms with a fixed number of equal unknown phase steps have been recently proposed in [95, 96, 97]. Use of a fixed number of equal steps is certainly a weak point in the measurement practice. This explains the urge of the phase-shifting community to elaborate more sophisticated algorithms with randomly distributed arbitrary unknown phase steps. Direct real time evaluation of a random phase step in the generalized PS profilometry without calibration of the phase-shifter is realized in [98] where the phase step is calculated using a Fourier transform of straight Fizeau fringes that are simultaneously generated in the same interferometric set-up. The necessity to have an additional optical set-up puts limitation on the method application. Evaluation of the phase steps by Lissajous figure technique is described in [99, 100]. The phase is determined by ellipse fitting based on Bookstein algorithm of a Lissajous figure obtained if two phase-shifted fringe profiles are plotted against each other. The algorithm, however, is sensitive to noise and easily affected by low modulation of the FP. Some improvement of the algorithm is proposed in [100] where the Lissajous figures and elliptic serial least-squares fitting are used to calculate the object phase distribution. The algorithm has both immunity to errors in φ and possibility for its automatic calibration. Reduction of phase error caused by linear and quadratic deviations of the phase step by means of a self-calibrating algorithm is proposed in [59]. The estimates of the phase steps are derived from each FP, and the exact phase difference between the consecutive patterns is calculated. Numerical simulation proves the efficiency of the algorithm up to 10% linear and 1% quadratic phase deviations and by experiments with a Twyman–Green interferometer for gauge calibration. Phase-calibration algorithm for phase steps less than π that uses only two normalized FPs is proposed in [101]. For the purpose, a region that concises a full fringe (region with the object phase variation of at least 2π) is chosen. The phase step is retrieved by simple trigonometry. A method for evaluation of irregular and unknown phase steps is described in [102] based on introduction of the carrier frequency in the FPs. The phase steps are determined from the phases of the first-order maximum in the spectra of the recorded phase-shifted FPs in the Fourier domain. The Fourier analysis can be applied to a subregion of the FP with high quality of the fringes. This straightforward and simple method works well only in the case of FPs with narrow spectra. Algorithms that exploit a spatial carrier use a relatively small number of interferograms. Improvement of the Fourier transform method based on the whole-field data analysis is proposed in [103]. The phase-step is obtained by minimization of the total energy of the first-order spectrum of the difference of two consecutive FPs with one of them multiplied by a factor of exp(jφ) where φ is equal to the estimated value of the phase step to be determined. Simulations and experiments prove that the algorithm is effective, robust against noise, and easy to implement. Based on quadrature filter approach, Marroquin et al. propose in [104] an iterative fitting technique that simultaneously yields the phase steps and the object phase, which
5 Pattern Projection Profilometry for 3D Coordinates Measurement
105
is assumed to be smooth. In [89] a method is proposed that requires only two phase-stepped images. The phase step is estimated as an arccosine from the correlation coefficient of both images without requirement for constant visibility and background intensity. The method can show position-dependent phase-step differences, but it is strictly applicable only to areas with a linear phase change. To overcome the errors induced in the phase step by different sources it is desirable to develop a pointwise algorithm that can compute the phase step and the object phase at each pixel [105]. The first attempt to deduce an algorithm with unknown phase steps using the least-squares approach belongs to Okada et al. [106]. Soon it is followed by several proposals of self-calibrating least-squares PS algorithms [45, 107, 108, 109]. The essence of the least-squares approach is to consider both phase steps and object phase as unknowns and to evaluate them by an iterative procedure. This approach is especially reliable for FPs without spatial carrier fringes presenting stable performance in the case of nonlinear and random errors in the phase step. The number of equations which can be constructed from M FPs each consisting of Nx × Ny pixels is 3M × Nx × Ny whereas the number of unknowns is 3Nx × Ny + M − 1. This entails the requirement 3M ×Nx ×Ny ≥ 3Nx ×Ny +M −1 to ensure the object phase retrieval. To have stable convergence the least-squares PS algorithms with unknown phase steps need comparatively uniformly spaced initial phase steps that are close to the actual ones. These algorithms usually are effective only at small phase-step errors and require long computational time. They are not able to handle completely random phase-steps. As a rule, these methods are either subject to significant computational burden or require at least five FPs for reliable estimation. The least-squares approach is accelerated in [109, 110] where a computationally extensive pixel-by-pixel calculation of the phase step estimate is replaced with a 2 × 2 matrix equation for cos φ and ˆ = tan−1 (sin φ/ cos φ) until sin φ. The phase step is determined iteratively as φ the difference between two consecutive phase step estimates falls down below a predetermined small value. The limitations of the least squares approach are overcome by an advanced iterative algorithm proposed in [111] and [112] which consists of the following consecutive steps: i) Using a least-squares approach, the object phase is estimated in each pixel under assumption of known phase steps and intraframe (pixel-to-pixel) variations of the background intensity and visibility. ii) Using the extracted phase distribution, the phase steps φn = tan−1 (−dn /cn ) are updated by minimization of the least-square error: Sn =
Ny Nx i=1 j=1
n n 2 (Iˆij −Iij ) =
Ny Nx n 2 ) (5.15) (B n +cn cos ϕij +dn sin ϕij −Iˆij i=1 j=1
under assumption of interframe (frame-to-frame) variations of the backn ground intensity and visibility, Bij = B n and Vijn = V n , with cn = n n V cos φn and dn = −V sin φn .
106
E. Stoykova et al.
iii) If the pre-defined converging criteria are not fulfilled the steps i) and ii) are repeated. An improved iterative least-squares algorithm is constructed in [108] which minimizes the dependence of differences between the recorded intensities and their recalculated values with respect of the phase step errors. An iterative approach is considered in [113] where the phase steps are estimated by modelling of an interframe intensity correlation matrix using the measured FPs. This makes the method faster, more accurate and less dependent on the quality of the FPs. The smallest eigenvalue of this matrix yields the random error of intensity measurement. As few as four FPs are required for phase-steps estimation. The developed iterative procedure is rather simplified in comparison with the methods that rely on pixel-to-pixel calculation. The accuracy of 2 × 10−3 rad has been achieved. A pointwise iterative approach for the phase step determination based on linear predictive property and least squares minimization of a special unbiased error function is proposed in [88]. The algorithm works well only for a purely sinusoidal profile of the FP. Phase retrieval and simultaneous reconstruction of the object wave front in PS holographic interferometry with arbitrary unknown phase steps is proposed in [107]. Assuming uniform spatial distribution of the phase step over the recorded interferogram, the authors obtain the following relationship between consecutive interferograms pn =
In+1 − In φn+1 − φn 2 √ 4 I I = π sin 2 0 r
(5.16)
where I0 and Ir are the intensities of the object and the reference waves. The parameter pn can be determined for all recorded interferograms which further permits to restore the complex amplitude of the object wave. The process is repeated iteratively until the difference φn+1 − φn becomes less than a small predetermined value. The algorithm is proved to work well for any number of patterns M > 3 by computer simulations. Extension of the algorithm is proposed in [114] for the case when only the intensity of the reference beam must be measured. The need of iterations, however, makes it unsuitable for real-time measurement as the authors recommend at least 20 iterations in 1 min to reach the desired high accuracy. To avoid iterations and the need of alternative estimation of the object phase and the phase step, Qian et.al. [115] propose to apply a windowed Fourier transform to a local area with carrier-like fringes in two consecutive FPs. The objective of [43] is to develop a generalized PS interferometry with multiple PZTs in the optical configuration that operates under illumination with a spherical beam in the presence of higher harmonics and white Gaussian intensity noise. These goals are achieved by a super-resolution frequency estimation approach in which Z-transform is applied to the phase-shifted FPs, and their images in the Z-domain are multiplied by a polynomial called an annihilating filter. The zeros of this filter in the Z-domain should coincide
5 Pattern Projection Profilometry for 3D Coordinates Measurement
107
with the frequencies in the fringes. Hence, the parametric estimation of the annihilating filter provides the desired information about the phase steps. Pixelwise estimation of arbitrary phase steps from an interference signal buried in noise in the presence of nonsinusoidal waveforms by rotational invariance is proposed in [105]. First, a positive semidefinite autocorrelation matrix is built from the M phase-shifted records at each pixel (i, j) which depends only on the step between the samples. The signal is separated from the noise by a canonical decomposition to positive definite Toeplitz matrices formed from the autocorrelation estimates. The phase steps are determined as frequency estimates from the eigen decomposition of the signal autocorrelation matrices. The exact number of harmonics in the signal is required. The method is extended to retrieve two distinct phase distributions in the presence of higher harmonics and arbitrary phase-steps introduced by multiple PZTs [116]. In [117] the problem of using two or more PZTs in the PS interferometry with arbitrary phase steps in the presence of random noise is solved by maximumlikelihood approach. The developed algorithm should allow for compensation of non-sinusoidal wavefront and for non-collimated illumination. 5.2.1.4 Phase Unwrapping As it has been already mentioned, the presence of the inverse trigonometric function arctg in the PS algorithms introduces ambiguity in the measured phase distribution. The calculated phase is wrapped into the interval of (−π; +π). The process of removing 2π crossovers (unwrapping) could simply be described as subtracting or adding 2π multiples to the wrapped phase data [118] that is equivalent to assign the fringe order at each point: ϕunw (i, j) = ϕwr (i, j) + 2πk(i, j),
(5.17)
where ϕunw (i, j) is the unwrapped phase at the pixel (i, j), ϕwr (i, j) is the experimentally obtained wrapped phase at the same point, and k(i, j) is an integer, that counts 2π crossovers from a starting point with a known phase value to the point (i, j) along a continuous path. Therefore, the phase unwrapping problem is a problem of estimation of the correct value of k(i, j) in order to reconstruct the initial true signal [119]. The described unwrapping procedure performs well only in the case of a noise-free, correctly sampled FP, without abrupt phase changes due to object discontinuities [120]. The basic error sources that deteriorate the unwrapping process are i) speckle noise, ii) digitalization and electronic noise in the sampled intensity value, iii) areas of low or null fringe visibility, and iv) violation of the sampling theorem [44, 120, 121]. In addition, the phase unwrapping algorithms should distinguish between authentic phase discontinuities and those caused by object peculiarity, coalescence [122], shadowing or non-informative zones due to limited detector visibility range. Over the years a lot of research is aimed to develop different unwrapping techniques [123], which should find
108
E. Stoykova et al.
the middle ground between alleviation of the computational burden and reduction of influence of the phase ambiguities [124]. One of the major problems in the phase unwrapping is how to estimate the unreliable data that may disturb actual data restoration. The principal categorization of algorithms that attack the major error sources is proposed in [125, 126], where three basic classes are outlined: i) Global class. In global class algorithms the solution is formulated in terms of minimization of a global function. The most popular phase unwrapping approaches [125, 127, 128, 129, 130, 131, 132] are based on solution of a unweighted or weighted least-squares problem [123, 133]. All the algorithms in this class are known to be robust but computationally intensive. The presence of noise and other fringe discontinuities, however, leads to corrupted results because of the generalized least square approach applied. To overcome this disadvantage, a time-consuming post-processing should be utilized [126]. ii) Region class. An essential feature of these algorithms is subdivision of wrapped data in regions. Each region is processed individually and on this basis larger regions are formed till all wrapped phase values are processed. This restricts the local errors only to the processed zone of the FP preventing their propagation into the other regions. There are two groups of region algorithms: 1) Tile-based and 2) Region-based. In tilebased approach [134, 135] the phase map is divided into grid of small tiles, unwrapped independently by line-by-line scanning techniques and after that the regions are joined together. However, this algorithm is not successful in processing of very noisy data. The region-based approach, proposed initially by Geldorf [136] and upgraded by other researchers [119, 128, 137, 138, 139, 140] relies on forming uniform regions of continuous phase. A comparison of a pixel to its neighbour is performed. If the phase difference is within a predefined value, then the pixel and its neighbour are attached to the same region; otherwise, they belong to different regions. After that the regions are shifted with respect to each other to eliminate the phase discontinuities. iii) Path following class, in which data unwrapping is performed using an integration path. The class of path-following algorithms can be subclassified into three groups: 1) Path-dependant methods; 2) Residuecompensation methods and 3) Quality guided path methods. The first group is characterized with phase integration on preliminary defined path (i.e. linear scanning, spiral scanning, multiple scan direction [141]); the simplest example of this type is proposed by Schafer and Oppenheim’s [142]. Despite of their benefit to be low-time consuming, these methods are not reliable at the presence of noise and other error sources due to the fixed integration path. The residue-compensation methods rely on finding the nearest residues (defined as unreliable phase data) and connect them in pairs of opposite polarity by a branch-cut [143]. Uncompensated residues
5 Pattern Projection Profilometry for 3D Coordinates Measurement
109
could be connected also to the image border pixels. The unwrapped procedure is realized without crossing any branch-cut placed, limiting the possible integration paths. Other similar approaches [144, 145] are also based on branch-cuts unwrapping strategy. These methods produce fast results, but an inappropriately placed branch-cut could lead to isolation of some phase zones and discontinuous phase reconstruction. Quality-guided path following algorithms initially unwrap the most reliable data, while the lowest reliable data are passed up in order to avoid error spreading. The choice of integration path depends on pixels quality in the meaning of quality map, first proposed by Bone [146], who uses a second difference as a criterion for data reliability, setting up a threshold, and the all phase data with calculated second derivatives under it are unwrapped in any order. The method is improved [147, 148] by introducing an adaptive threshold with increasing threshold value whose implementation allows all data to be processed. However, when reliable quality map is not presented, the method fails in phase restoration. The accuracy of the produced quality map assures successful performance of the method [149] with different type of phase quality estimators, such as correlation coefficients [123, 150], phase derivatives variance [151, 152] or fringe modulation [77, 153]. For illustration of some of the discussed phase unwrapping methods we processed the wrapped phase map (Fig. 5.8) of two real objects – plane and complicated relief surface, experimentally produced by two-spacing projection PS interferometry [5]. The results are shown in Fig. 5.9. Goldstein algorithm (Fig. 5.9a) identifies the low quality phase values, but does not create correct branch-cuts. The main advantage of this algorithm is minimization of the branch-cut length, thus allowing for fast data processing. However, this approach is not efficient in the case of phase maps with sharp
Fig. 5.8. Wrapped phase map of a test (left) and a real (right) object
110
E. Stoykova et al.
Fig. 5.9. Unwrapped phase map for a) Goldstein method, b) mask-cut method, c) minimum Lp – norm method, d) weighted multigrid method, e) conjugated gradient method, f ) least-squares method, g) quality-guided path following method and h) Flynn method
discontinuities. The same bad result is observed when implementing Mask-cut algorithm (Fig. 5.9b) that upgrades the Goldstein method with introducing quality map to guide the branch-cut placement. In comparison with Goldstein method, the incorrect interpretation of phase data could be due to low accuracy of the quality map. The phase unwrapping using all four minimum norm methods fails (Fig. 5.9) in the case of complex phase map with low quality noisy regions and discontinuities. A possible reason is the absence of a good quality map. Increasing the number of iterations improves the quality of the demodulated phase but at the expense of longer computational time. Quality-guided path following method (Fig. 5.9g) successfully demodulates the processed phase map. The regions with bad quality values (due to noise and shadowing) are recognized due to implementation of quality map that guides the integrating path. The algorithm is fast and successfully presents the small details that make it suitable for processing of complex phase maps. Flynn method (Fig. 5.9h) also provides phase reconstruction by effectively identifying phase discontinuities as a result of its main benefit – to perform well without an accurate phase map. However, in comparison with Qualityguided path following method it has poorer presentation of details and flat surfaces and is more time consuming. Involvement of arctan function in phase retrieval is an obstacle in achieving the two main goals of the PMP: high measurement accuracy and unambiguous full-filed measurement. Among the solutions of this problem there is the so-called temporal-phase unwrapping method [154, 155] which makes pixel-bypixel unwrapping along the time coordinate by projection of a proper number of FPs at different frequencies. Thus propagation of the unwrapping error to
5 Pattern Projection Profilometry for 3D Coordinates Measurement
111
the neighbouring pixels is avoided. The first projected pattern in the temporal sequence consists of a single fringe, and the phase changes from −π to +π across the field of view [156]. If the number of fringes increases at subsequent time values as n = 2, 3 . . . , N , so the phase range increases as (−nπ, nπ). For each n, M phase-shifted FPs are recorded. Therefore, the measured intensity depends on pixel coordinates, current number of fringes and number of the phase-shifted patterns. Analysis made in [44, 157] shows that the error in depth determination scales as N −1 to N −3/2 . Obviously, temporal unwrapping is suitable to applications when the goal is to derive the phase difference. Modifications of the original scheme have been tested aiming to reduction of the used FPs. As an example, in [158] two sinusoidal gratings with different spacings are used for fringe projection. The grating with higher spatial frequency ensures the sensitivity of the measurement while the coarse grating creates a reference pattern in the phase unwrapping procedure. Projection of tilted grids for determination of the absolute coordinates is proposed in [159]. In [20] a SLM is used to project fringes for surface contouring with a time-varying spatial frequency, e.g. linearly increasing, and by temporal unwrapping the 3D coordinates are restored pixel by pixel. In [157] an exponential increase of the spatial frequency of fringes is used which enhances the unwrapping reliability and reduces the time for data acquisition and phase demodulation. In [160] temporal unwrapping is combined with digital holography. The method requires a time-coded projection which is a serious limitation. This limitation is overcome in [161], where the authors propose projection of a single FP obtained by merging two sinusoidal gratings with two different spacings 1 /f1 > 1/ f2 . The following FP is recorded: I(x, y) = IB (x, y) + I1 (x, y) + I2 (x, y) = = IB (x, y) + IV1 (x, y) cos[2πf1 x + ϕ(x, y)]
(5.18)
+IV2 (x, y) cos[2πf2 x + ϕ(x, y)] Two phase maps ϕ1,2 (x, y) are derived from the components I1,2 (x, y) that are isolated from the registered FP and multiplied by the signals cos(2πf1,2 x) and sin(2πf1,2 x) respectively. Due to the relation, f1 ϕ2 (x, y) = f2 ϕ1 (x, y), higher sensitivity is achieved, at least within non-ambiguity interval of ϕ1 (x, y). In [162] two Ronchi gratings of slightly different spacings are used for fringe generation. The small difference in spacings is a ground to conclude that at a given point (x, y) both ϕ1,2 (x, y) and their difference are monotone functions of the object depth or height, h. This allows for coarse and fine estimation of h. A multifrequency spatial-carrier fringe projection system is proposed in [22]. The system is based on two-wavelength lateral shearing interferometry and varies the spatial-carrier frequency of the fringes either by changing the wavelength of the laser light or by slight defocusing. In [163] a white-light Michelson interferometer produces the varying pitch gratings of different wavelengths which are captured and separated by a colour video camera using red, green
112
E. Stoykova et al.
and blue channels. Parallel and absolute measurement of surface profile with a wavelength scanning interferometry is given in [32]. By using Michelson and Fizeau interferometer the authors report measuring objects with steps and narrow dips. The multiwavelength contouring for objects with steps and discontinuities is further improved in [164] by an optimizing procedure for determination of the minimum number of wavelengths that are necessary for phase demodulation. A pair of coarse and fine phase diffraction gratings is used for simultaneous illumination at two angles of an object in a PS interferometric system for flatness testing. The synthetic wavelength is 12.5 mm, and a height resolution of 0.01 mm is achieved. A PS approach without phase unwrapping is described in [165]. It includes calculation of the partial derivatives to build a 2D map of the phase gradient and numerical integration to find the phase map. The method proves to be less sensitive to phase step errors and does not depend on the spatial nonuniformity of the illuminating beam and on the shape of the FP boundary. Projection of a periodic sawtoothlike light structure and the PS approach are combined in [166]. Projection of such a pattern is simpler in comparison with the sinusoidal profile. The phase demodulation procedures are described for right-angle triangle teeth and isosceles triangle teeth. The method requires uniform reflectivity of the surface. The recommendable φ is half the period of the projected pattern. 5.2.2 Absolute Coordinates Determination Projecting of two FPs with different spatial frequencies can be used for measurement of 3D coordinates as is proposed in [167]. The method is based on the generation in the (x′ , y ′ , 0) plane of fringes with spacings d1 and d2 that are parallel to the y ′ axis (Fig. 5.10). The y and y ′ axes are perpendicular to the plane of the drawing. The phase of the projected fringes is determined
Fig. 5.10. Basic set-up for absolute coordinates determination
5 Pattern Projection Profilometry for 3D Coordinates Measurement
113
as ϕ′i = 2πx′ /di , i = 1, 2. The phase is reconstructed in the xyz coordinate system, with the z axis oriented parallelly to the optical axis of the CCD camera. Angle α is the inclination angle of the illumination axis z ′ with respect to the observation axis z. The phase maps are determined by the five-step algorithm for each of the spacings. The smaller of the spacings is chosen to allow ten pixels per fringe period. The phase ∆ϕi (x, y) can be represented in the xyz coordinate system as ∆ϕi (x, y) = ϕi (x, y) − ϕ0 =
2π lx cos α + lz(x, y) sin α − ϕ0 , · di l − z(x, y) cos α + x sin α
(5.19)
where i = 1, 2; z(x, y) is the relief of the object at point (x, y), l is the distance from the object to the exit pupil of the illumination objective, and ϕ0 is an unknown calibration constant. Subtracting the obtained phase distributions and assuming, ∆ϕ2 (x, y) − ∆ϕ1 (x, y) = 2πnx,y , we obtain the expression for the coordinate z in the form nx,y (1 + x sin α) + χlx cos α d2 − d1 z(x, y) = , χ= (5.20) nx,y cos α − χl sin α d1 d2 The vertical interference fringes, generated with a collimated laser light and a Michelson’s interferometer (one mirror is mounted on a phase-stepping device) are projected on the plane (x′ , y ′ ,0). Different spacings of interference patterns are used for successive illumination of the object surface (d1 = 1 mm, d2 = 2 and 6 mm). The angle α of the object illumination is 30 deg. The wrapped phase maps at different spacings of the projected FPs are presented in Fig. 5.11. Figure 5.12 gives the 3D reconstruction of the object. The method’s sensitivity mainly depends on the accuracy with which the phase difference is measured, i.e., on the accuracy of nx,y estimation. The influence of inaccuracy in determining l and α can be neglected. The measurement accuracy increases with the difference (d1 − d2 ) and with the illumination angle α but is not uniform over the length of the object and decreases as its transverse size increases. It is interesting to compare the obtained result to the two-wavelength holographic contouring of the same object, presented in [168, 169]. In reconstruction with a single wavelength of the two-wavelength recorded hologram
Fig. 5.11. Phase maps obtained for different spacings of the projected interference patterns after median filtration and low-quality zones detection; left) d1 = 1 mm, d2 = 2 mm; right) d1 = 1 mm, d2 = 6 mm
114
E. Stoykova et al.
Fig. 5.12. Reconstructed 3D image from the difference phase map
the object’s image is modulated as a result of interference of the two reconstructed images by sinusoidal contouring fringes in normal direction separated at distance ∆z which depends on both recording wavelengths and the angle between the reference and the object beam (surface normal). A ten mW CW generating temperature stabilized diode laser, emitting two shifted at ∆λ ∼ 0.08 nm wavelengths in the red spectral region (∼ 635 nm) is used for recording of a single exposure reflection (Denisyuk’s type) hologram onto silver-halide light-sensitive material. The illumination angle is 30 deg. The image reconstructed in white light is shown in Fig. 5.13. The step between the contouring fringes is ∆z = 1.83 mm. 5.2.3 Fourier Transform Method 5.2.3.1 Basic Principle and Limitations The most common and simple way for phase demodulation from a single FP is to use Fourier transform for analysis of the fringes. Almost three decades of intensive research and application make the Fourier transform based technique a well established method in holography, interferometry and fringe projection profilometry. In two works [170, 171] published within an year in 1982 and 1983 by Takeda and co-workers, it is shown that 1D version of the Fourier transform method can be applied both to interferometry [170] and PPP [171]. Soon after that, the method gains popularity under the name of Fourier fringe analysis (FFA) [172, 173, 174, 175, 176]. For the 3D shape measurement the method
5 Pattern Projection Profilometry for 3D Coordinates Measurement
115
Fig. 5.13. Reconstructed image in white light illumination of reflection hologram
becomes known as Fourier transform profilometry (FTP). The FTP surpasses in sensitivity and avoids all the drawbacks exhibited by the previously existing conventional moir´e technique used for 3D shape measurement as the need to assign the fringe order, poor resolution, and the incapability of discerning concave or convex surfaces [177, 178]. The computer-aided FFA is capable to register a shape variation that is much less than one contour fringe in moir´e topography [171]. Some years later, the 1D Fourier transform method is extended to process 2D patterns – firstly by applying 1D transform to carrier fringes parallel to one of the coordinate axes [179, 180] and further by generalization of the method to two dimensions [175]. Actually, as has been reported in [174], the algorithm proposed in [175] has been in use since 1976 for processing of stellar interferograms. The ability of FFA for fully automatic distinction between a depression and an elevation in the object shape put the ground for automated processing in the FTP [171]. The main idea of the FFA is to add a linearily varying phase into the FP, i.e. to use in (5.1) φ(r) = 2πf0 · r, which can be done e.g. by tilting one of the mirrors in the interferometric setup or by using a diffraction grating for fringe projection. Obviously, the introduction of the carrier frequency f0 = (f0x , foy ) is equivalent to adding a plane in the phase space, as it is shown in Fig. 5.14. The expression for the recorded intensity becomes:
116
E. Stoykova et al.
Fig. 5.14. Left: pattern with open fringes; middle: 3D presentation of the phase map without carrier removal; right: pattern with closed fringes
I(r) = IB (r) + IV (r)f [ϕ(r) + 2πf · r0 ] ∞ = IB (r) + IV (r) Ap cos p ϕ(r) + 2πf · r0
(5.21)
p=1
where the dependence on time variable, t, is omitted since we consider the case of phase retrieval from a single FP. The purpose of carrier frequency introduction is to create a FP with open fringes in which the phase change is monotonic (Fig. 5.14). The further processing of (21) is straightforward and includes the steps: i) Fourier transform of the carrier frequency FP that is modulated by the object ∞ D(f) = DB (f) + Dp (f − pf0 ) (5.22)
ii)
iii) iv) v)
p = −∞ p = 0 1 with Dp (f) = F 2 IV (r)Ap ejpϕ(r) and DB (fx , fy ) = F {IB (x, y)}, where F {. . .} denotes Fourier transform and f = (fx , fy ) is the spatial frequency; selection of the fundamental spectrum that corresponds to one of the two first diffraction orders, D1 (f − f0 ) or D1 (f + f0 ), by proper asymmetric bandpass filtering; removal of the carrier frequency D1 (f − f0 ) → D1 (f); inverse Fourier transform back to the spatial domain F −1 {D1 (f)}; extraction of the phase information from the resulting complex signal Ψ(r) = 21 IV (r)A1 (r)ejϕ(r) in the spatial domain, whose argument is the searched phase: ϕ(r) = tan−1
Im[Ψ(r)] Re[Ψ(r)]
(5.23)
As is seen, introduction of the carrier frequency separates in the Fourier domain both counterparts of the fundamental spectrum from each other and from the background intensity contribution concentrated around the zerofrequency (Fig. 5.15). Due to the global character of the Fourier transform,
5 Pattern Projection Profilometry for 3D Coordinates Measurement
117
Fig. 5.15. Schematic of the Fourier fringe analysis
the phase estimate calculated at an arbitrary pixel depends on the whole recorded FP. This means that any part of the pattern influences all other parts and vice versa. The successive steps of the FFA are illustrated in Fig. 5.16. Similar to the PS technique, the FFA returns a phase value modulo 2π and needs further unwrapping. As it can be seen from (5.23), the phase is restored without the influence of the terms IB (x, y) and IV (x, y). This means that the Fourier algorithm is not vulnerable to the noise sources that create IB (x, y) as e.g. stray light from the laboratory environment, unequal intensities in the two arms of interferometer or the dark signal from the imaging system, as well as to the noise contribution in IV (x, y) as e.g. nonuniform intensity distribution of the illuminating beam, optical noise or nonuniform response of the CCD [181]. In most cases, the higher harmonics content is ignored, and the recorded pattern with open fringes looks like [173]: I(x, y) =IB (x, y) + Ψ(x, y) exp[2πj(f0x x + f0y y)] + Ψ∗ (x, y) exp[−2πj(f0x x + f0y y)]
Fig. 5.16. Single frame phase retrieval with FFA
(5.24)
118
E. Stoykova et al.
The 2D Fourier transform of (5.24) can be written in the form: D(fx , fy ) = DB (fx , fy )+D1 (fx −f0x , fy −f0y )+D1∗ (fx +f0x , fy +f0y ) (5.25) where the asterisk represents the complex conjugate. If the background, the visibility and the phase vary slow in comparison to (f0x , f0y ), the amplitude spectrum is a trimodal function with a broadened zero peak DB (fx , fy ) and D1 and D1∗ placed symmetrically to the origin. In this case, the three parts of the spectrum in (5.25) can be well isolated from each other. A 2D bandpass filter centered at (f0x , f0y ) extracts a single spectrum D1 (fx − f0x , fy − f0y ) which is shifted to the origin in the frequency domain (Fig. 5.15). The amplitude of the zero-order spectrum at each point in the frequency domain exceeds at least twice the amplitudes of the first orders, which restricts the size of the filter window to remain, roughly speaking, less than halfway between zeroand first-order maxima. If after the filtering D1 (fx − f0x , fy − f0y ) remains where it is, a tilt is introduced in the restored height distribution [182]. The inverse Fourier transform of D1 (fx , fy ) yields, at least theoretically, the complex signal, Ψ(x, y). The FTP uses optical geometries similar to those of projection moir´e topography [171]. The most common and easy for implementation is the crossed-optical-axes geometry, like the one depicted in Fig. 5.1. For the measurement with a reference plane, the phase change is determined from ∆ϕ(x, y) = Im{log[Ψ(x, y)Ψ∗r (x, y)]} [183], where Ψ∗r (x, y) corresponds to the reference plane and is obtained after the inverse Fourier transform of the filtered positive or negative counterpart of the fundamental spectrum. The necessary condition to avoid overlapping of the spectra in (5.37) if we assume without a loss of generality that the carrier fringes are parallel to the y axis, is given by [173]: 1 ∂ϕ(x, y) 1 ∂ϕ(x, y) f0x + > 0 or f0x + 1 and (f1x )min ≥ (fB )max should be satisfied (Fig. 5.15), where 2πfnx = n[2πf0x + ∂ϕ(x, y)/∂x] and (fB )max is the maximal frequency of the background spectrum. The above non-equalities entail ∂ϕ(x, y) 2πf0x ∂h(x, y) L0 ≤ or (5.27) ∂x ∂x ≤ 3d 3
5 Pattern Projection Profilometry for 3D Coordinates Measurement
119
When the height variation exceeds limitation (27), aliasing errors hamper the phase retrieval. Application of the Fourier transform technique without spatial heterodyning was proposed by Kreis [44, 176]. Applying a proper bandpass filtering to the Fourier transform D(fx , fy ) = DB (fx , fy ) + D1 (fx , fy ) + D1∗ (fx , fy )
(5.28)
ˆ 1 (fx , fy ) of D ˆ ∗ (fx , fy ) of I(x, y) = IB (x, y) + Ψ(x, y) + Ψ∗ (x, y), an estimate D 1 can be derived and the phase distribution restored from ˆ (x, y) = arg F −1 [D(fx > 0, fy > 0)] ϕ (5.29)
However, the distortions in the restored phase due to possible overlapping of D1 (fx , fy ) and D1∗ (fx , fy ) are more severe in this case in comparison with the spatial heterodyning. This technique is appropriate for objects which cause slowly varying phase modulation centered about some dominant spatial frequency. In view of the obvious relations fx (x, y) = ∂ϕ(x, y)/∂x and fy (x, y) = ∂ϕ(x, y)/∂y, the phase estimate increases monotonically along X and Y [184], and the sign of the local phase variation is not restored. 5.2.3.2 Accuracy Issues and Carrier Removal Obviously, the two possible ways to improve the accuracy of the FFA is to vary the carrier frequency or the width of the filter window in the frequency domain. To ensure monotonic phase change throughout the FP the carrier frequency should be chosen large enough; however, it may happen that the 2 −1/2 2 carrier fringe period, (f0x ) , exceeds the spatial resolution of the + f0y CCD camera. Therefore, high resolution imaging systems are required for the measurement of steep object slopes and step discontinuities [185]. Besides, introduction of the spatial carrier entails, as a rule, a change in the experimental setup which could require sophisticated and expensive equipment that may not always be available. In addition, the change in the carrier frequency could hardly be synchronized with the dynamic behaviour of the object. The width of the Fourier-plane window affects in opposite ways the accuracy of phase restoration and spatial resolution [186]. The three terms in (5.25) are continuous functions throughout the Fourier domain. If the filter width is taken too large the information from the rejected orders of the Fourier transform will leek into the processed frequency window leading to phase distortions. Decrease of the width worsens the spatial resolution. A trade-off between accuracy of phase determination and spatial resolution is required. Obviously, for the real FP that is corrupted by noise the demodulated ˆ (x, y), differs from the real phase given by (5.23). Since the phase estimate, ϕ noise covers the whole Fourier transform plane, a decrease in filter width leads to considerable noise reduction. For optimal filtering prior information on the
120
E. Stoykova et al.
noise and bandwidth of the modulating signals is required. This dependence of filter parameters on the problem to be solved makes automatic processing of the fringe patterns difficult [187]. Using of tight square profile filter window leads to ‘filter ringing’ causing distortions in the restored phase distribution [182]. Phase accuracy of approximately 0.01 fringe is obtained for a Gaussian apodization window centered at the carrier frequency [172]. The main advantage of a Gaussian filter is its continuous nature and absence of zeros in the Fourier transform. Use of a 2D Hanning window is reported in [188] which provides better suppression of noise. The background phase distribution caused by optical aberrations can be also eliminated using differential mode [181]. Substantial reduction of the background intensity can be achieved by the normalization procedure developed in [189] which includes determination of two enveloping 2D functions eb (x, y) and ed (x, y) obtained by applying surface fitting to the centre lines of bright and dark fringes. The wrapped phase of the normalized fringe pattern In (x, y) = A
I(x, y) − ed (x, y) +B eb (x, y) − ed (x, y)
(5.30)
where A and B are normalization constants, remains the same as for the non-normalized pattern, but the contribution of the background is strongly diminished. A transform-domain denoising technique for processing of speckle FPs based on the discrete cosine transform with a sliding window and an addaptive thresholding is developed in [190]. To decrease the noise influence a method to enhance the FP by modifying the local intensity histogram before the Fourier transform is proposed in [184]. Modification is based on monotonic transformation from the real intensity values to the ideal values thus removing the noise without worsening of the contrast. A background removal is proposed in [191] for the case of continuous registration of FPs by adding the patterns in series. After normalization to the grey-level range of the CCD camera, the intensity distribution of the resulting pattern gives the background estimation at high number of added patterns. The method proves to be especially efficient for low carrier frequency FPs when the zero- and first-order peaks overlap to a great extent. Improvement of the spatial resolution without loss of phase demodulation accuracy is proposed and verified in [186]. The idea is to make use of the two complementary outputs of an interferometer taking in view that the locations of constructive interference in the plane of the first output correspond to destructive interference at the second output, i.e. we have: I1 (x, y) = IB1 (x, y) + IV 1 (x, y) cos[ϕ1 (x, y) + 2π(f0x x + f0y y)] (5.31) I2 (x, y) = IB2 (x, y) − IV 2 (x, y) cos[ϕ2 (x, y) + 2π(f0x x + f0y y)] (5.32) If precautions are taken to ensure equal contrasts and gains in a perfect way while recording the two interferograms by two different cameras, the zero-order spectrum vanishes at subtraction of the Fourier spectra of both patterns. This
5 Pattern Projection Profilometry for 3D Coordinates Measurement
121
permits to increase the size of the window of the filter applied to the firstorder spectrum and the spatial resolution respectively by a factor of 2. The FFA method with two complementary interferograms is very useful for images with high spatial frequencies in which the fundamental spectrum is not well localized or for the case of undersampling [181]. Elimination of the zero-order by registration of two FPs phase shifted at π using a defocused image of a Ronchi grating is proposed in [192]. The authors reported contribution of the higher orders to be 25% in comparison with the fundamental spectrum. Projection of a quasi-sinusoidal wave and π phase shifting technique increase the acceptable height variation to |∂h(x, y)/∂x| ≤ L0 /d [183]. A modification of the FFA which makes it suitable for a special class of closed FPs is proposed in [173]. The goal is achieved by transforming the closed FP into an open FP in a polar coordinate system using x = X + r cos θ, y = Y + r sin θ where X, Y are the coordinates of the center of the closed FP in the Cartesian coordinate system and the point (X,Y) is chosen for the origin of the polar coordinate system. The FP in the r-θ space consists of straight open fringes that permit application of the conventional FFA. However, this is true only for a concave or convex phase surface with the origin of the polar coordinate system coinciding with the apex of the wavefront. The phase retrieved in the r-θ space is transformed back to the Cartesian coordinate system and the phase map of the closed-fringe pattern is recovered. The Fourier transform is calculated using a discrete Fourier transform (DFT). Using of DFT leads to the so-called leakage for frequencies that are not integer multiplies of (1/Nx ∆x , 1/Ny ∆y ) [193]. Several authors point out that the error induced by the leakage effect in the retrieved phase is inevitable due to the discretization of the image by the CCD and non-integer number of fringes within the image [172, 191, 193, 194]. The distortions caused by the leakage are negligible if the carrier frequencies ensure integer number of fringes within the image and the object height distribution is concentrated also within the image [194], i.e. no phase distortions occur at the image boundaries. To avoid the leakage effect when large objects are monitored with a non-vanishing height at image borders, a method is proposed in [194] in which the full image is divided in overlapping subimages by a window that slides along the axis normal to the carrier fringes, e.g. axis X. The window width is chosen approximately equal to one fringe period. If this width is NW , Nx − NW , consecutive images are processed. The next step is to apply Fourier transform successively to all rows parallel to the X-axis of each subimage, thus achieving a local phase demodulation. The sliding pace is one discretization step per subimage which in practice ensures phase recovery at each point of the image and explains why the method is called interpolated or regressive 1D Fourier transform [194]. Briefly, the fringe pattern in each subimage I(xk , . . . , xk+NW , yl ) is modelled by a singlefrequency sine-wave Ik,l (x) = Ak,l sin(2πfk,l x + ϕk,l ) with frequency fk,l and phase ϕk,l connected to the height in the point (k∆x , l∆y ). The Fourier
122
E. Stoykova et al.
transform of the sine-wave leads to a set of two non-linear equations which when solved for the two largest Fourier coefficients yield the required frequency fk,l and phase ϕk,l [194]. The frequencies evaluated by the proposed approach are not limited to the frequencies of the Fourier transform and no leakage occurs. The necessity to find out the two largest spectral lines of the locally computed FFT involves sorting operations which increases slightly the computational burden. Sine-wave modelling gives good results in image regions without abrupt phase changes, i.e. for smooth objects without height discontinuities. The discrete nature of the Fourier spectrum may cause distortions in the recovered phase at the step of removal of the heterodyning effect. If the sampling interval in the frequency domain is considerably large, it is difficult to translate the positive or negative component of the fundamental spectrum by exactly (f0x , f0y ) to the origin. If the bias error in the shifted position of the fundamental spectrum is (δf0x , δf0y ), the retrieved phase is given by ˆ (x, y) = ϕ(x, y) exp {−2πj(xδf0x + yδf0y )} ϕ
(5.33)
with |δf0x,y | ≤ 0.5∆f0x,y , where ∆f0x and ∆f0y give the resolution in the frequency domain. The modulation of the true phase may lead to considerable phase shifts in some parts of the object. Distortions in the recovered phase due to the discrete nature of the Fourier spectrum are studied in [175, 180]. The approach proposed there relies on background and carrier frequency evaluation by the least-squares fit of a plane in the part of the recorded image that is not affected by the object. The evaluated phase plane is subtracted from the retrieved phase in the spatial domain. However, this approach is rather cumbersome due to inevitable dependence on the proper choice of the objectfree area. An efficient approach is proposed in [195] to evaluate the phase map ψ(x, y) = 2πf0x x + ϕ(x, y) from the FP by computing the mean value of its first phase derivative along the X-axis ∂[ψ(x, y)] ∂ϕ(x, y) ′ ¯ (x, y) = ψ (5.34) = 2πf0x + ∂x ∂x S S where () denotes averaging over the entire image S. It is reasonable to S assume that the expectation of the derivative is given by 2πf0x . Thus subtraction of the estimate of the mean value (5.34) from the 2D map of the first phase derivative along the X-axis is expected to yield the first derivative of the phase modulation caused by the object. The first derivative is calculated as a difference of the phase values at two adjacent pixels, ψ(xi+1 ) − ψ(xi ). In [196] carrier removal is performed using an orthogonal polynomial curve fitting algorithm. For the purpose, intensity distribution along one row parallel to e.g. X-axis is modelled by a sine-wave whose Fourier transform, Fs (ω), can be represented theoretically by [196]: Fs (ω) =
a∗ a + jω − ζ jω − ζ ∗
(5.35)
5 Pattern Projection Profilometry for 3D Coordinates Measurement
123
where ζ is the pole of Fs (ω). By fitting Fs (ω) with an orthogonal polynomial and using a least-square approach, an estimate of the carrier frequency can ˆ = ζˆ. The algorithm is based on the assumption be obtained from (5.35) as ω that the carrier frequency is the same throughout the whole image. To find the carrier frequency [191] makes use of the sampling theorem applied to the amplitude of the Fourier transform in the spatial frequency domain. Using the interpolation formula [191]: |D(fx , fy )| = |Dmn | sin c[π(fx′ − m)] sin c[π(fy′ − n)] , m,n
′ fx,y = fx,y (∆fx,y )−1 (5.36)
with Dmn = D(m∆fx , n∆fy ), one is able to calculate precisely the carrier frequency. An important drawback of carrier removal based on a frequency shift or by applying techniques described in [191, 195, 196] is inability to remove possible non-linear component of the carrier frequency. Such situation is encountered when divergent or convergent illumination is used for grating projection on the large- or small-scale object which yields a carrier FP with a non-equal spacing, for which the carrier removal by frequency shift fails [197]. To deal with this case Takeda et al proposed in [171] to use a reference plane. This solution entails implications such as the need of two measurements as well as the careful adjustment of the reference plane and increases the overall uncertainty of the measurement. Srinivasan et al. [198] propose a phase mapping approach without a reference plane. There have been developed methods that directly estimate a phase-to-height relationship from the measurement system geometry without estimating the carrier frequency. A profilometry method for a large object under divergent illumination is developed in [199] with at least three different parallel reference planes for calibration of the geometrical parameters of the system. The calibration permits to convert directly the phase value composed of both the carrier and shape-related components to a height value. However, high accuracy of determination of the geometrical parameters is required which makes the process of calibration very complicated. A general approach for the removal of a nonlinear-carrier phase component in crossed-optical-axes geometry is developed in [200] for divergent projection of the grating with a light beam directed at angle α to the normal to the reference plane and a CCD camera looking normally at it. If for this optical geometry the carrier fringes are projected along the Y-axis, the phase induced by them depends on x-coordinate in a rather complicated way [200]
x 0
x φ(x) = 2π f0x (u)du = 2πpL1 H 0
−1
(L2 + u sin β)
[H 2 + (d + u)2 ]1/2 du + φ(0)
(5.37)
124
E. Stoykova et al.
where φ(0) is the initial carrier phase angle, p is the grating pitch, β is the angle between the grating and the reference plane and L1 , L2 , H and d are distances characterizing the optical geometry. The authors propose to use a power series expansion for φ(x): φ(x) =
∞
an xn
(5.38)
n=0
and to determine the coefficients a0 , a1 , . . ., an . . . by a least-squares method minimizing the error function: Ω(a0 , a1 , . . . , aN ) = [a0 + a1 x + . . . + aN xN − ϕ(x, y)]2 (5.39) (x,y)∈S
where S comprises all image points, ϕ(x, y) is the unwrapped phase, and the number N ensures an acceptable fit to ϕ(x, y). The method is generalized for carrier fringes with an arbitrary direction in the spatial plane. The phaseto-height conversion becomes much simpler at successful elimination of the nonlinear carrier. Reliability of the FFA is thoroughly studied in [193] at all steps involved in the phase demodulation by means of 1D model of an artificial ideal noisefree open-fringes sinusoidal FP with constant magnitudes of IB (r), IV (r) and ϕ(r) throughout the image. The purpose of analysis is to identify only the errors inherent in the FFA. The filter used in the spatial frequency domain is a rectangular window apodized by a Gaussian function. As a result, an improved formula for phase derivation from the complex signal Ψ(x, y) is proposed. One of the most serious problems of the FTP arises from objects with large height discontinuities that are not band-limited which hinders application of the Fourier analysis. In addition, discontinuous height steps and/or spatially isolated surfaces may cause problems with the phase unwrapping [158, 201]. A modification of FFA, proposed in [182], makes unnecessary phase unwrapping by simple elimination of any wraps in the calculated phase distribution. This is achieved by proper orientation of the projected fringes and by choosing independently the angle between the illumination and viewing directions, θ, and the fringe spacing, L, in a way as to fulfil the requirement h(x, y)L−1 sin θ ≤ 1 after removal of the carrier frequency. Obviously, the method is efficient only for comparatively flat objects at the expense of decreased resolution. A two-wavelength interferometer is developed in [202]. The recorded pattern is given by I(x, y) = IB (x, y) + IV 1 (x, y) cos[2πf1x + ϕ1 (x, y)] +IV 2 (x, y) cos[2πf2x + ϕ2 (x, y)],
(5.40)
where ϕ1,2 = 2πh(x, y)/λ1,2 and f1,2x are inversely proportional to both used wavelengths. The Fourier transform of the recorded pattern yields
5 Pattern Projection Profilometry for 3D Coordinates Measurement
D(fx , fy ) = DB (fx , fy ) +
2
125
[D1k (fx − f1x , fy ) + D1k∗ (f0x + f1x , fy )] (5.41)
k=1
Two first-order spectra are selected, e.g. D11 and D12∗ , by bandpass filtering and shifted towards the origin of the coordinate system. After the inverse Fourier transform one obtains: Ψ(x, y) = IV (x, y) cos[Ξ(x, y)] exp[jΓ(x, y)]
(5.42)
where precautions are taken to provide IV (x, y) = IV 1 (x, y) = IV 2 (x, y) and Ξ(x, y) = πh(x, y)/Λ1 and Γ(x, y) = πh(x, y)/Λ2 . Here Λ1 = λ1 λ2 /(λ1 + λ2 ) is the average wavelength, and fx = aωR0 is the synthetic wavelength. As it could be seen, the phase modulation using the synthetic wavelength substantially increases the range of the interferometric measurement without need for phase unwrapping. Correct restoration of the 3D object shape and accurate phase unwrapping across big height variations and surface discontinuities can be done using multiple phase maps with various sensitivities. A method called spatial-frequency multiplexing technique was proposed in [203]. The proposed idea is extended in [204] to a technique termed multichannel FFA. The key idea of the method is that phase discontinuities which are not due to the processing algorithm but to surface discontinuities would appear at the same location on FPs generated with differing carrier frequencies. These FPs can be projected simultaneously on the object surface if FFA is used [203]. The spectra that correspond to the used multiple FPs are separated in the frequency space by means of a set of bandpass filters tuned to the carrier frequencies of the fringes. A FFA interferometric technique for automated profilometry of diffuse objects with discontinuous height steps and/or surfaces spatially isolated from one another is designed and tested in [201]. It makes use of spatiotemporal specklegrams produced by a wavelength-shift interferometer with a laser diode as a frequency-tunable light source. Necessity to record and process multiple FPs under stringent requirement for vibration-free environment is the main drawback of the developed approach. Phase demodulation and unwrapping by FFA for discontinuous objects and big height jumps obtains further development in [205]; the merit of the work is that all necessary information is derived from a single FP. This is achieved by combining spatial-frequency multiplexing technique with Gushov and Solodkin unwrapping algorithm [206]. The FP projected on the object consists of multiple sinusoids with different carrier frequencies: I(x, y) = IB (x, y) + IV (x, y)
K
cos[ϕk (x, y) + 2π(fkx x + fky y)]
(5.43)
k=1
Defining a set of simultaneous congruence equations for the real height distribution and height distributions corresponding to wrapped phase maps ϕk (x, y)
126
E. Stoykova et al.
and using the Gushov and Solodkin algorithm, phase unwrapping can be done pixelwise. A coaxial optical sensor system is described in [207] for absolute shape measurement of 3-D objects with large height discontinuities and holes without shadowing. This is achieved by a depth of focus method with a common image plane for pattern projection and observation. The FFA is applied for evaluation of the contrast, IV (x, y). The absolute height distribution is determined from the translation distance of the image plane that ensures a maximum fringe contrast at each pixel like in a white-light interferometry. Absolute phase measurement using the FFA and temporal phase unwrapping is developed in [188]. Use of a four-core optical fibre for pattern projection and 2D Fourier analysis is demonstrated in [208]. The projected FP is formed as a result of interference of the four wave fronts emitted from the four cores located at the corners of the square. In its essence, the FTP is based on determination of the quadrature component of the signal, i.e. it is described by approximation of the Hilbert transform. Kreis [209] is the first who applies a 2D generalized Hilbert transform for phase demodulation. However, the discontinuity of the used Hilbert transform operator at the origin leads to ringing in the regions where the phase gradient is close to zero. Accuracy of this approximation depends on the bandwidth of the processed signal. Contrary to the traditional opinion that it is not possible to find natural isotropic extension of the Hilbert transform beyond one dimension and to apply the analytic signal concept to multiple dimensions, a novel 2D quadrature (or Hilbert) transform is developed in [210, 211] as a combined action of two multiplicative operators: two-dimensional spiral phase signum function in the Fourier space and an orientational phase spatial oper˜ r ) = I(r) − IB (r) obtained after ator. The quadrature component of the FP I( the removal of background is obtained from the approximation [I( ˜ r )]} j exp[jθ(r)]IV (r) sin[ϕ(r)] ∼ (5.44) = F −1 {S(f)F
where θ(r ) is the fringe orientation angle, and S(f) is the 2D spiral phase signum function fx + jfy S(f) = (5.45) fx2 + fy2
The new transform shows effective amplitude and phase demodulation of closed FPs. A vortex phase element has been applied in [212] for demodulation of FPs. The phase singularity of the vortex filter transforms the FP into a pattern with open fringes in the form of spirals which allows for differentiating between elevations and depressions in the object. 5.2.4 Space-frequency Representations in Phase Demodulation 5.2.4.1 Wavelet Transform Method The FFA as a global approach exhibits unstable processing for patterns with low fringe visibility, non-uniform illumination, low SNR as well as in the
5 Pattern Projection Profilometry for 3D Coordinates Measurement
127
presence of local image defects which influence the entire demodulated phase map [213]. Since the spatial gradient of the phase is proportional to the fringe density, information about the latter is a step towards phase demodulation. This observation paves a ground for introduction of space-frequency methods in the fringe analysis. New phase retrieval tools have been applied as the windowed Fourier transform (WFT) [214, 215] and the continuous wavelet transform (CWT) [216]. The wavelet transform is a method that can detect the local characteristics of signals. This explains extensive research in wavelet processing of FPs and interferograms during the last decade [217, 218]. It could be very useful for patterns with great variation of density and orientation of fringes – the case in which the standard FFA fails [28]. The other methods that are capable to ensure localized phase retrieval, as the WFT or the regularised phasetracking algorithm generally would require a priori information about the fringe density and orientation. In the wavelet transform analysis it is not necessary to choose the filter in the frequency domain. An extensive review of the wavelet transform can be find in [219]. The idea to apply the CWT to the 2-D fringe data is proposed independently in [220] and [213]. The CWT can be applied both to open and closed fringes. The CWT shows promising results as a denoising tool for interferograms in holography and speckle interferometry [221] and as a method to improve bad fringe visibility in laser plasma interferometry [213]. In whitelight interferometry, CWT proves to be very effective for detecting the zero optical path length [222]. The wavelets can be very useful for finding the zones with a constant law of variation of the fringes [223]. The CWT of a 1D function I(x) is defined as Φ(a, b) =
∞
−∞
I(x)ψ∗a,b (x)dx
=
√
a
∞
−∞
ˆ (afx )D(fx ) exp(jbfx )dfx ψ
(5.46)
, a = 0 and b are scaling and translation where ψa,b (x) = |a|−1/2 ψ x−b a ˆ (x) = F [ψ(x)]. The kernel of the transform is parameters which are real, and ψ a single template waveform, the so-called mother wavelet ψ(x) which should satisfy the admissibility condition [219], in order to have a zero mean and should present some regularity to ensure the local character of the wavelet transform both in the space and frequency domains. The above conditions mean that the wavelet can be considered as an oscillatory function in the spatial domain and a bandpass filter in the frequency domain. The scaling factor entails the change of the width of the analyzing function thus making possible analysis of both high-frequency and low-frequency components of a signal with good resolution. Usually, the mother wavelet is normalized to have a unit norm [219]. The wavelet transform decomposes the input function over a set of scaled and translated versions of the mother wavelet. The Fourier transform of the daughter wavelet ψa,b (x) is
128
E. Stoykova et al.
ˆ (afx ) exp(−jbfx ) ˆ a,b (fx ) = |a|1/2 ψ ψ
(5.47)
ˆ (fx ) is the Fourier transform of the mother wavelet. where ψ The wavelet transform Φ(a, b) plotted in the spatial coordinate/spatial frequency space gives information about the frequency content proportional to 1/a at a given position b. In case of the FPs the translation parameter b follows the natural sampling of the FP, given by the pixel number, b → n, n = 0, . . . , N , where N is the total number of pixels. The parameter a ∈ [amin , amax ] is usually discretized applying a log sampling a = 2m , where m is an integer. Finer sampling is given by [219, 224]: ψνn (x) = 2−(ν−1)/Nν ψ(2−(ν−1)/Nν x − n), ν = 1, . . . , Nν
(5.48)
where the fractional powers of 2 are known as voices and the spatial coordinate x is also given in the units of a pixel number. One should make a difference between the CWT and the discrete wavelet transform which employs a dyadic grid and orthonormal wavelet basis functions and exhibits zero redundancy. The modulus of the CWT |Φ(a, b)|2 is the measure of a local energy density in the x − fx space. The energy of the wavelet ψa,b (x) in the x − fx space is concentrated in the so-called Heisenberg box centered at (b, η/a) with lengths aΛx and Λf /a along the spatial and frequency axis respec∞ 2 ∞ 1 ˆ (fx )|2 dfx = Λ2f and x |ψ(x)|2 dx = Λ2x , 2π (fx − η)2 |ψ tively, where 1 2π
∞ 0
−∞
0
ˆ (fx )|2 dfx = η. The area of the box, Λx Λf , remains constant. The fx |ψ
plot of I(x) = IB (x) + IV (x) cos[2πf0x (x)x + ϕ(x)] as a function of position and frequency is called a scalogram [215, 216]. The huge amount of information contained in the CWT Φ(a, b) could be made more condensed if one considers the local extrema of Φ(a, b). Two definitions of CWT maxima are widely used: i) wavelet ridges used for determination of the instantaneous frequency and defined as d(|Φ(a, b)|2 /a) = 0; (5.49) da ii) wavelet modulus maxima used to localize singularities in the signals and defined as d|Φ(a, b)|2 = 0; (5.50) db The choice of a proper analyzing wavelet is crucial for the effective processing of the FPs. The most frequently used in interferometry and profilometry wavelet is the truncated form of the Morlet wavelet which is a plane wave modulated by a Gaussian envelope 2 x (5.51) ψ(x) = π−1/4 exp(jω0 x) exp − 2
5 Pattern Projection Profilometry for 3D Coordinates Measurement
129
It is well-suited for processing of pure sinusoids or modulated sinusoids; ω0 is the central frequency. The correction term exp(−ω20 /2) which is introduced in the complete form of the Morlet wavelet ψ(x) = π−1/4 exp(jω0 x − ω20 /2) exp(−x2 /2) to correct for the non-zero mean of the complex sinusoid is usually neglected as vanishing at high values of ω0 . Usually ω0 > 5 is chosen ˆ (fx = 0) ≈ 0 [221], which means that the Morlet wavelet has five to ensure ψ ‘significant’ oscillations within a Gaussian window. The Morlet wavelet provides high frequency resolution. The Fourier transform and energy spectrum of the Morlet wavelet are √ √ √ (fx − ω0 )2 4 ˆ (fx ) = 2 π exp − ˆ (fx )|2 = 2 π exp −(fx − ω0 )2 ψ and |ψ 2 (5.52) For complex or analytic wavelets the Fourier transform is real and vanishes for negative frequencies. So, the Morlet wavelet removes the negative frequencies and avoids the zero-order contribution [221]. The Morlet wavelet produces a bandpass linear filtering around the frequency ω0 /a. Two other wavelets used in fringe analysis are the Mexican hat wavelet [216] and the Paul wavelet of order n [225]. The apparatus of the wavelet ridges can be used for phase demodulation of FPs, if the analyzing wavelet is constructed as ψ(x) = g(x) exp(jω0 x), where g(x) is a symmetric window function and ω0 > 2Λf , i.e. ψ(x) practically rejects negative frequencies. The CWT of the AC component of one row or column of the FP I(x,y) takes a form: Φ(a, b) =a
−1/2
∞
IV (x) cos ϕ(x)g
−∞
x−b a
ω 0 exp −j (x − b) a
dx =Z(ϕ) + Z(−ϕ)
(5.53)
where a−1/2 Z(ϕ) = 2
∞
−∞
IV (x + b) exp[jϕ(x + b)]g
ω0 ! x! exp −j x dx a a
(5.54)
It is difficult to solve analytically the integral in (5.54), but in the case of small variation of visibility and phase of the fringes over the support of the analyzing wavelet ψa,b , we can use a Taylor series expansion of IV (x) and ϕ(x) to the first order to simplify (5.54). IV (x + b) ≈ IV (b) + xIV′ (b), ϕ(x + b) ≈ ϕ(b) + xϕ′ (b)
(5.55)
where IV′ (b) and ϕ′ (b) are the first order derivatives of IV and ϕ(b) with respect to x. Taking in view the symmetric character of g(x) and the condition ω0 > 2Λf , one obtains [216]:
130
E. Stoykova et al.
√ ω a 0 IV (b)ˆ − ϕ′ (b) exp[jϕ(b)] g a Φ(a, b) ≈ Z(ϕ) = 2 a
(5.56)
To make contributions of the second order terms negligible, the following nonequalities should be fulfilled for the second order derivatives [216]: ′′ ω02 |IV′′ (b)| 2 |ϕ (b)| 1100 cm/J are obtained. This characteristic is strongly dependent on the grain size (increasing the grain size lead to higher sensitivity) and inversely proportional to the spatial resolution and the noise level. The highest values of spatial resolution reach 10000 lines/mm [17]. The refractive index modulation is ∼ 0.02. Another excellent feature is the perfect stability of the recorded holograms [18]. The holograms exhibit both amplitude and phase modulation. Additional possibility in order to obtain diffraction efficiencies up to 100% and pure phase recording is to bleach the holograms, but the “price” is a significant increase of the noise level. Recording materials revealing phase modulation and high diffraction efficiency, without the bleaching process and the high noise levels are the dichromated gelatines. Anyways they are medium for permanent holographic recording and also require “wet” post-process. Nevertheless, as mentioned in the introduction, the recent state of art in diffractive and holographic optics requires materials without the “wet” processing which presents in silver halide emulsions (as well as in dichromated gelatine) recording process. Series of the contemporary applications including the 3D display require dynamic media. As a consequence, there is an increasing necessity to substitute silver halide emulsions. This has resulted in tremendous efforts in the development of holographic recording materials ranging from crystals as doped LiNbO3 , LiTaO3 , KnbO3 and Sn2 P2 S6 to different types of thermoplastics, photopolymers, azobenzene polyester films, liquid crystals and composite materials [15, 19]. 16.3.1 Polymeric Recording Materials The first approach in order to substitute silver halide emulsions is to use polymeric recording media. Photopolymers have been used in practice for almost two centuries. From the middle of XIX century, the so-called photoresists find application in polygraphy. Due to this technology, printed plates for electronic devices have been produced since 1940. Later (in 1960), the first integrated circuit is created. This initiates the rapid growth in microelectronics as well one of its bases – the photolithography. Nowadays holographic technology is one of the most promising areas of photopolymers application. The polymeric layers are extremely attractive to form various in shape and functions diffractive optical elements. The photoinduced changes in polymeric recording media can be a consequence of various photochemical reactions of electron-excited molecules and the subsequent physical and chemical processes. These conversions are limited to the illuminated areas of the material and their degree is dependent on the
16 Materials for Holographic 3DTV Display Applications
563
activating light intensity. The rate of these changes determines modulation of the optical parameters. Usually one or two molecules take part in the primary reactions. In the simplest case, the reaction is recognized as monomolecular. The parameter quantum efficiency (Φ) is introduced to describe the process. It represents the ratio of the reacted molecules and the absorbed photons. The change in the concentration c of the photosensitive molecules in optical thin layer is described by the differential equation [20]: dc = −I0 σΦcdt, where I0 is the light intensity. The concentration decreases exponentially during the exposure: c = c0 exp(−I0 σΦt) = c0 exp(−σΦH), H = I0 t is the exposure, measured in number of photons per unit area of the layer for the time of the exposure (also called quantum exposure). If the 2 exposure is measured in radiation power per unit area (W/cm ) the equation acquires the form: I0 1 c = c0 exp − σΦt = c0 exp − σΦH , hν hν here ν is the activating radiation frequency. The above-described kinetics (the last two equations) is valid for monomolecular reactions. However, it is also applicable for bimolecular reactions when the excited molecule interacts with the surrounding particles situated in immediate vicinity. Such reaction is called pseudo-monomolecular and takes place in significantly higher concentration of the second reagent in comparison with the one of the photosensitive molecule. The primary photochemical reaction changes the qualitative composition of the material, influencing at the same time the optical properties by itself. Their alteration is consequence of both the polymer matrix modifications and the subsequent dark processes like diffusion of unreacted spices, different kinds of relaxation processes etc. Different types of photo-modifications are: • • • •
Crosslinking; Photodestruction; Photoisomerisation; Photopolymerization.
Crosslinking is intermolecular bonds formation leading to insolubility of the illuminated areas of the polymer. During the photodestruction, the length and weight of the polymer chain is decreased, which results in increase of the solubility. The photoisomerization is usually accomplished by cis-trans conformation of azo-derivative compounds. The primary photochemical reaction can
564
K. Beev et al.
initiate polymer formation from low-weight compounds – monomers. In this case the primary photomodification is accomplished by the molecules of the photoinitiator, which form chemically reactive particles with unpaired electrons – free radicals. The interaction of the radical with a monomer molecule initiates chain polymerization reactions, creating molecules with hundreds and thousands units. Different types of polymer recording media depending on the chemical mechanism are described. Among them are the so-called photoresists, photochromic azopolymers, anthracene-containing and photopolymers. The photoresists change their solubility in some solvents after illumination. Two types – positive and negative photoresists are recognized, depending on which area is dissolved after exposure – respectively the illuminated or dark one. The first type is comprised of polymers solvents containing compounds increasing the solubility of the polymer molecules under light illumination. Such media are the phenol-formaldehyd resins. The negative photoresists under short wavelength light exposure exhibit chemical double bonds breaking, which later crosslinks, forming bigger molecules insoluble in some organic solvents like xylene, benzine and others. Both types are used for holographic recording. The anthracene-containing polymers exhibit the mechanism photodimerization leading to formation of derivative molecules in excited state, which form pairs with a molecule in basic state. In consequence, both photochromism and photorefraction occur, due to changes respectively in the absorption wavelength and the molecular polarizability (see below). A wide class of organic holographic recording media are the azo materials. The specific element in their structure is the azo-group, which can be only one or more. The azo-groups are two phenyl rings connected with double nitrogen bond (−N = N −). Molecules with high value of the photoinduced anisotropy are obtained on the base of azo-dyes. The azo-group exists in two configurations (Fig. 16.1) – trans and cis. The first one is more stable. The trans-cis isomerization is accomplished under light illumination, while the reverse transformation – the cis-trans can be also performed by a thermal relaxation (or optically – at other wavelength). The change in the absorption wavelength of the two isomers mechanism is referred as photochromism.
h
N
N N
h ’, kTB
N
Fig. 16.1. Trans-cis isomerization of azobenzene molecules
16 Materials for Holographic 3DTV Display Applications
565
A large class of organic materials exhibits such changes. For the aim of the holographic recording, azobenzene liquid crystalline and amorphous polymers exhibiting photo-isomerization and surface relief creation mechanisms are elaborated. In azobenzene polymers and liquid crystalline materials the azo-group can be connected to the molecular chain and the izomerization leads to change in the refractive index [21]. There are also materials exhibiting amorphous-liquid crystalline phase transition in consequence of the cis-trans isomerisation [22]. In particular, photopolymers are materials exhibiting polymerization process mechanism of recording. The photopolymers possess two irrefutable advantages. On the one hand, due to the chain character of the reactions, they have high quantum efficiency, resulting in high sensitivity values. On the other hand, the activating radiation does not interact directly with the monomers but with added photoinitiator molecules. This allows spectrum shift far from the initial monomer absorption peaks in order to sensitize the material to a proper laser wavelength. The photopolymerization is a chemical process with separate molecules (monomers) connection under light illumination, resulting in alteration of the mechanical and optical parameters of the media. The volumetric refractive index and/or the thickness of the layer (surface relief creation) are changed. Thus, phase modulation occurs. The whole process is “dry”. Generally the recording process is accomplished by the following mechanism. Polymerization takes place during exposure in the light regions of the interference pattern. Its rate is proportional to the light intensity. A dark process goes after, leading to uniform redistribution of the unreacted monomer all over the layer. The last stage is fixing – the whole area is illuminated with uniform UV light. Higher densities and refractive index are obtained in the areas where the initial polymerization has taken place along with the subsequent mass transfer. This process in a lot of cases does not require dark process and postfixing. Different systems with two monomers are developed in order to realize one-step processing. Such systems are the two-component materials described below. Also polymer binder is used in order to obtain single step recording [23]. The photopolymerization process can be accomplished by three different mechanisms [24]. The chemical reaction can consist of double bond opening by cationic, anionic or free radical mechanism. This type is referred to addition or chain polymerization due to the fast process kinetics. In this case one of the double bond ends become chemically active and can link covalently to other molecule. As a consequence, the double bond of the other molecule becomes activated after the covalent reaction and reacts with another monomer. Thus the process repeats itself until the reaction is terminated. Such monomers are derived from the acrylate, methacrylate or vinyl families. If multifunctional acrylates or vinyl monomers (containing more than one reactive group) are used, the resulting network is highly crosslinked.
566
K. Beev et al.
Other type of monomers can react with only one other molecule, i.e. it is necessary to have at least two functional groups in order to form polymer network. To obtain crosslinking, again multifunctional reactants are required. In lots of cases, the process involves a loss of small molecule as a reaction product. The step-polymerization kinetics is very different from the addition one. The molecular weight during the polymerization process increases gradually in contrast to the very rapid chain growth in the case of addition reactions. For example, such reactions take place in the polyurethanes formations from diols with diisocyanates reactions. Also each alcohol can react with one isocyanate in order to link two molecules, until the additional alcohol (or isocyanate) is exhausted. The third type of polymerization chemistry is connected to ring-opening reactions. In this case, one of the reactants has to contain cyclic structure. Such reactions can lead to highly crosslinked network formation. Common reactants in such reactions contain epoxide groups – three-membered ring with an oxygen atom (as a member of the ring). The ring can be opened in presence of nucleophilic groups, like thiol. Simultaneously with the ring opening the oxygen atom is picking up hydrogen to form an alcohol. The molecular conversions lead to changes of the density and the refractive index of the material. Analysis of the refractive index modulation dependence on the monomer-polymer conversion is presented below. In fact, after all phototransformations exhibited by polymer materials (as well as after their eventual post-processing), they undergo to some extend refractive index changes [13]. If this change ∆n reaches values ∼ 10−4 or higher, the material is considered to possess photorefractive properties. In order to analyze these properties, the Lorentz-Lorenz formula describing the refractive index n of a mixture of particles is applied. It is convenient to use the following form: n2 − 1 = Ri ci n2 + 2 i According to this correlation, n is determined by the concentration of the material components ci and their refraction Ri . The refraction represents a characteristic of the particle contribution to the refractive index of the material. It is proportional to the molecular polarizability αi : Ri =
4π αi 3
The bigger the change in the polarizability of the photoproduct molecules is, the higher difference in the refractive index is obtained. On account of this, the photorefraction can be consequence of processes leading to such alteration of the qualitative composition, which is accompanied with change in the polarizability of the components. Another reason to arise photorefraction modulation is connected to the concentration alteration of the components ci , leading to density (ρ) changes,
16 Materials for Holographic 3DTV Display Applications
567
which is the case exhibited by the photopolymers. Such case can be examined by the Lorentz-Lorenz formula. If the i-type particles number in the volume of the photosensitive material with mass m is Ni , then: Ri Ni ρ. Ri ci = m i i
It can be considered that under the activating illumination, a conversion form the k- to the l-component takes place. The indexes k and l refer the initial and the photoproduct molecules respectively, which concentrations change due to the photochemical conversions. It is accompanied by mechanism of density (ρ) changes in the material. Then: 4 5 Ri ∆ (Ni ρ), Ri ∆ci = Ri ci = m i i i ∆ (Ni ρ) = Ni (H)ρ(H) − Ni ρ = (Ni + ∆Ni )(ρ + ∆ρ) − Ni ρ = ∆Ni ρ + (Ni + ∆Ni )∆ρ, Ri Ri Rk ∆ (Ni ρ) = Ni ∆ρ + ∆Nk (ρ + ∆ρ) m m m i i +
Rl ∆Nl (ρ + ∆ρ). m
Since the quantity of the formed product is Np = ∆Nl = −∆Nk (∆Nk < 0), the above equation can be written in the next manner: 4 5 Ni Np ∆ (ρ + ∆ρ)(Rl − Rk ) Ri ∆ρ + Ri ci = m m i i ∆ρ(H) Ri ci = + cp (H)∆R, ρ i where ∆ρ(H) is the density alteration caused by the exposition H, cp (H) – the concentration of the photoproduct and ∆R = Rl − Rk – the change of the activating particles refraction due to the photoreaction. From the LorentzLorenz we can write: 2 n −1 6n∆n(H) ∆ ≈ 2 . n2 + 2 (n2 + 2) As a result, the refractive index dependence on the molecular conversion and the density changes can be expressed as [20]: , 2 + 2 n +2 ∆ρ(H) + cp (H)∆R Ri ci ∆n(H) = 6n ρ i 2 2 2 n +2 n + 2 n2 − 1 ∆ρ(H) + cp (H)∆R = 6n ρ 6n
568
K. Beev et al.
Unfortunately, the photopolymerization is not a local process. It propagates beyond the illuminated volume boundaries. In the less illuminated or dark regions the polymerization process is accelerated or initiated by diffused light, thermal reactions as well as radicals diffusion. As a consequence, the refractive index modulation (∆n) diminishes with the spatial frequency increase. Another problem is connected to the layer shrinkage, originating from the close packing of the molecules accompanied by the polymerization process. It results in undesired blue shift of the Bragg wavelength during recording. These problems are nowadays limited by the supplementary compounds addition. They do not take part in the polymerization process i.e. they are neutral compounds in the recording process. The elaboration of such composites along with other relatively new dynamic materials is described in the part 4 of the Chapter, presenting the recent recording media development. 16.3.2 Photorefractive Crystals The described above materials represent a major part of the organic recording media. The efforts to develop non-silver holographic material are also directed to elaboration of holographic inorganic crystals. They have been extensively studied in order to achieve similar characteristics to silver halides along with reversibility of the recording process. Thus, the photorefractive crystals become a very wide studied class of the recording materials. The photorefractive effect is referred to refractive index spatial modulation under nonuniform illumination through space-charge-field formation [25] and electro-optical nonlinearity. The effect is consequence of drift or diffusion separation of photogenerated by spatially modulated light distribution charge carriers, which become trapped and produce nonuniform space charge distribution. The more mobile charges migrate out of the illuminated region owing to the photovoltaic effect and ultimately are trapped in the dark regions of the crystal [26]. The resulting internal space-charge electric field modulates the refractive index. This effect was first discovered in 1966 as an optical damage mechanism in electro-optical crystals [27]. Soon after, the photorefraction was recognized as potentially useful for image processing and storage [28]. Most applications and quantitative analysis of photorefractive materials nowadays are connected to holographic technique. The charge migration in the case of two coherent beams overlapping results in sinusoidal space-charge field that modulates the refractive index. Thus a refractive index modulation (grating) is obtained forming a read-write-erase capable hologram. This feature is due to the possibility to erase the pattern by uniform illumination. Typically, the holograms are π/2 phase shifted with respect to the illuminating interference pattern. A consequence is the energy transfer between the two light beams interfering in the medium, recognized as asymmetric two-beam coupling. In the case of sufficiently strong coupling, the gain may exceed the absorption and the reflection losses in the sample, so optical amplification can occur.
16 Materials for Holographic 3DTV Display Applications
569
It is important to avoid confusing the photorefractive mechanism with the large number of other local effects, such as photochromism, thermochromism, thermorefraction, generation of excited states, etc. leading to photoinduced refractive index modulation. On a basic level to achieve photorefractive effect both photoconductivity and refractive index dependence on the electric field is required. The recording process consists of several steps. First, charge carriers are excited by the inhomogeneous illumination leading to spatially modulated currents appearance. Thus, charge density pattern is obtained and space charge fields arise. Due to electro-optical effect, the necessary refractive index modulation is obtained [29]. According to the steps, described in the previous paragraph, the first physical process is the generation of mobile charges in response to the light field. It can be represented as electrons and holes formation. Drift, bulk photovoltaic effect and diffusion are involved in the charge density pattern formation. The drift current is consequence of the Coulomb interaction of an external electric field with the charge carriers. The bulk photovoltaic currents are typical for noncentrosymmetric crystals. Sometimes they are referred to “photogalvanic currents” in order to distinguish them from the usual photovoltaic effect. Except on the host materials, this effect also depends on the doping and annealing of the crystals. Since the electron orbitals of the defects are oriented with respect to the crystal lattice in order to minimize the free energy, light polarization sensitivity is observed. Lorentz forces can cause additional currents and influence the photovoltaic effect. Anyways the magneto-photorefractive effect is negligible even in very strong fields presence. The other transport process is connected to the diffusion currents, which are consequence of the free charges spatial variation due to the inhomogeneous illumination. Except the charge originate problem, another key question is where the charges are trapped. These microscopic processes determine macroscopic properties like absorption, absorption changes, conductivity and holographic sensitivity. Different models of these processes are developed. Among them are the one-center model without and with electron-hole competition, the two-center model and the three-valence model. On the base of these models, more complex systems like the three-valence with electron-hole competition case can also be described. To choose appropriate charge transport model not only the host material, but also the light intensity (cw or pulse laser regime), doping and thermal annealing should be considered [19]. Presence of trapping sites to hold the mobile charges is required, especially when longer lifetime storage is desired. In general terms, a trapping site is a local region of the material where the charges are prevented from transport participation for a certain time. The last requirement for the photorefractive media is connected to the presence of refractive index modulation in consequence of the local electric fields initiated by the illumination, charge generation and redistribution. If
570
K. Beev et al.
the material exhibits large electro-optic effect, the refractive index modulation magnitude ∆n is related to the space charge field Esc as follows: 1 ∆ n = − n3 re Esc , 2 where re is an effective electro-optical coefficient. According to the ∆n dependence on the field Esc , the sinusoidal variations of Esc lead to sinusoidal refractive index modulation. Another reason to occur field-dependant refractive index modulation can be consequence of quadratic or Kerr orientational effect. It is connected to light induced birefringence in photorefractive materials [30]. First, all known materials exhibiting photorefractive mechanism was inorganic crystals as LiNbO3 , KNbO3 , BaTiO3 , Bi12 SiO20 , Srx Ba1−x NbO3 (0 ≤ x ≤ 1), InP:Fe, GaAs, multiple-quantum-well semiconductors and several others [31]. Very early the crucial influence thermal treatment and the dopants were discovered [32, 33]. In order to improve the holographic characteristics of these crystals, the influence of different doping agents is examined. In the following, the main features of some of the most widely studied types of crystal and the used doping elements are considered. A further description is available in [19]. 16.3.2.1 LiNbO3 and LiTaO3 LiNbO3 and LiTaO3 crystals are among the most studied photorefractive materials. The highest values of the refractive index modulation (according to the recent literature) for photorefractive crystals (∼ 2 × 10−3 ) are obtained in LiNbO3 :Fe. The charge transport is consequence of bulk photovoltaic effect; the dominant carriers are electrons. At low intensities, the process is well described by the one center charge model and no light-induced absorption changes are observed. The photoconductivity increases linearly with the light intensity. In contrast if light-induced absorption changes appear, the one center charge model is not sufficient to describe the processes. In this case, the two center model is used. The most widely used dopants for these crystals are Fe and Cu. The presented features are also observed in Fe and Cu-doped crystals. In double-doped crystals, like LiNbO3 :Fe, Mn, photochromic effect is observed. Other employed dopants are Mg and Zn. The Mg does not influence the photovoltaic effect, but enlarges the conductivity of the crystal. The addition of Zn (∼2–5%) also leads to conductivity increase, but at the same time results in higher holographic sensitivity. Sensitization for infrared recording is realized in LiNbO3 :Fe and LiNbO3 : Cu. It is performed by green pulses excitation for subsequent infrared exposure [34]. 16.3.2.2 BaTiO3 In BaTiO3 crystals exhibit sublinear increase of the photoconductivity in regards to the light intensity. Light-induced absorption changes are observed.
16 Materials for Holographic 3DTV Display Applications
571
Therefore the two-center and three valence models are used to describe the charge transport processes [35]. The mechanisms are diffusion and externally applied electric field, but in doped materials appreciable photovoltaic effect contributes as well. Usually the charge carriers are holes. Fe, Rh, Co, Ce, Mb, Cr are used as dopants. The iron doping leads to additional absorption, but no significant improvements of the photorefractive performance are observed. The photorefractive effect is strongly influenced by thermal annealing. Rhodium doping improves the photorefractive effect in the red and infrared region along with light induced absorption changes. If double doping is performed – BaTiO3 :Fe, Rh, the charge transport become more complicated. The response time is improved in the case of Co-doping. The cerium-doped crystals exhibit higher light-induced absorption changes. Other dopants used in BaTaO3 crystals are Mb, Cr and Nb, although the best performance is observed in materials doped with Rh, Co and Ce. Another way to improve the response time is to heat the crystals. 16.3.2.3 Barium-strontium Titanate and Barium-calcium Titanate Since tetragonal BaTiO3 are very difficult to produce (small growth rates up to 0.2mm/h are achieved), a possible solution is to use appropriate mixed crystals. Such crystals are Ba1−x Srx TiO3 (BST) and Ba1−x Cax TiO3 (BCT), 0 ≤ × ≤ 1. The one-center model successfully describes the charge transport in BST, while BCT show sublinear conductivity and light-induced absorption and the two-level model should be used. In BCT are observed bulk photovoltaic fields, but they are not significant. In both crystals hole conductivity dominates. The dominant charge driving forces are diffusion and external electric fields. A possible way to improve these crystals is to use the knowledge for the charge transport in BaTiO3 and use dopants like Rh, Co and Ce. 16.3.2.4 KNbO3 Potassium niobate crystals also exhibit sublinear conductivity and light induced absorption changes. The two-level model is employed to describe the charge transport. The charge carriers in doped crystals are usually holes. In undoped materials electron-hole competition takes place. The driving forces are diffusion and drift in external electric fields. Due to the large conductivity, the bulk photovoltaic fields are negligible, although bulk photovoltaic currents present. Blue light irradiation enhances absorption in infrared. This effect can be used for infrared holographic recording. KNbO3 crystals doped with Ir show higher effective trap density, but no significant improvements of the response time are observed. Other possible dopants are Ni and Rh. The KnbO3 :Rh crystals are electron conductive. They show photorefractive response time about 30 times smaller and photoconductivity more than two orders of magnitude larger than undoped crystals. An
572
K. Beev et al.
inconvenience is the complicated crystal growth. Other dopants for KNbO3 are Cu, Ru, Mn, Rb, Na, Ta, Ce, Co. Among them, copper, rhodium and manganese favorably influence the photorefractive properties. 16.3.2.5 Other Crystals Mixed potassium tantalite-niobate crystals – KTa1−x NbxO3 , 0 ≤ x ≤ 1 (KTN) are also objects of investigations. Again Ir doping is used to increase the effective trap density. Other relatively more examined crystals are strontiumbarium niobate. Ce and Rh are used as dopants in these materials. With Sn2 P2 S6 crystal are obtained sensitivity values ∼ 5000 cm/J [36], which is relatively high for photorefractive crystals. Other group of photorefractive materials used for holographic recording is the sillenite-type crystals. Bismuth oxide crystals Bi12 MO20 , where M = Si, Ge, Ti, attract special attention owing to their high photosensitivity and high carrier mobility permitting fast response times [37, 38]. Moreover these crystals can be easily doped. An attractive feature is doping with Ru, Re and Ir, which shifts the transmission spectra to the red and to the near infrared spectral range. Thus holographic recording using He-Ne and low-cost diode lasers is possible. For example, the Ru-doped Bi12 TiO20 exhibits this effect most significantly [39]. Similar photorefractive sensitivity improvement to longer wavelengths is also observed in Ru-doped KNbO3 crystals [40]. However, growth of KNbO3 :Ru is rather complicated, since the incorporation of ruthenium is possible only if the crystals are grown with high speed, which diminishes the crystal quality and results in more defects in the crystal structure [19]. Usually the holographic sensitivity of photorefractive crystals is between tens and hundreds cm/J. Spatial frequencies exceeding 2000 lines/mm are obtained. On the other hand, the refractive index modulation is relatively low – up to 10−3 compared to other type of materials (silver halides, polymers). 16.3.3 Liquid Crystals Other type of materials, finding application in dynamic holographic recording is liquid crystals. Nowadays they can be found almost everywhere due to the huge realization they have achieved in the area of conventional displays. Nevertheless liquid crystals continue to be extensively studied for various contemporary applications being also very attractive for holographic display realization. In the basics of the specific liquid crystal behavior is the match of solid and isotropic liquid properties. In other words, they possess at the same time some properties typical for liquids along with such peculiar to crystals. More precise denomination is mesomorphic materials, since they exhibit aggregation states appearing between solid and liquid phase. They can also be called anisotropic liquids. From macroscopic point of view, they resemble liquids, but the strong anisotropy in the physical properties refer them like more similar
16 Materials for Holographic 3DTV Display Applications
573
to crystals [41]. At the same time typical effects for crystals like Pockels effect are not observed in liquid crystals. Depending on the material, one or more mesomorphic states can appear if some thermodynamic parameter is changed. If the volume and the pressure are kept constant, liquid crystalline state appears during concentration or temperature variation. Thus liquid crystals are divided in two main groups –lyotropic and thermotropic. The first ones show mesomorphic state if the concentration of the chemical compounds is changed, while the thermotropics reach these states through temperature variation. Mesomorphic state is observed in compounds exhibiting molecular orientational order. Usually this is the case of elongated in some direction (rod-like) molecules. The typical liquid crystal molecule length is about 20–40 ˚ A, while they are usually only ∼ 4–5 ˚ A broad. They are diamagnetic but with constant dipole momentum. Different categories of chemical structures exhibiting liquid crystalline state exist, at the same time new types are continuing to be synthesized nowadays. Though, the organic materials exhibiting mesomorphic state can be divided in several groups by symmetry considerations. Thus, nematic, smectic and cholesteric phases are distinguished. Nematic liquid crystals are characterized by long orientation order and free space translation of the molecules weight centers. They are optically uniaxial media with unpolar crystallographic structure – i.e. the directions of the molecular ends are homogeneously distributed. Layer structure is typical for smectic liquid crystals. According to the Zachman-Demus classification, the following smectic mesophases can be distinguished: •
•
•
Smectic phase A – represents freely moving towards each other layers, which surfaces are formed by the molecular ends. The molecules are directed in orthogonal direction towards the layer surface and parallel to each other. Inside the layers the molecules do not have translation order, so they can move in two directions and spin around their long axis. This modification of smectics appears at highest temperatures and under additional heating is transformed in nemamatic or holesteric (see below) mesophase or isotropic liquid. Smectic phase B – the molecules in the layers, being orthogonal towards the surface and parallel to each other, form hexagonal packing. Ordinary and slanted B smectics are distinguished (in the case of slanted B nematics the molecules tilt towards the layer surface is different of π/2). Smectic phase C – the long molecular axis (parallel to each other in the layer) are forming temperature dependent angle towards the layer surface. The liquid crystalline compounds possessing optical activity can generate chiral mesophase. In this case each following layer is turned towards the previous one, so twisted structure is obtained. If orientation of molecular dipoles in a certain manner presents, such structure possess ferroelectric properties. The chiral smectics C, having dipole moment perpendicular to
574
• • •
K. Beev et al.
its long axis and recognized as ferroelectric liquid crystals, generally exhibit a submillisecond or even microsecond switching time and thus attracts significant attention. Unfortunately, the use of these extremely attractive dynamic features for practical applications is limited due to technological problems, connected to the specific orientation of the ferroelectric liquid crystals. Smectic phase D – optically isotropic. X-ray structural analysis studies have not detected layered structure, but quasi-cubic lattice. Smectic phase E – characterized by very high degree of three-dimensional order Rotation around the long molecular axis is absent. Smectic F and G phases exist as well, being less studied and an object of current investigations.
The cholesteric liquid crystals comprise of optically active molecules, which long molecular axis direction in each following layer (consisting of parallel oriented and freely moving in two directions molecules) forms certain angle with the direction of the molecules in the previous one. Thus spiral structure with pitch dependent on the type of the molecules and the external influence is formed. This pitch corresponds to rotation of the orientation molecular axis (the director – see below) at 2π, although the period of the physical properties alteration is equal to π. The so-called chiral nematics – molecules with typical nematic structure, but possessing in addition optical activity – are also referred to the cholesteric liquid crystals. The predominant molecular orientation in liquid crystal is characterized by a unit vector n, fulfilling the requirement n = −n. It is recognized as director. In nematic liquid crystals the director coincides with the optical axis direction. In the cholesteric mesophase it changes its direction along with the cholesteric spiral and its components can be expressed as: nx = cos ϕ, ny = sin ϕ, nz = 0. In A and B smectic phases the director coincides with the normal towards the smectic planes, i.e. coincides with the optical axis similarly to the case of nematic liquid crystals. In C and H smectics the director is deflected towards the layer normal and coincides with one of the two optical axes. The equivalence of the director orientation (n = −n condition) is consequence of the fact that macroscopic polarization effects in liquid crystals are not observed. A series of monographs examine the LC properties in details [42, 43, 44]. The most attractive features (large optical anisotropy and field-applied refractive index modulation) are consequence of their anisotropic nature (anisotropy in the dielectric and diamagnetic permittivity) and electro-optical behavior. The electro-optical LCs properties are governed by the free energy minimization in external electric or magnetic field leading to reorientation (and reorganization) of the LC molecules. In the case of positive anisotropy, the LC director aims to follow the applied field direction. If the anisotropy is
16 Materials for Holographic 3DTV Display Applications
575
negative – the LC molecules rotate in perpendicular direction towards the external field. If the LC director does not satisfy the minimum free energy condition in the initial state, in the case of strong enough applied external field, director reorientation takes place until new stationary distribution is obtained. This effect is known as Freedericksz transition and requires sufficient fields to overcome the elastic forces. The relation between the applied field E and the director angle θ is given by the expression [45]: Ed 2
8
∆ε = 4πK
θm 0
dθ = F (k), sin2 θm − sin2 θ
where d is the LC thickness, K – elastic coefficient and θm – the angle of director diversion in the middle of the layer. The elliptical integral is tabulated for arbitrary values of k = sin θm < 1. For relatively small diversion angles it can be expanded in series. If the examination is limited to the first two terms, the expression can be written as: 8 4πK π 1 E= 1 + sin2 θm + . . . . ∆ε d 4 As a consequence, deformation θm = 0 is possible if the applied field exceeds a given value E0 , which represents the threshold voltage for the Freedericksz transition: 8 π 4πK E0 = . d ∆ε This effect changes the optical properties of the LCs due to the refractive index alteration: no ne nef f = n2o cos2 θ + n2e sin2 θ
Thus, the possibility to reorient the LC under external field (in most of the practical applications – electric) enables the control of its optical properties. This is an extremely attractive feature, finding application in a wide area of nowadays science and technique. In fact, the most attractive applications are connected to the display area. For the aim of the holographic recording, LC systems containing azodyes [46, 47, 48] are studied. Usually nematic LC are used. The reason is that they are rather sensitive to weak perturbation forces induced by electric, magnetic or optical fields. Nematics are also well known because of their nonlinear optical axis reorientational effects [49]. The nonlinear optical properties can be increased dramatically by doping the LC with traces of dye molecules, usually ranging from 0.5% to 2% [50]. This is a possible way to increase the
576
K. Beev et al.
diffraction efficiency of holographic gratings. The increase is consequence of the laser radiation absorption leading to photoexcitation of the dye molecules initiating the mechanism for large refractive index changes due to the LC director reorientation. The LC director axis reorientation has been attributed to intermolecular interactions between azo dye and LC molecule [51, 52] and to optically induced d.c. space-charge field (examined below) [53, 54]. Some of the obtained holographic characteristics of dye-doped liquid crystals are: sensitivity in the range 440–514nm, spatial frequency above 1000 lines/mm and refractive index modulation up to 0.1 [47, 48].
16.4 Recent State of Art and New Materials Development and Applicability of the Different Media Types to Holographic Display Systems The described materials are the major part of the non-silver recording media. But for the aim of holographic 3D display realization, the recording material has to be in addition dynamic. Among the presented holographic media photorefractive crystals, liquid crystalline and polymeric materials containing azo-groups or exhibiting phase transitions [55] are representative of the dynamic recording materials. The photorefractive crystals are one of the earliest, largely considered for storage media suitable for read-write holograms recording. Usually they are doped with transition metals such as iron or rare-earth ions like praseodymium, grown in large cylinders in the same way as semiconductor materials. Thus, large samples can be polished for thick hologram recording. In the last several years, photorefractive materials continue to be a subject of intensive studies. Most of the available literature concern LiNbO3 crystals. Some of the works are related to switching optimization [56] or studying of the photorefractive effect depending on the composition and the light intensity [57]. Another direction of the recent investigations is connected to new doping elements utilization. In cerium-doped crystal the photovoltaic constant is measured to be only one third of that of the iron-doped one [58]. Other dopants recently used in combination with supplementary elements are In and Cr [59, 60]. The higher the doping level of Cr, the larger absorbance around 660 nm in double doped LiNbO3 :Cr:Cu crystals is observed. Along with LiNbO3 crystals, the LiTaO3 materials are also subject of intensive investigations [61]. Similarly to LiNbO3 , two- step infrared sensitization and recording in LiTaO3 crystals is reported [62]. It has been realized by pyroelectric effect. Nevertheless, the other types of crystals are also continuing to be studied [63]. New materials like Gd3 Ga5 O12 [64, 65] are also developed. The main advantage of photorefractive crystals is that no development of the holograms is required and all the processes are completely reversible.
16 Materials for Holographic 3DTV Display Applications
577
Unfortunately, mainly the difficult crystal growth and sample preparation limit the applications of these materials. Also the thickness of photorefractive crystals is typically about several mm. For optical storage applications thicker media is preferable, but to realize devises comprising of layers and lots of elements this is rather not desired. Thus, it would be inconvenience in the 3D-holographic display realization. 16.4.1 Photorefractive Organic Materials In fact, all know photorefractive materials until the 1990 were inorganic crystals. Photorefractivity in an organic crystal was first reported by the ETH Zurich group (in 1990) [66]. The material was a carefully grown nonlinear organic crystal 2-cyclooctylamino-5-nitropyridine doped with 7,7,8,8tetracyanoquinodimethane. Although the growth of high-quality doped organic crystals is a difficult process, since most of the dopants are expelled during the crystal preparation [67], there are some following investigations of such media [68, 69]. On the other hand, polymeric and/or glassy materials can be doped relatively easy with various molecules with different sizes. Also, polymers may be formed into different shaped thin films as well as applied to total internal reflection and waveguide configurations according to the applications requirements [67]. The first polymeric photorefractive material was composed of an optically nonlinear epoxy polymer bisphenol-A-diglycidylether 4 -nitro-1,2-phenylenediamine, which was made photoconductive by doping with 30 wt% of the hole transport agent diethylaminobenzaldehydediphenylhydrazone. The first publication of its application in holography is presented in [70]. Another approach is not to dope electro-optical polymers with charge transport molecules, but to synthesize a fully functionalized side-chain polymer with multifunctional groups [71, 72]. Nevertheless, a faster and easier to implement approach is the quest-host chemical design. It enables a better way to test different combinations of polymers and molecules with photosensitivity, transport and optical activity [13]. Along with the other advantages, a further motivation to elaborate photorefractive polymers is consequence of a particular figure-of-merit consideration comparing the refractive index changes possible in different materials in the case of equal density of the trapped charges. It can be defined as Q=
n3 re , εr
where n is the refractive index, re – the effective electro-optical coefficient and εr is the relative low-frequency dielectric constant. Q approximately measures the ratio of the optical nonlinearity to the screening of the internal space-charge distribution by medium polarization. It is established that for inorganic materials it does not vary very much, due to the fact that the optical nonlinearity is driven mostly by the large ionic polarizability. In contrast, the
578
K. Beev et al.
nonlinearity in organics is a molecular property arising from the asymmetry of the electronic charge distribution in the ground and excited states [25]. As a consequence, the large electro-optic coefficients are not accompanied by large DC dielectric constants. Thus, an improvement in Q by more than 10 times is possible with organic photorefractive materials. On a basic level, the recording mechanism in photorefractive polymers does not differ from the one in photorefractive crystals, but different constituents bring forth to the required properties. Examples for good charge generators (the first process in photorefractive materials) are donor-acceptor charge transfer complexes like carbazoletrinitrofluorenone, fullerenes such as C60 , or well known in photographic industry dye aggregates. In order to obtain dynamic recording, reduction/oxidation process is required – the charge generation site has to oxidize back to its original state. In photorefractive polymers the holes are more mobile. The charge (hole) transport function is generally provided by a network of oxidizable molecules situated close enough to provide hopping motion. Examples of such transporting molecules are carbazoles, hydrazones and arylamines, which are electronrich and consequently have low oxidation potential. Energetics requires the highest occupied energy level of the photogenerator to be lower than the one of the transporting molecules. The physical processes initiating the charge transport are diffusion in consequence of charge density gradients or drift in externally applied electric field. Both generally proceed by charge transfer from transport site to transport site. At all, in most polymeric materials the ability of the generated charges to move by diffusion alone in zero electric field is quite limited, so the drift in the applied field is the dominant mechanism for charge transport. The other element for the photorefractive effect, especially when longer grating lifetimes are desired, is the presence of trapping sites that hold the mobile charges. In polymer photorefractive materials transporting molecules with lower oxidation potential are used as deep hole traps [73]. The efforts to describe the photorefractive effect in polymers were connected to application of the standard one-carrier model used for inorganic crystals. According to this model the space charge field Esc is expressed by [74]: Esc = where Eq =
eNA [1−(NA /ND )] εo εr KG
mEq (Eo + iEd ) Eq + Ed − iEo
is the trap-density-limited space charge field √
I1 I2 – the for wavevector KG , εo is the permittivity of free space, m = 2I1 +I 2 modulation depth of the optical intensity pattern, ND – the density of donors, NA – the density of acceptors providing partial charge compensation, Ed = kB T KG /e – the diffusion field, where kB is the Boltzmann’s constant.
16 Materials for Holographic 3DTV Display Applications
579
The corresponding equation for organic photorefractive materials [75] is quite similar. The density of the acceptors replaced by the density of traps. Additional field-dependent terms arising from the field dependence of mobility and quantum efficiency are present. Moreover, several additional physical effects should be taken in account. They are connected to the presence of shallow along with deep traps, evidenced by the sublinear intensity dependence; the more complicated field dependence of the photogeneration efficiency especially if sensitizers like C60 are used as well as the different trapping mechanism. In fact, the main reason for the complexity of the photorephractive effect in polymers was observed a bit later. In 1994 Moerner and co-workers altered the picture of electro-optical non-linearity discovering an effect, which does not exist in the inorganic materials [30]. It is connected to orientational processes in the polymeric material in consequence of the charge-field formation. The latest photorefractive polymers, strongly exhibiting these effects, showed significantly higher performance (diffraction efficiency up to 100% compared to values of the order of several percents). It was achieved by fabrication conditions improvement and addition of plasticizer. The plasticizer reduces the glass transition temperature, enabling orientation of the electro-optic chromophores. The reorientation of the birefringent chromophores enhances the refractive index modulation, which is fully reversible reaching amplitude values ∼ 0.007 with response time of 100–500 ms. The other reasons for the higher performance of these materials is due to utilization of sample preparation conditions allowing higher electric fields application. Thus, the year 1994 was a turning point in the photorefractive polymers development. The chromophores design was changed – the Pockels effect was replaced by the orientational birefringence as the main driving mechanism. On the basics is the orientational photorefractivity, in which the refractive index of the material is modulated by the orientation of the optically anisotropic dopant molecules with permanent dipole moment control. It is a consequence of the internal space charge field generation, driven by absorption, charge generation, separation and trapping, similarly to the traditional photorefractive materials. 16.4.2 Liquid Crystals At the same year (1994), the first photorefractive liquid crystal materials were reported [76, 77]. The low-molarmass liquid crystalline material 40pentyl-4-biphenylcarbonitrile, doped with small amounts of a sensitizing laser dye rhodamine 6G was used. In fact, the ultimate extension of the orientational photorefractivity is to consider materials consisting entirely of long, rod-shaped birefringent molecules, which can be easily oriented in external electric field – i.e. the liquid crystals [26]. It is well known that nematic liquid crystals possess large optical nonlinearities associated with director axis orientation in consequence of optical or electric field application. In many aspects
580
K. Beev et al.
they are ideal to observe photorefractive effect, due to the specific design in order to obtain orientational response. Also, no nonlinear dopant is necessary, since the liquid crystal is itself the birefringent component. It is essential that in liquid crystals 100% of the medium contribute to the birefringence in contrast to the percentage of the nonlinear optical dopant in other systems. Furthermore, the molecules response to the space charge field is lower with an order of magnitude compared to polymers. As a comparison, the required field for photorefractive liquid crystal reorientaion is ∼0.1 V/µm, while in polymer systems it is ∼50 V/µm. Along with these advantages, the figure of merit, similarly with polymers is rapidly improved within several years in consequence of new liquid crystal mixtures elaboration and the better understanding of the charge transfer processes. Usually, the liquid crystal is sandwiched between two indium tin oxide (ITO)-coated glass slides, treated with a surfactant to induce perpendicular alignment of the director (towards to the face of the glass slides), i.e. homeotropic alignment [78]. The cell thickness is typically between 10 and 100 µm, fixed by Mylar spacers. The theoretical treatment is based on a steady-state solution assumption to the current density resulting in the following expression for the diffusion field in the liquid crystals [76, 79]: Esc =
−mkB Tq D+ − D− σph sin qx, 2eo D+ + D− σph + σd
where m is the modulation index, σph – the photoconductivity, σd – the dark conductivity, eo – the charge of the proton, and D+ and D− – the diffusion constants for the cations and anions, respectively. This equation determines the critical factors influencing the magnitude of the space-charge field. The photoconductivity relative to the dark conductivity and the difference in the diffusion coefficients of the cations and anions allow the one set of charges to preferentially occupy the illuminated or the dark regions of the interference pattern. Even today, the complete theoretical understanding of the photorefractivity in polymers and liquid crystals is challenging. According to [49] a full theory should take into account effects like mobility of various charge carriers, standard space-charge generation, space-charge fields due to optical fields and to conductivity and dielectric anisotropies [80], torques on the director axis, as well as flows and instabilities of the nematic liquid crystal. A large increase of the orientational photorefractive effect was first reported by Wiederrecht et al. [81]. It was obtained basically by two improvements. An eutectic liquid crystal mixture was used along with an organic electron donor and acceptor combination with a well-defined and efficient photo-induced charge transfer mechanism. The eutectic mixture lowers the liquid crystalline to solid phase transition and provides better photorefractive performance due to the greater reorientation angle of the molecules in consequence of the lower orientational viscosity. The employed liquid crystals were low-molar-mass compounds with higher birefringence.
16 Materials for Holographic 3DTV Display Applications
581
Another direction in these investigations is to use high molecular liquid crystals. New liquid crystal composites were developed containing both a lowmolar-mass liquid crystal and a liquid crystal polymer, or high-molar-mass liquid crystal [77, 82, 83]. These composites in many respects have the best photorefractive figures of merit for strictly nematic liquid crystals [26]. Net beam-coupling coefficients greater than 600 cm−1 and applied voltages ∼ 0.11V/µm were obtained with lower than 10 mW per beam intensities of a He-Ne laser. The gratings operated at Bragg regime. Later studies have examined the influence of the polymer liquid crystal molecular weight on the temporal and gain coefficients of photorefractive gratings [84]. An improvement in the response time was found for the lower molecular weight polymers. This fact was referred to an overall decrease in the composite viscosity. The response time was shorter than 15 ms at Bragg regime. Furthermore, the required applied voltage was lowered to ∼0.1 V/µm. In fact the improvements in the holographic characteristics of these materials are connected to composite materials development. At the one hand, the possibility to combine the very large reorientational effects exhibited by low-molar-mass LCs with the longer grating lifetimes and higher resolution of nonmesogenic polymers is extremely attractive. At the other hand, the combination of liquid crystals and photopolymers enables switchable gratings realization, finding large applications in practice. These materials are recognized as polymer dispersed liquid crystals and will be discussed in details in the following. In order to summarize the current performance of the organic photorefractive materials, some data of the obtained characteristics are adduced and compared to the one of the photorefractive crystals. To gain efficient charge generation and thus high holographic sensitivity, it is required to sensitize the recording material to some proper laser wavelength. In contrast to inorganic crystals, materials the spectral sensitivity of photorefractive organics can be changed using proper dopants. During recent decades, numerous sensitizers have been developed [85]. As a result, the spectral sensitivity of these materials is nowadays tuned through the entire visible spectrum and the near infrared, up to 830 nm. The spatial resolution is similar to this of inorganic crystals. Such parameters as the dynamic range and the material stability are also comparable with photorefractive crystals. The last two characteristics are not directly connected to the requirements of holographic display material, but mostly to storage applications. For the holographic display realization purpose, a critical parameter is the time response of the medium. It is rather complex process in photorefractive polymers and liquid crystals. It depends on several factors, including photogeneration efficiency, drift mobility and field-induced orientational dynamics of chromophores. It is also strongly dependant on the applied electric field. Thus, the fundamental research concerning the photorefractive materials (inorganic crystals, polymers and liquid crystals) has pointed out the limits of
582
K. Beev et al.
their applicability in real-time holography [86]. Inorganic photorefractive crystals and photorefractive polymers have a relatively slow response and besides the latter need biasing of 10 kV per sample of typical thickness. Nevertheless, the latest photorefractive liquid crystals and polymer dispersed liquid crystals exhibit a better performance. As a conclusion, it could be pointed out that the most promising mechanism for holographic 3D display applications is the reorientation of birefringent chromophores. It can be obtained by internal electric field creation (in the photorefractive media) and optical cis-trans conformation processes (dyedoped liquid crystals). As a consequence, currently liquid crystalline materials are considered as a leading candidate for those applications that do not require long storage times. The molecular reorientational effect (cis-trans conformation) is much faster process than the charge generation, transport and trapping (in the case of the electro-optical effect in photorefractive materials). In this sense, it is preferable for 3D display applications. In fact, the dye-doped liquid crystal materials are quite new. Their performance and the possibility to realize holographic recording is consequence of the so-called Janossy effect. In 1990 Janossy discovered that the optical reorientation of liquid crystals can be enhanced up to two orders of magnitude in the case of doping with certain dichroic dyes, if the dye molecule is excited anisotropically [87]. Such dyes are known to undergo photoinduced conformational changes, which is the case of azo-dyes [88]. The excellent dynamic performance of the dye-doped liquid crystals achieved nowadays, has resulted in a significant scientific interest. Currently, the possibility to employ azobenzene dye-doped twisted-nematic liquid crystals for polarization holographic gratings recording is studied [89]. High diffraction efficiency (exceeding 45%) is obtained. In addition to the polarization rotation when the laser beam is diffracted in the medium, this rotation angle can be controlled by the twisted angle of the sample cells. Also, the layer undulations in the cholesteric cells are used as switchable weakly polarization-dependent 2D diffraction gratings of both Raman–Nath and Bragg types [90]. These experiments put a new possibility to enlarge the liquid crystal application through chiral structures use. Applying transverse-periodically aligned nematic liquid crystals, polarization-induced switching between diffraction orders of a transverseperiodic nematic LC cell is realized [91]. Relatively new approaches to liquid crystalline materials are also connected to carbon nanotubes-doping and combined – fullerene C60 and dye-doping [92, 93]. The applicability of dye-doped liquid crystal materials for relatively highspatial frequency recording has been shown in [47]. 16.4.3 Polymers In fact, enhanced optical reorientation the effect of dye-doping is not limited only to liquid crystals, but also presents isotropic liquids and amorphous
16 Materials for Holographic 3DTV Display Applications
583
polymers. Although polymers are most often referred as promising medium for high density optical storage, continuing to be studied in this direction [94], they also find applications in real-time holography. The azo-containing materials are among the representatives of the most studied dynamic polymeric holographic media. It is established that the optical properties modification of azo materials is due to the efficient photoisomerization of the -N = N- in azobenzen group, initiated by the absorbed light; the photoreorientation with polarized light is also well-known [95, 96]. The molecular reorientation, a consequence of the angular hole burning due to multiple trans-cis-trans photoisomerization cycles, leads to photoinduced birefringence and dichroism [97]. The reversible photoisomerizations can also initiate mass transport, resulting in surface relief formation (surface diffraction gratings) [98]. The polymer mass redistribution induced by an interference pattern of two laser beams takes place well below the polymer’s glass transition temperature. Different mechanisms have been proposed in order to explain the surface relief gratings origin in azobenzene functionalized polymers. They include thermal gradient mechanisms, asymmetric diffusion on the creation of concentration gradient [99], isomerization pressure [100], mean field theory (based on electromagnetic forces) [101], permittivity gradient [102] and gradient electric force [103]. Besides of the surface relief creation, the photochromic conversion have attracted strong interest [104]. At all, both azobenzene LC and amorphous polymers exhibiting photoisomerisation and surface relief creation show excellent holographic characteristics. The photo-isomerization mechanism allows wider spectral sensitivity – up to 633 nm, while the surface relief materials usually work in the range of 244–532 nm. The gained spatial frequency is 6000 and 3000 lines/mm, respectively. The refractive index modulation exceeds 0.1. Nevertheless, the pursuit for dynamic holographic materials development is enlarging the available media mostly by composites development in order to combine the advantages of different materials like liquid crystals, various polymers etc. 16.4.4 Polymer Dispersed Liquid Crystals Polymer dispersed liquid crystals (PDLC) are relatively new materials, elaborated during the last two decades. Although they have been first considered for other applications, later they find large appliance in holographic recording. The first applications of PDLCs were the so-called “smart windows” formed by homogeneously distributed liquid crystal droplets in polymer matrix. Their optical behavior is electrically controlled. Later, the switchable holographic gratings recording enable huge applications as holographic optical elements. Another direction in PDLC development is to use the photorefractive effect in order to obtain reversible recording. The first description of PDLCs has been made by Fergason in 1985 [105], Doane [106] and Drzaic [107] in 1986. Their main advantage is the possibility to combine of the unique liquid crystal properties with the possibility to
584
K. Beev et al.
realize photoinduced processes in the medium, including optical recording. The structures consist of micron, or sub-micron, birefringent liquid crystal droplets embedded in optically transparent polymer matrix. The structure is fixed during phase separation process. The phase separation of PDLCs can be accomplished by several mechanisms. The thermal methods include common solution of thermoplastic material and liquid crystal cooling (TIPS – thermal induced phase separation). Another way is to use common solvent and its evaporation – solvent-induced phase separation (SIPS). Nevertheless, the most established nowadays technique is to employ polymerisation of monomeric precursors homogenized with the liquid crystal – polymerisation-induced phase separation (PIPS). The last case can be achieved optically (by UV irradiation). The next stage consists of free radical reactions initiating monomer-polymer conversion leading to increase of the polymer molecular weight in the presence of large volume fractions of liquid crystal. The final morphology consists of randomly dispersed liquid crystal domains with form, volume proportion and size determined by the illuminating light intensity, the volume ratio of the compounds in the pre-polymer mixture and the temperature [108]. It is essential to note that the obtained morphology determines the further electro-optical properties of the film. Depending on the liquid crystal concentration, two main types of morphologies are observed after the phase separation process. In the case of relatively low amounts of liquid crystal, the morphology is “Swiss cheese” type – spherical or ellipsoidal droplets are completely embraced by the polymer matrix. The other type of morphology consists of two continuous phases (polymeric and liquid crystalline) – described like “sponge” morphology. It is usually observed at liquid crystal concentrations exceeding 50%. A typical feature of this morphology is the coalescence of the liquid droplets [109]. At a given liquid crystal concentration, the droplet size and distribution is determined by the polymerization kinetics. If the liquid crystal is extracted from the structure, the morphology can be observed by electron microscopy techniques. After the initial droplets formation, during the polymerization process, their size increases in consequence of liquid crystal diffusion from the areas where the polymer concentration (due to the monomer-polymer conversions) increases rapidly. The droplet size and their distribution is determined not only by the diffusion process, but also by the polymer network propagation leading to “gelation” over a given molecular weight and density of the matrix. At this moment, the diffusion significantly diminishes and the droplet size (and shape) is fixed. Diameters from 0.02 to several micrometers are obtained. The control of the diameter is required for optimization of the further electrooptical properties of the material. The droplets distribution is random, except in the cases when special surface treatment is performed in order to create preliminary orientation of a layer of the material. Some kind of arrangement presents within the
16 Materials for Holographic 3DTV Display Applications
585
droplets, usually nematic, but the overall direction of the molecular axes (the director) is different in each droplet. At all, the director configuration within the droplet is dependent on the surface interactions, the elastic constants of the liquid crystal and the presence of external applied field as well as its amplitude. In most of the cases, the optical axis (determined by the dipole momentum) coincides with the molecular. In consequence of the chaotic director distribution, the material is “opaque” and strongly diffuses light. This is a consequence of the refractive index mismatch between the droplets and the polymer matrix. Usually, the ordinary refractive index is chosen to be similar to the one of the polymer. This equilibrium is used to “switch off” the highly scattering mode through application of strong enough electric field. Such effect can be observed through the influence of magnetic field. The field (in the practical applications – electric) has to exceed the resistance caused by the elastic forces of the liquid crystal in order to induce molecular reorientation. This reorientation corresponds to Freedericksz transition [110], which explains its threshold behavior. The electric field is applied in such manner that the incident light “sees” only the ordinary refractive index of the liquid crystal. In consequence the material becomes absolutely transparent [111]. When the electric field is removed, the liquid crystal returns to its initial distribution governed by the elastic forces. Thus, two states of the PDLC films are obtained – highly opaque “switched off” state and transparent “switched on”. The refractive ne + 2no index in the initial state is n0 = . When strong enough electric field 3 is applied, the effective refractive index is no . As a consequence, the obtained optical anisotropy is: ∆n 3 In the case of low volume fractions of liquid crystal ∆nef f is smaller than ∆n/3. In order to increase the optical anisotropy, the percent content of the liquid crystal in the pre-polymer mixture should be increased. Another advantage of PDLCs with higher liquid crystal concentration is the lower electric field required for the director reorientation. Namely the scattering from the birefringent liquid crystal droplets control through electric field application is fundamentally the reason for one of the most attractive applications of PDLCs – the display technology. To some extent the conventional liquid crystal displays made from twisted nematics remain relatively expensive and difficult to produce. They also require additional optical elements (polarizers). On the base of controlled light scattering, PDLCs find application in optoelectronics for different transmission windows, temperature sensors, color filters with variable optical density, etc [112]. If such reactive monomers and liquid crystal solution is illuminated with spatially modulated light distribution (an interference pattern), the exposure ∆nef f = n0 − no =
586
K. Beev et al.
initiates a counter-diffusion process. It consists of liquid crystal transport to the dark regions of the interference pattern governed by the monomer diffusion to the light areas and the polymer network growth (in the light regions). The monomer diffusion is initiated by the concentration gradient due to the monomer-polymer conversions in the light regions. The phase separation process takes place in the dark parts of the material, where the liquid crystal is confined in droplets with sizes usually smaller than 0.5 µm. The liquid crystal droplets have randomly oriented director. This method of holographic structures formation was first utilized by Sutherland and co-workers in 1993. They report transmission diffraction gratings recording [113]. During the next year, reflection diffraction gratings recording was also realized [14]. The holographically formed PDLC structure consists of polymer reach and liquid crystal reach layers, corresponding to the light distribution. Again, due to the refractive indices mismatch, scattering occurs. Due to the periodicity of the structure this scattering is coherent and lead to reconstruction of the holographic information. Thus, the medium exhibit phase modulation and the structures are recognized as holographic polymer dispersed liquid crystals (HPDLC). Similar to the conventional PDLCs, the ordinary refractive index of the liquid crystal is chosen to match the one of the polymer matrix. As a result, the application of electric field leads to switching off of the diffraction structure, since the index modulation disappears. Again, when the electric field is removed, the liquid crystal restores its initial configuration governed by the elastic forces. The consequence of this mechanism is the reversible switching of the diffraction grating [114]. The sensitizing of HPDLC in the visible spectral range, actually to some proper laser wavelengths is fulfilled by addition of appropriate combination of dye and photoinitiator. The role of the dye consists in translation of the material absorption peak in the desired spectral range, while the photoinitiator is important for the free-radical polymerization processes initiation. It is considered that process exhibit the following mechanism [115, 116]. The photon absorption is accompanied with excitation of the dye molecule. The next process is electron transfer from the excited dye molecule to the initiator, usually belonging to the group of the amines. In consequence, a pair ion radicals is formed. This process is immediately followed by proton transfer to the anion radical of the initiator from the co-initiator. As a result a neutral amine radical is obtained which initiates the photopolymerization. The co-initiator concentration significantly influences the free radical formation efficiency. This efficiency results in higher polymerization velocity, which affect the size and the anisotropy of the droplets. In order to obtain high diffraction efficiency along with high spatial resolution, in the case of reflection diffraction gratings, morphology with high concentration of small liquid crystal droplets is required [117]. Extremely fast photopolymerization is exhibited by the multifunctional monomers – the necessary time is in the order of seconds. Another feature
16 Materials for Holographic 3DTV Display Applications
587
is that they form highly crosslinked network. As a result the liquid crystal droplet growth is limited and its size does not exceed 0.5 µm. Usually two kinds of monomers are used in HPDLC receipts. The mechanism at the one type is a free-radical bond opening (addition polymerization), while the other exhibits a combination of free-radical and step reactions. The acrylate monomers with higher than 4 functionality satisfy the requirements to achieve significant molecular weight in several seconds. Urethane derivatives with functionality between 2 and 6 are also used. Often N -vinyl pyrrolidinone (NVP) is used as a reactive diluent in order to homogenize the initial mixture. Another class of monomers is the commercially available Norland resins. The most widely used is NOA 65 (Norland Optical Adhesive). The other basic component of the PDLC pre-polymer mixture is the liquid crystal. The most widely used liquid crystals are nematic, possessing positive anisotropy. High values of ∆n and ∆ε are required. An important criterion for the material choice is the ordinary refractive index match with the value of the polymer’s. Some of the most often employed liquid crystals are E7 and the series BL. They have values of ∆n and ∆ε respectively ∼0, 21–0,28 and ∼13–18 [10]. An advantage of these liquid crystals is their good compatibility with the acrylate and NOA monomers. Another class is the TL series. They have limited solubility, but they are distinguished by the good stability, resistance and low driving voltages. Another approach in order to decrease the ruling voltages is to add surfactant-like compounds (as octanoic acid). Their role is to reduce the surface interactions. Since the morphology determines the properties of the film [118] at a major degree, the specific organization inside the droplet is an important object of investigation. One of the most frequently applied methods is the transmission imaging form polarization microscope analysis. The birefringent liquid crystal droplets change the polarization state of the light. The linear polarization is converted to elliptical and according to the rate of the polarization rotation, the organization inside the droplet toward the optical field can be estimated. Three different nematic organizations are estimated. Radial and axial configurations are consequence of normal anchoring at the droplet surface – homeotropic alignment. In the case of tangential (homogeneous) alignment of the liquid crystal molecules at the droplet surface, the configuration is bipolar [119]. Morphological investigations are also performed by scanning and transmission electron microscopy [120]. They provide information for the droplets distribution in the polymer matrix, but not for the configuration inside them. The organization inside the droplet is determined by parameters like droplet size and shape, as well as the surface interactions with the surrounding polymer matrix. These parameters are dependant on the concrete compounds as well as the recording kinetics. It has to be pointed out that HPDLC possess very attractive properties as medium for switchable holographic recording, which becomes apparent in the active investigations carried out by many research groups recent years.
588
K. Beev et al.
Mainly the high degree of refractive index modulation, the volume character of the recorded gratings, the unique anisotropic properties and electro-optical behavior attract the scientists’ attention. In addition, the whole recording process consists of a single step and allows the application of different optical schemes and geometries. As a consequence of the HPDLC new materials elaboration and optimization, the following holographic characteristics are obtained. • • • •
Spectral sensitivity in the almost whole visible range as well as in the infrared – 770–870 nm, through the utilization of different dyes; Holographic sensitivity (S) exceeding 3 × 103 cm/J; Spatial frequency > 6000 mm−1 ; Refractive index modulation ∆n ∼ 0.05.
Owing to these recent characteristics, HPDLC are nowadays intensively investigated for lots and various practical applications. Simultaneously, the fundamental problems of the material performance optimization [121, 122] and the underlying physical processes in such systems [123, 124] are also an object of extensive investigations. The studies of the mesophase confined in small volumes, where the surface interactions play major role is an interesting and actual problem. There is no exact theoretical treatment of the simultaneous photopolymerization, phase separation and mass transfer processes, responsible for the diffraction structures in HPDLC. At the other hand, HPDLC find large application as holographic optical elements in areas like photonic crystals [125], high density information recording [126], electrically controlled diffractive elements [127], tunable focus lenses [128], electro-optical filters [129], interconnectors and other elements for fiber optics [127]. They have been recently used as elements in information security systems [130] and feedback elements of compact laser in order to flip the generated wavelength [131]. Polarization holographic gratings in PDLC have also been reported [132]. First considered for display applications, HPDLC remain one of the most attractive candidates for the different approaches for color and mostly 3D displays. HPDLC enable color separation applicable also to image capturing [133]. Another approach is to use waveguide holograms [134]. Investigations of HPDLC at total internal reflection geometries are already performed. Slanted transmission diffraction gratings, where the applied electric field controls the total internal reflection conditions for the horizontally polarized light vector (electric vector parallel to the incident plane) are realized [135]. Stetson and Nassenstein holographic gratings, representing total internal reflection and evanescent wave recording in extremely thin layer, have also been successfully recorded [136, 137]. The applications of HPDLC in these different and numerous areas is consequence of the possibility to create different morphologies through compounds choice, concentration changes, recording geometry and kinetics.
16 Materials for Holographic 3DTV Display Applications
589
As mentioned above, another direction of PDLC development is connected to the synthesis of photorefractive polymer dispersed liquid crystals (PR PDLC) as medium for reversible holographic recording. Their elaboration is consequence of the photorefractive organic material development. In PR PDLC the polymer typically provides the photoconductive properties required for the space-charge field formation. The liquid crystal provides the refractive index modulation through orientational nonlinearity. The major advantage of PR PDLC is the significantly lower electric field necessity, compared to the one required in polymer composites. The first PR PDLC systems were reported in 1997 by two groups [118, 138, 139]. The polymer/liquid crystal mixtures were similar, based on PMMA (poly-methyl methacrylate) polymers and E49 and E44 liquid crystals. The employed sanitizers were different. Also, the recorded gratings differed depending on the obtained regime – Bragg or Raman-Nath. Since these first experiments, the performance of PR PDLC materials has been considerably improved. Internal diffraction efficiency values reaching values ∼ 100% and applied voltages of about only 8 V/µm are obtained [140, 141]. Although they have such promising characteristics, the remaining weak points of PR PDLC are the relatively high scattering loss and slow photorefractive dynamics due to the low mobility. The scattering losses are connected to the relatively high liquid crystal concentrations, resulting in droplets with bigger size compared to the HPDLC. Although these disadvantages, mostly the response time, were successfully overcome by the substitution of PMMA with PVK (poly-N-vinylcarbazole), it was obtained at the cost of low diffraction efficiency reaching only several percents [142]. As a result, a number of physical studies were conducted in various PR PDLC systems in order to get better understanding of the photorefractive mechanism and to optimize their performance [143, 144, 145, 146].
16.5 Conclusion The 3D holographic display realization is still a challenging task. It requires a 3D scene encoding, in terms of optical diffraction, transformation into fringe patterns of the hologram, signal conversion for a spatial light modulator and display in real time [1]. The ultimate element of this devise should be a fast dynamic holographic material possessing high spatial resolution capability. Another problem is connected to the fact, that the available spatial light modulators scarcely satisfy the demands of holographic display systems. Since the critical point is their poor spatial resolution, the most probable solution is to synthesize the whole diffraction structure in parts, i.e. to transfer the diffraction structure from the spatial light modulator to the reversible recording media by multiplication (the Qinetiq concept). Thus, the final device should comprise certain number of elements, including switchable diffractive optics and reversible recording media. The best
590
K. Beev et al.
candidate for the switchable optical elements seems to be the nano-sized composite polymer-dispersed liquid crystals. They possess the main advantages of the organic media – simple (dry), one step processing, high sensitivity, proper mechanical characteristics (plasticity) allowing easy integration in different compact devices, as well as high signal to noise ratio and spatial resolution. Recently, most of the efforts are directed to improve the electro-optical performance, to employ total internal holographic recording set-ups as well as to develop new PDLC mixtures. Otherwise, the ideal reversible material seems to be lacking. As mentioned above, the difficult crystal growth and sample preparation limit the applications of photorefractive crystals. Another disadvantage is the relatively high price. Despite the progress in photorefractive organic materials, a number of challenges remain. Among them is the necessity of each material optimization, due to the inability to maximize both steady state and dynamic performance at the same time. Sub-millisecond response times are not achieved yet. At all, the ideal material should have low operating voltages and fast response simultaneously. Also, no complete theoretical treatment exists. Other promising candidates are the dye-doped liquid crystals and photochromic materials. An advantage of these recording media is the absence of electric field in the write or read process. The required properties of this class of materials can be summarized as follows: thermal stability of both isomers; resistance to fatigue during cyclic write and erase processes; fast response; high sensitivity [104]. Another approach is to use biological materials, making advantage of their improved properties due to natural evolution. Such biological material is the photochromic retinal protein bacteriorhodopsin contained within the purple membrane of haloarchaea species members, usually encountered in hypersaline environments [147, 148]. A possible problem, or perhaps an advantage, can be connected with the natural evolution of the biological spices, depending on the ambient conditions, and the resulting change in their properties. Otherwise spectral sensitivity in the range of 520–640 nm along with very high S values – exceeding 106 cm/J, are obtained. The spatial resolution is higher than 1000 lines/mm. Also more than 106 write-erase cycles have been achieved. In order to illustrate and compare a part of the material types presented in the text, some of their holographic characteristics are adduced in Table 16.1. Both materials for permanent and dynamic (reversible and switchable) holographic recording are considered. Again, we should emphasize here the important application of holographic optical elements (both permanent and switchable) in the area of 3DTV. According to the statement of the 3DTV NoE project coordinator Levent Onural in a BBC interview, the expectation for pure holographic television realization is within the next 10–15 years. From the other hand, autostereoscopic displays for 3DTV are commercially available. Thus, the main challenge is to realize multiple viewing-zones screen, where a holographic technique has certain advantages over the lenticular systems.
Material/ Effect
Vol/ Surf
Sensitivity
Storage density
Spectral range, nm
S cm/J
Lines/mm
Response time *
Driving voltage**
∆n
Thickness
Stability
µm
Rewritable/ Temp range n˚ of read ˚C cycles
Lifetime
PERMANENT STORAGE Silver Halide Dichromated gelatin Photopolymers
V/S V
< 1100 < 700
> 1100 ∼ 100
up to 10000 > 5000
0.02 0.022
7–20 15–35
no no
< 100 < 200
years years
V
514, 532, 650–670
0.56.7 103
> 5000
0.012
5–500
no
< 100∗∗∗
> 10 years
DYNAMIC RECORDING LiNbO3
V
0.02–0.1 up to 40
> 2000
∼ kV/cm
2 10−3
> 10000
yes
< 500
years
30–3000
> 2000 > 2000
0.1–20 s 1 ms−1 s
∼ kV/cm ∼ kV/cm
−3
10 10−4
> 10000 > 10000
yes yes
years years
1000–5000
> 2000
0.5–500 ms
∼ kV/cm
3 10−4
> 10000
yes
< 450 > 50 < 200 < 66
LiTaO3 KNbO3
V V
350–650 800–1000 300–550 400–900
Sn2 P2 S6
V
550–1100
0.5–20 s
years
(continued)
16 Materials for Holographic 3DTV Display Applications
Table 16.1. Comparison between some holographic characteristics of different recording materials
591
592
Material/ Effect
Vol/ Surf
Sensitivity
Storage density
Response Driving time * voltage**
∆n
Thickness
Stability
µm
Temp range ˚C < 80 − −120∗∗∗
Lifetime
Spectral range, nm
S cm/J
Lines/ mm
S/V
488,514, 532, 633
102
> 6000
102 s
0.1
2–10
Rewritable/ n˚ of read cycles yes
S/V
244–532
102
> 3000
102 s
0.1
3–5
yes
< 80 − 120∗∗∗
years
S/V
vis
3 102
> 1600
ms
10−3
> 100
> 106
years
PDLC PIPS / TIPS / Photorefraction
S/V
360–532 770–870
> 3 103
> 6000
ms
∼ 10 V/µm
0.05
20–100
Dye-doped nematic
S/V
440–514
3 103
0> 1000
ms
0.1 V/µm (PR)
0.1
10–20
no PIPS/ TIPS yes PR yes
< 45 − 100∗∗∗ > t40–t10 < 50 − 100
< 48 − 95
years
Bacteriorhodopsin in gelatine matrix
V/S
520–640
4.7·106
> 1000
ms
2·10−3
30–40
> 106
–20/40
> 10 years
Azobenzene LC and amorphous polymers / photo-isomerisation Azobenzene LC and amorphous polymers / surface reliefs Photochromics
years
years
*Here for dynamic media only (for materials exhibiting permanent storage it is usually the time to obtain any diffracted signal from the hologram) **Where electric field is employed to switch the structure ***Strongly dependent on the molecular weight and the polymer type.
K. Beev et al.
Table 16.2. (Continued)
16 Materials for Holographic 3DTV Display Applications
593
Most probably, in particular the investigations in the field of materials for display and switchable diffractive devices in near feature should be concentrated on nano-particle-liquid crystal composites. The current development of nano-particles dispersions has shown excellent holographic characteristics [149, 150]. The main advantage is the possibility to obtain extremely high refractive index modulation, since materials like TiO2 have refractive index value of almost 3. Also, low shrinkage and good sensitivity are obtained. In general, the process is similar to HPDLC grating formation – the photopolymerization process initiates mass transfer of the components. The challenge is to combine low-energy consuming liquid crystal devices with the possibility to enhance the modulation by nano-particles redistributions and to obtain reversible diffractive structures formation.
Acknowledgement This work is supported by EC within FP6 under Grant 511568 with the acronym 3DTV.
References 1. V. Sainov, E. Stoykova, L. Onural, H. Ozaktas, Proc. SPIE, 6252, 62521C (2006). 2. J.R. Thayn, J. Ghrayeb, D.G. Hopper, Proc. SPIE, 3690, 180 (1999). 3. I. Sexton, P. Surman, IEEE Signal Process., 16, 85 (1999). 4. J. Kollin, S. Benton, M.L. Jepsen, Proc. SPIE, 1136, 178 (1989). 5. J. Thayn, J. Ghrayeb, D. Hopper, Proc. SPIE, 3690, 180 (1999). 6. T. Shimobaba, T. Ito, Opt. Rev., 10, 339 (2003). 7. D. Dudley, W. Duncan, J. Slaughter, Proc. SPIE, 4985, 14 (2003). 8. www.holographicimaging.com. 9. www.qinetiq.com. 10. T.J. Bunning, L.V. Natarajan, V. Tondiglia, R.L. Sutherland, Annu. Rev. Mater. Sci., 30, 83 (2000). 11. V. Sainov, N. Mechkarov, A. Shulev, W. De Waele, J. Degrieck, P. Boone, Proc. SPIE, 5226, 204 (2003). 12. S. Guntaka, V.Sainov, V. Toal, S. Martin, T. Petrova, J. Harizanova, J. Opt. A: Pure Appl. Opt., 8, 182 (2006). 13. H. Coufal, D. Psaltis, G. Sincerbox, Holographic Data Storage, Springer: Berlin, (2000). 14. K. Tanaka, K. Kato, S. Tsuru, S. Sakai, J. Soc. Inf. Disp., 2, 37 (1994). 15. R. Collier, C. Burckhardt, L. Lin, Optical Holography, Academic Press: New York, London (1971). 16. T. Petrova, P. Popov, E. Jordanova, S. Sainov, Opt. Mater., 5, (1996). 17. H. Bjelkhagen, Silver Halide Recording Materials for Holography and Their Processing, Springer-Verlag, Heidelberg, New York (1995); Vol. 66.
594
K. Beev et al.
18. Ts. Petrova, N. Tomova, V. Dragostinova, S. Ossikovska, V. Sainov, Proc. SPIE, 6252, 155 (2006). 19. K. Buse, Appl. Phys. B, 64, 391 (1997). 20. V. Mogilnai, Polymeric Photosensitive Materials And Their Application (in Russian), BGU: (2003). 21. G. Ponce, Tsv. Petrova, N. Tomova, V. Dragostinova, T. Todorov, L. Nikolova, J. Opt. A: Pure Appl. Opt., 6, 324 (2004). 22. T. Yamamoto, M. Hasegawa, A. Kanazawa, T. Shiono, T. Ikeda, J. Mater. Chem., 10, 337 (2000). 23. S. Blaya, L. Carretero, R. Madrigal, A. Fimia, Opt. Mater., 23, 529 (2003). 24. P.S. Drzaic, Liquid Crystal Dispersions, World Scientific: Singapore, (1995). 25. O. Ostroverkhova, W. E. Moerner, Chem. Rev., 104, 3267 (2004). 26. G.P. Wiederrecht, Annu. Rev. Mater. Res., 31, 139 (2001). 27. A. Ashkin, G.D. Boyd, J.M. Dziedzic, R.G. Smith, A.A. Ballman, et al., Appl. Phys. Lett., 9, 72 (1966). 28. C.R. Giuliano, Phys. Today, April, 27 (1981). 29. K. Buse, Appl. Phys. B, 64, 273 (1997). 30. W. Moerner, S. Silence, F. Hache, G. Bjorklund, J. Opt. Soc. Am. B, 11, 320 (1996). 31. P. Gunter, J. Huignard, Photorefractive Effects and Materials, Springer-Verlag: New York, (1988); Vol. 61–62. 32. J. Amodei, W. Phillips, D. Staebler, Appl. Opt., 11, 390 (1972). 33. G. Peterson, A. Glass, T. Negran, Appl. Phys. Lett., 19, 130 (1971). 34. E. Kr¨ atzig, K. Buse, Two-Step Recording in Photorefractive Crystals, In Photorefractive Materials and their Applications, P. G¨ unter, J. P. Huignard. (Ed.) Springer-Verlag: Berlin, Heidelberg (2006). 35. H. Kr¨ ose, R. Scharfschwerdt, O.F. Schirmer, H. Hesse, Appl. Phys. B, 61, 1 (1995). 36. A.A. Grabar, I.V. Kedyk, M.I. Gurzan, I.M. Stoika, A.A. Molnar,Yu.M. Vysochanskii, Opt. Commun., 188, 187 (2001). 37. V. Marinova, M. Hsieh, S. Lin, K. Hsu, Opt. Commun., 203, 377 (2003). 38. V. Marinova, Opt. Mater., 15, 149 (2000). 39. V. Marinova, M. Veleva, D. Petrova, I. Kourmoulis, D. Papazoglou, A. Apostolidis, E. Vanidhis, N. Deliolanis, J. Appl. Phys, 89, 2686 (2001). 40. K. Buse, H. Hesse, U. van Stevendaal, S. Loheide, D. Sabbert, E. Kratzig, Appl. Phys. A, 59, 563 (1994). 41. F. Simoni, Nonlinear Optical Properties of Liquid Crystal and Polymer Dispersed Liquid Crystals, World Scientific: Singapore, (1997). 42. W.H. de Jeu, Physical Properties of Liquid Crystalline Materials, Gordon and Breach: New York, (1980). 43. I.C. Khoo, F. Simoni, Physics of Liquid Crystalline Materials, Gordon and Breach: Philadelphia, (1991). 44. P.G. de Gennes, The Physics of Liquid Crystals, Oxford University Press, London, (1974). 45. L. Blinov, Electro and Magnitooptics of Liquid Crystals, Nauka: Moscow, (1978). 46. S. Slussarenko, O. Francescangeli, F. Simoni, Y. Reznikov, Appl. Phys. Lett., 71, 3613 (1997). 47. F. Simoni, O. Francescangeli, Y. Reznikov, S. Slussarenko, Opt. Lett., 22, 549 (1997).
16 Materials for Holographic 3DTV Display Applications
595
48. H. Ono, T. Sasaki, A. Emoto, N. Kawatsuki, E. Uchida, Opt. Lett., 30, 1950 (2005). 49. I. Khoo, IEEE J. Quantum Electron., 32, 525 (1996). 50. Y. Wang, G. Carlisle, J. Mater. Sci: Mater. Electron., 13, 173 (2002). 51. T. Kosa, I. Janossy, Opt. Lett., 20, 1230 (1995). 52. T. Galstyan, B. Saad, M. Denariez-Roberge, J. Chem. Phys., 107, 9319 (1997). 53. I. Khoo, S. Slussarenko, B. Guenther, M. Shin, P. Chen, W. Wood, Opt. Lett., 23, 253 (1998). 54. S. Martin, C. Feely, V. Toal, Appl. Opt., 36, 5757 (1997). 55. T. Yamamoto, M. Hasegawa, A. Kanazawa, T. Shiono, T. Ikeda, J. Mater. Chem., 10, 337 (2000). 56. L. Ren, L. Liu, D. Liu, J. Zu, Z. Luan, Opt. Lett., 29, 186 (2003). 57. W. Yan, Y. Kong, L. Shi, L. Sun, H. Liu, X. Li, Di. Zhao, J. Xu, S. Chen, L. Zhang, Z. Huang, S. Liu, G. Zhang, Appl. Opt., 45, 2453 (2006). 58. X. Yue, A. Adibi, T. Hudson, K. Buse, D. Psaltis, J. Appl. Phys, 87, 4051 (2000). 59. Q. Li, X. Zhen, Y. Xu, Appl. Opt., 44, 4569 (2005). 60. Y. Guo, L. Liu, D. Liu, S. Deng, Y. Zhi, Appl. Opt., 44, 7106 (2005). 61. M. Muller, E. Soergel, K. Buse, Appl. Opt., 43, 6344 (2004). 62. H. Eggert, J. Imbrock, C. B¨ aumer, H. Hesse, E. Kr¨ atzig, Opt. Lett., 28, 1975 (2003). 63. V. Marinova, S. Lin, K. Hsu, M. Hsien, M. Gospodinov, V. Sainov, J. Mater. Sci: Mater. Electron., 14, 857 (2003). 64. J. Carns, G. Cook, M. Saleh, S. Guha, S. Holmstrom, D. Evans, Appl. Opt., 44, 7452 (2005). 65. M. Ellabban, M. Fally, R. Rupp, L. Kovacs, Opt. Express, 14, 593 (2006). 66. K. Sutter, P. Gunter, J. Opt. Soc. Am. B, 7, 2274 (1990). 67. W. Moerner, A. Grunnet-Jepsen, C. Thompson, Annu. Rev. Mater. Res., 27, 585 (1997). 68. J. Hulliger, K. Sutter, R. Schlesser, P. Gunter, Opt. Lett., 18, 778 (1993). 69. G. Knopfle, C. Bosshard, R. Schlesser, P. Gunter, IEEE J. Quantom Electron., 30, 1303 (1994). 70. S. Ducharme, J. Scott, R. Twieg, W. Moerner, Phys. Rew. Lett., 66, 1846 (1991). 71. L. Yu, W. Chan, Z. Bao, S. Cao, Macromolecules, 26, 2216 (1992). 72. B. Kippelen, K. Tamura, N. Peyghambarian, A. Padias, H. Hall, Phys. Rew. B, 48, 10710 (1993). 73. G. Malliaras, V. Krasnikov, H. Bolink, G. Hadziioannou, Appl. Phys. Lett., 66, 1038 (1995). 74. G. Valley, F. Lam, Photorefractive Materials and Their Applications I, P. Gunter, J. Huignard. (Ed.) Springer Verlag: Berlin, (1988). 75. J. Schildkraut, A. Buettner, J. Appl. Phys, 72, 1888 (1992). 76. E. Rudenko, A. Shukhov, J. Exp. Theor. Phys. Lett., 59, 142 (1994). 77. I. Khoo, H. Li, Y. Liang, Opt. Lett., 19, 1723 (1994). 78. I. Khoo, Liquid Crystals: Physical Properties and Nonlinear Optical Phenomena, Wiley: New York, (1995). 79. N. Tabiryan, A. Sukhov, B. Zeldovich, Mol. Cryst. Liq. Cryst., 136, 1 (1986). 80. G. Wiederrecht, B. Yoon, M. Wasielewski, Science, 270, 1794 (1995). 81. G. Wiederrecht, B. Yoon, M. Wasielewski, Science, 270, 1794 (1995).
596
K. Beev et al.
82. H. Ono, N. Kawatsuki, Opt. Lett., 24, 130 (1999). 83. H. Ono, T. Kawamura, N. Frias, K. Kitamura, N. Kawatsuki, H. Norisada, Adv. Mater., 12, (2000). 84. H. Ono, A. Hanazawa, T. Kawamura, H. Norisada, N. Kawatsuki, J. Appl. Phys, 86, 1785 (1999). 85. K. Law, Chem. Rev., 93, 449 (1993). 86. S. Bartkiewicz, K. Matczyszyn, K. Janus, Real time holography - materials and applications, In EXPO 2000, Hannover, (2000). 87. I. Janossy, A.D. Lloyd, B. Wherrett, Mol. Cryst. Liq. Cryst., 179, 1 (1990). 88. K. Ichimura, Chem. Rev., 100, 1847 (2000). 89. H. Ono, T. Sasaki, A. Emoto, N. Kawatsuki, E. Uchida, Opt. Lett., 30, 1950 (2005). 90. B. Senyuk, I. Smalyukh, O. Lavrentovich, Opt. Lett., 30, 349 (2005). 91. H. Sarkissian, S. Serak, N. Tabiryan, L. Glebov, V. Rotar, B. Zeldovich, Opt. Lett., 31, 2248 (2006). 92. I. Khoo, Opt. Lett., 20, 2137 (1995). 93. W. Lee, C. Chiu, Opt. Lett., 26, 521 (2001). 94. L. Dhar, MRS Bulletin, 31, 324 (2006). 95. A. Osman, M. Fischer, P. Blanche, M. Dumont, Synth. Metals, 115, 139 (2000). 96. J. Delaire, K. Nakatani, Chem. Rev., 100, 1817 (2000). 97. A. Sobolewska, A. Miniewicz, E. Grabiec, D. Sek, Cent. Eur. J. Chem., 4, 266 (2006). 98. A. Natansohn, P. Rochon, Photoinduced motions in azobenzene-based polymers, In Photoreactive Organic Thin Films, Z. Sekkat, W. Knoll. (Ed.) Academic Press, San Diego, (2002); pp. 399. 99. P. Lefin, C. Fiorini, J. Nunzi, Opt. Mater., 9, 323 (1998). 100. C. Barrett, A. Natansohn, P. Rochon, J. Chem. Phys., 109, 1505 (1998). 101. I. Naydenova, L. Nikolova, T. Todorov, N. Holme, P. Ramanujam, S. Hvilsted, J. Opt. Soc. Am. B, 15, 1257 (1998). 102. O. Baldus, S. Zilker, Appl. Phys. B, 72, 425 (2001). 103. J. Kumar, L. Li, X. Jiang, D. Kim, T. Lee, S. Tripathy, Appl. Phys. Lett., 72, 2096 (1998). 104. E. Kim, J. Park, S. Cho, N. Kim, J. Kim, ETRI J., 25, 253 (2003). 105. J.L. Fergason, SID Int. Symp. Dig. Tech. Pap., 16, 68 (1985). 106. J.W. Doane, N.A. Vaz, B.-G. Wu, S. Zumer, Appl. Phys. Lett., 48, 269 (1986). 107. P.S. Drzaic, J. Appl. Phys., 60, 2142 (1986). 108. T.J. Bunning, L.V. Natarajan, V.P. Tondiglia, G. Dougherty, R.L. Sutherland, J. Polym. Sci., Part B: Polym. Phys., 35, 2825 (1997). 109. T.J. Bunning, L.V. Natarajan, V. Tondiglia, R.L. Sutherland, Polymer, 36, 2699 (1995). 110. P.G. De Gennes, J. Prost, The Physics of Liquid Crystals, (2nd ed.), Oxford University Press: New York, (1993). ˇ 111. J.W. Doane, N.A. Vaz, B.-G. Wu, S. Zumer, Appl. Phys. Lett., 48, 269 (1986). 112. G. Montgomery, J. Nuno, A. Vaz, Appl. Opt., 26, 738 (1987). 113. R.L. Sutherland, L.V. Natarajan, V.P. Tondiglia, T.J. Bunning, Chem. Mater., 5, 1533 (1993). 114. R. Pogue, R. Sutherland, M. Schmitt, L. Natarajan, S. Siwecki, V. Tondiglia, T. Bunning, Appl. Spectroscopy, 54, 12A (2000).
16 Materials for Holographic 3DTV Display Applications 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146.
597
D. Neckers, J. Chem. Ed., 64, 649 (1987). D. Neckers, J. Photochem. Photobiol., A: Chem., 47, 1 (1989). K. Tanaka, K. Kato, M. Date, Jpn. J. Appl. Phys., 38, L277 (1999). T.J. Bunning, L.V. Natarajan, V.P. Tondiglia, G. Dougherty, R.L. Sutherland, J. Polym. Sci., Part B: Polym. Phys., 35, 2825 (1997). R. Ondris-Crawford, E.P. Boyko, B.G. Wagner, J.H. Erdmann, S. Zumer, J.W. Doane, J. Appl. Phys., 69, 6380 (1991). T. Bunning, L. Natarajan, V. Tondiglia, R. Sutherland, D. Vezie, W. Adams, Polymer, 36, 2699 (1995). G. De Filpo, J. Lanzo, F.P. Nicoletta, G. Chidichimo, J. Appl. Phys., 84, 3581 (1998). L. Petti, G. Abbate, W.J. Blau, D. Mancarella, P. Mormile, Mol. Cryst. Liq. Cryst., 375, 785 (2002). D.R. Cairns, C.C. Bowley, S. Danworaphong, A.K. Fontecchio, G.P. Crawford, Le Li, S. Faris, Appl. Phys. Lett., 77, 2677 (2000). A. Mertelj, L. Spindler, M. Copic, Phys. Rev. E., 56, 549 (1997). R. Sutherland, V. Tondiglia, L. Natarajan, S Chandra, T. Bunning, Opt. Express, 10, 1074 (2002). L. Criante, K. Beev, D.E. Lucchetta, F. Simoni, S. Frohmann, S. Orlic, Proc. SPIE, 5939, 61 (2005). G. Crawford, Optics and Photonics News, April, 54 (2003). H. Ren, Y. Fan, S. Wu, Appl. Phys. Lett., 83, 1515 (2003). R. Sutherland, L. Natarajan, V. Tondiglia, T. Bunning, Proc. SPIE, 3421, 8 (1998). L. Luccheti, S. Bella, F. Simoni, Liq. Cryst., 29, 515 (2002). D. Lucchetta, L. Criante, O. Francescangeli, F. Simoni, Appl. Phys. Lett., 84, 4893 (2004). D.E. Lucchetta, R. Karapinar, A. Manni, F. Simoni, J. Appl. Phys., 91, 6060 (2002). T. Fiske, L. Silverstein, J. Colegrove, H. Yuan, SID Int. Symp. Dig. Tech. Pap., 31, 1134 (2000). T. Suhara, H. Nishihara, J. Koyama, Opt. Commun., 19, 353 (1976). H. Xianyu, J. Qi, R. Cohn, G. Crawford, Opt. Lett., 28, 792 (2003). K. Beev, L. Criante, D. Lucchetta, F. Simoni, S. Sainov, J. Opt. A: Pure Appl. Opt., 8, 205 (2006). K. Beev, L. Criante, D. Lucchetta, F. Simoni, S. Sainov, Opt. Commun., 260, 192 (2006). H. Ono, N. Kawatsuki, Opt. Lett., 22, 1144 (1997). A. Golemme, B. Volodin, E. Kippelen, N. Peyghambarian, Opt. Lett., 22, 1226 (1997). N. Yoshimoto, S. Morino, M. Nakagawa, K. Ichimura, Opt. Lett., 27, 182 (2002). J. Winiarz, P. Prasad, Opt. Lett., 27, 1330 (2002). R. Termine, A. Golemme, Opt. Lett., 26, 1001 (2001). H. Ono, H. Shimokawa, A. Emoto, N. Kawatsuki, Polymer, 44, 7971 (2003). G. Cipparrone, A. Mazzulla, P. Pagliusi, Opt. Commun., 185, 171 (2000). H. Ono, H. Shimokawa, A. Emoto, N. Kawatsuki, J. Appl. Phys, 94, 23 (2003). A. Golemme, B. Kippelen, N. Peyghambarian,, Chem. Phys. Lett., 319, 655 (2000).
598
K. Beev et al.
147. B. Yao, Z. Ren, N. Menke, Y. Wang, Y. Zheng, M. Lei, G. Chen, N. Hampp, Appl. Opt., 44, 7344 (2005). 148. A. Fimia, P. Acebal, A. Murciano, S. Blaya, L. Carretero, M. Ulibarrena, R. Aleman, M. Gomariz, I. Meseguer, Opt. Express, 11, 3438 (2003). 149. N. Suzuki, Y. Tomita, Jpn. J. Appl. Phys., 42, L927 (2003). 150. Y. Tomita, N. Suzuki, K. Chikama, Opt. Lett., 30, 839 (2005).
17 Three-dimensional Television: Consumer, Social, and Gender Issues Haldun M. Ozaktas Bilkent University, TR-06800 Bilkent, Ankara, Turkey
This chapter is based on a series of discussions which were planned and carried out within the scope of the Integrated Three-Dimensional Television— Capture, Transmission, and Display project, which is a Network of Excellence (NoE) funded by the European Commission 6th Framework Information Society Technologies Programme. The project involves more than 180 researchers in 19 partner institutions from 7 countries throughout Europe and extends over the period from September 2004 to August 2008. The scope of the discussions encompassed consumer expectations and behavior, including current perceptions of three-dimensional television (3DTV), its potential for novelty and mass consumption, and other consumer and nonconsumer applications and markets for the technology. Other areas discussed included the social dimensions of 3DTV in both consumer and non-consumer spheres, and how it compares with other high-impact technologies. Gender related issues were also discussed to some degree. Neither the manner in which the discussions were conducted, nor the way in which they were processed, was based on a scientific methodology. All discussions were informal in nature, with the moderator periodically putting up discussion points to raise new issues or focus the discussion. Most participants in these discussions were technical professionals or academicians with backgrounds in engineering and science, who were members of the Network of Excellence. A number of professionals from other areas also enriched the discussions and a small sample of laypersons and potential consumers were interviewed briefly. Our reporting here by no means represents a sequential record of the live and e-mail discussions which spanned a period of over two years. Opinions provided at different times and places were montaged thematically to achieve a unified presentation and were heavily edited. The discussions given here may seem naive (or worse, misguided) to social scientists with more sophisticated skills and more experience in thinking about such issues. Our hope is that if the content of this chapter does not actually illuminate the issues within its scope, it may at least shed light on the level of thinking and the concerns of those who are actively developing the technology.
600
H. M. Ozaktas
In this case, we hope that this chapter will serve as an insider record of the ruminations of the developers of a technology about its implications, during an intermediate stage of its development. It will certainly be interesting to consider these in retrospect ten or twenty years from now.
Part I: Introduction 17.1 Introduction It would certainly be a mistake to look upon three-dimensional television (3DTV) as solely the latest in the line of media technologies from radio to black-and-white television to color television, although therein may lie its greatest economic potential. Complicated chains and networks of causality underly the interaction between many technologies and society. It is important to distinguish between the impacts of social and technological entities, although they are intimately related. Television as a social institution has been thoroughly discussed, generally with a negative tone. Totally different however is the legacy of television in the sense of broadcasting technology, or in the sense of the cathode ray tube (CRT), the central technology in conventional television. This technology has been perfected for consumer television units, but today finds many applications, most notably in computer display terminals. Indeed, it can be argued that the CRT has had a greater impact in computing than in television. (Ironically, the liquid crystal display (LCD) found a place in computing first, and then later in television sets.) Lastly, it is important to make a distinction between 3D displays and 3D television (3DTV). Here we use the term 3D display to refer to imaging devices which create 3D visual output. 3DTV refers to the whole chain of 3D image acquisition, encoding, transport/broadcasting, reception, as well as display. We must also be cautious when referring to the impact of a technology on society, as it implies that there is only one-way causation; technology may have an impact on society but society also has an effect on technology. Such considerations complicate any prediction regarding the impact of 3DTV. However it seems very likely that it will have an important impact. Home video, cable, broadcast, and games are potentially highly rewarding areas for earlyentrance companies, since it may take a while before the technology can be emulated by others. Widespread public acceptance of this technology is very difficult to predict and will depend largely on the quality attained. If only mediocre quality is feasible, market penetration may be shallow and short lived, relying more on novelty aspects which are known to wear off quickly. People may prefer a high-quality two-dimensional image to a medium quality three-dimensional one, especially if there are limitations on viewing angle, contrast, equipment size, and cost. Even so, three-dimensional television has been so heavily portrayed in film and fiction that a significant number of consumers may show interest despite possible shortcomings. On the other hand,
17 Three-dimensional Television: Consumer, Social, and Gender Issues
601
if reasonably high quality can be attained, even at an initially high price, it is possible, and indeed likely that the technology may supplant ordinary television in at least some contexts. The potential consumer market should not blind one to the opportunities in other more specialized applications. Most of these will not demand as high a quality as consumer applications, and may involve customers willing to pay higher prices. While it is not clear that 3DTV would be widely used for computer display terminals, there are a wide variety of specialized applications. These may include sophisticated computer games, professional simulators and virtual reality systems, teleconferencing, special-purpose applications including scientific and industrial visualization, inspection, and control, medical visualization and remote diagnosis and treatment including telesurgery, environmental monitoring, remote operation in hazardous environments, air traffic control, architectural and urban applications, and virtual preservation of perishable objects of cultural heritage. If we accept that two-dimensional imaging and display technologies have had a positive impact in modern society, it seems almost certain that the above applications will produce a positive impact, even if 3DTV does not become a standard item in every home. For instance, the fact that people still travel to meet face-to-face is evidence that even modern teleconferencing cannot fully replace physical proximity. If 3DTV can come close enough, this would have a large impact on how meetings are conducted. This would not only include official and corporate meetings (reducing the cost of products and services to society), but also the meetings of civil society organizations, potentially increasing public participation at all levels. Three-dimensional television should not be seen in isolation from other trends in media technology, most importantly interactive or immersive technologies. Cliches holding television responsible for the drop in theater attendance or reading will gain new strength if such technologies become widespread. The main question will again focus on what the new technologies will replace/displace. In summary, the potential applications of the technology fall in two main categories: a three-dimensional replacement of present day television and a variety of specialized applications. The impact of the latter could be moderate to high benefits to society in economic and welfare terms. The impact of the former is less predictable but there is the potential for very high economic returns to those who own the technologies.
17.2 Historical Perspective I. Rakkolainen provided an extended account of pertinent historical observations, summarized at length in this section. He noted that many have dreamed of Holodeck- or StarWars-like 3D displays and that 3D images have attracted
602
H. M. Ozaktas
interest for over a century. The general public was excited about 3D stereophotographs in the 19th century, 3D movies in the 1950s, holography in the 1960s, and is now excited by 3D computer graphics and virtual reality (VR). The technology of 3D displays has deeply intrigued the media and the public. Hundreds of different principles, ideas, and products have been presented with potential applications to scientific visualization, medical imaging, telepresence, games, and 3D movies. The broad field of VR has driven the computer and optics industries to produce better head-mounted displays and other types of 3D displays; however most such VR efforts involve wearing obtrusive artifacts, an experience in stark contrast with the ease of watching TV. Immersion is an experience that encloses the user into a synthetically generated world. Contemporary 3D displays try to achieve this through elaborate schemes, but this is not only a matter of technology; the most important factor for immersion is not technical fidelity but the user’s attitude and possibly the skill of the content author. A theater scene or a novel can be quite “immersive” although it does not involve very advanced technology. Just before the first photographs were made in 1839, stereo viewing was invented. The first stereoscope tried to reproduce or imitate reality with the aid of an astonishing illusion of depth. A decade later, when less cumbersome viewing devices were developed, stereoscopic photography became popular. The stereo image pairs immersed the viewer in real scenes (they are still popular in the form of toys). Then, starting at the end of the 19th century moving pictures reproduced a world of illusion for the masses. The idea of synthetically-reproduced reality is not new and does not necessarily rely on digital technology. In 1860 the astronomer and scientist Herschel wrote about his vision of representing scenes in action and handing them down to posterity. Cinema and TV have somewhat fulfilled his vision. I. Rakkolainen went on to list a large number of popular mass-produced cameras of the late 19th century, each of which used slightly different technologies and designs with no standards: Academy, Brownie, Buckeye, Comfort, Compact, Cosmopolitan, Delta, Eclipse, Filmax, Frena, Harvard, Kamaret, Kodak, Kombi, Lilliput, Luzo, Nodark, Omnigraphe, Photake, Poco, Simplex, Takiv, Velographe, Verascope, Vive, Weno, Wizard, and Wonder. Only Kodak survived and became a huge business. The Kodak camera was by no means a superior technology. It used a roll film long enough for 100 negatives, but the key element to its success was perhaps that Kodak provided a photofinishing service for customers; apparently having to do the lab work was an obstacle for many. Rakkolainen believes that this resembles the current situation with 3DTV. The same enthusiasm that greeted photography, stereographs, and Lumiere brothers’ Cinematographe at the end of the 19th century, is now seen with 3DTV, virtual reality, and other related technologies.
17 Three-dimensional Television: Consumer, Social, and Gender Issues
603
Part II: Consumer Expectations and Behavior 17.3 Current Public Perceptions of “Three-dimensional Television” What do lay people think of when confronted with the phrase “three-dimensional television” ? This question was posed to people from different social and educational backgrounds. Among the brief answers collected we note the following: •
• •
People think of the image/scene jumping out, or somehow extending from the front of the screen. Although not everyone had seen Princess Leya projected by R2D2 in Star Wars, the idea of a crystal ball is widespread in folklore. However, most people seem to imagine a vertical display like conventional TV, rather than a horizontal tabletop scenario. So-called “three-dimensional” computer games which are not truly threedimensional, but where the action takes place in a three-dimensional domain as opposed to early computer games which take place in “flatland.” “Nothing.”
M. Kunter noted that some thought of 3DTV as a full 3D projection of objects into the room, but nobody referred to the “Holodeck” scenario (being and acting in a virtual reality environment). This may be connected to A. Boev’s distinction between what he refers to as convergent and divergent 3D displays: He defines convergent 3D as the case where the user stays outside the presentation; the presentation can be seen from different points of view, like observing a statue or attending the theater. He defines divergent 3D as the case where the user is inside the presentation, and is able to look around and change points of view. Boev noted that this is often compared to an immersive multimedia-type of game, and is in some ways like listening to radio theater which also puts the user in a similar state of mind of “being inside” the presentation. The observation that a radio play makes one feel inside, compared to TV where one feels outside, seems very important; the perception of insideness, which is considered an aspect of realism, does not necessarily increase with the amount of information conveyed. According to I. Rakkolainen, 3DTV may take many different forms; it may be similar to today’s TVs but with 3D enhancements, IMAX-like partially immersive home projection screens, tracking head-mounted displays, holographic displays, or perhaps “immaterial” images floating in the air. He emphasized that we should be open minded about the possibilities. A. Boev noted that the very use of the term 3DTV was limiting in that it forced people to think of a box, and excluded other modalities. M. Karam¨ uft¨ uo˘ glu asked whether 3DTV would be immersive and/or interactive, or merely offer depth information. He also noted that the TV and computer box might disappear, with all such technologies converging to an ubiquitous, pervasive presence.
604
H. M. Ozaktas
¨ M. Ozkan told an interesting anecdote exemplifying the power of media and marketing: He had asked TV sales staff in electronics stores whether they had heard about 3DTV and amazingly they said that it was “coming soon.” And what they were referring to was not any stereoscopic display, but a device displaying miniature football players on a table-like surface; they had seen it on a TV program featuring the 3DTV NoE! Furthermore, they linked this “near-future product” to recent price cuts in plasma and LCD TV screens. This anecdote is powerful evidence of how certain images can capture the public imagination. H. M. Ozaktas recalled that one US telephone commercial from about ten years ago, showed a family reunion for a child’s birthday party taking place through teleconferencing. The image took up a whole wall, making it seem that the remote participants were in the other half of the room. Clearly, the makers of the commercial were trying to similarly capture the imagination of their audience. F. Porikli observed that there is an imagination gap between the generation who grew up watching Star Wars episodes and earlier generations. People who have watched IMAX movies tended to imagine 3DTV as a miniature version of the movie theater in their homes. Younger generations are more open to the idea of a holographic tabletop display. In any event, people imagine that they will be able to move freely in the environment and still perceive the content in full 3D (which can lead to dissapointment if the viewer position needs to be restricted). Since conventional TV viewing is a passive activity, people do not usually have the expectation that they should be able to interact with the scene or have any effect on the program they are watching.
17.4 Lay Persons’ Expectations What do lay people expect from such a product? What features, function, and quality do they expect ? Today people take for granted high-quality 2D images and it would be unrealistic to expect them to put up with even moderately lower quality images in 3DTV. If the images are not clear, crisp, or if they are hard to look at in any way, it is unlikely that people will watch it. For a significant amount of TV content, 2D screens are already realistic enough, as M. Kautzner noted. Although in a technical sense one may think that 3D is more “realistic” than 2D, that may be a fallacy. “Realisticness” is very psychological: a clear, crisp color image is very realistic to a lot of people watching TV or a film, whereas a 3D image which deviates even a little from this crispness and contrast may look terrible. Humans possibly do not really miss true 3D information, since they can deduce enough of it from the content. Human imagination is such that even if we see a reduced representation of reality, such as a black-and-white photo, a 2D image, or even a sketchy caricature, we can fill it in our minds and visualize its realistic counterpart. Black and white photos are quite realistic despite the loss of color information. Other than an arrow flying towards
17 Three-dimensional Television: Consumer, Social, and Gender Issues
605
you or a monster jumping at you (contrived actions familiar from the old colored-glass 3D films), it is not clear exactly what information 3DTV is going to convey that will be important to viewers. Thus if the only thing 3DTV has to offer is the novelty factor, it will not be a mass market. The opposite argument could be that, by the same token, people did not really need color information either; black-and-white TV was just fine, but color TV still caught on. Nevertheless, the introduction of color did not entail much sacrifice of quality; G. Ziegler remarked that 3DTV will have a difficult time if it is of lower quality and this will be all the more true if it is difficult to watch or strains the eyes. Some consumers expect to see the same kind of an aquarium as the contemporary TV is, but somehow conveying some sense of depth. Other consumers expect to be able to move around the display freely and be able to see the view from different angles. Another group of consumers totally lacks any vision of “true” 3D, and merely expect 3D graphics on a flat panel, as in current 3D computer games. And a significant group of consumers seems to have hardly any idea of what the term might imply. These observations imply that it may be important to educate potential consumers that the 3D we are talking about is something more than the 3D of a perspective drawing. F. Porikli noted that while non-entertainment users of 3DTV may be willing to forego several comfort or convenience features that are not pertinent to the application, the expectations of household entertainment consumers may be higher. People do not like the idea of wearing goggles or markers or beacons, and they certainly do not like having limited viewing positions or low resolution. Consistency of 3D image quality with respect to viewer motion and position is another important factor. As for price, Porikli believed that any display product costing over 5000 USD is not likely to be widely accepted. I. Rakkolainen argued that rather than trying to achieve a perfect 3D display, tricks and approximations must be used to obtain a reasonably priced and good-enough display for general use. Indeed, while R&D group A may be focused on “true” or “real” holographic reconstruction, R&D group B may get to market with a sloppy, pseudo-, quasi-, and a really not-deserving-the-name product which nevertheless satisfies these conditions. The question is, what aspects of 3DTV will be important and attractive to consumers and which will be irrelevant? Maybe true 3D parallax, and the ability to walk around which are hallmarks of true 3D may not matter; maybe people will be comfortable simply with more depth cues on a flat screen. J. Kim noted that in fact, in many cases 2D cues are sufficient for depth perception. N. Stefanoski commented on the issue of whether more information is always desirable. In some cases, conveying the maximum amount of information may be desirable (perhaps for sports events, teleconferencing, virtual shops), but in other cases there will be not much consumer desire to choose the viewing perspective. In fact in some cases fixing the perspective may be desired
606
H. M. Ozaktas
for artistic reasons or genre convention (hiding the face of the murderer in a mystery film). A. Boev noted the importance of studying consumer expectations. Although consumers are often “taught” what they need in the case of some products, for a novel and potentially expensive product it may be important to know what the buyers expect. The Nintendo Virtualboy was promoted as a “3D game system” which made people expect images floating in the air. When people realized it only works with glasses, almost everybody was heavily disappointed, and it was a failure. A. Boev also emphasized the importance of two-way compatibility. 3DTV sets should be able to display 2D programs and 2D sets should be able to display a 2D version of 3D programs. This would be a general expectation based on the history of the transition to color. F. Porikli also emphasized that at the very least, any 3DTV should be back-compatible to 2D content. D. Kaya-Mutlu noted that TV is here being conceived largely as a visual medium, as a conveyor of visual information, and the viewer’s relation to TV is being conceived mainly as a matter of decoding/processing visual information, as a matter of visual perception. This is understandable if we assume that the major contribution of 3DTV is taken as the enhancement of images. She pointed out that this misses other important components of TV content, such as talk, and more importantly other social functions beyond being an information conveyor—such as providing background sound, serving as an accompaniment, or as a means to structure unstructured home time. These are all functions of the household TV set and whether they will transfer to 3DTV may be important determinants.
17.5 Sources of Public Perceptions and Conceptions What past or present technologies or fictional sources have influenced people’s conceptions of such a technology ? Some of the answers collected were: • • • • • •
Colored (or polarized) stereo glasses. Three-dimensional IMAX movies and other theme park movies. Depiction of such technologies in science fiction movies and novels, such as Star Trek and Star Wars. Still holograms. 3DTV computer games or similar rendered objects on conventional TV. Virtual reality or augmented reality.
S. Fleck noted in particular that 3D theaters in Disneyland, Europa-Park, etc. and IMAX theaters might have had the greatest influence; 3D versions of Terminator 2 and The Muppet Show are popular examples.
17 Three-dimensional Television: Consumer, Social, and Gender Issues
607
17.6 Potential for Novelty Consumption Is there a novelty-oriented segment of the population willing to pay for expensive, relatively low-quality early consumer models ? R. Civanlar did not think that an early model of low quality regarding resolution, color, etc. would be acceptable; consumers are too accustomed to high resolution crisp images. However, low quality or restrictions on the 3D features may be acceptable since consumers have not yet developed high expectations in that regard yet. Audiences might at first rush to watch the new 3DTV tabletop football games, but the novelty would quickly fade after a couple of times and people would probably return to the comfort of their 2D sets. M. Kunter made a similar comment about IMAX theaters, which remain a tourist attraction but have never become established as cultural institutions like common movie theaters. G. Ziegler thought that there may be a subculture of science fiction enthusiasts who would gladly pay for initially expensive hardware—not so much for the content they would watch, but for the excitement of the experience they are familiar with from science fiction. He noted that 3D already has the status of a hobby with specialist suppliers such as www.stereo3d.com, which evaluates all kinds of exotic hardware from small and large companies. Purchasers of this equipment are not ordinary consumers but hobbyists who sometimes modify the hardware. Ultimately, however, this group is small and without large buying power. Ziegler also noted that certain rich urban singles often have an interest in such gadgets; for them the design is of paramount importance, even more so ¨ Sandık¸cı both thought that certain highthan the features. G. Ger and O. income customers might buy such a product for the sake of novelty if it were a status symbol; but they felt that such an outcome is socially divisive and not desirable. I. Rakkolainen emphasized that it might make more sense to target the early models at businesses rather than the consumer of novelties; businesses, the military, and other special applications customers can pay significantly greater amounts and take greater risks. He pointed out that some rich consumers might buy expensive technology if it gave them something new, fun and useful. But he wondered if there are enough such consumers. The same seems to be the case for non-mainstream customers who are so attracted by the novelty that they are willing to put up with low quality. (P. Surman was of the opinion that such populations are more likely to be motivated by being the first to own a product, rather than being thrilled by the novelty factor.) Therefore, focusing on non-consumer markets seems to be strategically more advantageous. Ziegler also noted that major companies like to use novel yet expensive technologies at fairs for promotional purposes. F. Porikli supported Rakkolainen, noting that some people pay huge sums for expensive artwork and hobby cars, so there is obviously a market for everything, but how big is
608
H. M. Ozaktas
that market? Without convincing content support, Porikli thought it unlikely that expensive 3DTV products will ever reach any but the richest people. He also emphasized the importance of the non-household market: research labs, assisted surgery and diagnosis in medical settings, military applications, and video conferencing. V. Skala believed that the hand game industry might be an engine for ¨ Y¨ future development. A. O. ontem also believed that there will be a significant demand for game consoles with 3D displays, generating considerable revenue. R. Ilieva pointed out that one option in introducing 3DTV to the masses would be an approach involving small changes to ordinary TV sets. H. M. Ozaktas noted that for instance, K. Iizuka of the University of Toronto has produced simple add-ons for cellular phones allowing them to transmit stereo images. Similar approaches may be technically possible for 3DTV, but it is not clear whether these would interest consumers. I. Rakkolainen reported a quotation from Alan Jones in the newsletter 3rd Dimension (www.veritasetvisus.com), Jones suggested that a new level of technology must drop to 5–10 times the price of its predecessor to get users interested; when the price drops to only double it starts getting widespread acceptance from early adopters. The price must fall to about 1.5 times the predecessor’s before it can become a truly mass product. In summary, Jones felt that the future for 3D displays is bright but they will not displace 2D because there will continue to be uses for which 3D is not necessary.
17.7 The Future Acceptance of Three-dimensional Television Will commercial three-dimensional television replace two-dimensional television, or will it remain as a novelty limited to only a certain fraction of consumers ? Many people seem to think that 3DTV may replace common television in the future. However, this thinking may reflect nothing more than a simpleminded extrapolation of a linear conception of progress moving from radio to television to color to 3D. It is important to understand the acceptance of new technologies in the context of competition between rival market forces. D. Kaya-Mutlu recounted that in the 1950s, when TV had become a serious alternative to cinema, the American movie industry introduced technological novelties to attract viewers back to the movie theatres. The first novelty was 3D movies but these did not have a longlasting impact on viewers. The special eyeglasses that were required were the main reason for audience resistance. This is why many present researchers consider it important to develop approaches which do not necessitate the wearing of such equipment. The next novelty was Cinerama, which created an illusion of three dimensionality without special eyeglasses. It garnered more public interest, but it too became a tourist attraction in a few places,
17 Three-dimensional Television: Consumer, Social, and Gender Issues
609
probably because it required a different and complex exhibition environment. Finally, it was the wide-angle process, Cinemascope, introduced by Twentieth Century Fox in 1953 that had the most longlasting effect among these novelties. Cinemascope movies offerred a wider screen image with color and stereo sound, and therefore contrasted sharply with the small, black and white TV image. Cinemascope movies also augmented the impression of telepresence. HDTV, which combines a large, high quality image with good quality sound, is an extension of this concept into the private home. Kaya-Mutlu thought that although it seems to be nothing more than a high-resolution version of ordinary 2DTV, HDTV could be a rival to 3DTV in the home market. G. Ger noted that the different phases of acceptance of a new technology must be carefully studied to avoid strategic mistakes. She gave several examples of failed technologies, such as 3D movies and the picture phone. The picture phone had been the subject of science fiction for a long time; the public was already familiar with the concept and was even anticipating it. It seemed like the logical next step; just as television followed radio, picture phones would follow ordinary phones. The engineers working on it probably thought that price was the only obstacle and that would surely drop in time. But it turned out that although people sometimes wanted to see the person they were talking to, more often they did not. Perhaps you are unshaven or without makeup during the weekend, or perhaps you do not want your body language to tell your boss that you are lying when you say you are too ill to come to work. Ger underlined that a technology is accepted if it fits well with the existing culture and needs of society. On the other hand, H. M. Ozaktas noted that the present acceptance and popularity of free internet-based video-telephony (most notably Skype) should make us rethink these explanations; this could provide a lot of new data regarding people’s acceptance and the factors underlying it. There are some useful questions: how do people use Skype, when do they prefer no video, when do they prefer to remain silent or inaccessible, how do they combine voice and the accompanying chat features. ¨ Sandık¸cı noted that about eighty percent of new products fail, mostly O. because of a lack of understanding of consumers and their needs. She also gave the example of 3D movies, noting that it had the image of being weird and juvenile kid stuff, which probably guaranteed its failure. She warned that the association of 3DTV with 3D movies could hurt the success of 3DTV. She talked of the need to think about how the technology will fit into the people’s lives. For instance, referring to the tabletop 3DTV scenario, she noted that in a typical living room layout, the TV is not in the middle of the room. Therefore either the technology may effect the way people furnish their living rooms, or if it asks for a major change from people, it may face resistance. D. Kaya-Mutlu had already noted the fallacy of conceiving of TV merely as a visual medium, and of the viewer’s relation to TV as merely a matter of decoding/processing visual information. Another important component of TV content is talk and many TV programs are talk-oriented. More importantly,
610
H. M. Ozaktas
ethnographic research on TV audiences (within a cultural studies framework) exploring the significance of TV in the everyday lives of families and housewives, has shown that TV has several social functions beyond being an information conveyor. This body of research has shown that the pleasures derived from TV content are not merely textual (which includes both the visual and the aural information). For example, James Lull, in his article “The Social Uses of Television” (1980) develops a typology of the uses of home TV, which are not directly related to the content of TV programs. Distinguishing between structural and relational uses of TV, Lull points to the use of TV as a background sound, as an accompaniment, and as a means by which family members, especially housewives, structure unstructured home time. Lull also discusses how TV regulates the relations between family members. It has also been shown that TV is watched in an unfocused manner, at the same time as conversation and other domestic activity. Another aspect of this unfocused watching is the growing practice of zapping among channels. Kaya-Mutlu ¨ Sandık¸cı was very right in pointing to the need to talk about said that O. how the technology will fit into the lives of people. Since 3DTV will cater to an audience whose expectations and viewing habits/styles have been shaped by 2DTV content, its realistic images may not be enough to attract a wide audience. She thought that 3DTV assumes a focused/attentive viewer while some evidence shows that many viewers watch TV in a distracted manner (for example there are housewives who “watch” TV without even looking at the screen; they are maybe more appropriately referred to as “TV listeners” rather than TV viewers). At the least, one can argue that 3DTV’s popularity may depend on a radical change in the viewing habits and styles of the majority of viewers. M. Karam¨ uft¨ uo˘ glu suggested that in order to avoid failure, it is important to talk with sociologists, philosophers, cultural theorists, and media artists. The essential ingredient of a successful commercial design is to iterate the design through interaction with potential consumers. He also noted the possibility of consumer resistance to obtrusive gadgets. One of the most important issues brought up in this context was that of content. A 3DTV set does not mean anything without 3D content. The content is the real product, the set is just the device needed to view it. What is to be sold is the content, not the set. For instance, a typical CD or DVD player costs much less than the cost of any reasonable CD or DVD collection. And for content to be produced there must be demand, which can come only from customers already owning 3DTV sets, creating a chicken-and-egg situation. Therefore, Y. Yardımcı speculated that even if there were a small group of customers attracted to novelty, and even if they could support the production of the sets, would they reach the threshold necessary to justify the production of content? On the other hand, Yardımcı also cited research showing that purchases of high definition (HD) television sets was rising at a faster rate than those receiving HD programming. This was paradoxically setting the stage for a boom in content production and thus the solution of the
17 Three-dimensional Television: Consumer, Social, and Gender Issues
611
chicken-and-egg problem. F. Porikli, on the other hand, noted that a lesson learned from HDTV acceptance was that without sufficient content, it is not realistic to expect people to make such an investment. R. Civanlar thought that acceptance will probably depend on the type of content. People may be willing to pay extra for 3D sports viewing. As for entertainment, special movies that use 3D effects would have to be produced. He mentioned a Sony theater in New York City that frequently shows high quality 3D feature films and is usually full even though the tickets are not cheap. On the other hand, although this particular theater has been around for ten years or so, no new ones have opened. He believes that Sony produces special movies for this theater, probably not to make money but for reasons of prestige and promotion. Another aspect of the content issue was brought forward by D. Kaya-Mutlu, who noted that each medium has its own esthetics. For example, Cinemascope promotes long shots instead of close-ups, whereas the small low-resolution TV screen promotes close-ups. That is partly why many major cinematic productions look crammed and are less pleasant to watch on TV. Kaya-Mutlu suggested that the growing popularity of HDTV is likely to prompt some changes in TV esthetics; these may also be valid for 3DTV. I. Rakkolainen suggested that 3DTV technology could be used with many different kinds of content. Apart from broadcast TV, some of the content categories could include 3D games, virtual reality, and web content. He also noted that an interim appliance might be an ordinary TV most of the time but be switchable to 3D, perhaps with less resolution, when there is a special broadcast. According to G. Ziegler, in order for big media companies to produce the content that would drive consumer demand, what is desperately needed is a common standard for 3DTV productions (this would at first be for stereo 3DTV systems). Naturally, standardization requires a certain degree of maturity of a technology; he noted that some of the new stereo movies available may set a de facto standard until better standards are agreed upon. F. Porikli believed that consumers will not stubbornly stick to 2DTV if the quality of 3DTV matches expectations. Even though transitions, such as to HDTV, are painful, people find it difficult to go back once they are accustomed to the higher quality content. Nevertheless, 2D displays will continue to be used in many applications due to their cost, their smaller size, and their robustness. Another issue is whether 3DTV would ever become standard or whether only selected special programs would be 3D. This may also depend on the restrictions and requirements 3D shooting brings to stage and set, an issue which does not seem to be widely discussed. Transition from radio to TV brought tremendous changes, whereas transition from monochrome to color brought only minor ones. If the requirements coming from 3D shooting are excessive, it might not be worth the trouble and cost for programs where it
612
H. M. Ozaktas
does not have a special appeal, and it may be limited to specific program categories including sports and certain film genres. J. Kim noted that since it was not yet clear what shape 3DTV will take, it was not easy to comment on public acceptance. 3DTV may evolve from stereoscopic 3DTV requiring special glasses, to multi-view autostereoscopic 3DTV, and finally to holographic 3DTV. Each type of 3DTV could engender a different response. For the first two types (which lay 3D functionality on top of existing 2D without replacing it) the primary determinant of acceptance will be how the added 3D video services fit users needs for specific content types. These first two types of 3DTV should be backward-compatible—users should be able to switch to 2D viewing without losing anything more than depth perception. They should also be able to handle 2D/3D hybrid content. 3DTV systems, especially early ones, might exhibit various distortions, which would induce psychological fatigue with extended viewing. J. Kim therefore predicted that only selected programs would be shown in 3D mode.
17.8 Other Consumer Applications of Three-dimensional Television Other than being a three-dimensional extension of common TV and video, what other consumer applications of 3DTV can you think of ? (games, hobbies, home movies, automotive, smart apartments, etc.) I. Rakkolainen believes that it is useful to distinguish between passive applications such as TV and video, navigation aids in cars, and interactive applications such as games. He also noted that the success of different applications will depend on the size and format of the displays that can be produced. He said that games and entertainment applications hold a lot of promise, because they can be adapted to many different display types. Indeed, many ¨ Y¨ consider games an important potential application of 3DTV. A. O. ontem believed that a 3D game console designed to be connected to a 3DTV set would be very attractive to consumers. F. Porikli also agreed that since current TV displays support games, hobbies, entertainment content etc., it was likely that 3D displays would also do so. While many understand these to be more realistic and immersive versions of existing computer games, G. Ziegler suggested several less immediate examples. He first noted the success of the IToy, a camera-based device with simple 2D motion tracking; this is used for games but it has other applications—it can be a personal training assistant that supervises your daily exercises. If optical 3D motion capture works reliably, such systems could easily be extended in exciting ways. He also noted the DDR (Dance Dance Revolution), the Japanese dancing game, as an example of consumer interest in such devices and activity games. Video conferencing is another potential application area. While many kinds of systems are already available for remote multi-party conferencing, they have still not replaced face-to-face meetings. Precisely what important features of
17 Three-dimensional Television: Consumer, Social, and Gender Issues
613
the face-to-face interaction are lost, and whether they can be provided by 3DTV, remain interesting questions. F. Porikli commented that just as Skype users find it difficult to go back making traditional phone calls, people who experience 3D teleconferencing may not be willing to go back to conventional teleconferencing. P. Surman noted that the display systems being developed have the capacity to present different images to different viewers and this could be exploited for certain purposes, such as for targeted advertising where a viewer is identified and an image intended specifically for that viewer is not seen by anyone else. This could work for more than one viewer. Such technology could also be used to block undesirable scenes from young viewers. A TV set which can display two channels simultaneously to viewers sitting in different spots has already been introduced, and marketed as a solution to family conflicts about which station to watch. G. Ziegler noted potential applications mixing the concepts of 3DTV and augmented reality, where multi-camera recordings are projected into augmented reality environments. He also noted that there were several possibilities, such as the use of a webcam or a head-mounted display for “mixed reality” 3DTV viewing. Other applications include a virtual tourist guide and a virtual apartment walk-through. J. Kim mentioned TV home shopping; people would like to see the goods as if they were in a store and would appreciate the added 3D depth perception and the ability to look around objects. Only the goods for sale need to be shown in 3D, against a 2D background that includes the host and any other information. This mode of 2D/3D hybrid presentation could also be used for other programs such as news, documentary etc. Applications of 3D displays to mobile phones was suggested by ¨ Y¨ A. O. ontem, who argued that consumers would like to see a miniature of the person they are talking to. In this context, he also proposed the intriguing idea that 3D displays may form the basis of 3D “touch screens,” although there are many questions about how to detect the operator’s finger positions and purposeful motions.
17.9 Non-consumer Markets for Three-dimensional Television What major markets other than the mass consumer market may arise? In other words, what could be the greatest non-consumer applications of 3DTV, in areas such as medicine, industry, scientific and professional applications, etc. Do these constitute a sizable market ? We have already noted I. Rakkolainen’s position that such technologies should initially target business customers in high-cost professional areas like medical, military, and industrial visualization, followed by medium-cost applications like marketing and advertising. P. Surman also noted that such
614
H. M. Ozaktas
applications constitute a sizeable market and that these niche markets can justify a more expensive product. This could be useful for the development of a commercial TV product as it could take ten years to develop an affordable TV display, but less time to produce a more expensive one. The niche mar¨ kets would drive the development. Likewise M. Ozkan believed that due to the cost of initial products, professional areas such as military training, industrial design, and medicine were likely early application areas. R. Ilieva, along with others, believes that there is considerable potential in the medicine, education, and science markets. While most agreed that industrial markets could tolerate higher prices, it was not clear that they would tolerate lower quality. T. Erdem noted that industrial applications may require even higher quality than consumer 3DTV applications. The consensus was that it depended on the application and could go either way. G. Ziegler noted that 3DTV research may have many spill-over effects, in areas such as image analysis and data compression. This could lead to advances in areas such as real-time camera calibration, industry-level multicamera synchronization, real-time stereo reconstruction, and motion tracking. For instance, classic marker-based motion tracking (also used in the movie industry) might become obsolete with the advent of more advanced markerless trackers that stem from the problem of 3D data generation for free viewpoint video (FVV) rendering. Other applications might include remote damage repair, space missions, spying and inspection operations, remote surgery, minimally-invasive surgery, and regrettably military operations such as remote-controlled armed robots. F. Porikli added remote piloting and virtual war-fields to the potential list of military applications. An interesting point was made regarding professional applications. In some professional areas, the existing values, norms, vested interests, or skill investments of practitioners may result in resistance to the technology. While most physicians are used to adapting to sophisticated new equipment, years of clinical training and experience with 2D images may make them resistant or uncomfortable working with 3D images. Also, their expectations of quality may be quite different than general consumers. As with many technologies, the issues of de-skilling and retraining arise. Many professionals learn over years of experience to “feel” the objects they work with, and when the technology changes, they cannot “feel” them any longer and feel almost disabled. As a specific example, S. Fleck noted that they have been doing research in the field of virtual endoscopy for years, and that they have asked surgeons about their opinion of 3D visualization. While about two-thirds said that they would appreciate such capabilities, it was important for them to be able to use any such technology in a hassle-free way with very low latency and high spatial resolution. They also insisted on maintaining the option of being able to fall back on the 2D visualization they were used to; they wanted to be sure that whatever extras the new technology might bring, they would not lose anything they were accustomed to. This is quite understandable given the
17 Three-dimensional Television: Consumer, Social, and Gender Issues
615
critical nature of their work. K. Ward thought that as a doctor herself, she observed great fear among the medical profession that new technology may not be as safe; anything that doctors do not have experience with feels less safe to them and they hesitate to risk a bad outcome for their patients. ¨ M. Ozkan noted another potential reason for resistance from the medical establishment, which in theory should greatly benefit from 3D visualization in both training and practice. He underlined the resistance to even lossless digital image compression techniques for fear of costly malpractice lawsuits, and so was pessimistic regarding the adoption of 3D techniques in practice, but thought they may be more acceptable in training, especially remote training. J. Kim reported on trials in Korea applying different kinds of information technology to medicine. He referred to two big issues: broadband network connection among remotely located hospitals and doctors for collaborative operation and treatment, and exploitation of 3D visualization technologies for education and real practice. Accurate 3D models of human organs, bones, and their 3D visualization would be very time- and cost-efficient in educating medical students. Doctor-to-doctor connections for collaborative operations is considered even more necessary and useful than doctor-patient connections for remote diagnosis and treatment. Kim believes the medical field will surely be one of the major beneficiaries of 3DTV. H. M. Ozaktas noted that many examples of resistance to new technology is available in consumer applications as well; a new car design with different positions of the brakes, accelerator, and gearstick, would not easily be accepted even if tests showed it was safer and gave the driver better control. Likewise, despite its clear inferiority, the QWERTY keyboard is still standard and very few people attempt to learn one of the available ergonomic keyboard layouts. A number of participants, including C. T¨ ur¨ un and V. Skala emphasized the education market. Skala gave several examples from three-dimensional geometry where students had difficulty visualizing shapes and concepts; 3DTV may help them improve these skills. Indeed, the traditional book culture as well as the more recent visual culture are both heavily invested in 2D habits of thinking. H. M. Ozaktas agreed that perception of 3D objects may be improved with the use of 3D imaging in education, but argued that the applications to education should not be limited to this, suggesting that we should be able to, for instance, show simulations of a vortex in fluid mechanics or the propagation of a wave in electromagnetics. However, even very low-tech 2D animations which can add a lot to understanding are not often used in educational settings, despite their availability. Ozaktas gave the example of simple animations or simulations of electromagnetic waves and how useful they could be, but noted that most electromagnetics courses do not include such simulations. He concluded that customary habits and possibly organizational obstacles may come before technical obstacles in such cases.
616
H. M. Ozaktas
Part III: Social Impact and Other Social Aspects 17.10 Impact Areas of Three-dimensional Television Will the greatest impact of 3DTV be in the form of consumer broadcasting and video (that is, the three-dimensional version of current TV and video), or will the greatest impact be in other areas such as medicine, industry, scientific applications, etc ? P. Surman believed that the greatest impact will be in the form of consumer broadcasting and video, since this will potentially be the most widespread application. I. Rakkolainen agreed that this may be the case in the long run; in the meantime the greatest impact will be in special experiences created by the entertainment industry with high-end equipment. He noted that there are already very low-cost head-mounted displays for PCs and game consoles, but they have not yet sold well, although they could become popular within 5–10 years. V. Skala also agreed that in the long run consumer 3DTV will have the greatest impact but meanwhile other professional areas will have a larger impact.
17.11 Social Impact of Three-dimensional Television Television is currently understood as being a social ill. Its negative effects, including those on children, have been widely documented and are considered to far outweigh its positive aspects. In this light, what will be the effect of 3DTV technology? Will it further such social ills? Will it have little effect? Can it offer anything to reduce these ills ? There is a vast literature regarding the negative effects of ordinary television on children. The negative effects mentioned include the conveying of a distorted picture of real life, excessive exposure to violence, obesity due to replacement of active play, unsociability due to replacement of social encounters, and negative developmental effects due to replacement of developmentally beneficial activities. In the early years, additional negative effects include negative influences on early brain development as a result of the replacement of real-person stimuli, and exposure to fast-paced imagery which affects the wiring of the brain, potentially leading to hyperactivity and attention deficit problems. S. Sainov noted that during their holographic exhibitions, children 2–5 years old and older people with minor mental deteriorations were very much impressed by 3D images; this suggests that the psychological impact of 3D images on TV screens should be taken into account. P. Surman noted that children are fascinated by 3D, suggesting this may be due to their greater ability to fuse stereo images. R. Ilieva commented that although TV is a social ill, it has also had important positive aspects; it has brought knowledge of the world to low income
17 Three-dimensional Television: Consumer, Social, and Gender Issues
617
people who cannot travel and do not have access to other sources of information. 3D technology can have a positive impact on science education but it is not clear how much 3D can add to the general information dissemination function of TV. P. Surman thinks it will have little effect, since there was no noticeable difference when color took over from monochrome. F. Porikli however thinks that both positive and negative impacts would be enhanced since 3DTV has the potential to become a more convincing and effective medium than conventional TV. A. Boev noted that a social ill is something that hinders the basic functions of society, such as socializing; by this definition, TV is a social ill, but Skype is not; playing computer games is a social ill, but writing in web-forums is not. He said that even reading too much could be a social ill. He argued that maybe the main feature which makes TV a social ill is its lack of interactivity. If TV was truly interactive (and went beyond just calling the TV host to answer questions), it would not be such a social ill. M. Karam¨ uft¨ uo˘ glu noted that this view can be criticized; for instance, some would argue that certain forms of Internet communication such as chats are poor substitutes for face-to-face human communication, that they distance people from their immediate family and friends, and actually have an antisocial effect. Karam¨ uft¨ uo˘ glu believes that 3DTV can be less of a social ill than present-day television only if it can be made to and is used to convey more human knowledge. This should involve bodily, embodied tactile interaction and immersion with affectivity and subjectivity. G. Ziegler pointed out that there has been a radical transformation in the social isolation associated with playing computer games; many games are now networked and played interactively, sometimes in large role-playing communities. While these games may separate you from your local community, they make you a member of other communities. Potentially such games may still be considered a social ill, given that they may isolate people from family, school, and work contacts. On the other hand, being able to network with others who share common interests and not being limited to people in ones immediate social environment, appear to be beneficial. I. Rakkolainen pointed out that it is not so easy to claim that TV (or networked games, Skype, books) is a social ill for all. Some people get seriously addicted to TV, watching it 10 hours a day, while others may get addicted to excessive gardening, football, music playing, drawing, virtual reality, drugs, sex, and so forth in an attempt to escape real life. He argued that if anybody gets addicted to these things, the reason is usually not in the particular thing or technology, but somewhere deeper in their personality or history. Nevertheless, he agreed that new lucrative technologies can make the old means of escape even more effective. Will 3D technologies have such an effect? Some people were already addicted to computer games in the 1980s, but the advent of superior graphics, 3D displays and virtual reality will make it much easier for the masses to get immersed. Interactive technologies are more immersive, as they
618
H. M. Ozaktas
require continuous attention. In the end, the implications of A. Boev’s thesis that interactivity might improve the status of TV, remained an open issue. ¨ M. Ozkan and J. Kim agreed with Rakkolainen that the technology is not intrinsically good or bad, but it is how it is used that makes it good or bad; lack of social interaction seems to be a growing problem in the developed world ¨ and Ozkan was not sure that it was fair to blame TV for it. He noted that the erosion of extended families, the disappearance of communal structures and living conditions such as old style neighborhood interactions and neighborhood shops have all contributed to social isolation. V. Skala agreed with him that the major negative aspects of TV were based on its use as a tool to boost consumption and to indoctrinate people; he added that TV programmers do not try to produce value but just use sophisticated psychological techniques to ¨ keep people passively watching TV. On the other hand, M. Ozkan continued, TV could potentially be used as an effective and economical education tool if public-interest broadcasting could be more widespread. Given the current trends, the move to 3DTV may increase its power and negative effects, a point also agreed to by Kim. N. Stefanoski suggested that the spectrum of different applications of 3DTV technology will be much wider than traditional television, with potential applications in the areas of medicine (telesurgery, surgery training, surgery assistance), industry (CAD), and military (training and simulation in virtual environments). Immersive 3D environments could be created to improve the social environment of elderly and handicapped people, helping them to have more realistic-looking visual contact with other people and interact with them. Thus, in judging the overall effects of the technology, we should not focus only on consumer 3DTV, but also consider the array of potential non-consumer applications which may have a considerable benefit to society. D. Kaya-Mutlu noted that the question under discussion frames TV around the “effects” model of mass communication. However, this model/ theory of “strong effects” or “uniform influences” was challenged in the 1930s, 40s, and 50s (i.e., the uses and gratifications approach to media consumption). It was shown that the media do not affect everybody uniformly; individual psychological differences, social categories (e.g., age, sex, income, education), and social relationships (e.g., family, friends, acquaintances) affect people’s perception and interpretation of media content. In the 1980s, culturalist studies of audiences showed that consumers are not passive recipients of encoded meanings and identities in the media. These studies redefined media consumption as a site of struggle. For example, Stuart Hall has argued that representations of violence on TV are not violence per se but discourses/messages about violence. Hall, David Morley and other audience researchers showed that, depending on their social, cultural, and discursive dispositions, viewers are able to negotiate and even resist media messages.
17 Three-dimensional Television: Consumer, Social, and Gender Issues
619
17.12 Comparison to the Move to Color Will the effects of moving to three dimensions be similar to the effects of moving to color ? A. Boev believes that merely adding another dimension (color, depth, even haptics or olfaction) to TV is not going to greatly affect the social impact of such a medium. On the other hand, P. Surman believes that once viewers have become accustomed to watching 3D images, 2D images will appear dull and lifeless. R. Ilieva also believes that the move to 3D will be more important than the move to color. C. T¨ ur¨ un noted that the present state of 3D displays are not yet of sufficient quality to allow us to imagine what it might be to experience 3DTV where the images are almost impossible to distinguish from the real thing. If such a high quality image is hanging in the air and is not physically attached to a screen, people might have an experience which is difficult for us to imagine now. In this case, the move to 3D would be much more significant than the move to color and may be comparable to the difference between a still photograph and a moving picture. G. Ziegler agreed with T¨ ur¨ un, underlining that present display technologies which are not quite “true” 3D are not significantly different from ordinary television. He also noted that a more challenging target than the move to color might be the move to moving pictures: could we ever create the awe that the first moving pictures ¨ generated in the audience? M. Ozkan also agreed that only transition to a “real” 3DTV system could be much more important than the transition to color. However, many believe that in any event, interactive TV or immersive TV is almost certain to have a much larger impact than 3DTV. In other words, the addition of the third dimension may have less impact than the possibility of interaction or immersion. G. Ziegler noted that even very convincing 3D displays, if in relatively confined spaces and viewing conditions, will be far from creating the immersive experience of even ordinary cinema. This seems to indicate that simply the size of the display and the architecture can have more of an immersive effect than the perception of three dimensions. Therefore, G. Ziegler suggested that it would be worth investigating large-scale 3D display options, even if they were not true 3D or they offered only a limited amount of depth perception; these might create a much more breathtaking experience than true 3D systems in confined or restricted viewing conditions.
17.13 Economic Advantages of Leadership If Europe, the Far East, North America, or some other block becomes the first to establish standards and offer viable 3DTV technologies, especially to the home consumer market, what economic advantages may be expected ?
620
H. M. Ozaktas
P. Surman noted that for Europe, the value added could be from licensing and from the fact that there are no overwhelming barriers to displays being manufactured in Europe. R. Ilieva believed that if Europe became the first player to establish the standards and offer viable 3DTV technologies, especially to the consumer market, the economic advantages would compare to that for CDs and DVDs. V. Skala didn’t think that Europe would be the main player. He also expressed pessimism regarding standards: they would take a long time to develop and meanwhile the market would have already moved on. He thought that while there will be similar principles of coding, there will also be many variations (like NTSC, PAL, SECAM). He guessed that major Far Eastern countries may once again take the lead. G. Ziegler was also skeptical of Europe’s capacity to provide leadership, based on observation of earlier technologies such as HDTV and GPS. Nevertheless, he argued that if the EU could set forth certain common standards, then it might give media producers and hardware manufacturers a huge home market for the fruits of their latest research; and if the rest of the world finds the new medium desirable, then these companies will have an advantage. He also underlined that it is the media standardization and the following media content which generate the ¨ revenue, not the hardware. M. Ozkan also agreed; the size of the EU market makes it viable for consumer electronics companies to achieve economies of scale in producing a new EU-standard 3DTV. However, he also noted that the added value is not in the hardware but in the content. For traditional broadcast TV, the commercial model was simple: consumers pay for the equipment (so manufacturers target the end-consumer—branding and marketing is important) and advertisement revenue pays for the content. With the move to Digital TV, it has been mostly the service provider who pays for the equipment and recoups this cost from the monthly service charge to the end-consumer. On most equipment, the manufacturers brand is either invisible or is clearly dominated by the brand of the service provider. A parallel business model has been in the works for game platforms. In such “service provider” subsidized equipment models, low-cost manufacturers have a clear advantage, and famous brands end up having a cost disadvantage because of the brand marketing costs (among other things) they incur. However, establishing a standard obviously creates a great advantage for those companies who own the intellectual property and patents. Hence although “manufacturing” of the equipment might be done by non-European companies (or even European companies doing it offshore), if intellectual property is developed early on and included in the standards, that can establish a clear and lasting advantage. C. T¨ ur¨ un thought that with the current rate of development, no place is any more advanced than any other. But if something radical were to be achieved by a European company, such as an application much more extraordinary than mundane TV or film content, we would be able to talk about real economic advantages.
17 Three-dimensional Television: Consumer, Social, and Gender Issues
621
17.14 Comparison with Other High-impact Technologies How large might the impact of consumer 3DTV technology be, compared to other related established consumer technologies such as audio and video, cellular communications, etc. P. Surman noted that the impact of 3DTV technology is likely to be high due to the high proportion of time people devote to watching TV. Also, viewing patterns are likely to change in the future; the TV set, the most familiar and easy-to-operate device in the home, will evolve into a media access gateway serving the information society. In contrast, A. Smolic said that when he imagines the world before TV, he believes that the move to 3D would be nowhere nearly as important as the introduction of TV itself. 3DTV should be considered more as another step in the development of TV, rather than as a revolutionary new technology. Moreover, he predicted that 3DTV would not spread far quickly and broadly, but rather would develop from niches; perhaps it would never completely replace 2DTV. G. Ziegler approached this issue somewhat differently. Rather than thinking of end-to-end consumer TV, he looked at acquisition, compact representation, rendering, and display technologies separately. With regard to acquisition for example, he noted that being able to acquire 3D surfaces of yourself or your surroundings could be of great interest for immersive online games, where you could quickly create a 3D avatar of yourself. If 3D tracking can be made good enough, it will open up exciting new possibilities for game entertainment and would likely be popular. As for compact representation, being able to compress 3D video so that you can store it on a DVD could be of interest for documentaries, but not for feature films, since they may remain a rather passive experience with 3D being merely an add-on (as in IMAX 3D theaters). However, in video conferencing, 3D would probably increase the feeling of telepresence, and might be successful. Free viewpoint rendering is probably mostly of interest for documentaries or plays; this is a radical change in filmmaking since the director’s control over point-of-view is lost. For this reason, free viewpoint movies may not appear soon. But, in video conferencing it would be very desirable to change viewpoints. In summary, many kinds of 3D displays could have market potential provided they did not cause eye strain, and provided there was a standardized media format (which did not soon become obsolete), and interesting high-quality media content. S. Fleck noted that apart from TV programming in the conventional sense, many other forms of content for consumer 3DTV may emerge. The example he used was Google Earth; he noted some attempts to produce some basic anaglyphic (involving red and blue glasses) stereoscopic screenshots so it would be possible to experience “3D” in Google Earth.
622
H. M. Ozaktas
17.15 Social Impact of Non-consumer Applications Which non-consumer applications of 3DTV—including such areas as medicine, industry, the environment, the military—may have significant beneficial or harmful impacts on society (Example: telesurgery.) What could be the extent and nature of these impacts ? ¨ M. Ozkan noted the steady increase of non-conventional training methods in the military. The U.S. department of defense is specifically supporting software companies to develop game based e-learning and training systems. Such systems allow the military to train their personnel for very different environments and situations, without risking lives. These users would likely welcome a realistic 3DTV system. Even higher-cost early implementations of 3DTV may find use in military training systems. (Such systems can obviously be used for more humane applications, such as disaster readiness, first-aid, and humanitarian aid applications; unfortunately funds for such applications are much more limited.) G. Ziegler also noted implications for the battlefield such as more realistic virtual training environments. He was concerned that 3D visualization might make war become even more of a “computer game” to commanders and thus distance and alienate them from the resulting human suffering. ¨ Ozkan also noted that industrial designers, specifically designers and manufacturers of 3D parts and systems have always been very aggressive users of any 3D software and hardware that shortens production times. However, these applications might require very high resolution, thus creating the need for higher quality industrial grade 3DTV. I. Rakkolainen observed that the trend in modern society to become very visual will be enhanced by any future 3D technologies; undesirable uses will be enhanced along with the desirable ones—violent video games will become even more realistic. G. Ziegler believes that telesurgery will have a huge impact. Young students have no problem looking at a screen while using surgical instruments in simulated surgery. Apparently, kids who have grown up with computer games will be quite ready to adapt to augmented reality, making such applications a reality. Another example he gave was in the area of aircraft maintenance. Paper airplane repair manuals could run up to many kilograms of weight and are very hard to handle during maintenance because their pages easily become soiled. Augmented reality displays could provide the page content, but 3DTV technology could do even more by superimposing actual objects from the manual, such as screws, onto the field of view, and thus make it easier for the maintainers to do their job. Video-conferencing is one area where the quality of face-to-face meetings can still not be recreated. If 3DTV could make video-conferencing more satisfactory, Ziegler pointed out it could reduce business travel considerably with enormous time and cost savings. H. M. Ozaktas added that this may be an incentive for companies to invest in rather expensive preliminary 3DTV models.
17 Three-dimensional Television: Consumer, Social, and Gender Issues
623
17.16 Implications to Perception of Reality Referring to the tabletop football scenario, M. Karam¨ uft¨ uo˘ glu noted that the first thing that comes to mind when 3DTV is mentioned is a telepresence type of experience. 3DTV is usually described as being realistic in the sense that it captures all the information and not just a 2D projection of it. However the perspective in the artist’s depiction of tabletop football was not the natural perspective of a person in the stadium or on the field; it was rather a bird’seye view. The distance between the human observer and the players is very short and the scale is very different. M. Karam¨ uft¨ uo˘ glu made two further points regarding realism. First, the greater the amount of information conveyed, and the more absolute the information becomes, the less there is for viewers to do and they will be pushed into a more and more passive role; this is already a criticism of conventional TV. Only the final dimension of realism—interactivity—could possibly reverse this. Indeed, the 3DTV image is absolute and real only in the passive sense; realism in a broader sense must also include interactivity. The main distinguishing feature between real and reproduction is not 2D versus 3D, not black-and-white versus color, but the possibility of interaction with the actual scene as opposed to isolation from it. He explained that nothing short of tactile, bodily interaction with the objects would bring back realism. TV content can be divided into two broad categories: factual programs and fictional programs, noted D. Kaya-Mutlu. Realism, especially in the case of fictional programs, should not be construed solely in the sense of realism of the images. One also needs to consider “emotional realism” (the recognition of the real on a connotative level). For example the relevancy of the characters and events in a film to the everyday lives of viewers may contribute much more to the impression of reality than the characters’ realistic appearance; it is not the appearance of the characters which make them realistic but rather their actions, their relation to events and to other characters in the film. The contribution of 3DTV to increasing realism should be evaluated within this broader understanding of realism. Nevertheless, Kaya-Mutlu suggested that it is interesting to think about the contributions of 3D images to the production or enhancement of emotional realism itself. Another very interesting topic was brought up by G. Ziegler, and elaborated in a response by H. M. Ozaktas. Ziegler talked about the implications of mixing real and animated characters, especially animated characters with real skin; this produces an erosion of trust in video images. Animated images are obviously not to be trusted, and everyone knows that still photographic images can be altered by “photomontage.” However, realistic video images are still trusted because people know they must be real shots of real people; people “know” that such images cannot be fabricated. But as everything becomes fully manipulable, we are entering an era where no media content can be taken to constitute “evidence”; you can believe it to the extent that you trust the source, but there is no true first-hand witnessing-at-a-distance any
624
H. M. Ozaktas
longer since virtually most forms of transmitted data will be manipulable, even the most “realistic” ones of today. And people will know it. I. Rakkolainen joined this discussion by suggesting that with the digitalization of all forms of media, it would be possible to create immersive and interactive 3D experiences; eventually synthetic 3D objects would be indistinguishable from real objects captured with a camera. H. M. Ozaktas found this to be a strong statement, with interesting implications. Ozaktas recalled the famous work “The Work of Art in the Age of Mechanical Reproduction” by Frankfurt School author Walter Benjamin, who wrote in the first half of the 20th century. With “recordings” on media no longer just representing, but becoming indistinguishable from true things, art and esthetic theorists will have a lot to theorize on. Ozaktas surmised that maybe one can write about “The Work of Art in the Age of Indistinguishable Reproduction.” Ziegler thought that the ability to use digitized characters in computer games and maybe even films will make it possible to create or simulate nonexisting or existing digital actors. This could drastically reduce film production costs and also encourage pirate productions, with a host of interesting and complex implications. Ozaktas noted that the ability to pirate an actor digitally would open up copyright and ownership issues much more complex than the current pirate copying and distribution of copyrighted material. Also, the “star-making” industry will be transformed. A star may be a real person, but it will no longer be necessary for him or her to actually act, or even be able to act; he or she will merely be the model for the digital representation used in the films. In some cases, the star will not correspond to any real person, and will be nothing more than an item of design in a studio. Ziegler also brought up the possibility of “reviving” human actors or personalities who are no longer alive through skeletal animation with realistic skin. The traditional capital of an actor was his/her actual physical presence, but now it would be merely his or her image, a commodity that can be sold and hired, even after the actor is dead. Celebrities like football stars or top models can be in Tokyo this morning in a televised fashion show and in Rio tomorrow for a talk show. Indeed, since most of us have never seen these people in the flesh, it is conceivable that totally imaginary personalities could be synthesized for public consumption. People may or may not know that these people do not actually exist, but perhaps it will be commonly accepted that they may not exist, just as we do not mind watching fictional films, knowing the events are not true. K. K¨ ose noted that in certain circumstances, the availability of very convincing images could open the door to “persona piracy,” the ability to convince others that you are someone else. Ziegler also asked whether such media would offer alternative means of escape, involving addiction, and concluded yes, but since there are already enough paths to escape, this will not have a substantial impact. Mastery of even the earliest computer or video games demonstrated the potential for addiction and escapism. Realism may increase this considerably. But will highly
17 Three-dimensional Television: Consumer, Social, and Gender Issues
625
realistic and interactive media allow qualitatively new levels of escapism from the real world? Presently, many substitutes for the real world exist, including imitation eggs and sugar, but in the social realm, substitutes are usually poor, nowhere near the real thing. If this changes, more people may prefer the more controllable, lower-risk nature of artificial experiences, leading to a society of isolated individuals simulating human experiences with quite genuine sensory accuracy. Although sensorily and therefore cognitively equivalent, these experiences will not be socially equivalent and will have an effect on how society operates. Unless, of course, computers are programmed to synchronize and coordinate the virtual experiences of individuals so that the resulting experiences and actions are in effect comparable to present social interactions and their consequences. For example, person A is hooked up to a simulator virtually experiencing an interaction with person B, and person B is likewise apparently interacting with A, and the two computers are linked such that their interactions do actually simulate actual interaction between them. In that case, the distinction between simulation and true interaction disappears; such simulation is effectively a form of remote communication and interaction. Ziegler also argued that children who experience such media concepts will not be impressed by conventional fairy tales and lose interest in them. Generalizing to other aspects of human culture, it may be argued that poetry, novels, music, even traditional cinema images may no longer be able to capture the interest of audiences. The erosion of interest in poetry and the theater in the 20th century may support this, but the continued interest in at least some forms of printed content and plastic arts are counterexamples. M. Karam¨ uft¨ uo˘ glu compared the 3DTV tabletop football scenario with the painting Anatomy Lesson by Rembrandt. He noted how the way the professor held his hand conveyed knowledge about human anatomy. He contrasted this embodied tactile/haptic human knowledge with the disembodied absolute/objective machinic knowledge of the tabletop football scenario. Indeed, engineers attempt to produce more “realistic” images by going from black-andwhite to color, from 2D to 3D, and so forth, but they often seem to be moving towards such objective, physical/machinic knowledge, at the expense of more human knowledge. In the physicist’s objective understanding of color, each wavelength corresponds to a different color, whereas human understanding of color is based on the three primaries, which is rooted in human physiology. Objective knowledge is independent from the human observer; thus “true 3D” aims to fully reconstruct a light field as exactly as possible and thus preserve as much objective information as possible. An artists rough sketch or caricature, on the other hand, may do very poorly in terms of any objective measure of fidelity and information preservation, but may carry a very high degree of information about human nature, even including such intangible things as psychological states. These discussions are not necessarily restricted to 3DTV, but applicable to any technology such as 3DTV, which increases the accuracy and realism of remote experiences. For example, odor-recording technologies allowing the
626
H. M. Ozaktas
recording and playback of smells are being developed. These devices analyze odors, and then reproduce them by combining an array of non-toxic chemicals.
17.17 Interactivity Although interactivity is not a defining characteristic of 3DTV, it comes up constantly in relation to 3DTV. Interactivity is a very important trend in all TV-type media, but it is possibly even more crucial for, perhaps even inseparable from, 3DTV. Interactive “TV” is almost certain to have a greater impact, good and bad, than 3DTV, regardless of the number of units sold. A. Boev discussed different kinds of entertainment or “media” and their differing degrees of interactivity. Live events such as theater and sports events, offer a degree of interactivity with the opportunity to throw eggs, shout, sing, and engage in hooliganism. Historically, in some forms of staging the audience was allowed to shout, argue, or even decide on the course of action (gladiators). Nevertheless, with the exception of certain experimental theaters, being among the audience is generally a safe place to be, allowing one to be close to the action without risking too much. Different degrees of interactivity are sought by different audiences in different contexts and ignoring this fact might lead to the rejection of a product. There might be a few forms of interactivity—such as the choice of point of view—which are special to 3DTV. In the same context, D. Kaya-Mutlu noted that interactivity necessitates an active viewer. However, in her view, the popularity of TV is based on its being a passive medium. Interactivity may not be much desired in a medium which is so associated with leisure.
Part IV: Gender Related Issues 17.18 Effect on Gender Inequality and Gender Bias Can you think of any aspect or application of 3DTV that will increase or decrease gender inequality or bias in the social and cultural sense? Most participants who expressed an opinion on this issue did not believe there will be any major effect, apart from what is discussed in the following section. However, it is important to underline that most of the discussion participants—as well as most developers of 3DTV technology—are male. G. Ziegler pointed out that it is worth looking into such biases in the areas of computer games, general computer usage, and usage of other “high-tech” consumer gadgets (although the insights gained may not be specific to 3DTV). He suggested that women would become more interested in using computers when the available applications and games have a considerable component of social interaction. Likewise, if 3DTV becomes a tool of social interaction, more women will become interested in it.
17 Three-dimensional Television: Consumer, Social, and Gender Issues
627
Can you think of any applications of 3DTV that will benefit or harm men or women to a greater degree? (Example: Medical application treating a disease more common in men or women.) Not too many examples were put forward, with the exception of the application areas suggested in the following section. Of these, the effects of pornography are largely perceived as negative, and more so for women. On the other hand, it has been argued that training and education applications may benefit women to a greater degree. Whether entertainment applications which selectively target men or women can be said to “benefit” them is open to debate. It was commented that application of 3DTV to shopping may also be viewed as being potentially exploitative.
17.19 Gender-differentiated Targeting of Consumers D. Kaya-Mutlu noted that ethnographic audience studies have shown that TV viewing is a gender differentiated activity not only in program preferences but also in viewing styles. Researchers have found that while men prefer such programs as news, current affairs, documentary, and adventure films, women prefer quiz shows, serials, soap operas, and fantasy movies (mostly talk oriented TV genres for which 3D may not be necessary). But the home is a site of power relations and when there is a clash of tastes, masculine preferences prevail. Researchers have also identified some differences between the viewing styles of men and women; while men watch TV in a focused and attentive manner, women watch it in a distractive manner (i.e., together with at least one another domestic activity like ironing or feeding the children). Kaya-Mutlu concluded that these gendered program preferences and viewing habits imply that 3DTV, which seems to favor visual information and encourages focused/attentive viewing, may be more responsive to male demands and tastes; this may encourage producers to reserve 3D for prime-time programs since most daytime programs are addressed to children and housewives. Can you think of any applications of 3DTV that will target either men or women as primary consumers? (Example: Broadcasting of male-dominated sports events.) Three applications have been noted that may target men as primary consumers: male-dominated sports events, games applications, and pornography. It was suggested that such applications may help 3DTV by building initial niche markets. Y. Yardımcı and A. Smolic noted that in particular, 3D pornography may become a popular industry, but many participants expressed that they would be uncomfortable from an ethical perspective in building 3DTV’s success on such grounds. If 3DTV does become a new tool in disseminating this type of content, the result will be to increase the negative and exploitative impact of such content on society. It was also commented that in any event,
628
H. M. Ozaktas
it cannot be taken for granted that 3D will make this type of content more attractive for its consumers. A number of shopping-related applications that may target women as primary consumers were noted by G. B. Akar. 3D virtual malls were described as a game-like environment where you can navigate through shops. A 3D dressup simulator would allow selected garments to be mixed and matched on an avatar and viewed from different angles. Finally, the use of 3D in telemarketing was noted as an enhancement that might make home shopping more attractive. Akar also suggested that application of 3DTV in the areas of professional and vocational training (for instance, to become a surgeon, pilot, or technician) has the potential to benefit women particularly, because some studies seem to show that women are more inclined towards visual learning. She added that a similar potential exists in K-12 education, especially in subjects such as geometry, chemistry, biology, and physics where visualization or simulation of complex structures or phenomena are vital. This may also help increase the interest of women in science and engineering. Will it be important for companies to target one or the other gender to sell 3DTV consumer equipment or applications? A. Boev and G. Ziegler mentioned market studies on which gender dominated in the decision to purchase consumer electronics, and which features were decisive (technical parameters, design, etc.). Furthermore some studies seem to indicate sex differences in perception and spatial ability, which may have implications especially on immersive technologies. Given that these issues are highly charged and there are many unresolved claims, it is difficult to make any meaningful conclusions.
Acknowledgments We would like to thank the following participants of the Network of Excellence for their contributions to the discussions which formed the basis of this work: Gozde B. Akar (Middle East Technical University, Ankara, Turkey), Atanas Boev (Tampere University of Technology, Tampere, Finland), Reha Civanlar (Ko¸c University, Istanbul, Turkey, now DoCoMo Labs, Palo Alto, USA), Sven Fleck (University of T¨ ubingen, T¨ ubingen, Germany), Rossitza Ilieva (Bilkent University, Ankara, Turkey), Matthias Kautzner (Fraunhofer Institute for Telecommunications/Heinrich-Hertz-Institut, Berlin, Germany), Kıvan¸c K¨ose (Bilkent University, Ankara, Turkey), Matthias Kunter (Technical University of Berlin, Berlin, Germany), Haldun M. Ozaktas (Bilkent Uni¨ versity, Ankara, Turkey), Mehmet Ozkan (Momentum AS¸, Istanbul, Turkey), Ismo Rakkolainen (FogScreen Inc., Helsinki, Finland), Simeon Sainov (Bulgarian Academy of Sciences, Sofia, Bulgaria), Vaclav Skala (University of West Bohemia in Plzen, Plzen, Czech Republic), Aljoscha Smolic (Fraunhofer
17 Three-dimensional Television: Consumer, Social, and Gender Issues
629
Institute for Telecommunications/Heinrich-Hertz-Institut, Berlin, Germany), Nikolce Stefanoski (University of Hannover, Hannover, Germany), Philip Surman (De Montfort University, Leicester, United Kingdom), Cemil T¨ ur¨ un (Yogurt Technologies Ltd., Istanbul, Turkey), Yasemin Yardımcı (Middle East ¨ ur Y¨ Technical University, Ankara, Turkey), Ali Ozg¨ ontem (Bilkent University, Ankara, Turkey), Gernot Ziegler (Max Planck Institute for Informatics, Saarbr¨ ucken, Germany). We are especially grateful to the following external participants for their contributions to the discussions: G¨ uliz Ger (Bilkent University, Faculty of Business Administration, Department of Management, Ankara, Turkey), an expert on the sociocultural dimensions of consumption, consumption and marketing in transitional societies and groups, and related issues of globalization, modernity, and tradition; Murat Karam¨ uft¨ uo˘ glu (Bilkent University, Faculty of Art, Design, and Architecture, Department of Communication and Design, Ankara, Turkey), an expert on information retrieval theory, design, and evaluation, computer mediated communication and collaborative work, computer semiotics, philosophical foundations of information systems, and the organizational, social and political implications of information systems; Dilek Kaya-Mutlu (Bilkent University, Faculty of Art, Design, and Architecture, Department of Graphic Design, Ankara, Turkey), an expert on film studies with an emphasis on audience studies, film reception, and Turkish cinema; Jinwoong Kim (ETRI, Radio and Broadcasting Research Laboratory, Daejeon, Republic of Korea), 3DTV Project Leader; Fatih Porikli (Mitsubishi Electric Research Labs, Cambridge, USA), Principal Member and Computer ¨ Vision Technical Leader; Ozlem Sandık¸cı (Bilkent University, Faculty of Business Administration, Department of Management, Ankara, Turkey), an expert on culturally oriented issues in marketing, including advertising reception, gender and advertising, consumption culture, and the relationships between modernity, postmodernity, globalization and consumption. Special thanks goes to Gozde B. Akar (Middle East Technical University, Faculty of Engineering, Department of Electrical and Electronics Engineering, Ankara, Turkey) and Yasemin Soysal (University of Essex, Department of Sociology, Colchester, United Kingdom) for their critical comments on Part IV. We would like to take this opportunity to also thank Levent Onural of Bilkent University for his support as leader of the Network of Excellence on Three-Dimensional Television. Finally, we are grateful to Kirsten Ward for careful editing of the manuscript. This work is supported by the EC within FP6 under Grant 511568 with the acronym 3DTV.