115 64 52MB
English Pages 722 [697] Year 2024
Jinyang Liang Editor
Coded Optical Imaging
Coded Optical Imaging
Jinyang Liang Editor
Coded Optical Imaging
Editor Jinyang Liang Laboratory of Applied Computational Imaging Institut National de la Recherche Scientifique Université du Québec Varennes, QC, Canada
ISBN 978-3-031-39061-6 ISBN 978-3-031-39062-3 https://doi.org/10.1007/978-3-031-39062-3
(eBook)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
To my family: Jiajing, Grace, and Joy To my parents: Jingsha and Ping To my mentors, students, colleagues, and friends
Preface
Coded optical imaging is an emerging field of research. As a sub-field of computational imaging, it synergizes optical engineering and computational techniques. Different from conventional optical imaging, it uses a patterned mask, which is generated by various meanings and can be placed at any position in the imaging system, to code the scene during the physical data acquisition. The captured data, as an intermediate setup, often have no resemblance to the scene. The image of the scene and its associated information are recovered by computational image reconstruction. Thanks to this rather unique paradigm, coded optical imaging is able to surpass its conventional counterparts in image quality, dimensionality, and functionality, which, in turn, opens up many unprecedented applications. To date, coded optical imaging has become an extensively studied topic and has made contributions to both scientific advancements and industrial innovations. Despite the burgeoning development of coded optical imaging, thus far, there lacks a timely and comprehensive compilation of the field. Historically, coded aperture imaging has separate origins: one came from engineering instrumentation in astrophysics and metrology in the 1970s–1980s; the other originated from computer vision and graphics in the 1990s. The former aimed to use computers to enhance the performance of imaging devices, while the latter targeted optics to assist computers. Because of these different emphases, coded optical imaging is rarely discussed as a united discipline, but rather presented as specific techniques among various disciplines, including structured light, wavefront engineering, lensless imaging, and computational photography. In terms of existing literature, reviews of coded-aperture imaging solely discuss its operation with high-energy radiation, leaving out recent advances in the optical regime. Books and reviews on computational imaging and computational photography focus on the fundamentals of computer vision or graphics using unmodified consumer cameras. The different origins also created an inherent gap between researchers in optics and in computer science to keep up with the latest developments in one another’s fields. All these reasons call for a designated venue to present coded optical imaging as an independent discipline, scrutinize its similarity and difference to related fields, and survey representative technical development and novel applications as a pedagogical reference for the benefit of a broad readership from diverse communities.
vii
viii
This book is intended to capture the state-of-the-art in coded optical imaging. For clarification, each term is defined as follows. First, “coded” refers to the use of patterns to spatially and/or temporally modulate the complex amplitude or the intensity of light. The coding components must provide prior information that is recorded in physical data acquisition and leveraged in computational image reconstruction. This definition excludes the modulation from many ordinate components, such as the parabolic phase term added by a lens, a linear phase ramp imparted by a grating, and uniform intensity change enabled by an attenuator. Second, “optical” refers to detecting photons in the spectral range from extreme ultraviolet to far infrared. Finally, “imaging” takes place in at least two spatial dimensions. This book contains seven parts. A total of 37 invited chapters from leading experts cover a wide range of topics from basic concepts to the latest progress. The book starts with Part I “Fundamentals in Coded Optical Imaging.” The field of coded optical imaging is introduced in Chap. 1. Then, Chaps. 2– 4 discuss the fundamentals of essential components in both hardware and software, including various encoders for optical imaging (Chap. 2) as well as convex optimization (Chap. 3) and machine learning (Chap. 4) for image reconstruction. Following Part I, selected techniques that represent the current state of development are presented in Chaps. 5–37. Because the research content of coded optical imaging is relatively divergent, to avoid exhaustive enumeration, instead of compartmentalizing these techniques by applications, they are categorized according to the measured photon tags in the plenoptic function. Part II “Coded Planar Imaging” contains seven chapters. This part starts by introducing how two passive encoders, namely metasurfaces for diffractive optical neural networks (Chap. 5) and a zone plate for lensless imaging (Chap. 6), can realize attractive features in coded optical imaging systems. Then, Chap. 7 shows how spatiotemporally encoding can clean motion blur. The last four chapters in this part discuss various spatial encoding techniques in representative coded optical imaging techniques, including single-pixel imaging (Chap. 8), spatial frequency domain imaging (Chap. 9), wavefront shaping (Chap. 10), and ptychography (Chap. 11). Part III “Coded Depth Imaging” contains six chapters that extend the coverage of coded optical imaging to three-dimensional (3D) space. The first two chapters introduce innovative dipole-spread function engineering for super-resolution microscopy (Chap. 12) and coded aperture correlation holography for 3D imaging (Chap. 13). The next two chapters survey two representative structured light 3D imaging approaches—fringe projection profilometry (Chap. 14) and grid-index-based surface profilometry (Chap. 15). Finally, time-of-flight depth sensing (Chap. 16) and optical diffraction tomography (Chap. 17) are discussed as two representative methods relying on temporal and angular encoding. Part IV “Coded Light-Field Imaging” has four chapters. The first two chapters cover how various structured light patterns (Chap. 18) and a backgroundoriented schlieren setup (Chap. 19) can be used as coded illumination for light field imaging. The last two chapters present how spatial light modulators (Chap. 20) and coded masks (Chap. 21) can be inserted in the detection side for light field imaging.
Preface
Preface
ix
Part V “Coded Temporal Imaging” has seven chapters. The first three chapters cover spectrally encoded illumination (Chap. 22), spatial multiplexing (Chap. 23), and sampling (Chap. 24) for ultrafast imaging. Then, Chap. 25 surveys the latest development of compressed ultrafast photography (CUP)—the world’s fastest real-time imaging modality. Chapter 26 extends the concept of CUP to ordinary cameras based on the charge-coupled device (CCD) and complementary-metal-oxide-semiconductor (CMOS) technology for highspeed imaging. The last two chapters present on-sensor coding achieved by a shuffled rolling shutter (Chap. 27) and multi-tap charge modulators (Chap. 28). Part VI “Coded Spectral Imaging” contains six chapters. Chapter 29 discusses coded aperture snapshot spectral imager—a powerful compressedsensing imaging modality to retrieve high-dimensional information. Chapter 30 surveys coded Raman spectroscopy using spatial light modulators. Chapter 31 presents the implementation of spatial multiplexing in imaging spectroscopy. The next two chapters present the implementation of chaotic masks (Chap. 32) and diffractive optical elements (Chap. 33) as an optical encoder for computational hyperspectral imaging. The final chapter in this part details the on-sensor encoding of a hyperspectral camera (Chap. 34). Part VII “Coded Polarization Imaging” has three chapters. Chapter 35 describes structured illumination microscopy for fluorescence polarization imaging. Chapter 36 discusses the implementation of dielectric-metasurfacebased full-Stokes polarization detection and imaging polarimetry. Chapter 37 presents on-sensor polarization encoding in phase-shifting digital holography for advanced quantitative phase imaging. This book serves as a reference for professionals and a supplement for trainees in computational imaging, information optics, optical engineering, computer graphics, image reconstruction techniques, and other related fields. It can also be used as a field guide for senior undergraduate students and graduate students who aim to have a quick grasp of the fundamental knowledge of coded optical imaging. Efforts have been made to allow using this book more as a textbook for a graduate-level class. Selected chapters can be used to cover a one-semester course. In particular, Chaps. 1–4 can serve as an introduction to this field. Fundamentals of operating principles of various techniques and specific examples in Chaps. 5–37 can also be adapted for course materials. Alternatively, many chapters can be re-grouped for specific topics. For example, Chaps. 27, 28, 34, and 37 are implementations of various on-sensor encoding mechanisms. Chapters 2, 8, 9, 14, 18, 23, 31, and 35 share a common topic the applications of spatial sinusoidal patterns. Many chapters can also be put together for the topic of compressed sensing in optical imaging. The successful completion of this book is due almost entirely to the enthusiasm of each one of the 104 authors from 12 countries on 5 continents for sharing their knowledge of coded optical imaging with a broad audience. The entire editorial process of this book was conducted during the trying time of the pandemic. Despite many practical difficulties, the authors responded to this challenge positively and with much enthusiasm. The manuscripts were expected to be 10 pages in length. However, all of them ended up
x
Preface
being much longer, and several chapters had more than 30 pages! I sincerely thank all authors for their trust, patience, support, cooperation, and dedication throughout the compilation of this book. As the editor, my appreciation also goes to Merry Stuber, Amrita Unnikrishnan, Suhani Jain and other editorial staff at Springer Nature, who gave me much useful information, tips, and assurance no matter how confusing or unpleasant the occasions seem to be. I also wish to thank my students Yingming Lai, Cheng Jiang, Miguel Marquez, and Hanzi Liu for proofreading and logistics as well as Dr. Patrick Kilcullen for cover image preparation. Finally, I thank my wife Jiajing and my daughters Grace and Joy, who gave me tremendous support and encouragement to edit this book on many late nights. Varennes, QC, Canada 4 January 2024
Jinyang Liang
Contents
Part I Fundamentals in Coded Optical Imaging 1 Introduction to Coded Optical Imaging . . . . . . . . . . . . . . . . . . . Jinyang Liang
3
2 Encoders for Optical Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yingming Lai and Jinyang Liang
15
3 Convex Optimization for Image Reconstruction . . . . . . . . . . . . Henry Arguello and Miguel Marquez
37
4 Machine Learning in Coded Optical Imaging . . . . . . . . . . . . . . Weihang Zhang and Jinli Suo
55
Part II Coded Planar Imaging 5 Diffractive Optical Neural Networks . . . . . . . . . . . . . . . . . . . . . . Minhan Lou and Weilu Gao
73
6 Zone Plate-Coded Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiachen Wu and Liangcai Cao
95
7 Spatiotemporal Phase Aperture Coding for Motion Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shay Elmalem and Raja Giryes
109
8 Single-Pixel Imaging and Computational Ghost Imaging . . . . Ming-Jie Sun
131
9 Spatial Frequency Domain Imaging . . . . . . . . . . . . . . . . . . . . . . . Rolf B. Saager
143
10 Imaging Through Scattering Media Using Wavefront Shaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuecheng Shen 11 Coded Ptychographic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . Shaowei Jiang, Tianbo Wang, and Guoan Zheng
165 181
xi
xii
Contents
Part III Coded Depth Imaging 12
13
Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . Tingting Wu and Matthew D. Lew
207
Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH) . . . . . . . . . . . . . . . . . . . . . . . Joseph Rosen, Nathaniel Hai, and Angika Bulbul
225
14
Fringe Projection Profilometry . . . . . . . . . . . . . . . . . . . . . . . . . . Cheng Jiang, Yixuan Li, Shijie Feng, Yan Hu, Wei Yin, Jiaming Qian, Chao Zuo, and Jinyang Liang
241
15
Grid-Index-Based Three-Dimensional Profilometry . . . . . . . . Elahi Ahsan, QiDan Zhu, Jun Lu, Yong Li, and Muhammad Bilal
287
16
Depth from Time-of-Flight Imaging . . . . . . . . . . . . . . . . . . . . . . Mohit Gupta
307
17
Illumination-Coded Optical Diffraction Tomography . . . . . . . Andreas Zheng, Hui Xie, Yanping He, Shiyuan Wei, Tong Ling, and Renjie Zhou
323
Part IV Coded Light-Field Imaging 18
Light-Field Imaging with Patterned Illumination . . . . . . . . . . Depeng Wang, Kekuan Wang, Feng Xing, and Diming Zhang
345
19
Plenoptic Background Oriented Schlieren Imaging . . . . . . . . Elise Hall, Jenna Davis, Daniel Guildenbecher, and Brian Thurow
357
20
Coded-Aperture Light Field Imaging Using Spatial Light Modulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jingdan Liu and Jinyang Liang
21
Compressive Coded-Aperture Light Field Imaging . . . . . . . . . Saghi Hajisharif, Ehsan Miandji, and Christine Guillemot
369 385
Part V Coded Temporal Imaging 22
Continuous High-Rate Photonically Enabled Compressed Sensing (CHiRP-CS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark Aaron Foster
405
23
MUltiplexed Structured Image Capture (MUSIC) . . . . . . . . . Zhili Zhang and Mark Gragston
421
24
Sampling-Based Two-Dimensional Temporal Imaging . . . . . . Qiyin Fang, Morgan Richards, and Yiping Wang
437
Contents
xiii
25 Compressed Ultrafast Photography . . . . . . . . . . . . . . . . . . . . . . . Peng Wang and Lihong V. Wang
453
26 Compressed High-Speed Imaging . . . . . . . . . . . . . . . . . . . . . . . . . Xianglei Liu and Jinyang Liang
481
27 Shuffled Rolling Shutter Camera . . . . . . . . . . . . . . . . . . . . . . . . . Esteban Vera, Felipe Guzman and Nelson Diaz
499
28 Ultra-High-Speed Charge-Domain Temporally Compressive CMOS Image Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keiichiro Kagawa and Hajime Nagahara
515
Part VI Coded Spectral Imaging 29 Coded Aperture Snapshot Spectral Imager . . . . . . . . . . . . . . . . Xin Yuan, Zongliang Wu, and Ting Luo 30 Coded Raman Spectroscopy Using Spatial Light Modulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark A. Keppler, Zachary A. Steelman, and Joel N. Bixler 31 Spatial Frequency Multiplexing in Spectroscopy . . . . . . . . . . . Elias Kristensson 32 Multispectral Three-Dimensional Imaging Using Chaotic Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vijayakumar Anand, Soon Hock Ng, Daniel Smith, Denver Linklater, Jovan Maksimovic, Tomas Katkus, Elena P. Ivanova, Joseph Rosen, and Saulius Juodkazis 33 Encoded Diffractive Optics for Hyperspectral Imaging . . . . . . Henry Arguello, Laura Galvis, Jorge Bacca, and Edwin Vargas 34 Chirped Spectral Mapping Photography Using a Hyperspectral Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dalong Qi, Shian Zhang, Yunhua Yao, Jiali Yao, Chengzhi Jin, and Yilin He
533
549 565
581
593
607
Part VII Coded Polarization Imaging 35 Polarization Structured Illumination Microscopy . . . . . . . . . . . Xin Chen, Wenyi Wang, Meiqi Li, and Peng Xi
631
36 Full-Stokes Imaging Polarimetry Using Metasurfaces . . . . . . . Ting Xu, Yilin Wang, Yongze Ren, and Qingbin Fan
667
37 Compact Snapshot Phase-Shifting Digital Holographic Imaging Systems Using Pixelated Polarization Camera . . . . . . Hanzi Liu, Vinu R. V., Ziyang Chen, Jinyang Liang, and Jixiong Pu Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
683
705
Part I Fundamentals in Coded Optical Imaging
1
Introduction to Coded Optical Imaging Jinyang Liang
Abstract
This chapter introduces coded optical imaging. The presentation starts with a discussion of the importance and limitations of conventional optical imaging, followed by the introduction of optical encoding as the solution to overcome the existing limitations. Then, the operating principle of coded optical imaging is reviewed. The ensuing discussion focuses on the origins of coded optical imaging and representative major events in the development of this field. Next, the reasons that contribute to the prosperity of coded optical imaging are presented, and its advantages over conventional optical imaging are described. Keywords
Optical imaging · Computational imaging · Multi-dimensional imaging · Optical engineering · Data acquisition · Optical encoding · Coded aperture · Spatial light modulators · Computational techniques · Image reconstruction
J. Liang (o) Laboratory of Applied Computational Imaging, Centre Énergie Matériaux Télécommunications, Institut National de la Recherche Scientifique, Université du Québec, Varennes, QC, Canada e-mail: [email protected]
1.1
Importance of Optical Imaging
From billboards to galleries, from TikTok™ to Instagram™, visual information is ubiquitous. The adage, “a picture is worth a thousand words”, constantly emphasizes that images, as an indispensable constituent in visual information, can convey meanings much more effectively than verbal descriptions. Especially in this fast-paced era of big data, images have brought diverse, inclusive, and efficient presentations into our daily lives. To “show and tell”, optical imaging instruments are necessary to record scenes, process captured data, and present images to spectators. Conventionally, optical imaging instruments operate by collecting the light intensity from the object (e.g., via transmission, scattering, and reflection) and using the imaging components to form an image of the object on the detector (Fig. 1.1). This point-to-point mapping between the object and the captured image best illustrates “what you see is what you get”. Guided by this operating principle, the development of optical imaging instruments has focused on a sole purpose— to maximize the resemblance between the captured image and the scene. Toward this goal, many revolutionary technologies have been developed. Early examples include the research of photosensitive materials in 1717 [1], the development of color photography in 1861 [2], and the invention of the digital camera in 1975 [3]. After the world enters the twenty-first century,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_1
3
4
J. Liang
Image Sensor
Imaging system
Pinhole
Scene Scene Fig. 1.1 Operating principle of conventional optical imaging systems
the blooming of optical imaging techniques has largely expanded human vision to see the world as never possible before. For example, the in situ image storage camera built upon the charge-coupled device (CCD) increases the imaging speed to 100 million frames per second [4]. Advanced microscopy has revealed fine spatial features from single molecules to organelles [5]. Compound camera lenses are designed to minimize distortion over a large field of view [6]. Besides advancing scientific frontiers, optical imaging has largely impacted our daily lives. Phase detection has been widely used in autofocusing, first enabled by Samsung’s Galaxy S5 [7]. Structured illumination has been implemented in iPhone X for threedimensional (3D) imaging [8]. These new technical abilities have largely propelled the prosperity of social media.
Fig. 1.2 Schematic of a pinhole camera
function. The system performance is also limited by each constituent in terms of the brightness of the light source, sensitivity, size, and speed of the detector, as well as sensing dimensions and resolution of the imaging components. Although certain engineering solutions could overcome these limits, they come with a considerable cost in high expense, bulky size, and limited application scope. Although exciting research outcomes have kept improving these technical specifications, certain specifications are inherently limited by physics. For example, the CCD sensor’s frame rate, limited by the electron’s diffusion speed, can hardly top 1 billion frames per second [9]. The spatial resolution, bounded by the diffraction limit, is difficult to exceed 250 nm [10]. The sensitivity of the sensor is also bounded by random thermal and electronic noise [11].
1.3 1.2
Limitation in Conventional Optical Imaging
Despite this remarkable progress, the development of conventional optical imaging techniques is bounded by its operating principle. From the aspect of image acquisition, the two-dimensional (2D) intensity-only recording automatically discards rich information contained in many photon tags—three spatial coordinates, two angles of incidence, time-of-arrival, wavelength, and polarization—in the photon’s plenoptic
Light Encoding as a Solution
The limitations of conventional optical imaging can be overcome by implementing coded apertures in imaging systems. The coded aperture, in its original form, is basically a mask containing many pinholes. As early as 400 BC, the phenomenon of imaging using a pinhole was documented by the Chinese philosopher Mozi [12]. Pinhole imaging was later implemented in the “camera obscura” (Latin for “dark chamber”) for early photography [13] and drawing aid [14] (Fig. 1.2). This lensless setup can form a
1 Introduction to Coded Optical Imaging
5
(a)
(b)
(c) Sensor
Workstation
Imaging system
Scene
Encoding mask
Reconstructed image
Fig. 1.3 Coded optical imaging. (a) This example shows a twelve-hole mask used in a camera obscura. (b) This example shows irregularly distributed gaps between leaves producing replicated images of a solar eclipse. (c) Oper-
ating principle of coded optical imaging. ((b) is reprinted under a Creative Commons Attribution 2.0 Generic license from Wikimedia Commons: The Dappled Sunlight under the Trees was Very Strange Before and After Totality, Hammer, V.H., © 2017)
distortion-free image of objects across a wide angular field and over a large range of distances. However, the pinhole size must remain small (e.g., 0.5 mm in diameter to image objects 0.25 m from the film) to produce a sharp image. An increase in pinhole size will drop the clarity of the image. Due to the low light throughput, in most practical cases, the exposure time of the pinhole camera is insufferably long even with the most sensitive films. This overriding drawback largely limits the application scope of the pinhole camera to stationary objects. To overcome the inevitable trade-off between the spatial resolution and the light throughput in a single-pinhole camera [14], a coded aper-
ture was devised by putting multiple pinholes together with a certain format. An early example of a coded aperture—a 3 × 4 pinhole array— can be found in the publication in the fifteenth century [15] (Fig. 1.3a). Other types of coded aperture, e.g., randomly distributed gaps between leaves, widely exist in nature (Fig. 1.3b) [16]. With a coded aperture, element images produced by each pinhole are overlapped on a film that records them all simultaneously. In this data acquisition process, the incident light from a scene, denoted by I, is modulated by one or multiple coded apertures with various other hardware to produce the recorded data, denoted by E. Mathematically, this process can be expressed by a forward model of
6
J. Liang
E = OI,
.
(1.1)
where the operator O stands for the overall operation performed by the hardware. Different from conventional optical imaging, the acquired data often bear no apparent resemblance to the scene. Hence, the acquired image is reconstructed computationally to retrieve the image of the scene, denoted by .Iˆ, via a backward model of Iˆ = O' E,
.
(1.2)
'
where O stands for the inverse operation of O and is performed by the software. Therefore, coded optical imaging is a hybrid imaging scheme that synergizes physical data acquisition and computational image reconstruction (Fig. 1.3c).
1.4
Origins of Coded Optical Imaging
In modern science, coded optical imaging is developed from separate origins. The first major implementation of coded aperture imaging was in the field of astrophysics [17]. Because of the high energy of cosmic radiations (e.g., x-rays or gamma rays), it is challenging to find proper components to reflect and/or refract them. Therefore, by offering the ability to block or transmit these
high-energy photons, coded masks were implemented as a key device to enable the imaging of these kinds of radiations [18]. Their robust performance, practical merits, and distortion-free imaging propel the use of coded-aperture imaging in many high-energy telescopes in satellites [19] (Fig. 1.4). In data acquisition, the captured image can be represented as the correlation of the objects and the coded mask [20]. The ensuing image reconstruction typically employs correlation to recover them. This reconstruction approach demands the autocorrelation function of the coded aperture to approximate a delta function, which sparks the development and implementation of various types of coded apertures, such as the random encoding mask [21], the Fresnel zone plate [22], and a modified uniform redundancy array [23]. Despite its initial implementation with high-energy radiation, the operating principle of coded-aperture imaging is universally applicable to other electromagnetic spectral ranges. Thus, coded masks were soon introduced to optical imaging [24]. Optical encoding also occurs in diverse locations in data acquisition, from the illumination of the scene to the sensor on the camera. Complementary with many high-quality reflecting and refracting components of light, coded masks endow the resulting optical imaging systems with many new functions.
Fig. 1.4 Coded mask of the Burst Alert Telescope in the Neil Gehrels Swift Observatory. (Reprinted with permission from NASA: https://imagine.gsfc.nasa.gov/observatories/learning/swift/mission/bat.html)
1 Introduction to Coded Optical Imaging
The second origin lies in the early development of fringe-projection profilometry. In 1982, Takeda, Ina, and Kobayashi invented the Fouriertransform-based fringe-pattern analysis [25]. Using a Michelson interferometer, they encoded a 3D object by projecting a sinusoidal fringe pattern. Then, they extracted the information in the spatial frequency domain, solved the phase wrapping problem, and reconstructed the 3D profile. In 1984, this technique was extended by Srinivasan, Liu, and Halioua to phase-shift profilometry [26]. Using a laser-illuminated shearing polarization interferometer, they generated precise phase shifts, contributing to accurate 3D shape measurement. These early works marked a successful attempt to use optical encoding to enhance the imaging dimensionality, which led to the later development of the vastly popular phase-shifting fringe projection profilometry in diverse applications [27–29]. Besides its origins in optical engineering, coded optical imaging also finds its root in the fields of computer vision. In the 2000s, more researchers in this area started to jointly consider image capture and postprocessing in improving the output image quality. It was found that adding a coded mask, despite reducing the overall light throughput, allowed the capture of additional visual information, which improved the output image quality [30–32]. Coded photography, a branch of computational imaging, provides new means and new ideas for breaking through many restrictive factors in traditional imaging systems.
1.5
Emergence of Coded Optical Imaging
The early developments in light encoding and imaging sparked the research interest in the synergy between optical engineering and computational techniques in imaging. Researchers embraced the idea of “taking the data before seeing the image” and sought to implement the coded masks in many ways to enhance the performance of optical imaging systems. These attempts and activities led to the emergence of coded optical
7
imaging. At the turn of the century, more breakthroughs largely propelled the frontier in coded optical imaging. A few examples are briefly described below in chronological order (Fig. 1.5). In 1995, the pioneering work of Dowski and Cathey [33] implemented wavefront encoding for the first time. A cubic phase profile was added at the aperture diaphragm of an optical system to generate a special point spread function that was insensitive to defocusing but sensitive to depth. The data acquisition was followed by a deconvolution process to recover a clear image with an extended depth of focus. This method inspired numerous follow works in point-spread-function engineering [34–38]. In 1999, Nordin et al. designed and fabricated the first micropolarizer array (MPA) for imaging polarimetry in the 3–5 μm region [39]. The array contained 128 × 128 unit cells, each of which was composed of a 2 × 2 array of wire-grid micropolarizers at three distinct angular orientations. Integrated with a focal plane array, it enabled the measurement of the first three Stokes vector components in each pixel of an imaging polarimeter. Similar designs were later implemented in MPAbased visible polarimetry for full Stokes measurement [40]. MPA-based polarization sensors have been commercialized by Sony [41]. Also in 1999, Kodama, Okada, and Kato invented the high-speed sampling camera (HISAC) [42]. A 2D-1D fiber array was used with a streak camera. At the input end, this fiber bundle was arranged as a 15 × 15 array to sample the 2D spatial information of a dynamic scene. At the output end, this fiber array was remapped to a 1 × 255 configuration that accommodates the 1D imaging requirement of the streak camera. Later implemented in other coded optical imaging modalities, the sampling operation by a pinhole array generated void spaces on a sensor that could be filled with information of other photon tags (e.g., spectrum [43, 44]). In 2006, Baraniuk, Kelly, et al. invented the famous single-pixel camera based on the compressed sensing theory [45, 46]. This modality used binary pseudo-random patterns displayed on a digital micromirror device (DMD) to encode the scene, which was then focused onto a single-pixel
8
J. Liang
Phase-shifting 3D profilometry Srinivasan, Liu, and Halioua (1984)
1985
1990
Digital micromirror device (DMD)–based projector Digital Projection Ltd. (1997)
Micropolarizer-array-based infrared imaging polarimetry Nordin et al. (1997)
Coded aperture snapshot spectral imager Brady, Gehm, et al. (2007) Dappled photography Veeraraghavan, Raskar, et al. (2007)
1995
Liquid-crystal-display (LCD) projector Epson (1989)
Wavefront front coding for depthof-focus extension Dowski and Cathey (1995)
High-speed sampling camera Kodama, Okada, and Kato (1999) 2000
2005
2010
Single-pixel camera via compressed sensing Baraniuk, Kelly, et al. (2006) Phase-coded aperture for lensless imaging Chi and George (2009)
Fig. 1.5 Representative major events in the development of coded optical imaging before 2010
detector. The original scene was reconstructed via a compressed sensing-based algorithm. Since then, compressed sensing has become a hot research area and is widely used in coded optical imaging. In 2007, also based on compressed sensing theory, Brady, Gehm, et al. invented the coded aperture snapshot spectral imager [47]. This
imaging modality used a static-coded aperture to encode the scene and used a dispersive element (e.g., a prism) to shear the scene. This work enabled single-shot measurement of a threedimensional datacube, which inspired many following works in single-shot high-dimensional imaging [48].
1 Introduction to Coded Optical Imaging
In the same year, Veeraraghavan, Raskar, et al. developed the dappled photography technology [46], which used a coded aperture consisting of a summation of 2D cosine harmonics to encode the scene. As the light propagation sheared the Fourier spectrum of the coded aperture, different slices of the generated replicas of the light field were captured by the sensor, which enabled reconstruction the full light-field datacube. This method can reconstruct the 4D light field at full camera resolution without any additional refractive element, revealing the unique advantages of coded light-field optical imaging [49, 50]. In 2009, Chi and George developed a lensless imaging system [51]. A phase-coded aperture of the uniform redundant array was used to produce specific diffraction patterns on a 2D detector when the wavefront from a point source object passes through the phase screen. An iterative phase retrieval method and correlationtype processing were applied for system design and image recovery. This work grafted the codedaperture imaging used with high-energy radiation into the optical regime. It inspired many followup works in lensless imaging [52].
1.6
Prosperity of Coded Optical Imaging
Today, coded optical imaging has been widely used in diverse scientific disciplines, ranging from quantum optics to materials characterization. It has also been applied to various industrial products, from gaming to mobile communications. Its prosperity is contributed to the advances in three major areas. The first contributor is the incessant progress in nanofabrication that has made the coded apertures of new types and with high quality, including metasurfaces for wavefront encoding [53], coded photocathodes for ultrafast imaging [54], and coded phase patterns for diffractive deep neural networks [55]. A representative example is the evolution of the fabrication of wiregrid micropolarizers that are widely used in multispectral encoding [56] and polarization encoding [57]. Early attempts in the 1960s
9
deposited metals using vacuum evaporation at nearly grazing incidence to the surface of a diffraction grating [58]. However, it was not possible to precisely control the size of each micropolarizer, the deposited wires were not uniform, and the micropolarizer worked only in the infrared region. In the 1990s, using more advanced techniques, such as radiofrequency sputtering and interference lithography, it became possible to fabricate uniform metal wire grids with micrometer-level size on a glass substrate. The resulting micropolarizers were not only compatible with sensors based on the CCD or complementary-metal-oxide-semiconductor (CMOS) technology [39] but also worked with visible light [59]. After the 2000s, further advances in nanofabrication, such as reactive ion etching and electron beam lithography, enable the direct fabrication of these micropolarizers on top of CCD and CMOS sensors [60], which eventually leads to Sony’s commercialization. Extended from this contribution, the synergy between nanofabrication and microelectronics also significantly contributes to coded optical imaging. One of the most significant outcomes from this synergy is the development of two most commonly used spatial light modulators (SLMs)—DMDs and liquid-crystal (LC) SLMs (Fig. 1.5). These devices contain millions of micrometer-size pixels, each of which can independently modulate the intensity, phase, and/or polarization of incident light. The DMD, invented by Larry Hornbeck from Texas Instruments in 1987, is a binary amplitude SLM [61]. The first DMD-based projector was introduced by Digital Projection Ltd. in 1997. The LC-SLM is based on the electronically addressable liquid crystal display (LCD) technique. The first functional prototype was the active-matrix LCD with thinfilm transistors, which was made by the team led by T. Peter Brody and Fang-Chen Luo at Westinghouse Electric Corporation in 1972 [62, 63]. The first color LCD projector was released by Epson in 1989 [64]. The commercialization of these two projectors marks the technical maturity of reconfigurable and adaptive generation of coded apertures for light. DMDs and LCSLMs also become the first choices in coded
10
J. Liang
optical imaging. These complex SLMs are realized by advanced processes of surface micromachining manufacturing. For example, physical vapor deposition and dry plasma ashing are indispensable to the construction of supporting structures of the DMD’s micromirrors that are built directly on top of a CMOS wafer [65]. As another example, extreme ultraviolet lithography enables the fabrication of static random-access memory-based circuitry with a small footprint suiting the increasingly smaller pixels in LC-SLMs. The final contributor is the development of new computational frameworks in imaging science. Of particular interest is the effort to apply compressed sensing in spatial and temporal domains to overcome the limitations of conventional optical imaging systems [66]. Another remarkable research trend is the rise of deep-learning approaches for coded optical imaging. Leveraging targeted training, deep neural networks (DNNs) can adapt themselves well to specific systems or specific scenes. Not relying on the precise acquisition of prior knowledge, they are more flexible than the analytical-modeling-based methods. Recent advances in DNNs also enhance their links with the hardware by optimizing the coded mask [67] and providing feedback on the system’s status [68].
1.7
Advantages of Coded Optical Imaging
Although increasing the system’s complexity, light coding opens up many new routes to enhance the performance of optical imaging systems. First, from the perspective of signal processing, coded masks optically realize many advanced mathematical models. For example, sinusoidal fringes at different orientations allow multiplexed measurement of the scene, followed by information extraction in the spatial frequency domain [69]. As another example, a random mask allows compressively sensing of the scene with far fewer measurements than the number of unknowns [70]. Meanwhile, the prior information
provided by light coding embeds the underlying mathematical models in image reconstruction [71]. The results, which carry the meritorious properties in these mathematical models, improve the performance of the optical imaging systems. For example, many coded optical imaging systems enjoy the Jacquinot advantage (i.e., high light throughput in data acquisition) [72] and the Fellgett advantage (i.e., high signal-to-noise ratio and contrast in the reconstructed image) [73]. Moreover, from the point of view of system design, coded masks break the point-to-point relationship in most conventional optical imaging systems. This enhanced flexibility balances the resources required for data acquisition in the physical domain and information extraction in the computational domain. As a result, the detectors, which possess meritorious specifications (e.g., in detection sensitivity and responsive spectrum) but are limited in functionality (e.g., imaging dimension and frame rate), can be included in the system design. For example, coded optical imaging can use a highly sensitive point detector to capture 2D images with singlephoton sensitivity [74]. With compressed sensing enabled by a coded mask, an ultrafast streak camera, which is conventionally operated with a line field of view, can capture ultrafast 2D movies [75–78]. Also, with compressed sensing, coded optical imaging has been shown to speed up an ordinary CCD camera by over 105 times [79, 80]. Consequently, these new imaging methods break the limitations in manufacturing processes and working conditions. They also bring many practical merits, including low power consumption and cost as well as high reliability and maintainability. Furthermore, in coded optical imaging, the concept of synthetic consideration and joint optimization of optical engineering and image reconstruction largely expand the technical ability and application scope of imaging systems. This disruptive redefinition of imaging problems has fundamentally overturned the inherent perception that the ultimate purpose of imaging should be to create a perfect resemblance of the scene on the sensor. Rather, the key is to rationally de-
1 Introduction to Coded Optical Imaging
sign the task distribution in various parts of the coded optical imaging system to retain information about the scene to the maximum extent possible. Meanwhile, another key is to co-design the reconstruction algorithm, which uses the prior information of the coded mask to unmix multidimensional (e.g., spatial, temporal, and spectral) information in reconstruction. Inspired by this new design paradigm, coded optical imaging has become the first choice in high-dimensional imaging, which permits simultaneously measuring information from multiple photon tags in the plenoptic function. It also enables imaging multi-contrasts beyond intensity, including phase, refractive index, and coherence. These expansions in detection ability, in turn, enable unprecedented applications. These new and improved systems are quickly deployed to explore new applications across numerous disciplines, including biomedicine, physics, and materials science.
1.8
Conclusions
Coded optical imaging is a hybrid imaging paradigm that joins physical data acquisition and computational image reconstruction. It is not a simple stack-up of conventional optical imaging and digital image processing. Rather, the core of this computational imaging paradigm is coded masks that bring together optical engineering and computational techniques. The synergy overcomes the limitations while retaining the fortes in each field, resulting in a “1 + 1 > 2” effect in optical imaging. “Less is more” [81]. Coding the light during the imaging process, although reducing the overall light throughput, attaches more information to the data captured by the sensor and enables informative recovery compared to conventional optical imaging systems. Leveraging the advantages of nanofabrication, microelectronics, and computational frameworks, coded optical imaging has become a major international research focus in optical imaging. This multi-disciplinary research field integrates geometric optics, information optics, computational optics, computer vision, modern signal processing, and many other theories. Its
11
prosperity has led to many new imaging modalities with unprecedented functions, which in turn, sparks new applications.
References 1. RB, L., Tom Wedgwood the First Photographer. London: Duckworth, 1903. 2. From Charles Mackintosh’s waterproof to Dolly the sheep: 43 innovations Scotland has given the world, in The Independent. 2016. 3. Präkel, D., The visual dictionary of photography. 2010: Ava Publishing. 4. Lazovsky, L., et al. CCD sensor and camera for 100 Mfps burst frame rate image capture. in Defense and Security. 2005. SPIE Proceeding. 5. Wang, L.V. and S. Hu, Photoacoustic tomography: in vivo imaging from organelles to organs. Science, 2012. 335(6075): p. 1458–1462. 6. Smith, W.J., Modern optical engineering: the design of optical systems. 2008: McGraw-Hill Education. 7. How fast is Galaxy S5 fast auto focus? Why Galaxy S5 fast auto focus matters?; Available from: https:// galaxys5guide.com/samsung-galaxy-s5-featuresexplained/galaxy-s5-fast-auto-focus. 8. Zhang, S., High-speed 3D shape measurement with structured light methods: A review. Optics and Lasers in Engineering, 2018. 106: p. 119–131. 9. Etoh, T., et al. Toward 1Gfps: Evolution of ultrahigh-speed image sensors-ISIS, BSI, multi-collection gates, and 3D-stacking. in Electron Devices Meeting (IEDM), 2014 IEEE International. 2014. IEEE. 10. Schermelleh, L., et al., Super-resolution microscopy demystified. Nature Cell Biology, 2019. 21(1): p. 72– 84. 11. Mullikin, J.C., et al. Methods for CCD camera characterization. in Image Acquisition and Scientific Imaging Systems. 1994. Spie. 12. Johnston, I., , in The Mozi. 2010, The Chinese University of Hong Kong Press. p. 466-577. 13. Hammond, J.H., The camera obscura: a chronicle. The camera obscura: a chronicle, 1981. 14. Snyder, L.J., Eye of the beholder: Johannes Vermeer, Antoni van Leeuwenhoek, and the reinvention of seeing. 2015: WW Norton & Company. 15. Bettinus, M., Apiaria universae philosophiae mathematicae. Vol. 2. Ferronius. 16. Hammer, V.H., The dappled sunlight under the trees was very strange before and after totality. 2017: McNary Field, Corvallis, OR, United States 17. Cie´slak, M.J., K.A. Gamage, and R. Glover, Codedaperture imaging systems: Past, present and future development—A review. Radiation Measurements, 2016. 92: p. 59–71. 18. Fenimore, E.E. and T.M. Cannon, Coded aperture imaging with uniformly redundant arrays. Applied optics, 1978. 17(3): p. 337–347.
12 19. Braga, J., Coded aperture imaging in high-energy astrophysics. Publications of the Astronomical Society of the Pacific, 2019. 132(1007): p. 012001. 20. Cannon, T. and E. Fenimore, Coded aperture imaging: many holes make light work. Optical Engineering, 1980. 19(3): p. 193283. 21. Dicke, R., Scatter-hole cameras for x-rays and gamma rays. The astrophysical journal, 1968. 153: p. L101. 22. Tipton, M.D., et al., Coded aperture imaging using on-axis Fresnel zone plates and extended gamma-ray sources. Radiology, 1974. 112(1): p. 155–158. 23. Gottesman, S.R. and E.E. Fenimore, New family of binary arrays for coded aperture imaging. Applied optics, 1989. 28(20): p. 4344–4352. 24. Liang, J., Punching holes in light: recent progress in single-shot coded-aperture optical imaging. Reports on Progress in Physics, 2020. 83(11): p. 116101. 25. Takeda, M., H. Ina, and S. Kobayashi, Fouriertransform method of fringe-pattern analysis for computer-based topography and interferometry. JosA, 1982. 72(1): p. 156–160. 26. Srinivasan, V., H.-C. Liu, and M. Halioua, Automated phase-measuring profilometry of 3-D diffuse objects. Applied optics, 1984. 23(18): p. 3105–3108. 27. Geng, J., Structured-light 3D surface imaging: a tutorial. Advances in Optics and Photonics, 2011. 3(2): p. 128–160. 28. Jiang, C., et al., High-speed dual-view band-limited illumination profilometry using temporally interlaced acquisition. Photonics Research, 2020. 8(11): p. 1808–1817. 29. Zuo, C., et al., Micro Fourier Transform Profilometry (μFTP): 3D shape measurement at 10,000 frames per second. Optics and Lasers in Engineering, 2018. 102: p. 70–91. 30. Levin, A., et al., Image and depth from a conventional camera with a coded aperture. ACM transactions on graphics (TOG), 2007. 26(3): p. 70-es. 31. Liang, C.-K., et al., Programmable aperture photography: multiplexed light field acquisition, in ACM SIGGRAPH 2008 papers. 2008. p. 1–10. 32. Raskar, R., A. Agrawal, and J. Tumblin, Coded exposure photography: motion deblurring using fluttered shutter, in Acm Siggraph 2006 Papers. 2006. p. 795– 804. 33. Dowski, E.R. and W.T. Cathey, Extended depth of field through wave-front coding. Applied optics, 1995. 34(11): p. 1859–1866. 34. Shechtman, Y., et al., Multicolour localization microscopy by point-spread-function engineering. Nature photonics, 2016. 10(9): p. 590–594. 35. Shechtman, Y., Recent advances in point spread function engineering and related computational microscopy approaches: from one viewpoint. Biophysical reviews, 2020. 12(6): p. 1303–1309. 36. Wen, G., et al., High-fidelity structured illumination microscopy by point-spread-function engineering. Light: Science & Applications, 2021. 10(1): p. 1–12.
J. Liang 37. Boniface, A., et al., Transmission-matrix-based pointspread-function engineering through a complex medium. Optica, 2017. 4(1): p. 54–59. 38. Weinberg, G. and O. Katz, 100,000 frames-persecond compressive imaging with a conventional rolling-shutter camera by random point-spreadfunction engineering. Optics Express, 2020. 28(21): p. 30616–30625. 39. Nordin, G.P., et al., Micropolarizer array for infrared imaging polarimetry. JOSA A, 1999. 16(5): p. 1168– 1174. 40. Hsu, W.-L., et al., Polarization microscope using a near infrared full-Stokes imaging polarimeter. Optics Express, 2015. 23(4): p. 4357–4368. 41. Sony. Polarization Image Sensor Technology Polar sens™. Available from: https://www.sony-semicon. com/en/technology/industry/polarsens.html. 42. Kodama, R., K. Okada, and Y. Kato, Development of a two-dimensional space-resolved high speed sampling camera. Review of scientific instruments, 1999. 70(1): p. 625–628. 43. Cao, X., et al., A prism-mask system for multispectral video acquisition. IEEE transactions on pattern analysis and machine intelligence, 2011. 33(12): p. 2423– 2435. 44. Dwight, J.G. and T.S. Tkaczyk, Lenslet array tunable snapshot imaging spectrometer (LATIS) for hyperspectral fluorescence microscopy. Biomedical optics express, 2017. 8(3): p. 1950–1964. 45. Takhar, D., et al. A new compressive imaging camera architecture using optical-domain compression. in Computational Imaging IV. 2006. SPIE. 46. Duarte, M.F., et al., Single-pixel imaging via compressive sampling. IEEE signal processing magazine, 2008. 25(2): p. 83–91. 47. Gehm, M.E., et al., Single-shot compressive spectral imaging with a dual-disperser architecture. Optics express, 2007. 15(21): p. 14013–14027. 48. Gao, L. and L.V. Wang, A review of snapshot multidimensional optical imaging: Measuring photon tags in parallel. Physics Reports, 2016. 616: p. 1–37. 49. Liu, J., et al., Coded-aperture broadband light field imaging using digital micromirror devices. Optica, 2021. 8(2): p. 139–142. 50. Inagaki, Y., et al. Learning to capture light fields through a coded aperture camera. in Proceedings of the European Conference on Computer Vision (ECCV). 2018. 51. Chi, W. and N. George, Phase-coded aperture for optical imaging. Optics Communications, 2009. 282(11): p. 2110–2117. 52. Boominathan, V., et al., Recent advances in lensless imaging. Optica, 2022. 9(1): p. 1–16. 53. Chen, H.-T., A.J. Taylor, and N. Yu, A review of metasurfaces: physics and applications. Reports on progress in physics, 2016. 79(7): p. 076401. 54. Lai, Y., et al., Single-Shot Ultraviolet Compressed Ultrafast Photography. Laser & Photonics Reviews, 2020. 14(10): p. 2000122.
1 Introduction to Coded Optical Imaging 55. Lin, X., et al., All-optical machine learning using diffractive deep neural networks. Science, 2018. 361(6406): p. 1004–1008. 56. Ono, S., Snapshot multispectral imaging using a pixel-wise polarization color image sensor. Optics Express, 2020. 28(23): p. 34536–34573. 57. Zhang, Z., et al., Nano-fabricated pixelated micropolarizer array for visible imaging polarimetry. Review of scientific instruments, 2014. 85(10): p. 105002. 58. Bird, G.R. and M. Parrish, The wire grid as a nearinfrared polarizer. JOSA, 1960. 50(9): p. 886–891. 59. Guo, J. and D. Brady, Fabrication of thin-film micropolarizer arrays for visible imaging polarimetry. Applied Optics, 2000. 39(10): p. 1486–1492. 60. Gruev, V., R. Perkins, and T. York, CCD polarization imaging sensor with aluminum nanowire optical filters. Optics express, 2010. 18(18): p. 19087–19094. 61. Liang, J., et al., Grayscale laser image formation using a programmable binary mask. Optical Engineering, 2012. 51(10): p. 108201. 62. Brody, T., et al., A 6× 6-in 20-lpi electroluminescent display panel. IEEE transactions on electron devices, 1975. 22(9): p. 739–748. 63. Fischer, A., Liquid crystal image display panel with integrated addressing circuitry. 1974, Google Patents. 64. Epson. Breakthrough after relentless market research and steady efforts: Looking back at the history of visual innovation. Available from: https://80th.epson. com/en/journey/chapter5_2. 65. Gong, C. and T. Hogan, CMOS compatible fabrication processes for the digital micromirror device. IEEE Journal of the Electron Devices Society, 2014. 2(3): p. 27–32. 66. Rani, M., S.B. Dhok, and R.B. Deshmukh, A systematic review of compressive sensing: Concepts, implementations and applications. IEEE Access, 2018. 6: p. 4875–4894. 67. Yang, C., et al., Optimizing codes for compressed ultrafast photography by the genetic algorithm. Optica, 2018. 5(2): p. 147–151. 68. Marquez, M., et al., Deep-learning supervised snapshot compressive imaging enabled by an end-to-end
13
69.
70.
71.
72.
73.
74. 75.
76.
77.
78.
79.
80.
81.
adaptive neural network. IEEE Journal of Selected Topics in Signal Processing, 2022. 16(3). Oh, W.-Y., et al., High-speed polarization sensitive optical frequency domain imaging with frequency multiplexing. Optics Express, 2008. 16(2): p. 1096– 1103. Willett, R.M., R.F. Marcia, and J.M. Nichols, Compressed sensing for practical optical imaging systems: a tutorial. Optical Engineering, 2011. 50(7): p. 072601. Mait, J.N., G.W. Euliss, and R.A. Athale, Computational imaging. Advances in Optics and Photonics, 2018. 10(2): p. 409–483. Caroli, E., et al., Coded aperture imaging in Xand gamma-ray astronomy. Space Science Reviews, 1987. 45(3–4): p. 349–403. Hirschfeld, T., Fellgett’s advantage in UV-VIS multiplex spectroscopy. Applied Spectroscopy, 1976. 30(1): p. 68–69. Morris, P.A., et al., Imaging with a small number of photons. Nature communications, 2015. 6(1): p. 1–6. Qi, D., et al., Single-shot compressed ultrafast photography: a review. Advanced Photonics, 2020. 2(1): p. 014003. Gao, L., et al., Single-shot compressed ultrafast photography at one hundred billion frames per second. Nature, 2014. 516(7529): p. 74–77. Liang, J., L. Zhu, and L.V. Wang, Single-shot realtime femtosecond imaging of temporal focusing. Light: Science & Applications, 2018. 7: p. 42. Wang, P., J. Liang, and L.V. Wang, Single-shot ultrafast imaging attaining 70 trillion frames per second. Nature Communications, 2020. 11: p. 2091. Liu, X., et al., Fast wide-field upconversion luminescence lifetime thermometry enabled by single-shot compressed ultrahigh-speed imaging. Nature Communications, 2021. 12: p. 6401. Liu, X., et al., Single-shot compressed opticalstreaking ultra-high-speed photography. Optics Letters, 2019. 44(6): p. 1387–1390. Raskar, R. Less is more: coded computational photography. in Asian Conference on Computer Vision. 2007. Springer.
2
Encoders for Optical Imaging Yingming Lai and Jinyang Liang
Abstract
Keywords
Codification, which modulates photo tags in specific dimensions, is an important operation that affects the performance of coded optical imaging systems, such as imaging speed, spatial resolutions, and application scope. An encoder, embodying the deployed codification strategy, is the core component in coded optical imaging systems. Benefiting from the advances of electronic devices and precise control technologies in recent decades, diverse types of optical signal encoders have enabled the development of many notable coded optical imaging modalities. This chapter reviews existing optical signal encoders. In the first part, based on the number of modulated photon tags, they are classified into one-dimensional, two-dimensional, and highdimensional encoders. Their principles and some representative techniques are described. The second part of this chapter presents the implementation of fixed and reconfigurable optical signal encoders. In each category, representative techniques and processes are introduced. Finally, a summary concludes this chapter.
Optical imaging · Computational imaging · Coded aperture · Optical modulation · Optical instrumentation · Temporal encoding · Spatial encoding · Spectral encoding · Photomask fabrication · Spatial light modulator · Micro-electromechanical systems
Y. Lai · J. Liang (O) Laboratory of Applied Computational Imaging, Centre Énergie Matériaux Télécommunications, Institut National de la Recherche Scientifique, Université du Québec, Varennes, QC, Canada e-mail: [email protected]; [email protected]
2.1
Introduction
The encoding of optical signals plays a vital role in the entire imaging process of coded optical imaging [1]. In data acquisition, codification is an indispensable constituent of the optical systems that physically encode data. In image reconstruction, the encoders provide prior information that helps deployed algorithms to retrieve the targeted optical signal [2]. In practice, the encoding method needs to be carefully chosen, because it affects the light throughput efficiency, as well as the quality and recoverability of the collected data. Different encoding operations in data acquisition also lead to different forward models that determine the framework of algorithms used in image reconstruction, which correspondingly affects the computational cost and the quality of results [3, 4]. Intensity- or phase-based encoding can be performed in various dimensions of the optical signal. The specific modulation strategy depends on
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_2
15
16
Y. Lai and J. Liang
the requirements and capabilities of fabricating the encoding components. The original encoding designs used perforated opaque panels and could only modulate the intensity of the optical signal in the spatial dimension [4, 5]. Later, with the development of various sophisticated optical instruments and emerging optical signal processing strategies, various encoders of optical signals in other photon tags were developed [6–9], which led to the development of multidimensional coded optical imaging. This chapter presents major strategies and representative techniques of optical encoders for imaging. In the first section, we review in detail the one-dimensional (1D), two-dimensional (2D), and high-dimensional optical signal encoders. In the second section, we describe the popular techniques for enabling encoding in fixed ways and reconfigurable manners. In the last section, a summary will conclude this chapter.
2.2
Designs of Optical Encoders
The design of the encoding strategy mainly depends on the nature of the optical signal and the capability of the optical system. Optical signals are complex with many photon tags, such as space, angle, time, spectrum, and polarization [1]. These photon tags can all be modulated. However, current mainstream detectors such as charge-coupled devices (CCDs) and complementary metal-oxide-semiconductor (CMOS) cameras are only sensitive to the intensity of light in 2D space. Therefore, most coded optical imaging systems are limited to certain dimensions. For example, temporal modulation of optical signals provides 1D encoding. Popularly used random binary masks provide 2D spatial encoding for compressed sensing (CS) imaging. It is worth noting that some imaging strategies can achieve the encoding of optical signals in high dimensions, although the encoded optical signal still needs to be converted into a 2D intensity distribution to be recorded by electronic detectors [10]. In this section, we will introduce the current mainstream encoders for optical imaging in the order of 1D, 2D, and high-dimensional encoding. Depending on
the encoding strategies and the adopted encoding elements, these encoders cover the dimensions of time, space, angle, spectrum, and polarization. The principles and the representative techniques of the encoders will also be discussed.
2.2.1
One-Dimensional Encoding
The main branch of 1D encoding of optical signals is achieved by controllable time-gating techniques. Advances in the electronic control of light sources and detectors have made it possible to encode light signals through temporal modulation [11]. For example, on the detection side, temporal encoding can be achieved by temporal control of the shutter. On the illumination side, electronic signals trigger a light source to generate pulses of different durations. Precise synchronization is required to achieve accurate temporal encoding. The 1D temporal encoding techniques on the detector side are more common due to the more controllability and integrability of commercially available cameras. As an example, the globallevel temporal encoding of exposure has been used for motion deblurring [8]. The technique integrates a fast and high-contrast shutter in front of the system. Different from conventional exposure, where the shutter is kept open during the entire exposure, this extra shutter follows an openclose pattern during the single exposure through a set of near-optimal coded triggers shown in Fig. 2.1a. Such coded exposure reduces the illposed image deconvolution in traditional motion deblurring. After a least square estimation, the deblurred image could be obtained [8]. A representative illumination-side 1D temporal encoding technique is time-encoded grayscale illumination (TEGI) microscopy [12]. The microscope features a custom LED array capable of generating programmed illumination with 8bit grayscale intensity. In a single exposure, the display of the LED array is divided into dozens of time steps. Temporal encoding is achieved by setting the gray levels in these time steps to be different with maintaining spatially uniform intensities. The signal recovery can be viewed as a nonconvex
2 Encoders for Optical Imaging
optimization problem, which can be solved by nonlinear reconstruction [13]. TEGI’s capability is demonstrated by removing motion blur while imaging microscopic objects in motion [14]. Compared to traditional illumination (i.e., illumination is on for the entire exposure), the coded illumination preserves more high-frequency components. Compared to time-encoded binary illumination [8], the grayscale encoding results in a smaller condition number (Fig. 2.1b).
2.2.2
Two-Dimensional Encoding
17
of restoring the original image, the pattern of the coded aperture and the corresponding PSFs are the key prior information. In 2D encoded optical imaging, not all the masks would work for accurately recovering the original signals. The essential property for identifying the applicability of a coded aperture observation matrix is its orthogonality [15]. Let the optical signal be x, and the image observation matrix be A. Then, the acquired measurement y could be expressed as y = Ax + e,
.
(2.1)
The most popular 2D encoding method modulates light intensity in space by employing a 2D mask including opaque parts and transparent parts [4]. The earliest coded apertures, comprising multiple pinholes, were developed to solve the conflict between the signal-to-noise ratio and the resolution of single-pinhole imaging [5]. Different pattern design gives benefits on noise reduction, image resolution, and computational cost. Unlike the direct imaging systems that produce a single readable image, coded aperture imaging introduces a more complex point spread function (PSF). The images captured on the detector are complex and cannot be distinguished directly. In the process
where e is the error term (e.g., noise and background in the measurement). To recover the optical signal x from the measurement y, we need to know the inverse of the sensing matrix A. In other words, the inverse matrix, denoted by A−1 , should exist and be as accessible as possible. When a matrix is orthogonal, its rank is full. Both the inverse and the transpose of this matrix exist. This property intrinsically guarantees a unique solution in the reconstruction [16]. The optical signal can be recovered by a simple matrix inversion. A well-known example of an orthogonal 2D encoder is the Hadamard matrix [17]. The inverse of an n × n Hadamard matrix H is its transpose
Fig. 2.1 Temporal optical encoder. (a) Differences among traditional exposure, time-encoded binary exposure, and time-encoded grayscale illumination. (b) Fourier transforms of traditional exposure, binary
encoding, and grayscale encoding shown in (a). (Adapted by permission from Optica: Optics Letters, Motion deblurring with temporally coded illumination in an LED array microscope, Ma C. et al., © 2015)
18
Y. Lai and J. Liang
Fig. 2.2 Representative 2D spatial optical encoder. (a) 64 × 64 Hadamard pattern. (b) Square 101 × 101 MURA pattern. (c) 64 × 64 discrete Fourier transform matrix. (d) 64 × 64 pseudo-random pattern. (e) 64 × 64 8-bit pseudo-random grayscale mask. (f) Deep-learning opti-
HT divided by its order n as H−1 = HT /n. Patterns generated from such an orthonormal basis can serve as spatial modulation masks for coded optical imaging. Figure 2.2a shows a mask generated based on the Hadamard matrix with an order of 64. Representative further research of 2D spatial orthogonal encoder is the development of modified uniformly redundant arrays (MURAs), which suppress the noise term compared to the general orthogonal masks [6, 18, 19]. The MURA mask is derived based on quadratic residues. Its size must be a prime number (i.e., length L = 4m + 1, m = 1, 2, 3, . . . ). The generation of MURA could be expressed as
Ai,j
.
⎧ 0, if i = 0 ⎪ ⎪ ⎨ 1, if j = 0, i /= 0 , = 1, if Ci Cj = 1 ⎪ ⎪ ⎩ 0, otherwise
⎧ ⎨ 1, i or j = qudratic residue where .Ci or Cj = modulo L . ⎩ −1, otherwise
mized mask. ((f) is reprinted by permission from IEEE: IEEE Selected Topics in Signal Processing, Deep-learning supervised snapshot compressive imaging enabled by an end-to-end adaptive neural network, Miguel M. et al., © 2022)
The inverse of the MURA pattern is
G(i,j )
.
⎧ if i + j = 0 ⎨ 1, = 1, if Ai,j = 1, (i + j /= 0) . ⎩ − 1, if Ai,j = 0, (i + j /= 0)
Figure 2.2b shows a square MURA mask with a size of 101 × 101 pixels. The MURA matrices are designed to be orthogonal, as well as the image recovery is approximately equal to a δfunction [20]. The noise terms are independent of the image structure for all elements. This linear convolution-based optical coder has been successfully used with its simplicity and corresponding low computational load [21]. In addition to encoding the spatial dimension, orthogonal 2D encoders have also been implemented in the Fourier domain for light-field imaging. Light field describes the amount of light traveling in every viewpoint in a scene and contains a wealth of information about the spatial and angular properties of the optical signal [22]. Coded aperture light field (CALF) imaging is a
2 Encoders for Optical Imaging
19
promising method to capture light fields with high angular and spatial resolutions at a low cost [23]. In CALF imaging, multiple orthogonal masks encode the aperture in the Fourier domain. A complex and structured light field encodes spatial and angular information is created. The captured light field data is then processed using computational algorithms, which typically involve Fourier transforms, and inverse Fourier transforms to separate and manipulate the frequency components [24]. Advanced CALF imaging can efficiently capture 2D spatial and 2D angular information in a camera’s full pixel count at a video rate [25]. Apart from the orthogonal matrix and linear convolution schemes, various coding methods linked to nonlinear optimization models have emerged in recent years, which can significantly improve the performance of coded optical imaging. These advances lead to new strategies for 2D spatial encoding of optical signals. Among them, CS is the most prominent one. In 2006, the famous single-pixel imaging (SPI) was developed using CS, which shows the possibility of accurately extracting high-resolution images from incomplete measurements when the signal is sparse (i.e., compressible) in a certain domain [26–32]. Assume the compressible signal is x that can be expressed in a sparse basis w by x = wx' , where x' denotes the coefficients in this basis. The image acquisition is expressed by an observation matrix o. Thus, the measurement y can be formulated as .
y = 0 • wx ' + e.
(2.2)
Here, the observation matrix 0 needs to meet the restricted isometry property (RIP) [33, 34], which essentially requires that the observation matrix is mutually uncorrelated to the sparse basis w. The RIP equivalently means that every row of the observation matrix 0 behaves as an orthonormal system. In this way, the signal can be recovered from massively compressed measurements with low loss or completely by iteratively estimating the solution of the l-norm problem [35]. In practice, there is no explicit and general way to construct CS matrices that satisfy RIP [36, 37]. Rather, examining if a known type of matrix sat-
isfies RIP is common. The earliest proven matrix that satisfies RIP is the Fourier matrix (Fig. 2.2c) [38], which is uncorrelated with most matrices constructed from fixed orthonormal bases. Later, the matrix constructed by random entries (Fig. 2.2d) has also been proved to meet RIP with the largest range of nonzero entries in the event, implying that random matrices are universally suitable for most events [39]. RIP also provides a valuable framework for guiding matrix optimization. As an example, Fig. 2.2e represents a pseudo-random grayscale-coded aperture designed to increase the dynamic range of compressed spectral imaging [40]. Moreover, deep learning has emerged as an effective tool for optimizing the aforementioned coded masks. Figure 2.2f shows an optimized mask by a deep neural network specifically designed for snapshot compressive optical imaging [41].
2.2.3
High-Dimensional Encoding
Besides the 1D and 2D encoders for optical imaging, there are techniques to simultaneously encode optical signals in multiple photon tags. Despite more complexity in system configurations and more computational load in image reconstruction, these high-dimensional coded optical imaging approaches can capture more information than their 1D and 2D counterparts.
2.2.3.1 Spatiospectral Encoding Spectrum is an important photon tag for implementing coded optical imaging. Spectral encoding can be achieved by modulating the light intensity or by leveraging the wavelength-time relationship in the spectral-spatial domain [42, 43]. The spectral encoding techniques are mainly based on the modulation of the light propagation behavior or the refractive index of the propagation medium [1]. As an example, a spatiospectral encoder has enabled the development of snapshot compression hyperspectral cameras [44, 45]. The encoder relies on a set of modified FabryPerot (FP) cavities with different FP gaps that produce varying intensity modulations at different wavelengths (Fig. 2.3). In a single
20
Y. Lai and J. Liang
Fig. 2.3 Fabry-Perot (FP) cavity array for spectral encoding in 2D space. (a) Schematic of an FP cavity array. (Reprinted by permission from Optica: Optics Letters, Multi-aperture snapshot compressive hyperspectral camera, Oiknine Y. et al., © 2018). (b) Transmittance-
encoded patterns at four different wavelengths created by FP filters. (Reprinted by permission from Springer: Nature Photonics, Video-rate hyperspectral camera based on a CMOS-compatible random array of Fabry–Pérot filters, Motoki Y. et al., © 2023)
exposure, the FP array achieves different spectral encoding to the input scene. A hyperspectral (x, y, λ) datacube is retrieved by the two-step iterative shrinkage/thresholding algorithm [46]. As another example, coded aperture snapshot spectral imaging (CASSI) is a typical 3D encoding method combining 2D spatial and spectral dimensions. It is developed to measure the hyperspectral information of an object [47]. As shown in Fig. 2.4, CASSI uses a spatial-coded aperture to encode the optical signal. At the same time, a dispersive component, such as a grating or a prism, is implemented to disperse wavelengths in the encoded optical signal to different spatial positions on the detector [7]. When the coded aperture satisfies RIP, the spectrum of the object can be reconstructed by CS-based algorithms from the measurement [48, 49].
by an electro-optic modulator (EOM, detailed in Sect. 2.3.2.1) to generate a series of discrete pulses in different spectral bands, which are successively amplified to high-peak powers. The spectro-temporal encoded pulses are sent onto a diffraction grating, which spatially separates them for line scanning by a galvanometer scanner. Each pulse illuminates a distinct pixel at a specific time point. The fluorescence signals are recorded by a high-speed photodetector and a fast oscilloscope to quantify the lifetimes in one line. Finally, the galvanometer scanner moves the excitation across the field of view to acquire a 2D lifetime map [51].
2.2.3.2 Spatio-Spectral-Temporal Encoding Encoded imaging in the spectral domain with spatial and temporal dimensions can be achieved through optical fibers [50]. By chirping ultrafast laser pulses, Karpf et al. [51] developed a spectral-temporal-spatial encoded architecture, named spectro-temporal laser imaging by diffractive excitation (SLIDE) (Fig. 2.5). In this system, a board-band Fourier-domain modelocked (FDML) laser is temporally modulated
2.2.3.3 Spatiotemporal Encoding Coded aperture compressive temporal imaging (CACTI) and compressed ultrafast photography (CUP) are two representative techniques combining temporal and 2D spatial encoding. The core components of CACTI are a temporally controlled mechanical translation stage and a spatially coded aperture (Fig. 2.6a). During exposure, the translation stage moves the coded aperture over time. The spatial information corresponding to each time interval in the event is modulated by the shifted version of the coded aperture. Pixel-wise modulation over the entire sensor is implemented in this process [52].
2 Encoders for Optical Imaging
21
Fig. 2.4 Schematic of a typical CASSI system. (Reprinted by permission from IEEE: IEEE Signal Processing Magazine, Compressive coded aperture spectral imaging, Arce G. et al., © 2014)
CUP is slightly different from CACTI in the encoding model. As shown in Fig. 2.6b, the coded aperture of the entire event datacube is fixed to a single pattern, while the encoded datacube is subjected to a shearing operation so that the frames are deflected to different spatial positions on the detector. Akin to CACTI, CUP employs the continuous temporal shearing operation. Its imaging speed is jointly determined by the shearing speed and the pixel size of the camera [53– 57]. In general, both CACTI and CUP generate an observation matrix containing multiple suboperators. The random mask used in these two techniques guarantees that the observation matrix satisfies RIP and that the signal can be recovered by CS-based algorithms [54, 58, 59]. Pixel-level temporal encoding on the detector side is another strategy to achieve temporal encoding over a 2D spatial domain. The representative technology is compressive temporal imaging using a shuffled rolling shutter [60, 61]. A rolling shutter is a common image-capturing method, which sequentially acquires and reads data from each row of pixels on the camera (Fig. 2.7a). By shuffling the position of the pixels in each shutter-
opening subinterval, this shuffled rolling shutter achieves temporal modulation at the pixel level [62]. As shown in Fig. 2.7b, when the shuffled rolling shutter camera samples different spatial pixels across the detector, each captured image can be considered an encoded temporal integration. High-speed video can be recovered from a snapshot using a CS-based reconstruction algorithm.
2.2.3.4 Spatio-polarization Encoding The polarization dimension is also an important degree of freedom for encoding optical imaging combined with the other dimensions [55, 63]. Polarization reflects rich information about the physical, chemical, and biological properties of the objects. Similar to temporal encoding, polarization encoding can also be implemented in either illumination or detection sides [64– 66]. In the former methods, 2D varying coded structured illumination is used to modulate light with different polarization states [64]. In the latter approaches, Schonbrun et al. integrated spatially varied polarizer filters onto the camera sensor to encode light’s polarization states
22
Y. Lai and J. Liang
Fig. 2.5 Principle of spectro-temporal laser imaging by diffractive excitation (SLIDE). Each pulse has both a unique wavelength and time (spectro-temporal encoding) leading to a sequential and pixel-wise illumination. Fluorescence signals are recorded at high-detection bandwidth to enable fluorescent lifetime imaging at high speed.
EOM: Electro-optic modulator; FDML: Fourier-domain mode-locked. (Reprinted by permission from Springer: Nature Communications, Spectro-temporal encoded multiphoton microscopy and fluorescence lifetime imaging at kilohertz frame-rates, Karpf S. et al., © 2020)
[65, 66]. Figure 2.8 shows a typical modality of encoding in the domain of polarization and spectrum. A chiral disperser is used to transform an array of micropolarizer to be color filters for spectral encoding [65]. In image reconstruction, the optical information could be recovered by either linear or nonlinear reconstructions.
mask and integrating it into the imaging systems, greatly saving space and compacting the system. Nevertheless, the fixed coded aperture cannot be easily changed once prepared. Especially for multiple-shot imaging using different coded masks, this limitation could bring in additional errors due to pattern misalignment [67]. In contrast, dynamic optical encoding mainly uses specialized electro-optic devices to modulate optical signals. Despite the capability of easily changing the encoding patterns without making changes to the optical system, this approach usually requires a more complex setup and extra space in the optical path [54]. This section introduces some typical and noteworthy techniques in both approaches.
2.3
Implementation of Optical Encoders
Section 2.2 shows that optical encoding can be implemented statically or dynamically. The former is realized mainly by fabricating a coded
2 Encoders for Optical Imaging
23
Fig. 2.6 Operating principle of two coded-aperture spatiotemporal imaging modalities. (a) Coded aperture compressive temporal imaging. (b) Compressed ultrafast photography. C: spatial encoding operator, F: image blurring operator, D: image distortion operator, S: shearing operator, T: spatiotemporal integration operator. ((a) is reprinted
by permission from Optica: Optics Express, Coded aperture compressive temporal imaging, Llull P. et al., © 2013; (b) is adapted by permission from Springer: Light: Science & Applications, Single-shot real-time femtosecond imaging of temporal focusing, Liang, J. et al., © 2014)
2.3.1
patterning is a key process to achieving goals [71]. Techniques for making photomasks, such as inkjet printing and lithography, are inherently suitable for making masks for encoded optical imaging.
Fixed Encoding
Fixed encoding uses static-coded masks that work as specific components in imaging systems. The preparation of static coded masks benefits from the development of microfabrication, which is critical to modern microelectronics, microelectromechanical systems (MEMSs), and microanalytical systems [68–70]. In microfabrication,
2.3.1.1 High-Resolution 2D/3D Printing High-resolution 2D printing is a direct writing technique that generates arbitrary patterns
24
Y. Lai and J. Liang
Fig. 2.7 Illustration of the acquisition schemes of (a) normal and (b) shuffled rolling shutter. (Reprinted by permission from Felipe Guzman and Esteban Vera: Fourier
Transform Spectroscopy 2021, Improved compressive temporal imaging using a shuffled rolling shutter, Guzmán F. et al., © 2021)
Fig. 2.8 Encoding polarization and spectrum. (a) Single micropolarizer pixel consisting of a wire grid polarizer. Scale bar: 2 μm. (b) A 4 × 4 array of micropolarizer is transformed into a 4 × 4 array of color filters by the chiral dispersive element. (c) Poincaré sphere representation of linear and circular birefringence. RCP: right-handed circular polarization, LCP: left-handed circular polarization, V:
vertically polarized, H: horizontally polarized. (d) Chiral dispersive element transforms broadband linear polarized light to spectrally rotated linearly polarized light. θ in : angle of the input polarizer, θ out : angle of the output polarizer. (Adapted by permission from Optica: Optics Letters, Polarization encoded color camera, Schonbrun E. et al., © 2014)
including coded masks with design flexibility. The representative technology in 2D printing is high-resolution inkjet printing, which is an additive manufacturing technology that applies functional materials with mechanical, optical, or chemical properties to print components for electronics, sensing, chemical processing, life sciences, and optics [72–74]. Figure 2.9a
shows a pseudo-random mask prepared by highresolution 2D inkjet printing that can be used in a single-shot compressed temporal imaging system [75, 76]. During the printing process, the ink is pumped through a nozzle to form a picolitersized jet of droplets. Uniformly spaced and sized droplets are obtained by applying periodic dithers [77]. Commercial inkjet printers can reach a
2 Encoders for Optical Imaging
25
Fig. 2.9 Advanced printing technologies for coded mask generation. (a) A plastic mask with a random pattern fabricated by a high-resolution 2D inkjet printer. (b) A
cyclic S-mask fabricated by a 3D printer. Inset: a zoomedin view of the transmissive pixels captured by an optical microscope. Size of an encoding pixel: 200 μm
spatial resolution of 5 μm. The advantages of inkjet printing include high reliability, cheap price, and the ability to manufacture colorcoded apertures [73, 78]. Inkjet is a noncontact deposition technology. The typical distance between the substrate and the inkjet head is around 1 mm. Therefore, the risk of damaging fragile substrates is low. In recent years, benefiting from the rapid development and popularization of 3D printers, high-resolution 3D printing has become an option for preparing photomasks [79]. Mainstream 3D printing can be seen as an extension of 2D printing to the z-direction. 3D printing is achieved by accurately layering or voxelating the 3D model in the height dimension and then constructing each layer or voxel by curing methods, such as ultraviolet (UV) curing. Figure 2.9b shows a cyclic S-pattern prepared by 3D printing. Unlike 2D printing, which only prepares the photomask on the standard substrates, 3D printing can be used to directly fabricate photomasks with the desired thickness above the smallest layer height [80]. The limitation of 3D printing is the relatively low resolution and the small printable size. For example, curing may result in unwanted joints when the features near the highest resolution of the device. The uniformity of the entire printed pattern may also decrease as the features become
smaller [81, 82]. With the maturing of the 3D industry, there is still a lot of potential for fabricating coded apertures in high resolution and high quality [83].
2.3.1.2 Drilling Making coded apertures by drilling is a process that can be traced back to coded aperture imaging with multiple pinholes. Unlike inkjet printing, which creates a mask by adding opaque sections to a transparent substrate, drilling creates transmissive sections on an opaque substrate [84]. This method is commonly used to pattern rigid materials, such as metals and alloys [85], which are commonly used for pinhole imaging under highenergy conditions, such as x-rays and gamma rays [86]. The advantage of the drilling process is its low cost and universal applicability to diverse substrates. Nevertheless, its resolution is highly dependent on the drilling tools [87]. The small size of the drill allows for high patterning resolution but makes it easy to be broken and damage the mask. Moreover, the drilling method usually generates the transmissive pixels (i.e., holes) one by one, making it difficult to fabricate photomasks with high pixel counts. The development of high-energy lasers and their wide industrial applications provide new capabilities for drilling-based photomask
26
Y. Lai and J. Liang
Fig. 2.10 Drilling for coded mask generation. An aluminum mask with a cyclic pattern fabricated by laser drilling. Inset: a zoomed-in view of the drilled holes captured by an optical microscope
fabrication. Lasers are a well-established microetching tool in many aspects of microelectronics manufacturing due to their noncontact nature, scalable output power, and high-precision control with computers (Fig. 2.10) [88–90]. Laser drilling removes material by the target absorbing laser power, generating intense localized heat and causing the target to melt and evaporate. This means that the material cannot be transparent at the laser wavelength [91, 92]. The scope and quality of laser drilling depend on the target material and laser properties. In general, wavelength limits the smallest 3D spatial feature that can be machined by laser drilling. For example, shorter wavelengths, such as deep UV, can be focused to a smaller spot size but also have a lower penetration depth [93]. In addition, laser drilling is limited by the target thickness. The Rayleigh length of the laser focus needs to be larger than the target’s thickness [91].
2.3.1.3 Lithography Photolithography can be used to fabricate coded apertures with high resolution and large scale. This technique is efficient for the mass production of microstructures, such as the ones in integrated circuits [94]. In lithography, radiationsensitive polymer materials are used to create patterns in substrates. A typical process for fabricating photomasks by photolithography includes five steps: coating, exposure, development, etching, and stripping. When coating, the photosen-
sitive material is spin-coated to form a resist layer on a substrate. The resist film is then exposed through a mask or directly with a finely focused electron beam. After exposure, the resist film is typically developed by immersing it in a developer solvent, resulting in the creation of a 3D relief image. Exposure causes the resist film to become soluble in the developer, leading to the formation of a positive image of the mask. Conversely, unexposed areas become harder to dissolve, resulting in a negative image. During the etching of transferring the resist image onto the substrate, the remaining resist film functions as a protective mask. Ultimately, the remaining resist film is stripped, leaving the desired pattern on the substrate [95, 96]. Advanced microlithography has been used to fabricate high-resolution optical coded apertures as shown in Fig. 2.11a, b. In addition to traditional coded apertures for intensity modulation, lithography can also fabricate phase masks by controlling the material’s optical thickness. Combined with deposition and ion etching, lithography can design spatially varying polarization filters on the nanometer scale, producing coded apertures for polarization dimensions [97, 98]. Photolithography also provides the capability of creating coded apertures on the sensor. In the chip fabrication process, the coded sensor can be obtained by directly forming a coded mask on the sensor (Fig. 2.11c) [99]. The constructed coded optical imaging system thus does not require extra
2 Encoders for Optical Imaging
27
Fig. 2.11 Coded masks generated by representative lithography techniques. (a) Chromium mask used in the ultraviolet-compressed ultrafast photography system. Inset: high-solution partial view of the mask with an encoding pixel size of 40 μm, captured by an optical microscope. (b) Electron micrographs of a MURA-coded aperture with 10 nm holes manufactured by lithography. (c) Phase mask directly fabricated on an image sensor by imprint lithog-
raphy. ((b) is reprinted by permission from IUCr: Journal of Synchrotron Radiation, A coded aperture microscope for X-ray fluorescence full-field imaging, Siddons D. et al., © 2020; (c) is adapted by permission from Creative Commons License (CC BY 4.0: https://creativecommons. org/licenses/by/4.0): IEEE Photonics Journal, Fabrication of integrated lensless cameras via UV-imprint lithography, Lee Y., et al., 2022)
space or optical components between the coded aperture and the sensor, which can greatly reduce the size of the coded optical imaging system and avoid the deviation of the extra optical path [54, 63]. Large quantities of finished coded apertures can be prepared by lithography at one time [100]. However, lithography will cost a relatively high price compared to other methods.
ent patterns, structures, and periods, metasurfaceencoded apertures can encode intensity and phase in space, spectrum, and polarization [103–105]. The advantage of preparing coded apertures from metasurfaces is its unique applications with encoding in hybrid freedoms of optical signals. A limitation of the metasurface-coded aperture is that suitable humidity and temperature are required in operating [106, 107], and the prepared metasurfaces require suitable storage conditions. As an example, Fig. 2.12a, b shows the schematic and transmission spectrum of a metasurface composed of subwavelength photonic sieves. They have been used to improve the performance of SPI. The small size of the metasurface makes the entire system compact. In addition, the metasurface structure of the
2.3.1.4 Metasurface Benefiting from the development of metamaterials, fabricating coded apertures through metamaterials has become another promising option for researchers. The metasurface aperture consists of a waveguide structure that has a subwavelength size, with resonant metamaterial elements etched into the surface [101, 102]. By designing differ-
28
Y. Lai and J. Liang
Fig. 2.12 Metasurface-based coded masks. (a) Scanning electron microscopy image (left image) of a fabricated metasurface sample containing photon sieves (images in the right column). (b) Simulated transmission spectrum of the metasurface in (a). (Adapted by per-
mission from Creative Commons License (CC BY 4.0: https://creativecommons.org/licenses/by/4.0): Nanophotonics, Single-pixel imaging based on large capacity spatial multiplexing metasurface, Yan J. et al., © 2022)
photonic sieve pattern is simple to fabricate, which is more efficient [108].
2.3.2.1 Electro-optic Modulator (EOM) An EOM is an electronically controlled device to temporally encode an optical signal. EOM light modulation is based on the electro-optic (EO) effect, which electrically changes the optical properties of certain materials [114]. EOMs are divided into the electro-refractive type and electroabsorption type [111]. The former is based on the birefringence characteristics of some EO materials. In the EO crystal, there are two orthogonal axes, called the “fast” axis and the “slow” axis, with different refractive indices. When the EO crystal is applied with an electric field, the incident orthogonally polarized light propagates at different speeds on the fast and slow axes, resulting in a phase difference [115]. When the half-wave voltage is applied, the phase difference is tuned to π that gives a 90-degree polarization rotation. The second-order EO response to the applied electric field is called the Pockels effect, while the third-order nonlinear response to the electric field is called the Kerr effect. In the electro-absorption EOMs, the energy state of the material’s electrons changes. When an electric field is applied, the field effect tilts the band edges of the material. A significant reduction of the bandgap value is accompanied, allowing the material to absorb light whose wavelength previously lies in the transmission band. The optical signal at the corresponding wavelength is then modulated. This effect is evident in direct
2.3.2
Reconfigurable Encoding
A better ability to control the encoding masks precisely and efficiently contributes to enhancing the quality of coded optical imaging. To accomplish this goal, a useful method is reconfigurable encoding. A notable impetus is the invention of spatial light modulators (SLMs). An SLM is a versatile device that can modulate the polarization, amplitude, and/or phase of light at high refresh rates [109]. Most SLM technologies were originally developed for digital display, where large arrays of individual electronically addressable pixels must rapidly modulate light [110]. SLM devices with such capabilities are also directly integrated into optical systems to encode optical signals. For example, electrooptic modulators (EOMs) have been used as high-speed temporal encoders [111]. MEMSbased digital micromirror devices (DMDs) and liquid crystal (LC)-SLMs have been widely used to produce high-resolution spatial masks [112, 113]. An SLM can be programmed to refine the coded aperture based on measurement feedback. Compared to their fixed counterparts, programmable dynamic coded apertures enhance the performance and versatility of imaging systems.
2 Encoders for Optical Imaging
29
Fig. 2.13 Principle of EOM-based temporal modulation of an optical signal
bandgap semiconductors and exists in a more efficient modulation of photons close to the bandgap value [111]. Due to their sensitive response, high speed, and general availability, EOMs are widely used for temporally encoding the amplitude, phase, wavelength, and polarization of optical signals and any combination of the above [51, 116–118]. Among them, amplitude modulation is the most common way to temporally encode optical signals using EOM. Figure 2.13 shows an electrorefractive EOM temporally modulates the optical signal intensity with a pair of polarizers. The first polarizer polarizes the optical beam to the crystal’s principal axes. When no electric field is applied, EO crystal is an ordinary transparent medium. When external time-varying electrical signals are applied, the EO crystal changes the polarization state of the beam accordingly. The subsequent analyzer only transmits the component with the specific polarization state [119]. Hence, different transmittances for the incident beam are achieved. To date, some tested EO materials achieve a modulation speed of around 50 gigahertz (GHz) [120].
2.3.2.2 Digital Micromirror Device (DMD) DMDs use a MEMS-based method to modulate light. As shown in Fig. 2.14, each DMD consists of an array of micromirrors (which are the DMD’s pixel units), whose tilt can be individually controlled electronically to be either +12◦ or −12◦ from the surface normal. The code value “1” is achieved when the micromirror is in one of these binary states reflecting incident light to the detector, while “0” means the micromirror is in the other state reflecting light away from
the detector [121]. This technique has been used for binary and grayscale amplitude modulation in various fields of beam shaping and optical imaging [29, 122–126]. Ideally, all binarycoded apertures can be implemented by DMD dynamically. Grayscale-coded apertures can also be achieved by micromirror dithering or error diffusion [124, 127]. Capable of dynamically displaying patterns and controlling single-pixel sequential amplitudes at high speeds, the DMD is a popular choice for encoding optical signals. The DMD has a fast refresh rate of up to tens of kilohertz (kHz). Because the technology is based on aluminum mirrors, the optical losses in UV-visible-nearinfrared bands are small. DMDs also support displays with more than one million pixels [128]. Using these meritorious technical features, DMDs can modulate optical signals with high speed, high accuracy, and high resolution. It has been used for spatial, spectral, temporal, and hybrid encoding of optical signals [25, 53, 129]. Although conventionally considered to be an amplitude SLM, the DMD has been used to modulate the phase thanks to relevant techniques such as Lee holography [130] and super-pixel modulation [131, 132]. The wide commercial availability of DMDs offers lots of low-cost options. DMDs have excellent reliability and come with support tools that make them popular to use. In practice, generally, the DMD has a different resolution or pixel count from the detector. The pixel-to-pixel correspondence between the DMD and detector is not matched, which affects the quality of measurements on encoded optical signals. Although this constraint can be solved by adjusting the magnification ratio of the imaging
30
Y. Lai and J. Liang
Fig. 2.14 Digital micromirror device (DMD). (a) Photo of a DMD chip and a zoomed-in view. (b) The “ON” and “OFF” positions of micromirrors during operation. (c) The mirror in the “ON” state projects light to the
detector. The mirror in the “OFF” state reflects the light out of the system. (Adapted by permission from Texas In® struments , DLP Products, https://www.ti.com/dlp-chip/ overview.html)
Fig. 2.15 Liquid crystal spatial light modulator. (a) Typical structure of an LC-SLM. (b) Uniformly redundant array (URA) mask generated by an LC-SLM. ((a) is reprinted by permission from AIP: Review of Scientific Instruments, An interferometric method for local phase
modulation calibration of LC-SLM using self-generated phase grating, Zhao Z. et al., © 2018; (b) is reprinted by permission from Optica: Optics Express, Optical imaging with phase-coded aperture, Chi W. et al., © 2011)
optics between the DMD and the detector, the system complexity is increased. Due to the initial tilt settings of the micromirrors, the DMD also needs to be placed in Littrow conditions in the optical system to avoid distortion as much as possible. Despite improving the encoding quality, this setup restricts the system’s field of view [54].
2.3.2.3 Liquid Crystal Spatial Light Modulator The LC-SLM is probably the most widely used SLM, as the LC technology was one of the first digital display technologies, and its success made the LC relatively inexpensive and ubiquitous. The typical structure of a reflective LC-SLM is shown in Fig. 2.15a. It looks like a “sandwich”. The
2 Encoders for Optical Imaging
31
Fig. 2.16 Microshutter array in NASA’s James Webb Space Telescope. (a) Overview. (b) SEM image showing several microshutters. (Adapted by permission from
NASA: Microshutters Webb/NASA, https://webb.nasa. gov/content/about/innovations/microshutters.html)
upper part is the cover glass with a transparent conductive film. The middle layer contains LC molecules. The bottom part is the silicon substrate containing discontinuous reflection pixels [133]. LC-SLMs are typically phase-modulating devices that achieve controlled phase retardation at the pixel level through electronic manipulation of the LC alignment axis. When a voltage is applied, the LC molecular twists, resulting in a change in the birefringence coefficient. This electro-optic effect is known as the electrically controlled birefringence effect. Within an LCSLM, the degree of LC rotation determines the relative phase shift of light at each pixel, meaning that the incident light must be polarized at a specific angle to enable efficient modulation [134]. LC-SLMs are commonly used in reflection mode because the LC control substrate is usually silicon with a highly reflective layer. While transmissionmode LC-SLMs are available, they often display lower resolution and luminous flux compared to their reflective counterparts. LC-SLMs can generate phase and intensity masks and modulate the complex amplitude of the optical signal (Fig. 2.15b). A phase shift of 2π per pixel over 4160 × 2464 pixels can be achieved with a small change in amplitude [135]. LCSLMs can typically modulate light over a broad spectral region, ranging from 400 to 1800 nm [136]. The total phase shift varies with the specific wavelength. Antireflection coatings on the cover glass can be used to increase throughput in specific spectral ranges. Throughput also depends on factors other than reflection, such as fill fac-
tor and light utilization efficiency. LC-SLMs are sensitive to light polarization, which allows them to achieve intensity modulation with polarization optics [134]. LC-SLMs can display variable complex modulations at high rates [137]. The optimal number of masks can be dynamically generated using electronically controlled LC-SLMs. The modulation speed of LC-SLM is typically in the range of commercial display refresh rates (60– 120 Hz). A higher modulation rate of 1–2 kHz could be achieved by boosting the LC response time or decreasing the pixel count [138].
2.3.2.4 Microshutter Array Coded optical imaging has also been implemented in astronomy for spectral imaging, which reveals clues about the state, temperature, speed, distance, and composition of stars and galaxies. The microshutter array is a part of the nearinfrared spectrograph (NIRSpec) in NASA’s James Webb Space Telescope (Fig. 2.16a) [139]. It consists of an array of 248,000 dynamically controllable micro-windows. Individual shutters are patterned with a torsion flexure permitting shutters to open 90 degrees with minimized stress concentration (Fig. 2.16b). The microshutter array is developed to image many objects in one measurement. Before observation, the status of each microshutter is set when a magnetic arm sweeps past, depending on whether it receives an electrical signal that tells it to be opened or closed [140]. An open shutter lets light from a selected target in a particular part of the sky pass through NIRSpec, while a closed shutter blocks
32
Y. Lai and J. Liang
unwanted light from any objects. The spectra of up to 100 objects can be recorded simultaneously with high spectral resolution [141].
2.4
Summary
This chapter reviews several representative methods for encoding optical signals and the ways they are implemented. In general, depending on the capabilities of the chosen instruments, the encoding of optical signals can be realized in different dimensions. Different encoding methods have their benefits and drawbacks in terms of size, capability, simplicity, and cost. The optimal encoding method should be selected based on the particular use case and application environment of the imaging system. Many more emerging technologies can realize coding with different degrees of freedom. For more detailed content, readers can refer to the literature listed in this chapter.
References 1. J. Liang, “Punching holes in light: recent progress in single-shot coded-aperture optical imaging,” Rep. Prog. Phys., 83 (11), p. 116101, (2020). 2. J. Liang and L. V. Wang, “Single-shot ultrafast optical imaging,” Optica, 5 (9), pp. 1113–1127, (2018). 3. M. J. Cie´slak, K. A. A. Gamage, and R. Glover, “Coded-aperture imaging systems: Past, present and future development—a review,” Radiat. Measur., 92 pp. 59–71, (2016). 4. T. M. Cannon and E. E. Fenimore, “Coded aperture imaging: many holes make light work,” Optical Engineering, 19 (3), p. 193283, (1980). 5. R. H. Dicke, “Scatter-hole cameras for x-rays and gamma rays,” Astrophys. J., 153 p. L101, (1968). 6. E. E. Fenimore, “Coded aperture imaging: predicted performance of uniformly redundant arrays,” Appl. Opt., 17 (22), pp. 3562–3570, (1978). 7. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle, “Compressive coded aperture spectral imaging: an introduction,” IEEE Signal Process. Mag., 31 (1), pp. 105–115, (2014). 8. R. Raskar, A. Agrawal, and J. Tumblin, “Coded exposure photography: motion deblurring using fluttered shutter,” Acm Siggraph 2006 Papers, pp. 795– 804, (2006). 9. Y. Lai et al., “Compressed ultrafast tomographic imaging by passive spatiotemporal projections,” Opt. Lett., 46 (7), pp. 1788–1791, (2021).
10. A. A. Wagadarikar, N. P. Pitsianis, X. Sun, and D. J. Brady, “Spectral image estimation for coded aperture snapshot spectral imagers,” in Image Reconstruction from Incomplete Data V, 2008, vol. 7076: SPIE, p. 707602. 11. J. R. Moffitt, C. Osseforth, and J. Michaelis, “Timegating improves the spatial resolution of STED microscopy,” Opt. Express, 19 (5), pp. 4242–4254, (2011). 12. C. Ma, Z. Liu, L. Tian, Q. Dai, and L. Waller, “Motion deblurring with temporally coded illumination in an LED array microscope,” Opt. Lett., 40 (10), pp. 2281–2284, (2015). 13. W. H. Richardson, “Bayesian-based iterative method of image restoration,” J. Opt. Soc. Am., 62 (1), pp. 55–59, (1972). 14. S. S. Gorthi, D. Schaak, and E. Schonbrun, “Fluorescence imaging of flowing cells using a temporally coded excitation,” Opt. Express, 21 (4), pp. 5164– 5170, (2013). 15. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, 52 (4), pp. 1289–1306, (2006). 16. F. M. Roummel, T. H. Zachary, and M. W. Rebecca, “Compressive coded aperture imaging,” in Computational Imaging VII, 2009, vol. 7246: SPIE, p. 72460G. 17. L. D. Baumert and M. Hall Jr, “A new construction for Hadamard matrices,” (1965). 18. S. R. Gottesman and E. E. Fenimore, “New family of binary arrays for coded aperture imaging,” Appl. Opt., 28 (20), pp. 4344–4352, (1989). 19. A. Busboom, H. Elders–Boll, and H. D. Schotten, “Uniformly redundant arrays,” Exp. Astron., 8 pp. 97–123, (1998). 20. E. E. Fenimore and T. M. Cannon, “Uniformly redundant arrays: digital reconstruction methods,” Appl. Opt., 20 (10), pp. 1858–1864, (1981). 21. T. M. Cannon and E. E. Fenimore, “Tomographical imaging using uniformly redundant arrays,” Appl. Opt., 18 (7), pp. 1052–1057, (1979). 22. Y. Inagaki, Y. Kobayashi, K. Takahashi, T. Fujii, and H. Nagahara, “Learning to capture light fields through a coded aperture camera,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 418–434. 23. Y.-P. Wang, L.-C. Wang, D.-H. Kong, and B.-C. Yin, “High-resolution light field capture with coded aperture,” IEEE Trans. Image Process., 24 (12), pp. 5609–5618, (2015). 24. A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and J. Tumblin, “Dappled photography: Mask enhanced cameras for heterodyned light fields and coded aperture refocusing,” ACM Trans. Graph., 26 (3), p. 69, (2007). 25. J. Liu, C. Zaouter, X. Liu, S. A. Patten, and J. Liang, “Coded-aperture broadband light field imaging using digital micromirror devices,” Optica, 8 (2), pp. 139–142, (2021).
2 Encoders for Optical Imaging 26. E. J. Candes and T. Tao, “Near-optimal signal recovery from random projections: universal encoding strategies?,” IEEE Trans. Inf. Theory, 52 (12), pp. 5406–5425, (2006). 27. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, 52 (2), pp. 489–509, (2006). 28. E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Commun. Pure Appl. Math., 59 (8), pp. 1207–1223, (2006). 29. P. Kilcullen, T. Ozaki, and J. Liang, “Compressed ultrahigh-speed single-pixel imaging by swept aggregate patterns,” Nat. Commun., 13 (1), p. 7879, (2022). 30. P. Kilcullen, C. Jiang, T. Ozaki, and J. Liang, “Camera-free three-dimensional dual photography,” Opt. Express, 28 (20), pp. 29377–29389, (2020). 31. M. F. Duarte et al., “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag., 25 (2), pp. 83–91, (2008). 32. D. Takhar et al., “A new compressive imaging camera architecture using optical-domain compression,” in Computational Imaging IV, 2006, vol. 6065: SPIE, pp. 43–52. 33. E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” Comptes Rendus Math., 346 (9), pp. 589–592, (2008). 34. R. Chartrand and V. Staneva, “Restricted isometry properties and nonconvex compressive sensing,” Inverse Probl., 24 (3), p. 035020, (2008). 35. H. Arguello and G. R. Arce, “Restricted isometry property in coded aperture compressive spectral imaging,” in 2012 IEEE Statistical Signal Processing Workshop (SSP), 2012, pp. 716–719. 36. A. S. Bandeira, E. Dobriban, D. G. Mixon, and W. F. Sawin, “Certifying the restricted isometry property is hard,” IEEE Trans. Inf. Theory, 59 (6), pp. 3448– 3450, (2013). 37. J. D. Blanchard, C. Cartis, and J. Tanner, “Compressed sensing: How sharp is the restricted isometry property?,” SIAM review, 53 (1), pp. 105–125, (2011). 38. I. Haviv and O. Regev, “The restricted isometry property of subsampled fourier matrices,” in Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2014–2016, B. Klartag and E. Milman Eds.: Springer International Publishing, 2017, pp. 163–179. 39. R. G. Baraniuk, M. A. Davenport, R. A. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constr. Approx., 28 pp. 253–263, (2007). 40. N. Diaz, H. Rueda, and H. Arguello, “High-dynamic range compressive spectral imaging by grayscale coded aperture adaptive filtering,” Ingeniería e Investigación, 35 (3), pp. 53–60, (2015).
33 41. M. Marquez et al., “Deep-learning supervised snapshot compressive imaging enabled by an end-to-end adaptive neural network,” IEEE J. Sel. Top. Signal Process., 16 (4), pp. 688–699, (2022). 42. M. Kavehrad and D. Zaccarin, “Optical codedivision-multiplexed systems based on spectral encoding of noncoherent sources,” J. Lightwave Technol., 13 (3), pp. 534–545, (1995). 43. D. Yelin, S. H. Yun, B. E. Bouma, and G. J. Tearney, “Three-dimensional imaging using spectral encoding heterodyne interferometry,” Opt. Lett., 30 (14), pp. 1794–1796, (2005). 44. Y. Oiknine, I. August, and A. Stern, “Multi-aperture snapshot compressive hyperspectral camera,” Opt. Lett., 43 (20), pp. 5042–5045, (2018). 45. M. Yako et al., “Video-rate hyperspectral camera based on a CMOS-compatible random array of Fabry–Pérot filters,” Nat. Photonics, 17 (3), pp. 218–223, (2023). 46. J. M. Bioucas-Dias and M. A. T. Figueiredo, “A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process., 16 (12), pp. 2992–3004, (2007). 47. A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt., 47 (10), pp. B44–B51, (2008). 48. A. A. Wagadarikar, N. P. Pitsianis, X. Sun, and D. J. Brady, “Video rate spectral imaging using a coded aperture snapshot spectral imager,” Opt. Express, 17 (8), pp. 6368–6388, (2009). 49. L. Wang, Z. Xiong, D. Gao, G. Shi, and F. Wu, “Dual-camera design for coded aperture snapshot spectral imaging,” Appl. Opt., 54 (4), pp. 848–858, (2015). 50. B. T. Bosworth, J. R. Stroud, D. N. Tran, T. D. Tran, S. Chin, and M. A. Foster, “High-speed flow microscopy using compressed sensing with ultrafast laser pulses,” Opt. Express, 23 (8), pp. 10521– 10532, (2015). 51. S. Karpf et al., “Spectro-temporal encoded multiphoton microscopy and fluorescence lifetime imaging at kilohertz frame-rates,” Nat. Commun., 11 (1), p. 2062, (2020). 52. P. Llull et al., “Coded aperture compressive temporal imaging,” Opt. Express, 21 (9), pp. 10526–10545, (2013). 53. L. Gao, J. Liang, C. Li, and L. V. Wang, “Single-shot compressed ultrafast photography at one hundred billion frames per second,” Nature, 516 (7529), pp. 74–77, (2014). 54. Y. Lai et al., “Single-shot ultraviolet compressed ultrafast photography,” Laser Photonics Rev., 14 (10), p. 2000122, (2020). 55. J. Liang, P. Wang, L. Zhu, and L. V. Wang, “Single-shot stereo-polarimetric compressed ultrafast photography for light-speed observation of high-dimensional optical transients with picosecond resolution,” Nat. Commun., 11 (1), p. 5252, (2020).
34 56. D. Qi et al., “Single-shot compressed ultrafast photography: a review,” Adv. Photonics, 2 (1), pp. 014003–014003, (2020). 57. J. Liang, L. Zhu, and L. V. Wang, “Single-shot realtime femtosecond imaging of temporal focusing,” Light: Sci. Appl., 7 (1), p. 42, (2018). 58. I. Daubechies, M. Defrise, and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57 (11), pp. 1413–1457, (2004). 59. M. V. Afonso, J. M. Bioucas-Dias, and M. A. Figueiredo, “An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems,” IEEE Trans. Image Process., 20 (3), pp. 681–695, (2010). 60. E. Vera, F. Guzmán, and N. Díaz, “Shuffled rolling shutter for snapshot temporal imaging,” Opt. Express, 30 (2), pp. 887–901, (2022). 61. J. Gu, Y. Hitomi, T. Mitsunaga, and S. Nayar, “Coded rolling shutter photography: Flexible spacetime sampling,” in 2010 IEEE International Conference on Computational Photography (ICCP), 2010, p. 11553751. 62. F. Guzmán, N. Díaz, and E. Vera, “Improved compressive temporal imaging using a shuffled rolling shutter,” in OSA Optical Sensors and Sensing Congress 2021, Washington, DC, 2021: Optica Publishing Group, in OSA Technical Digest, p. JTh6A.9. 63. T.-H. Tsai and D. J. Brady, “Coded aperture snapshot spectral polarization imaging,” Appl. Opt., 52 (10), pp. 2153–2161, (2013). 64. K. Wicker and R. Heintzmann, “Single-shot optical sectioning using polarization-coded structured illumination,” J. Opt., 12 (8), p. 084010, (2010). 65. E. Schonbrun, G. Möller, and G. Di Caprio, “Polarization encoded color camera,” Opt. Lett., 39 (6), pp. 1433–1436, (2014). 66. V. Gruev, R. Perkins, and T. York, “CCD polarization imaging sensor with aluminum nanowire optical filters,” Opt. Express, 18 (18), pp. 19087–19094, (2010). 67. C. Yang et al., “Compressed ultrafast photography by multi-encoding imaging,” Laser Phys. Lett., 15 (11), p. 116202, (2018). 68. S. Rizvi, Handbook of photomask manufacturing technology. CRC Press, 2018. 69. K. D. Wise and K. Najafi, “Microfabrication techniques for integrated sensors and microsystems,” Science, 254 (5036), pp. 1335–1342, (1991). 70. R. A. Lawes, “Manufacturing costs for microsystems/MEMS using high aspect ratio microfabrication techniques,” Microsyst. Technol., 13 (1), pp. 85– 95, (2007). 71. R. Zaouk, B. Y. Park, and M. J. Madou, “Introduction to microfabrication techniques,” in Microfluidic Techniques: Reviews and Protocols, S. D. Minteer Ed.: Humana Press, 2006, pp. 5–15.
Y. Lai and J. Liang 72. H. Saito and K. Saito, “Image focusing analysis using coded aperture made of a printed mask,” Jpn. J. Appl. Phys., 58 p. SKKA01, (2019). 73. H. Rueda-Chacon, J. F. Florez-Ospina, D. L. Lau, and G. R. Arce, “Snapshot compressive ToF+spectral imaging via optimized color-coded apertures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 42 (10), pp. 2346–2360, (2020). 74. Y. Zhao, H. Guo, Z. Ma, X. Cao, T. Yue, and X. Hu, “Hyperspectral imaging with random printed mask,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10149–10157. 75. X. Liu et al., “Fast wide-field upconversion luminescence lifetime thermometry enabled by single-shot compressed ultrahigh-speed imaging,” Nat. Commun., 12 (1), p. 6401, (2021). 76. X. Liu et al., “Single-shot real-time compressed ultrahigh-speed imaging enabled by a snapshot-tovideo autoencoder,” Photonics Res., 9 (12), pp. 2464–2474, (2021). 77. N. J. Wilkinson, M. A. A. Smith, R. W. Kay, and R. A. Harris, “A review of aerosol jet printing—a non-traditional hybrid process for micro-manufacturing,” The International Journal of Advanced Manufacturing Technology, 105 (11), pp. 4599–4619, (2019). 78. K. Byard, “Contiguous perfect coded aperture patterns with high throughput,” Appl. Opt., 61 (5), pp. 1112–1124, (2022). 79. T. Blachowicz and A. Ehrmann, “3D printed MEMS technology—recent developments and applications,” Micromachines, 11 (4), p. 434, (2020). 80. A. A. M. Muñoz, A. Vella, M. J. F. Healy, D. W. Lane, I. Jupp, and D. Lockley, “3D-printed coded apertures for x-ray backscatter radiography,” Radiation Detectors in Medicine, Industry, and National Security XVIII, 10393 pp. 125–136, (2017). 81. B. Venzac et al., “PDMS curing inhibition on 3dprinted molds: why? also, how to avoid it?,” Anal. Chem., 93 (19), pp. 7180–7187, (2021). 82. D. Wu, Z. Zhao, Q. Zhang, H. J. Qi, and D. Fang, “Mechanics of shape distortion of DLP 3D printed structures during UV post-curing,” Soft matter, 15 (30), pp. 6151–6159, (2019). 83. A. M. M. André, V. Anna, J. F. H. Matthew, W. L. David, J. Ian, and L. David, “3D-printed coded apertures for x-ray backscatter radiography,” in Proc. SPIE 10393, Radiation Detectors in Medicine, Industry, and National Security XVIII, 2017, vol. 10393, p. 103930F. 84. H. Fujii et al., “Optimization of coded aperture radioscintigraphy for sentinel lymph node mapping,” Mol. Imag. Biol., 14 (2), pp. 173–182, (2012). 85. M. Zhiping and L. Yi-Hwa, “Aperture collimation correction and maximum-likelihood image reconstruction for near-field coded aperture imaging of single photon emission computerized tomography,”
2 Encoders for Optical Imaging
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
IEEE Trans. Med. Imaging, 25 (6), pp. 701–711, (2006). L.-T. Chang, S. N. Kaplan, B. Macdonald, V. PerezMendez, and L. Shiraishi, “A method of tomographic imaging using a multiple pinhole-coded aperture,” J. Nucl. Med., 15 (11), pp. 1063–1065, (1974). C. Fiorini, R. Accorsi, and G. Lucignani, “Single pinhole and coded aperture collimation systems for high-resolution gamma-ray imaging in nuclear medicine: a comparative study,” in IEEE Nuclear Science Symposium Conference Record, 2005, vol. 5, pp. 2938–2941. A. Haboub, A. MacDowell, S. Marchesini, and D. Parkinson, “Coded aperture imaging for fluorescent x-rays-biomedical applications,” Lawrence Berkeley National Lab, Berkeley, CA (United States), 2013. K. Avinash et al., “Optimization of laser machining process for the preparation of photomasks, and its application to microsystems fabrication,” J. Micro Nanolithogr. MEMS MOEMS, 12 (4), p. 041203, (2013). K. Sugioka, B. Gu, and A. Holmes, “The state of the art and future prospects for laser direct-write for industrial and commercial applications,” MRS Bull., 32 (1), pp. 47–54, (2007). K. Sugioka and Y. Cheng, “Ultrafast lasers—reliable tools for advanced materials processing,” Light: Sci. Appl., 3 (4), pp. e149–e149, (2014). H. Richard, W. Alfred, L. Peter, and L. Daeyoung, “Femtosecond laser ablation and deposition of metal films on transparent substrates with applications in photomask repair,” in Proc. SPIE 5714, Commercial and Biomedical Applications of Ultrafast Lasers V, 2005, vol. 5714, pp. 24–36. L. Wang, X. Wang, X. Wang, and Q. Liu, “Laser drilling sub-10 nm holes on an island-shaped indium thin film,” Appl. Surf. Sci., 527 p. 146771, (2020). S. V. Sreenivasan, “Nanoimprint lithography steppers for volume fabrication of leading-edge semiconductor integrated circuits,” Microsyst. Nanoeng., 3 (1), p. 17075, (2017). L. F. Thompson, “An introduction to lithography,” in Introduction to Microlithography, vol. 219: American chemical society, 1983, pp. 1–13. V. N. Truskett and M. P. C. Watts, “Trends in imprint lithography for biological applications,” Trends Biotechnol., 24 (7), pp. 312–317, (2006). V. Gruev, “Fabrication of a dual-layer aluminum nanowires polarization filter array,” Opt. Express, 19 (24), pp. 24361–24369, (2011). T. York et al., “Bioinspired polarization imaging sensors: from circuits and optics to signal processing algorithms and biomedical applications,” Proc. IEEE, 102 (10), pp. 1450–1469, (2014). Y. Lee et al., “Fabrication of integrated lensless cameras via uv-imprint lithography,” IEEE Photonics J., 14 (2), pp. 1–8, (2022).
35 100. D. P. Siddons et al., “A coded aperture microscope for X-ray fluorescence full-field imaging,” J. Synchrotron Radiat., 27 (6), pp. 1703–1706, (2020). 101. G. D. Bai et al., “Multitasking shared aperture enabled with multiband digital coding metasurface,” Adv. Opt. Mater., 6 (21), p. 1800657, (2018). 102. Y. Wu, W. Yang, Y. Fan, Q. Song, and S. Xiao, “TiO2 metasurfaces: From visible planar photonics to photochemistry,” Sci. Adv., 5 (11), p. eaax0939, (2019). 103. Y. Pan et al., “Dual-band multifunctional coding metasurface with a mingled anisotropic aperture for polarized manipulation in full space,” Photonics Res., 10 (2), pp. 416–425, (2022). 104. Z. Zhou et al., “Efficient silicon metasurfaces for visible light,” ACS Photonics, 4 (3), pp. 544–551, (2017). 105. Q.-T. Li et al., “Polarization-independent and highefficiency dielectric metasurfaces for visible light,” Opt. Express, 24 (15), pp. 16309–16319, (2016). 106. O. Danila and B. M. Gross, “Towards highly efficient nitrogen dioxide gas sensors in humid and wet environments using triggerable-polymer metasurfaces,” Polymers, 15 (3), p. 545, (2023). 107. W. Kim et al., “Thermally-curable nanocomposite printing for the scalable manufacturing of dielectric metasurfaces,” Microsyst. Nanoeng., 8 (1), p. 73, (2022). 108. J. Yan et al., “Single pixel imaging based on large capacity spatial multiplexing metasurface,” Nanophotonics, 11 (13), pp. 3071–3080, (2022). 109. C. Maurer, A. Jesacher, S. Bernet, and M. RitschMarte, “What spatial light modulators can do for optical microscopy,” Laser Photonics Rev., 5 (1), pp. 81–101, (2011). 110. U. Efron, Spatial light modulator technology: materials, devices, and applications. CRC press, 1994. 111. G. Sinatkas, T. Christopoulos, O. Tsilipakos, and E. E. Kriezis, “Electro-optic modulation in integrated photonics,” J. Appl. Phys., 130 (1), p. 010901, (2021). 112. X. Liu, J. Liu, C. Jiang, F. Vetrone, and J. Liang, “Single-shot compressed optical-streaking ultrahigh-speed photography,” Opt. Lett., 44 (6), pp. 1387–1390, (2019). 113. R. Zhu, T.-H. Tsai, and D. J. Brady, “Coded aperture snapshot spectral imager based on liquid crystal spatial light modulator,” 2013: Optica Publishing Group, pp. FW1D-4. 114. S. K. Kurtz and F. N. H. Robinson, “A physical model of the electro-optic effect,” Appl. Phys. Lett., 10 (2), pp. 62–65, (1967). 115. S. Datta and B. Das, “Electronic analog of the electro-optic modulator,” Appl. Phys. Lett., 56 (7), pp. 665–667, (1990). 116. S. Panigrahi, J. Fade, R. Agaisse, H. Ramachandran, and M. Alouini, “An all-optical technique enables instantaneous single-shot demodulation of images at high frequency,” Nat. Commun., 11 (1), p. 549, (2020).
36 117. Z. Chen, B. Liu, S. Wang, and E. Liu, “Polarizationmodulated three-dimensional imaging using a largeaperture electro-optic modulator,” Appl. Opt., 57 (27), pp. 7750–7757, (2018). 118. Z. L. Yuan, B. Fröhlich, M. Lucamarini, G. L. Roberts, J. F. Dynes, and A. J. Shields, “Directly phase-modulated light source,” Physical Review X, 6 (3), p. 031044, (2016). 119. N. Xie, Q. Xu, J. Ma, and C. Xu, “Thermally insensitive design for the LiNbO3 electro-optical modulator under dynamic electric field,” J. Opt., 16 (8), p. 085201, (2014). 120. J. Li et al., “Photonic crystal waveguide electrooptic modulator with a wide bandwidth,” J. Lightwave Technol., 31 (10), pp. 1601–1607, (2013). 121. D. Michael, “DMD reliability: a MEMS success story,” in Proc. SPIE 4980, Reliability, Testing, and Characterization of MEMS/MOEMS II, 2003, vol. 4980, pp. 1–11. 122. L. Jinyang, W. Sih-Ying, N. K. Rudolph, F. B. Michael, and J. H. Daniel, “Grayscale laser image formation using a programmable binary mask,” Opt. Eng., 51 (10), p. 108201, (2012). 123. J. Liang, L. Gao, P. Hai, C. Li, and L. V. Wang, “Encrypted three-dimensional dynamic imaging using snapshot time-of-flight compressed ultrafast photography,” Sci. Rep., 5 (1), p. 15504, (2015). 124. C. Jiang, P. Kilcullen, Y. Lai, T. Ozaki, and J. Liang, “High-speed dual-view band-limited illumination profilometry using temporally interlaced acquisition,” Photonics Res., 8 (11), pp. 1808–1817, (2020). 125. P. Yu et al., “Ultrahigh-density 3D holographic projection by scattering-assisted dynamic holography,” Optica, 10 (4), pp. 481–490, (2023). 126. C. Chang et al., “High-brightness X-ray freeelectron laser with an optical undulator by pulse shaping,” Opt. Express, 21 (26), pp. 32013–32018, (2013). 127. C. Jiang et al., “Real-time high-speed threedimensional surface imaging using band-limited illumination profilometry with a CoaXPress interface,” Opt. Lett., 45 (4), pp. 964–967, (2020). 128. Y.-X. Ren, R.-D. Lu, and L. Gong, “Tailoring light with a digital micromirror device,” Annalen der Physik, 527 (7–8), pp. 447–470, (2015). 129. Y. Wu, I. O. Mirza, G. R. Arce, and D. W. Prather, “Development of a digital-micromirror-
Y. Lai and J. Liang
130. 131.
132.
133.
134.
135.
136. 137.
138.
139.
140.
141.
device-based multishot snapshot spectral imaging system,” Opt. Lett., 36 (14), pp. 2692–2694, (2011). W.-H. Lee, “Binary synthetic holograms,” Appl. Opt., 13 (7), pp. 1677–1682, (1974). S. A. Goorden, J. Bertolotti, and A. P. Mosk, “Superpixel-based spatial amplitude and phase modulation using a digital micromirror device,” Opt. Express, 22 (15), pp. 17999–18009, (2014). S. Jiao, D. Zhang, C. Zhang, Y. Gao, T. Lei, and X. Yuan, “Complex-amplitude holographic projection with a digital micromirror device (DMD) and error diffusion algorithm,” IEEE J. Sel. Top. Quantum Electron., 26 (5), pp. 1–8, (2020). Z. Zhao, Z. Xiao, Y. Zhuang, H. Zhang, and H. Zhao, “An interferometric method for local phase modulation calibration of LC-SLM using self-generated phase grating,” Rev. Sci. Instrum., 89 (8), p. 083116, (2018). I. Moreno, J. A. Davis, T. M. Hernandez, D. M. Cottrell, and D. Sand, “Complete polarization control of light from a liquid crystal spatial light modulator,” Opt. Express, 20 (1), pp. 364–376, (2012). X.-J. Lai, H.-Y. Tu, Y.-C. Lin, and C.-J. Cheng, “Coded aperture structured illumination digital holographic microscopy for superresolution imaging,” Opt. Lett., 43 (5), pp. 1143–1146, (2018). N. Savage, “Digital spatial light modulators,” Nat. Photonics, 3 (3), pp. 170–172, (2009). W. Chi and N. George, “Optical imaging with phasecoded aperture,” Opt. Express, 19 (5), pp. 4294– 4300, (2011). H.-M. P. Chen, J.-P. Yang, H.-T. Yen, Z.-N. Hsu, Y. Huang, and S.-T. Wu, “Pursuing high quality phaseonly liquid crystal on silicon (lcos) devices,” Appl. Sci., 8 (11), (2018). M. J. Li et al., “Microshutter array system for James Webb Space Telescope,” in Proc.SPIE, 2007, vol. 6687, p. 668709. M. J. Li, A. D. Brown, A. S. Kutyrev, H. S. Moseley, and V. Mikula, “JWST microshutter array system and beyond,” in Proc. SPIE 7594, MOEMS and Miniaturized Systems IX, 2010, vol. 7594: SPIE, p. 75940N. M. D. Jhabvala et al., “Development and operation of the microshutter array system,” in Proc. SPIE 6959, Micro (MEMS) and Nanotechnologies for Space, Defense, and Security II, 2008, vol. 6959, p. 69590C.
3
Convex Optimization for Image Reconstruction Henry Arguello and Miguel Marquez
Abstract
Convex optimization is a particular class of mathematical tool in decision sciences and in analyzing physical systems for finding a point that maximizes/minimizes an objective function (subjected to equality constraints and/or inequality constraints) through iterative computations. In this regard, recent developments in the mathematics and computational fields have sparked and positioned convex optimization as one of the most powerful tools to formulate and solve image inverse problems. This chapter briefly introduces state-of-the-art convex optimization methods for such reconstruction problems and presents the most successful approaches and their interconnections. Section 3.1 summarizes the basic concepts of convex optimization, focusing on the convexity conditions; Section 3.2 introduces the principal mathematical optimization tools for image/signal processing, focusing on the role of convex optimization. Section 3.3 discusses a variety of conventional convex optimizationbased reconstruction algorithms used in the image processing community. The scope of H. Arguello (o) Universidad Industrial de Santander, Computer Science Department, Bucaramanga, Santander, Colombia e-mail: [email protected] M. Marquez Universidad Industrial de Santander, Department of Physics, Bucaramanga, Santander, Colombia e-mail: [email protected]
these mathematical concepts, algorithms, and their applications are extended as needed in the remainder of the book. Keywords
Convex optimization · Convex functions · Inverse problems · Linear programming · Image processing · Gradient descent · Augmented Lagrangian method · Shrinkage thresholding · Image processing · Image reconstruction algorithms
3.1
Convex Functions
The first and the more fundamental concept to be studied in this chapter is the definition of a convex function. By definition, a function f : Rn → R is convex if its domain (denoted .D(f )) is a convex set and satisfies that a line segment intersecting any two points (e.g., (x, f (x)) and (y, f (y))) on the function’s graph lies above the function, as seen in Fig. 3.1a. This convex condition is given by the expression f (λx + (1 − λ) y) ≤ λf (x) + (1 − λ) f (y), (3.1)
.
where 0 ≤ λ ≤ 1 and for all .x, y ∈ D(f ). Some examples of such functions include x → x, x → x2 , and x → ex . Convex functions satisfy several fascinating mathematical properties, such
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_3
37
38
H. Arguello and M. Marquez
as continuity and the existence of left/right derivatives. In this regard, a function f is concave if −f is convex. In the following subsections, we introduce three conditions that allow us to explain some of the more remarkable properties of convex functions and convex optimization problems [1].
3.1.1
First-Order Condition
Let f : Rn → R be twice differentiable over an open domain, where its gradient ∇f exists at all points x in .D(f ), which is open, and n represents the vector length. Then, f is convex if and only if .D(f ) is a convex set and f (x) ≥ f (y) + ∇y f (y)T (x − y) ,
.
(3.2)
holds for all .∀x, y ∈ D(f ), where T represents the transpose operator. It is strictly convex if and only if f (x) > f (y) + ∇y f (y)T (x − y) .
.
approximation of f is a global underestimator, then f is convex.
3.1.2
Second-Order Condition
The second-order condition states that the Hessian (or second derivative) of a convex function exists at each point in .D(f ), i.e., f is twice differentiable if .D(f ) is open and .∇y2 f (y) ∈ Rn . Particularly, this condition can be expressed by the inequality ∇y2 f (y) > 0.
.
(3.4)
Equation (3.4), an inequality, can be interpreted geometrically as the requirement that the function’s graph has positive curvature at y. In this regard, f is strictly convex if and only if 2 .∇y f (y) > 0, ∀y ∈ D(f ). The second-order condition is used to confirm whether ∇ y f (y) = 0 is a local minimum, maximum, or nothing of both.
(3.3)
From Eqs. (3.2–3.3), it can be derived that ∀x, y ∈ D(f ) at which the gradient ∇f (y) = 0 implies that f (x) ≥ f (y), i.e., y is a global minimum of the function f. This inequality is illustrated in Fig. 3.1b. Conversely, if the function is strictly convex (i.e., .∈ D(f ) and x /= y), the minimum is unique f (x) > f (y). Note that the right side of the inequality in Eq. (3.3) is an affine function of y, which is the first-order Taylor approximation of f close to x. In this regard, if the first-order Taylor
3.1.3
Jensen’s Inequality
.
(a)
Jensen’s inequality states that the value of a convex function f at a point x, which is a convex combination of finitely many points x = λi yi for λ1 , . . . , λk ≥ 0 with i = {1, . . . , k} and λ1 + . . . + λk = 1, is less than or equal to the convex combination of values of the function at these points. This basic inequality can be expressed as
(b)
Fig. 3.1 Graph of a convex function. (a) Convex function on the interval (x, y). (b) Illustration of the first-order condition for convexity f (y) ≥ f (y) + ∇ y f (y)T (x − y)
3 Convex Optimization for Image Reconstruction k E .
( ) λi f y i ≥ f
i=1
(
k E
) λi y
i
(3.5)
.
i=1
The inequality defined in Eq. (3.5), as in the case of convex sets, extends to integrals, infinite sums, and expected values. For the case when k → ∞, the inequality in Eq. (3.5) can be written as [2]: (f
f .
p(y)f (y)dy ≥ f
39
approximation of the max function, since max{x1 , . . . , xn } ≤ f (x) ≤ max {x1 , . . . , xn } + log n. . lp norms: If f : Rn → R defined by .f (x) = (En ) p 1/p ||x||p = is a convex function i=1 |xi | for any p ≥ 1. Two important special cases are the l∞ -norm that considers the maximal entry of the vector (i.e., .||x||∞ = max |xi |) and the i=1,...,n
of the absolute l1 -norm that carries out the sum E vector values (i.e., .||x||1 = ni=1 |xi |). In this regard, the compressive sensing community has shown a particular interest in l1 due to its capability to sparsify the solution [6]. Figure 3.2 presents the behavior of .||x||pp for five different values of p.
) p(y)ydy ,
(3.6)
f for p(y)dy = 1, p(y) ≥ 0 ∀ y, where p(y) is commonly considered as a probability density.
3.1.4
Examples of Convex Functions
In several cases, it could be challenging to determine whether a function is convex by using the first-order and second-order conditions. A widespread alternative in the convex optimization field to guarantee the convexity of an interest function is to express this in terms of a more straightforward and well-demonstrated convex function. Below are listed the most popular convex functions used in image reconstruction approaches [1]. • The quadratic function: The function .f (x) = 1 T x Qx + cT x + b is convex, provided that 2 Q > 0, where Q ∈ Rn × n is a real symmetric matrix and c ∈ Rn is a n-dimensional vector. Problems based on quadratic functions can be viewed as a particular type of general problem implemented in many areas, including economics, applied science, engineering, and image processing. This area is called quadratic programming, and it arises as a principal computational component of many sequential quadratic programming methods [3– 5]. (En . Log-sum-exp: The function .f (x) = log i=1 exp (xi )) is convex on Rn . This function can be related as a differentiable
3.2
Convex Optimization
The formulation of constrained optimization problems using a convex objective function f0 : Rn → R constrained by a set of convex functions fi , . . . , fk : Rn → R (called inequality constraint functions) and hj , . . . , hp : Rn → R (called equality constraint functions) can be defined in the standard form of a constrained optimization problem as
.
minimize f0 (x) subject to fi (x) ≤ 0, i = 1, . . . , k . (3.7) hj (x) = 0, j = 1, . . . , p
where x ∈ Rn is the optimization variable, and hj (x) must be affine with .hj (x) = aTj x − bj and .aTj x = bj . Note that an unconstrained optimization problem can be derived from Eq. (3.7) by removing the constraints, i.e., k = p = 0. The feasible and constraint set of points of the convex optimization problem in Eq. (3.7) (i.e., nk .F, which is a convex set) is defined as .F = i=0 D (fi ) ∩ ( ) np D h . Here, the feasible set consists of all j j =1 points .x ∈ F satisfying the constraints. If at least one point .x ∈ F, the problem in Eq. (3.7) is said to be feasible and infeasible otherwise, i.e., .x ∈ / F.
40
H. Arguello and M. Marquez
Fig. 3.2 (a) Geometry interpretation of lp -norm for a unit length vector, p i.e., .||x||p = 1. (b) The p behavior of .||x||p for five values of p
1.5 1 0.5 0 -0.5 -1 -1.5 -1.5
-1
-0.5
0
0.5
1
1.5
0
0.5
1
1.5
(a)
1.5
1
0.5
0
-1.5
-1
-0.5
(b)
3.2.1
minimize cT x + d . subject to Gx ≺ h , Ax = b
Linear Programming
Linear programming (LP) addresses a class of programming problems where the objective function and the constraint functions are all affine. The LP approach has been successfully applied to computer vision approaches such as denoising, image reconstruction [7], object tracking [8], image detection/classification [9], and transportation systems [10]. In recent times, linear programming theory has also helped resolve and unify several outstanding computed vision applications. In this regard, a general linear program has the form
(3.8)
where A ∈ Rp × n , G ∈ Rm × n , c ∈ Rn , x ∈ Rn , h ∈ Rm , and d ∈ R. The feasible set of the optimization problem in Eq. (3.8) is a polyhedron.
3.2.2
Quadratic Programming
Quadratic programming (QP) addresses a class of programming problems where the objective function is (convex) quadratic, and the constraint
3 Convex Optimization for Image Reconstruction
functions are all affine. QP is a powerful mathematical programming tool that appears in several research fields, such as statistics, machine learning, finance, engineering, and control theory. A QP is written in standard form as
.
minimize subject to
+ cT x + b . Gx ≺ h Ax = q
1 T x Qx 2
(3.9)
where G ∈ Rm × n , A ∈ Rp × n , .Q∈S n , x ∈ Rn , q ∈ Rp , and h ∈ Rm . Note that Eq. (3.9) is convex if the Hessian matrix Q is positive semidefinite, and in this scenario, the problem is often similar in complexity to a linear program. Conversely, the QP problem in Eq. (3.9) is strictly convex in the case of .Q∈S+n being positive definitive. The QP problems also arise as subproblems in methods for general constrained optimization, such as linear programming as a special case (by taking Q = 0), sequential quadratic programming, augmented Lagrangian method, and interior-point methods.
41
y = Ax + e,
.
where y ∈ Rm and A ∈ Rm × n are known variables, x ∈ Rn is the original image to be recovered, and e ∈ Rm is an unknown noise variable. Numerous publications in image processing modeled the sensing process as Eq. (3.10). The following sections summarize some state-of-the-art LIP algorithms that have enjoyed wide popularity in the image processing community by solving constrained and unconstrained optimization problems of the form:
.
minimize f (x) + τ g (x) := ϕ (x) fi (x) ≤ 0, i = 1, . . . , k , (3.11) subject to hj (x) = 0, j = 1, . . . , p
where ϕ : Rn → R is the objective function, g : Rn → R is the regularization function, f : Rn → R is a smooth function, and τ ∈ R is the regularization parameter.
3.3.1
3.3
Convex Optimization Algorithms for Inverse Problems
Several real-world problems or physical phenomena in various domains can be modeled as convex optimization problems, such as deconvolution, graphical model inference, image denoising, signal processing, and many others. Particularly in the image processing field, the recent advances in convex optimization have sparked the development of numerical optimization algorithms for image reconstruction, which apply to a broader range of problems than was previously possible but are efficient, reliable, and enjoy solid theoretic guarantees. Such algorithms have made convex optimization problems even more attractive. In linear inverse problems (LIP), the aim is to estimate an unknown image x from a measurement/observation y generated by a linear operator A applied to x. In this regard, one of the most basic discrete linear systems in image processing can be modeled as
(3.10)
Gradient Descent
Gradient descent (GD) is one of the most popular iterative first-order optimization algorithms used to find a local minimum of a given function [11– 13] and by far the most used in machine learning (ML) and deep learning (DL) to minimize a cost/loss function [14–16]. The popularity of the GD algorithms has prevailed over the years since such methods for a broad class of largedimensional problems (with a not-too-complex implementation) require simpler iterations with fewer operations, which can sometimes even be parallelized. The mathematical formulation of the GD optimization problem is
.
minimize f0 (x) , x
(3.12)
where x ∈ Rn is a real vector with n ≥ 1 components and f0 : Rn → R is a convex and smooth function. To solve and minimize Eq. (3.12), the steepest descent direction − ∇ f (x(ι) ) is the most
42
H. Arguello and M. Marquez
obvious choice for the linear search of a direction that allows downhill the convex function. If the start point is set as x(ι) and moves a positive distance τ in the direction of the negative gradient, the resulting new position x(ι + 1) can be obtained by ( ) x(ι+1) = x(ι) − τι ∇f0 x(ι) ,
argminx,λ,ν L (x, λ, ν) = argminx,λ,ν ⎞ p m E E ⎝f0 (x) + λi fi (x) + νj hj (x)⎠ , .
⎛
i=1
(3.13)
.
where the positive scalar τ ι > 0 is called the step length. The success of the gradient methods highly depends on the effective choice and update of the step length τ ι . Therefore, several linear conditions have been proposed to choose the steplength parameter τ ι , such as the Wolfe condition [17, 18], the Goldstein condition, and the socalled backtracking approach.
3.3.2
(3.14), the optimization problem in Eq. (3.7) can be rewritten from a constrained problem to an unconstrained problem as follows [21]:
(3.15) where .L (x, λ, ν) is a convex function of x. Example 3.1 Convex optimization problems based on LA methods arise in several fields and applications; an example of a typical and widely used optimization model in the computer vision field, the least-square solution of a linear equation, is expressed by
Augmented Lagrangian Methods
.
The augmented Lagrangian (AL) method is one of the algorithms in a class of methods for constrained optimization to replace the original problem with a sequence of subproblems in which the constraints are represented by terms added to the objective function [19, 20]. The central principle in AL methods is to consider the constraints in Eq. (3.7) by augmenting the objective f0 function with a weighted sum of the constraint functions fi and hj , which can be written as [1]: L (x, λ, ν) =f0 (x) +
m E
.
i=1
λi fi (x) +
j =1
minimize xT x , subject to Ax = q
(3.16)
where A ∈ Rp × n . Note that Eq. (3.16) is derived from Eq. (3.9) by setting = 0, G = 0, b =0, and Q = 2I with I ∈ Rn × n as an identity matrix. The Lagrangian of Eq. (3.16) is defined as .L (x, ν) = xT x + νT (Ax − b) with .D (L (x, λ, ν)) : Rn × Rp , which is a convex quadratic function of x. Based on Eq. (3.15), the variable x that minimizes the optimization problem in Eq. (3.16) and its Lagrangian function can be defined as ∇x L (x, ν) = 2x + AT ν = 0,
(3.17)
(1)
AT v can be
.
p E
νj hj (x) , from which the value .x = − obtained. j =1
2
(3.14) where x ∈ Rn , .L : Rn × Rm × Rp , .D (L) : F × Rm × Rp with .F as the feasible set, λi represents the Lagrange multiplier associated with the i-th inequality constraint fi (x) ≤ 0, and ν j represents the j-th equality constraint hj (x) = 0. The vectors λ ∈ Rm and ν ∈ Rp are called the dual variables or Lagrange multiplier vectors associated with the problem described in Eq. (3.14). Based on the standard form of the Lagrangian function in Eq.
3.3.3
Proximal Operator
Proximal algorithms have surged as a relevant tool in the convex optimization algorithm community due to their ability to decompose complex composite convex problems into simple subproblems by calculating the proximal operator. These subproblems have several interesting advantages, such as projecting a point onto a convex set,
3 Convex Optimization for Image Reconstruction
permitting closed-form solutions, or being solved with standard/simple methods. In particular, the proximal operator associated with a closed proper convex function (f0 : Rn → R ∪ {∞}) is defined by [22]: ) ( 1 ||x − u||22 , proxλf0 (u) = argminx f0 (x) + 2λ (3.18)
.
where λ > 0 is a scaled constant and the effective domain of f0 is .D (f0 ) = {x ∈ Rn |f0 (x) < +∞}. Following are listed some of the main properties of proximal operators. These are used to, for example, derive a method for evaluating the proximal operator of a given function. Separable Sum Let f0 be a fully separable ( ) function across K ∈ Z variables, so .f0 xj = ( ) EK j =1 fj xj , then, .proxλf j
(
) uj =argminxj
) ||2 ( ) 1 || fj xj + ||xj −uj ||2 , 2λ
where j = {1, . . . , K}. Equation (3.19) indicates that the proximal operator of a fully separable function can be reduced to evaluating the proximal operator for each separable part. Regularization Let .f0 (x) τ ||x − v||22 , then 2
=
f1 (x) +
( 1 proxλf0 (u) = argminx βf1 (x) + 2β || ( )||2 ) || || ||x − β u + τβv || , (3.20) || || λ 2
.
λ . 1+λτ
Affine Addition Let f0 (x) = f1 (x) + vT x + b, then .proxλf 0
where v ∈ Rn is an arbitrary vector and b ∈ R is an arbitrary constant. Example Let analysis one of the more basic image reconstruction algorithms, the least square algorithm with .f0 = ||y − Ax||22 . Based on Eq. (3.18), the proximal is defined as (
x
. k+1
( ) 1 ||x − (u − λv)||22 , (u) = argminx f1 (x) + 2λ
(3.21)
1 ||y − Ax||22 2 ) 1 + ||x − xk ||22 , 2λ (3.22)
= proxλf0 = arg min x
where y ∈ Rm represents the measurement, x ∈ Rn is the objective image, and A ∈ Rm × n is the sensing matrix. The closed-form solution of Eq. (3.22) is given by x
. k+1
(
(3.19)
where .β =
43
( )−1 ( T ) = AT A + λ−1 I A y + λ−1 xk , (3.23)
where I ∈ Rn × n represents an identity matrix.
3.3.4
Basis Pursuit (BP)
BP is one of the first and more famous convexconstrained optimization problems in the signal and image processing field due to its ability to obtain sparse solutions. In this regard, BP has played a significant role in dimensionality reduction approaches. The BP method can be written as .
minimize ||x||1 , subject to Ax = b
(3.24)
where the sparsity of the coefficient x ∈ Rn is induced by the l1 norm (at least in the real setting) and for some τ > 0. The unknown datacube x is expected to have concise representations when expressed on an appropriate basis representation, e.g., the 2D discrete cosine or the 2D wavelength. Therefore, optimization problems based on l1 regularizer aim to reconstruct the objective datacube projected in some appropriated domain
44
H. Arguello and M. Marquez
(e.g., Fourier) instead of the original object domain (e.g., spatial domain). Particularly, in Eq. (3.24), the l1 norm is the convex function closest to the l0 quasi-norm, so this substitution is referred to as convex relaxation [23]. In some applications, a trade-off between the exact congruence of Ax and b is desirable in exchange for a sparser x [24]. In these scenarios, a more appropriate formulation of Eq. (3.24) is the now famous l2 − l1 basis pursuit denoising problem as [25]: 1 minimizex ||b − Ax||22 + τ ||x||1 , 2
.
(3.25)
where τ is the parameter that sets the tradeoff between error and sparsity. Iterative-Shrinkage algorithms have surged as an effective and efficient alternative to solve Eq. (3.25) for highdimensional problems [26–29], as frequently encountered in image and signal processing applications. This new family of numerical algorithms surged as an extension to the classical DonohoJohnstone shrinkage method for signal denoising [30, 31]. For example, in the circular photoacoustic tomography (C-PAT) approaches [32, 33], the BP algorithm was used as an alternative to the back-projection algorithms by incorporating compressive sensing theory into the data acquisition process. CS-based C-PAT approaches state that the number of angles/measurements can be reduced dramatically by using the wavelet, Fourier, or curvelets basis, i.e., x = Ψf, where f ∈ Rn represents the object in the spatial domain, Ψ ∈ Rn × n is the basis, and x ∈ Rn the projection of f onto Ψ. Iteratively Reweighted Least Squares (IRLS) An alternative to solve Eq. (3.24) is via the IRLS algorithm [23, 34, 35]. In particular, by setting ||x||1 ≡ xT C(k + 1) x with C(k + 1) = diag (|x(k + 1) |) and x(k + 1) as a current approximate solution, Eq. (3.25) can be rewritten as minimizex
.
( )−1 1 ||b − Ax||22 + τ xT Ck−1 x, 2 (3.26)
( )−1 where .ϕ (x) = 12 ||b − Ax||22 + τ xT Ck−1 x. Note that Eq. (3.26) is a quadratic optimization problem solvable using a standard linear algebra approach. To minimize Eq. (3.26), IRLS employs iterations of the form ( ( )−1 )−1 x(k+1) = 2τ C(k−1) + AT A AT b.
.
(3.27) The matrix C(k − 1) is iteratively updated based on the new values of the found solution. Although IRLS is a simple and effective strategy for solving small-scale problems, its performance is poor for large-scale problems. IRLS-Based Shrinkage Algorithm (IRLS-SA) The IRLS-based shrinkage approach surged as a straightforward and intuitive solution to the IRLS’s performance shortcomings. Let μx be added and subtracted from the gradient of the objective function ϕ(x), i.e., ∇ x ϕ(x) + μx − μx with μ ≥ 1. Thus, ( ( ) ( ) )−1 −AT b+ AT A−μI x+ τ C(k−1) +μI x=0,
.
(3.28) where ∇ x ϕ(x) = − AT (b − Ax) + τ (C(k − 1) )−1 x and I ∈ Rn × n is an identity matrix. Based on the fixed-point iteration methodology and with the assignment of (AT A − μI)x(k) and (τ (C(k − 1) )−1 + μI)x(k + 1) , the solution to Eq. (3.28) can be written as [36]: ( x(k+1) =
.
(
τ ( (k−1) )−1 +I C μ
)−1
) ) 1( T 1 T A b− A A − μI x(k) , μ μ
(3.29)
( ( )−1 )−1 where the diagonal matrix . μτ C(k−1) +I plays the role of shrinkage to the vector ( ) ( ) 1 T (k) (k) . A . Note that the b − Ax + x μ initialization must be different from zero (i.e., x(0) /= 0), since this case is a stable solution of the IRLS-SA algorithm in Eq. (3.29).
3 Convex Optimization for Image Reconstruction
3.3.5
Iterative Shrinkage Thresholding Algorithm (ISTA)
ISTA is an iterative algorithm comprised of two steps: a linear estimation process and a shrinkage/soft-threshold process. In particular, the ISTA aims to solve a general nonsmooth convex optimization model
45
( ) ( ) where .∇x (k−1) f x(k−1) = AT y − Ax(k−1) . Since the l1 norm is separable, the computation of x(k + 1) can be expressed as [37]: ( ) ( ) 1 x(k+1) = ℑμ/λ x(k−1) − ∇x f x(k−1) , λ (3.34)
.
with minimizex f (x) + g (x) := ϕ (x) ,
.
(3.30)
( μ) sgn (xi ) , ℑμ/λ (x)i = |xi | − λ +
.
where f : Rn → R is a continuously differentiable smooth convex function with Lipschitz continuous gradient and g : Rn → R is a continuous convex function. Then, the basic approximation model for the general model in Eq. (3.30) is achieved by establishing a quadratic approximation of ϕ(x) at a given point x(k − 1) , which can be expressed as [37]: ( ) < x(k+1) =argminx f x(k−1) + x−x(k−1) ,
.
||2 ( )> λ || ∇x f x(k−1) + ||x−x(k−1) ||2 +g (x) , 2 (3.31) where λ > 0. By expanding, ignoring the constant terms in x(k − 1) , and contracting, Eq. (3.31) can be rewritten as λ 2 || ( )||2 || ( ) || ||x − x(k−1) − 1 ∇x f x(k−1) || . || || λ x(k+1) = argminx g (x) +
.
3.3.6
Here, Eq. (3.32) represents the general convex optimization model of ISTA methodology. ISTA methodology is one of the best-known and most popular methods for solving the BP problem (see Eq. (3.25)) [38–40], i.e., by setting g(x) = ||x||1 and .f (x) = ||y − Ax||22 . For this special case (i.e., l1 − l2 ), Eq. (3.32) can be rewritten as x(k+1) = argminx ||x||1 +
λ 2
|| ( )|| || ( ) ||2 ||x − x(k−1) − 1 ∇x f x(k−1) || , || || λ 2
where .ℑμ/λ : Rn → R is the shrinkage operator, μ/λ is an appropriate stepsize, and sgn : R → R represents the sign function. In this regard, ISTA is a methodology that allows solving BP problems in a simple form at the expense of computing time, i.e., convergence rate. The ISTA advantages sparked the interest of the image-processing community in developing ISTA-based algorithms that preserves the computational simplicity of ISTA but with a faster rate of convergence. ISTA-based algorithms are used in many medical applications (e.g., magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT)) to recover images from noisy measurements acquired using an ill-posed measurement scheme. ISTA’s main assumption is that the image ought to have only a few nonzero coefficients in some transform domains, e.g., Fourier, Wavelet, or discrete cosine transform.
(3.32)
2
.
(3.35)
(3.33)
Fast Iterative Shrinkage Thresholding Algorithm (FISTA)
FISTA is one of the most relevant ISTA-based algorithms that has gained popularity because of its demonstrated convergence rate improvement in several orders of magnitude against the ISTA algorithm [37]. FISTA is inspired by the Nesterov accelerated gradient method [41], which consists of a gradient descent step followed by something that looks a lot like a “momentum term.” In particular, this methodology solves the general
46
H. Arguello and M. Marquez
model [i.e., Eq. (3.30)] by the following iterative optimization problem:
.
|| ))||2 ( ( || || argminx ||x||1 + λ2 ||x− v (k) − λ1 ∇x f v (k) || 2, ( )) ( = ℑμ/λ v (k) − λ1 ∇x f v (k)
x (k) =
(3.36)
either a sparse or a dense linear sensing matrix [48–50]. Additionally, this algorithm exhibited a much faster convergence rate for ill-conditioned and ill-posed problems than its predecessor, the IST algorithm. Particularly, the convex objective function established for the TwIST algorithm is given by ϕ (x) =
with
.
( v(k+1) = x(k) +
.
)
) t(k) − 1 ( (k) x − x(k−1) , (k+1) t (3.37)
and
t
.
(k+1)
=
1+
/
( )2 1 + 4 t(k) 2
.
1 minimize ||y − Ax||22 + τ ||x||1 , x 2
.
( ) x1 = Ψτ x0 + AT (y − Ax0 ) + (α − β) xt , . xt+1 = (1 − α) x t−1 ( ( )) T +β Ψτ xt + A (y − Axt )
The TwIST algorithm is based on the iterative shrinkage-thresholding (IST) principle for solving the LIP, which can be considered an extension of the classical gradient algorithms. The TwIST algorithm gained much popularity in 2010 beginnings because of its simplicity and ability to solve large-scale problems in scenarios with
(3.41)
with { Ψ τ (v) = argminx
Two-Step Iterative Shrinkage/Thresholding (TwIST)
(3.40)
where .f (x) = 12 ||y − Ax||22 . Note that the optimization problem in Eq. (3.40) is, in fact, a quadratic programming problem since ) ( T T 2 1 1 T . ||y − Ax||2 = A Ax + y y − 2yT Ax , x 2 2 where P = AT A, qT = 2(Ay)T and r = yT y. To minimize f (x), TwIST employs iterations of the form
.
3.3.7
(3.39)
where g : Rn → R is a regularizer function assumed to be convex, lower semicontinuous, and proper. Formally, the cost function in Eq. (3.39) is introduced into Eq. (3.11), which leads to the following optimization problem
(3.38)
Note that the computational effort in both FISTA and ISTA is approximately the same since the requested additional computations for Eqs. (3.37–3.38) are marginal. For the l1 − l2 case (i.e., g(x) = ||x||1 and .f (x) = ||y − Ax||22 ), Eq. (3.36) can be rewritten as Eq. (3.33) and solved as in Eq. (3.34). Comparable to the BP and ISTA, FISTA is a popular approach in medical applications where low-reconstruction time and memory requirements are critical. For example, several ongoing FISTA algorithms for compressed sensing magnetic resonance imaging approaches (CS-MRI) [42–44] are based on dataadaptive sparsifying transform, e.g., singular value decomposition [45], patch-based adaptive kernel methods [46], and dictionary learning [47].
1 ||y − Ax||22 + τg (x) , 2
} 1 2 ||x − v||2 + τ ||x||1 , 2 (3.42)
where Ψ τ : Rn → Rn is known as the Moreau proximal operator of g, α, β, and β 0 are the parameters of the algorithm, x0 is the initialization, and t ≥ 1. The TwIST algorithm has been used and adapted in several CS imaging applications, ranging from spectral to temporal imaging. For example, in compressed ultrafast photography [51, 52], the function g(x) is established with
3 Convex Optimization for Image Reconstruction
47
the total variation regularizer (i.e., g(x) = ||x||TV ) to exploit the sparsity in spatial gradients within each temporal channel.
3.3.8
where the solution of Eq. (3.44) must satisfy that the supports of s and t cannot overlap, i.e., sT t /= 0. By denoting the concatenated vector u = [sT , tT ]T with u ∈ R2n × 1 , Eq. (3.44) can be rewritten in a more standard quadratic program
Gradient Projection for Sparse Reconstruction (GPSR) .
The GPSR is a constrained convex quadratic program obtained by splitting x into positive and negative parts and applying the gradient projection method with Barzilai–Borwein steps [53]. The GPSR algorithm aims to solve the l2 − l1 convex nonsmooth unconstrained optimization problem [39]. 1 min ||y − Ax||22 + τ ||x||1 , x 2 1 ||y 2
Ax||22
minimize 21 ||y − A (s − t)||22 + τ 1Tn s + τ 1Tn t, s,t
s≥0 t≥0
subject to
u≥0
(3.45)
( ) u(k+1) = u(k) + λ(k) z(k) − u(k) ,
(3.46)
( ) z(k) = u(k) − α (k) ∇u G (u) + ,
(3.47)
.
and g(x) = ||x||1 . − where .f (x) = Then, to express Eq. (3.43) in a quadratic programming structure, the variable x can be split into its positive and negative parts as si = (xi )+ and ti = (−xi )+ , resulting in x = s − t, where s,t ∈ Rn are both nonnegative and (·)+ : R → R represents the positive-part operator with (xi ) = max {0, xi }. With the replacement .τ ||x||1 = τ ||s − t||1 = τ 1Tn (s + t), in [39], Eq. (3.43) was rewritten as the following bound-constrained quadratic program [39]:
.
u
subject to
where b = τ 12n + [−(AT y)T , (AT y)T ]T is an auxiliary vector with b ∈ R2n × 1 , 12n ∈ R2n × 1 is a vector of ones, and C = [(AT A, −AT A)T , (−AT A, AT A)T ]T with C ∈ R2n × 2n . The solution of Eq. (3.45) is obtained by using the well-known gradient projection approach as follows:
(3.43)
.
minimize bT u + 12 uT Cu,
with .
where .G (u) ≡ bT u + 12 uT Cu represents the objective function in Eq. (3.45) with gradient ∇ u G(u) = b + Cu, α (k) > 0 is the first scalar parameter, and λ(k) ∈ [0, 1] is the second scalar parameter. The GPSR’s authors describe two versions of their algorithm: the basic gradient projection (GPSR-Basic) and the Barzilai-Borwein gradient projection (GPSR-BB). The difference between GPSR-Basic and GPSR-BB lies in the process of updating α and λ. For the GPSR-BB version, α and λ are defined in this chapter.
(3.44) ⎛
α (k+1)
.
⎞ || ( ) ( (k) )|| ||2 || (k) −1 ||− η I ∇u G u || ⎜ ⎟ 2 ) , αmax ⎟ = mid ⎜ ) ) ( ⎝αmin , (( ( ⎠, )−1 ( ) T ( )−1 ( ) (k) (k) (k) (k) C − η I ∇u G u − η I ∇u G u ⎞ ( (k) ))T ( k) −η I ∇u G u ∇u G u ⎟ ⎜ ) , 1⎟ = mid ⎜ ) ) ( ⎠, ⎝0, (( ( )−1 ( ( )−1 ( ) T ) (k) (k) (k) (k) − η I ∇u G u C − η I ∇u G u ⎛
λ(k+1)
.
((
(k)
(3.48)
)−1
(3.49)
48
H. Arguello and M. Marquez
where [α min , α max ] represents the interval of α (k) with 0 < α min < α (k) < α max , η(k) is a parameter chosen to satisfy that η(k) I being approximated to the Hessian of G(·) at u(k) , i.e., for the most recent step, η(k) must accomplish that ∇ u G(u(k) ) − ∇ u G(u(k − 1) ) ≈ η(k) [u(k) − u(k − 1) ]. Finally, and echoing [54], the GPSR’s authors suggest initialized τ as 0.1||AT y||∞ for compressed sensing applications. Several works have validated the GPSR’s performance in various imaging applications ranging from hyperspectral imaging to computed tomography. In particular, the GPSR algorithm was extensively used in CSbased spectral imaging since it allows exploits the sparse nature of the spectral data cube [55, 56]. The representation basis Ψ ∈ Rn × n is usually constructed by the 2D wavelet (for the spatial domain) and the DCT (for the spectral domain) transform. The sparse version of the spectral data cube can be expressed as x = Ψf, where f ∈ Rn represents the object in the spatial domain, and x ∈ Rn represents the projection of f onto Ψ.
3.3.9
Sparse Reconstruction by Separable Approximation (SpaRSA)
The SpaRSA algorithm is a framework that aims to minimize the sum of smooth convex and nonconvex functions [38]. This algorithm can be viewed as an accelerated iterative shrinkage/thresholding method that seeks to alleviate the computational complexity of problems with the form of Eq. (3.11) by tailoring it into a sequence of optimization subproblems ( ) ( )T x(k+1) ∈ arg min w − x(k) ∇x f x(k)
.
w
+
|| α (k+1) || ||w − x(k) ||2 + τ g (w) , (3.50) 2 2
where f : Rn → R is an arbitrary convex function, g(w) = ||w||1 , and α ∈ R+ . Here, the computation of ∇f (x(k) ) and solving Eq. (3.50) is less expen-
sive than solving Eq. (3.11) by other means, e.g., GPSR. An equivalent form of Eq. (3.50), which holds the identical structure of TwIST/GPSR optimization problems, is given by ||2 1 || τ x(k+1) ∈ arg min ||w − z(k) ||2 + (k) ||w||1 , w 2 α (3.51)
.
( ) 1 where .z(k) = x(k) − α(k+1) ∇x f x(k+1) and the variable α (k) is chosen using the BarzilaiBorwein spectral approach with .α (k) = || T ( (k) )|| || ( )|| ||A A x − x(k−1) ||2 /||A x(k) − x(k−1) ||2 . 2 2 Note that, for the special case of g(w) ≡ 0, is .x(k+1) = z(k) = the solution of Eq. ( (k)(3.51) ) 1 (k) x − α(k+1) ∇x f x . The SpaRSA provides a competitive approach against the fastest state-ofthe-art algorithms for solving the standard l1 − l2 case. In the scenario that .f (x) = ||y − Ax||22 , the optimization problem in Eq. (3.51) is solved following the same splitting approach used for the GPSR algorithm (see Eq. (3.45)). This splitting is defined as w = s − t with u = [sT , tT ]T , where si = (wi )+ , ti = (−wi )+ , and (·)+ : R → R represents the positive-part operator with (wi ) = max {0, wi }.. Then, Eq. (3.51) can be rewritten as
.
minimize bT u + 12 uT Cu, u
subject to
where .b =
τ 1 α (k) 2n
u≥0
(3.52)
[ ( ) ( ) ]T T T + − z(k) , z(k) is the
auxiliary vector with b ∈ R2n × 1 , 12n ∈ R2n × 1 is a vector of ones, and C = [(I, −I)T ; (−I, I)T ]T with C ∈ R2n × 2n and I ∈ Rn × n as an identity matrix. Similar to its predecessor, the GPSR, the SPARSA algorithm has been used in a variety of image-processing applications. For example, the compressed sensing of streaming data for estimating a sparse, time-varying signal from a set of compressed measurements by solving a weighted l1 -norm minimization problem for estimating sparse coefficients.
3 Convex Optimization for Image Reconstruction
3.3.10 Alternating Direction Method of Multipliers (ADMM) The ADMM is an algorithm developed for distributed convex optimization that follows a divide-and-conquer approach where a global problem is split into smaller subproblems [21, 57]. ADMM aims to attempt the advantages of dual decomposition and the AL methods for constrained optimization. Starting from Eq. (3.7), the ADMM approach solves problems in the following form minimize f (x) + g (z) , . subject to Cx + Bz = b
49
where ν ∈ Rm × 1 is the dual variable and ρ > 0 is the penalty parameter. Note that the variables x and z are updated alternately, accounting for the term alternating direction. Having defined the core structure of the ADMM algorithm, the following subsections discuss two popular alternatives of the ADMM in the image/signal processing field. Scaled Form ADMM A popular alternative in CS applications is obtained using the ADMM scaled form by rewriting Eq. (3.54) as L (x, z, w) =f (x) +g (z)
.
ρ ρ + ||Cx+Bz−b+w||22 − ||w||22 , 2 2
(3.53)
where {f, g} are convex functions, C ∈ Rm × n , B ∈ Rm × l , x ∈ Rn , z ∈ Rl , and b ∈ Rm . The AL for Eq. (3.53) is defined as
(3.58)
where .w = ρ1 ν and using the equality .νT r + (ρ/2) ||r||22 = (ρ/2) ||r − w||22 − (ρ/2) ||w||22 with r = x + Bz − b. Then, setting B = − I and b = 0 (i.e., z = Cx), Eqs. (3.55–3.57) can be written as
L (x, z, ν) =f (x) +g (z) +νT
.
ρ (Cx+Bz−b) + ||Cx+Bz−b||22 . 2
(3.54)
which can be split into three optimization problems as follows: ( )T ( ) x(k+1) A arg min f (x) + ν(k) Cx+Bz(k) −b
.
x
||2 ρ || + ||Cx+Bz(k) −b||2 , 2 (3.55)
x(k+1) A arg min f (x) +
.
x
|| ρ || ||Cx − z(k) + w(k) ||2 , 2 2 (3.59)
||2 ρ || z(k+1) A arg min g (z) + ||Cx(k+1) −z+w(k) ||2 , z 2 (3.60)
.
and ( ) w(k+1) A w(k) + ρ Cx(k+1) − z(k+1) . (3.61)
.
ADMM algorithms offer a flexible framework ( )T ( (k+1) ) z(k+1) A arg min g (z) + ν(k) +Bz−b for incorporating many types of convex objective Cx z functions and constraints. ||2 ρ || (k+1) || || + Cx +Bz−b 2 , 2 Plug-and-Play ADMM The ADMM’s modular (3.56) structure is one of its main features, which allows one to plug in any off-the-shelf image-denoising and algorithm as a solver for the subproblems [58]. This methodology is named PnP-ADMM ( (k+1) ) and has been reported for various imaging (k) (k+1) (k+1) .ν A ν + ρ Cx + Bz −b , applications such as super-resolution [59], (3.57) electron microscopy [60], single-photon imaging
.
50
H. Arguello and M. Marquez
[61], diffraction tomography [62], Gaussian denoising [63], and hyperspectral sharpening [64]. The subproblem in Eq. (3.60) can be rewritten as a denoising problem by setting C = I and .ρ = σ12 , resulting in
z(k+1) = arg ming (z) +
.
z
|| ||2 (k) || 1 || ||z − ∼z || , || 2σ 2 || 2 (3.62)
∼(k+1)
= x(k) + w(k) represents the noisy where . z image degraded by Gaussian noise with standard deviation σ [65]. The subproblem Eq. (3.62) can be solved by ( z = Dσ
.
∼(k)
z
)
( ) = Dσ x(k) + w(k) ,
(3.63)
where .Dσ is a denoiser. Note that the PnP-ADMM algorithms support any denoiser that fulfills some restrictive conditions [66, 67], such as being nonexpansive and having a symmetric Jacobian. One of the more relevant works to the PnP-ADMM methodology for the image processing community is the work of Danielyan et al. [68], where the block-matching and 3D filtering (BM3D) [69] algorithm was used as a prior. Inspired by the approach mentioned above, many works have explored incorporating the deep-learning models as denoisers in PnP-ADMM algorithms, i.e., solving Eq. (3.63) via a deep-learning algorithm. For example, the PnP-ADMM algorithm has been employed in synthetic aperture applications for the despeckling of Rayleigh and Poisson distributed speckles [70]. This application links the PnPADMM with either a total variation or a BM3D denoiser.
3.4
Conclusion
Convex optimization is a crucial building block for the imaging community that has sparked the emergence of many new research directions. This chapter is meant for the researcher, scientist, or engineer interested in venturing into the basic
concepts of computational mathematics for image recovery ranging from image denoising to compressive imaging. Some basic concepts of convex functions, along with some examples, were briefly studied. Then, the evolution of the convex optimization-based image reconstruction algorithms was shown by analyzing seven traditional algorithms ranging from the BP to the ADMM approach. Moreover, for each algorithm, an example of an application is included showing that if a problem can be formulated as a convex optimization problem, it can be solved efficiently. In most cases, these algorithms enjoy the advantages of theoretical analysis and strong convergence. Due to these advantages, their structures have synergistically incorporated into the deep learning methods to improve their interpretability, e.g., ISTA-net [71], FISTA-net [72], and ADMMbased net [73]. Acknowledgments This chapter was supported by the Sistema general de regalías Colombia under Grant BPIN 2020000100415, with UIS code 8933.
References 1. S. Boyd and L. Vandenberghe, Convex optimization, Cambridge: Cambridge university press, 2004. 2. A. Dempster, N. Laird and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 39, no. 1, pp. 1–22, 1977. 3. D. Goldfarb and A. U. Idnani, “A numerically stable dual method for solving strictly convex quadratic programs,” Mathematical Programming, vol. 27, pp. 1–33, 1983. 4. M. P. Friedlander and D. Orban, “A primal-dual regularized interior-point method for convex quadratic programs,” Mathematical Programming Computation, vol. 4, pp. 71–107, 2012. 5. P. Gill and E. Wong, “Methods for convex and general quadratic programming,” Mathematical programming computation, vol. 7, no. 1, pp. 71–112, 2015. 6. E. Candès and M. Wakin, “An introduction to compressive sampling,” IEEE signal processing magazine, vol. 25, no. 2, pp. 21–30, 2008. 7. J. Bioucas-Dias and M. Figueiredo, “A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Transactions on Image processing, vol. 16, no. 12, pp. 2992–3004, 2007. 8. H. Jiang, S. Fels and J. Little, “A linear programming approach for multiple object tracking,” in IEEE Con-
3 Convex Optimization for Image Reconstruction
9.
10.
11.
12. 13.
14.
15.
16.
17. 18.
19. 20.
21.
22.
23.
24.
ference on Computer Vision and Pattern Recognition, Minneapolis, 2007. O. Mangasarian and E. Wild, “Multiple instance classification via successive linear programming,” Journal of optimization theory and applications, vol. 137, no. 3, pp. 555–568, 2008. P. Luathep, A. Sumalee, W. Lam, Z. Li and H. Lo, “Global optimization method for mixed transportation network design problem: a mixed-integer linear programming approach,” Transportation Research Part B: Methodological, vol. 45, no. 5, pp. 808–827, 2011. J. Barzilai and J. Borwein, “Two-point step size gradient methods,” IMA journal of numerical analysis, vol. 8, no. 1, pp. 141–148, 1988. A. Beck, “First-Order Methods in Optimization,” SIAM, pp. 1-487, 2017. A. Beck, “Introduction to nonlinear optimization: Theory, algorithms, and applications with MATLAB.,” Society for Industrial and Applied Mathematics, pp. 1–294, 2014. L. Bottou, “Large-scale machine learning with stochastic gradient descent,” Proceedings of COMPSTAT’2010, pp. 177–186, 2010. Y. S. L. a. X. J. Chen, “Distributed statistical machine learning in adversarial settings: Byzantine gradient descent,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 2, pp. 1–25, 2017. G. James, D. Witten, T. Hastie and R. Tibshirani, An introduction to statistical learning, New York: Springer, 2013. P. Wolfe, “Convergence conditions for ascent methods,” SIAM review, vol. 11, no. 2, pp. 226–235, 1969. Y. Dai, “On the nonmonotone line search,” Journal of Optimization Theory and Applications, vol. 112, no. 2, pp. 315–330, 2002. D. Bertsekas, Constrained optimization and Lagrange multiplier methods, Academic press, 2014, pp. 1–46. E. Birgin and J. Martínez, Practical augmented Lagrangian methods for constrained optimization, Society for Industrial and Applied Mathematics, 2014. S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning, vol. 3, no. 1, pp. 1–122, 2011. N. Parikh and S. Boyd, “Proximal algorithms,” Foundations and trends in Optimization, vol. 1, no. 3, pp. 127–239, 2014. A. Bruckstein, D. Donoho and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM review, vol. 5, no. 1, pp. 34–81, 2009. P. Gill, A. Wang and A. Molnar, “The in-crowd algorithm for fast basis pursuit denoising,” IEEE Transactions on Signal Processing, vol. 59, no. 10, pp. 4595– 4605, 2011.
51 25. S. Chen, D. Donoho and M. Saunders, “Atomic decomposition by basis pursuit,” SIAM review, vol. 43, no. 1, pp. 129–159, 2001. 26. J. Bioucas-Dias, “Bayesian wavelet-based image deconvolution: A GEM algorithm exploiting a class of heavy-tailed priors,” IEEE Transactions on Image Processing, vol. 15, no. 4, pp. 937–951, 2006. 27. I. Daubechies, M. Defrise and C. De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint,” Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, vol. 57, no. 11, pp. 1413–1457, 2004. 28. M. Elad, “Why simple shrinkage is still relevant for redundant representations?,” IEEE transactions on information theory, vol. 52, no. 12, pp. 5559–5569, 2006. 29. M. Figueiredo and R. Nowak, “An EM algorithm for wavelet-based image restoration,” IEEE Transactions on Image Processing, vol. 12, no. 8, pp. 906–916, 2003. 30. D. Donoho and J. Johnstone, “ Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, no. 3, pp. 425–455, 1994. 31. J. Starck, M. Elad and D. Donoho, “Image decomposition via the combination of sparse representations and a variational approach,” IEEE transactions on image processing, vol. 14, no. 10, pp. 1570–1582, 2005. 32. J. Provost and F. Lesage, “The application of compressed sensing for photo-acoustic tomography,” IEEE transactions on medical imaging, vol. 28, no. 4, pp. 585–594, 2008. 33. J. Prakash, A. Raju, C. Shaw, M. Pramanik and P. Yalavarthy, “ Basis pursuit deconvolution for improving model-based reconstructed images in photoacoustic tomography,” Biomedical optics express, vol. 5, no. 5, pp. 1363–1377, 2014. 34. L. Karlovitz, “Construction of nearest points in the l_p, p even and l_∞ norms,” Journal of Approximation Theory, vol. 3, no. 2, pp. 123–127, 1970. 35. B. Rao, K. Engan, S. Cotter, J. Palmer and K. KreutzDelgado, “Subset selection in noise based on diversity measure minimization,” IEEE transactions on Signal processing, vol. 51, no. 3, pp. 760–770, 2003. 36. M. Elad, Sparse and redundant representations: from theory to applications in signal and image processing (Vol. 2, No. 1, pp. 1094–1097), New York: Springer, 2010. 37. A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009. 38. S. Wright, R. Nowak and M. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Transactions on signal processing, vol. 57, no. 7, pp. 2479–2493, 2009. 39. M. Figueiredo, R. Nowak and S. Wright, “Gradient projection for sparse reconstruction: Application
52
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
H. Arguello and M. Marquez to compressed sensing and other inverse problems,” IEEE Journal of selected topics in signal processing, vol. 1, no. 4, pp. 586–597, 2007. M. Figueiredo and R. Nowak, “An EM algorithm for wavelet-based image restoration,” IEEE Transactions on Image Processing, vol. 12, no. 8, pp. 906–916, 2003. Y. Nesterov, “A method for solving the convex programming problem with convergence rate $ Obigl (frac {1}{kˆ 2} bigr) $,” Dokl. Akad. Nauk SSSR, vol. 269, pp. 543–547, 1983. M. Zibetti, E. Helou, R. Regatte and G. Herman, “Monotone FISTA with variable acceleration for compressed sensing magnetic resonance imaging,” IEEE transactions on computational imaging, vol. 5, no. 1, pp. 109–119, 2018. S. Pejoski, V. Kafedziski and D. Gleich, “Compressed sensing MRI using discrete nonseparable shearlet transform and FISTA,” IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1566–1570, 2015. O. Jaspan, R. Fleysher and M. Lipton, “Compressed sensing MRI: a review of the clinical literature,” The British journal of radiology, vol. 88, no. 1056, p. 20150487, 2015. M. Hong, Y. Yu, H. Wang, F. Liu and S. Crozier, “Compressed sensing MRI with singular value decomposition-based sparsity basis,” Physics in Medicine & Biology, vol. 56, no. 19, p. 6311, 2011. J. Zhang, D. Zhao and W. Gao, “Group-based sparse representation for image restoration,” IEEE transactions on image processing, vol. 23, no. 8, pp. 3336– 3351, 2014. J. Huang, L. Guo, Q. Feng, W. Chen and Y. Feng, “Sparsity-promoting orthogonal dictionary updating for image reconstruction from highly undersampled magnetic resonance data,” Physics in Medicine & Biology, vol. 60, no. 14, p. 5359, 2015. R. Aster, B. Borchers and C. Thurber, “Parameter estimation and inverse problems,” Elsevier, pp. 1– 301, 2018. L. Bottou, F. Curtis and J. Nocedal, “Optimization methods for large-scale machine learning,” Siam Review, vol. 60, no. 2, pp. 223–311, 2018. J. Fan, F. Han and H. Liu, “Challenges of big data analysis,” National science review, vol. 1, no. 2, pp. 293–314, 2014. J. Liang, P. Wang, L. Zhu and L. Wang, “Single-shot stereo-polarimetric compressed ultrafast photography for light-speed observation of high-dimensional optical transients with picosecond resolution,” Nature communications, vol. 11, no. 1, pp. 1–10, 2020. Y. Lai, R. Shang, C. Côté, X. Liu, A. Laramée, F. Légaré, G. Luke and J. Liang, “Compressed ultrafast tomographic imaging by passive spatiotemporal projections,” Optics letters, vol. 46, no. 7, pp. 1788– 1791, 2021. J. Barzilai and J. Borwein, “Two-point step size gradient methods,” IMA journal of numerical analysis, vol. 8, no. 1, pp. 141–148, 1988.
54. S. Kim, K. Koh, M. Lustig, S. Boyd and D. Gorinevsky, “An interior-point method for large-scale $\ell_1 $-regularized least squares,” IEEE journal of selected topics in signal processing, vol. 1, no. 4, pp. 606–617, 2007. 55. H. Arguello, H. Rueda, Y. Wu, D. Prather and G. Arce, “Higher-order computational model for coded aperture spectral imaging,” Applied optics, vol. 52, no. 10, pp. 12–21, 2013. 56. H. Arguello and G. Arce, “Code aperture optimization for spectrally agile compressive imaging,” JOSA A, vol. 28, no. 11, pp. 2400–2413, 2011. 57. D. Han and X. Yuan, “A note on the alternating direction method of multipliers,” Journal of Optimization Theory and Applications, vol. 155, no. 1, pp. 227– 238, 2012. 58. S. Venkatakrishnan, C. Bouman and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” Proc. IEEE Global Conference on Signal and Information Processing, p. 945–948, 2013. 59. W. Dong, P. Wang, W. Yin, G. Shi, F. Wu and X. Lu, “Denoising prior driven deep neural network for image restoration,” IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 10, pp. 2305–2318, 2018. 60. S. V. S. V. Sreehari, K. L. Bouman, J. P. Simmons, L. F. Drummy and C. A. Bouman, “Multiresolution data fusion for super-resolution electron microscopy,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, 2017. 61. S. H. Chan, X. Wang and O. A. Elgendy, “Plugand-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017. 62. Y. Sun, B. Wohlberg and U. Kamilov, “An online plug-and-play algorithm for regularized image reconstruction,” IEEE Transactions on Computational Imaging, vol. 5, no. 3, pp. 395–408, 2019. 63. G. Buzzard, S. Chan, S. Sreehari and C. Bouman, “Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium,” SIAM Journal on Imaging Sciences, vol. 11, no. 3, pp. 20012020, 2018. 64. A. Teodoro, J. Bioucas-Dias and M. Figueiredo, “Scene-adapted plug-and-play algorithm with convergence guarantees,” 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. 65. S. Chan, X. Wang and O. Elgendy, “Plug-and-play ADMM for image restoration: Fixed-point convergence and applications,” IEEE Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2016. 66. C. Metzler, A. Maleki and R. Baraniuk, “From denoising to compressed sensing,” IEEE Transactions on Information Theory, vol. 62, no. 9, pp. 5117–5144, 2016. 67. S. Sreehari, S. Venkatakrishnan, B. Wohlberg, G. Buzzard, L. Drummy, J. Simmons and C. Bouman,
3 Convex Optimization for Image Reconstruction “Plug-and-play priors for bright field electron tomography and sparse interpolation,” IEEE Transactions on Computational Imaging, vol. 2, no. 4, pp. 408–423, 2016. 68. A. Danielyan, V. Katkovnik and K. Egiazarian, “BM3D frames and variational image deblurring,” IEEE Trans. Image Process., vol. 21, no. 4, p. 1715– 1728, 2012. 69. K. Dabov, A. Foi, V. Katkovnik and K. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Transactions on image processing, vol. 16, no. 8, pp. 2080–2095, 2007. 70. S. Baraha and A. Sahoo, “SAR image despeckling using plug-and-play ADMM,” IET Radar, Sonar & Navigation, vol. 14, no. 9, pp. 1297–1309, 2020.
53 71. J. Zhang and B. Ghanem, “ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1828–1837, 2018. 72. J. Xiang, Y. Dong and Y. Yang, “FISTA-net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging,” IEEE Transactions on Medical Imaging, vol. 40, no. 5, pp. 1329–1339, 2021. 73. M. Marquez, Y. Lai, X. Liu, C. Jiang, S. Zhang, H. Arguello and J. Liang, “Deep-Learning Supervised Snapshot Compressive Imaging Enabled by an Endto-End Adaptive Neural Network.,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 4, pp. 688–699, 2022.
4
Machine Learning in Coded Optical Imaging Weihang Zhang and Jinli Suo
Keywords Abstract
In this chapter, we will go through the theory and application of machine learning in coded optical imaging, where the fundamental task is to solve the inverse problem that infers the desired visual data from coded measurement(s). With the rapid development of machine learning, different decoding approaches have been proposed to advance the progress of coded optical imaging. Taking the snapshot compressive imaging as a representative, we briefly introduce the three widely used schemes: conventional optimization framework imposing various nature visual priors to retrieve the target data in an iterative manner; plug-and-play scheme incorporating deep image priors into an optimization framework; end-to-end deep neural networks applying emerging network backbones for high-quality and fast decoding. All these approaches have their own pros and cons, and all keep developing for high performance, as well as addressing limitations in other aspects, such as efficiency, flexibility, scalability, robustness, etc. We hope this review can sketch the algorithm development on snapshot compressive imaging and inspire new research in other coded optical imaging approaches. W. Zhang · J. Suo (o) Department of Automation, Tsinghua University, Beijing, China e-mail: [email protected]; [email protected]
Machine learning · Coded aperture · Compressive sensing · Snapshot compressive imaging · Convex optimization · Deep learning · Plug-and-play · Deep unfolding · End-to-end · Hyper-spectral imaging · Computational imaging · Total variation · Sparse prior
4.1
Introduction
In coded optical imaging, the target visual data is recorded in an encoded manner and needs to be decoded algorithmically. Such “optical encoding + computational decoding” scheme can achieve imaging capability beyond conventional cameras, but the decoding is often an ill-posed inversion problem, and the algorithm design is quite challenging. Machine learning theories and techniques play great roles in developing high performance decoding algorithms. Since computational decoding is coupled with optical encoding, there are a bunch of varying decoding algorithms serving diverse coded imaging setups. In spite of the large diversity, all these approaches share a common framework to decode the target high-dimensional data into low dimensional sensor signal and thus are similar in multiple aspects:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_4
55
56
W. Zhang and J. Suo
• The inference should conform to the forward imaging model and be performed under physical constraints from the setup; • The decoding takes a single or few measurement(s) as input and is usually ill-posed, so one needs to impose priors for reliable reconstruction; • Most nature visual data are redundant, and some general priors can be of interest to different imaging tasks; • The efficiency and robustness are as important as quality in real applications. • As machine learning has been developing rapidly, a close combination with the most recent machine learning techniques is beneficial for the progress of coded optical imaging. In the past decades, researchers are applying continuously progressive machine learning technologies for effective decoding, from conventional approaches to the emerging deep neural networks, and it is infeasible to exhaustively list them in this chapter. Considering that snapshot compressive imaging (SCI) is a representative of coded optical imaging and provides a benchmark for most algorithms in coded imaging, we take it as an example to show the application of machine learning technologies in decoding algorithms. Background Information
Machine learning builds algorithms that learn from data or experience to improve performance on some tasks. In the field of coded optical imaging, visual data are captured in an implicit way and need to be inferred algorithmatically. The application of machine learning in coded optical imaging has lasted for decades, including both conventional optimization and emerging deep learning, and keeps pushing forward the progress in this field.
4.2
Mathematical Model
SCI of high-dimensional visual information (e.g., a video and a hyper-spectral volume) shares a unified mathematical model, which encodes redundant visual information into a snapshot and solves the inverse problem to retrieve the target data. Taking the 3D target data as an example, we can denote the coded measurement as .Y ∈ RH ×W , which encodes a 3D data cube .X ∈ RH ×W ×N with a modulation mask .M ∈ RH ×W ×N Y=
N E
.
Mk O Xk + E,
(4.1)
k=1
where k indexes the slices of the target data and coding mask (or sensing matrix), .E ∈ RH ×W is sensor noise, and .O denotes the Hadamard (element-wise) product. Equation (4.1) can also be expressed in vector form as .y = Hx + e, (4.2) where the target information .x ∈ RH W N is written as x = vec(X) = [vec(X1 )T , . . . , vec(XN )T ]T , (4.3)
.
with .H ∈ RH W ×H W N being the sensing matrix defined as the concatenation of calibrated coding masks H = [diag(vec(M1 )), . . . , diag(vec(MN ))]. (4.4)
.
After superimposing sensor noise .e = vec(E) ∈ RH W , the final coded measurement is .y = vec(Y) ∈ RH W . Notably, although seeming to be at a large scale, .H is highly sparse since each column has only one non-zero element, which facilitates developing high-efficiency reconstruction algorithms.
4 Machine Learning in Coded Optical Imaging
57
Tips
Given an SCI setup, one should obtain the coding masks to fit the imaging model. In video SCI, masks can be obtained by capturing the whiteboard sequentially, while in spectral SCI, narrow-band spectral filters are applied to obtain the corresponding masks. Moreover, the masks should be calibrated to compensate for system imperfections and match the physical imaging process, including flat field correction, dark current subtraction, etc.
lem until 2011, when Boyd et al. proposed a general and intuitive algorithm named the alternating direction method of multipliers (ADMM) [1], which provides a general framework with linear equality constraints for solving the dual optimization. Specifically, the ADMM introduces an auxiliary parameter .v to convert Equation (4.5) into 1 (ˆx, vˆ ) = argmin ||y − Hx||22 + +λR(v), xˆ ,ˆv 2
.
subject to x = v. (4.6) Then the problem can be decomposed into the following three sub-problems [2]
4.3
Conventional Optimization for Coded Optical Imaging: Experience Is the Best Teacher v(k+1)
Since the imaging model in Equations (4.1) and (4.2) is under-determined, some additional constraints are required to retrieve the target signal. Optimization methods are commonly utilized to cope with such ill-posed inverse problems by introducing various priors 1 ˆ = argmin ||y − Hx||22 + λR(x), .x x 2
(4.5)
where the first term is the data term fitting the imaging model and .R(x) is a regularization term, with .λ being a parameter balancing the two terms. Under such an optimization framework, researchers have proposed different solvers and defined various priors in the past decades.
4.3.1
1 x(k+1) = argmin ||y − Hx||22 x 2 ρ 1 + ||x − (v(k) − u(k) )||22 , . (4.7) 2 ρ ρ = argmin λR(v) + ||v − (x(k+1) v 2 1 + u(k) )||22 , and. (4.8) ρ .
Solvers
For the optimization problem defined in Equation (4.5), the data term is often derivable, while the regularization term serving as a penalty term is usually not (such as .l1 norm). There had been no suitable algorithm for this optimization prob-
u(k+1) = u(k) + ρ(x(k+1) − v(k+1) ), (4.9) in which Eqs. (4.7) and (4.8) update .x and .v, respectively, Equation (4.9) is the dual updating process with “learning rate” .ρ, and the superscript k indexes the iteration. Here the quadratic optimization problem in Equation (4.7) can be solved by (see [3] for details) ⎡ x(k+1) =(v(k) −
.
(k)
u )+HT ⎣ ρ
⎡ ⎤ (k) y1 − H(v(k) − uρ )
1
ρ+r1
,
⎡ ⎤ ⎤T (k) yn − H(v(k) − uρ ) n⎥ ..., ⎦ , ρ+rn (4.10) where .r1 , . . . , rn are defined in .HHT = diag{r1 , . . . , rn }, with .n = H ×W . Equation (4.8) is the denoising process of .v, where the definition
58
W. Zhang and J. Suo
of prior plays an important role and will be discussed in the next section. Another algorithm named generalized alternating projection (GAP) [4] shares the same optimization objective (4.6) [5] with ADMM but is more efficient in the iteration process. The high efficiency is mainly attributed to the fact that GAP has only two steps, i.e., .(v − u/ρ) in ADMM is substituted by .v and .ρ = 0 [3], so in GAP we have x(k+1) = v(k) + HT (HHT )−1 (y − Hv(k) ). (4.11)
.
Consequently, ADMM and GAP are equivalent in the ideal cases without noise, and ADMM outperforms GAP when the noise does exist [3,6]. Besides, under above frameworks, there are also a lot of solvers developed for incorporating different priors, including deep neural network [7].
4.3.2
Widely Used Priors
following Equation (4.11), and .v is calculated by solving 1 v(k) = argmin ||x(k) − v||22 + λ||Dv||1 , (4.12) v 2
.
by iterative clipping [9, 10] as v(k) = x(k) − DT z(k) , . (4.13) ( ) λ 1 , (4.14) z(k+1) = clip z(k) + Dv(k) , α 2 .
where the parameter .α ≥ maxeig(DDT ), and the clipping function is defined as { clip(b, T ) :=
.
b, if |b| ≤ T , (4.15) T sign(b), otherwise.
Another well-known algorithm solving Equation (4.5) is the Two-step Iterative Shrinkage/Thresholding (TwIST) [11]. The iteration workflow is as follows: x(1) = rλ (x(0) ), .
.
Nature visual data is statistically redundant, and a prior properly describing the redundancy can be utilized to regularize the ill-posed linear system in Equation (4.5) and solve Equation (4.8). Here we briefly show several widely used ones.
4.3.2.1 Total Variation (TV) The basic idea of total variation regularization is to reject the incorrect solutions with much high-frequency components (such as spikes, noise, etc.) that violate the statistical distribution of natural visual data, by minimizing the sum of its gradient amplitudes (total variation) [8]. Empirically, TV minimization penalizes the smaller gradients and thus tends to achieve piecewise smooth results. TV is generally calculated as the .l1 norm, i.e., .||TV(v)|| = ||Dv||1 , where .D is the differential operation. Incorporating the TV term into the GAP optimization framework, Yuan et al. [3] proposed a classical algorithm named GAP-TV and achieve decent performance on real data. After initializing .z(0) = 0, the algorithm alternatively updates .x, .v, and .z. The target signal .x is updated
(4.16)
x(k+1) = (1−α)x(k−1) +(α−β)x(k) +βrλ (x(k) ), (4.17) for .k ≥ 1, and rλ (x) = wλ (x + HT (y−Hx)), .
.
(4.18)
1 wλ (x) = argmin ||x−u||22 +λTV(u).(4.19) u 2 Recall that Equation (4.19) is similar to Equation (4.12), and there also exist other solutions such as [12] for such a general optimization objective. For the detailed deviations and parameter selection of .α and .β, please refer to [11]. Program Code
The code for GAP-TV algorithm can be found in https://github.com/liuyang12/ 3DCS/tree/master/algorithms. The code for TwIST algorithm provided by Bioucas et al. is in http://www.lx.it.pt/~bioucas/ code/TwIST_v2.zip.
4 Machine Learning in Coded Optical Imaging
59
4.3.2.2 Gaussian Mixture Model (GMM) GMM focuses on the redundancy of local blocks (or patches) and assumes the 3D block .xm are drawn from a mixture of Gaussian distributions with k mixture components x ∼
K E
. m
λk N(xm |μk , E k ),
where .μk , .E k , and .λk are the mean, covariance matrix, and weight of the kth component, respectively [13]. Given a set of high-dimensional target visual data, these parameters can be estimated by the expectation-maximization algorithm [14]. Then, assuming the system noise is Gaussian, i.e., .em ∼ N(em |0, R), we have the coded measurement y |xm ∼ N(ym |Hm xm , R),
x = wθ,
(4.21)
where .w is the basis set in the transform domain and .θ is the coefficient vector [18] most of which are close to zero. Mathematically, .l1 regularization leads to the sparse solution and is thus widely used for imposing sparse priors. The transform domain in which nature visual data is with sparse representation can be defined over local blocks, and one can minimize the .l1 norm of coefficient vector corresponding to each block in parallel [19, 20]. Using i to index the local blocks, we can decode the data cube via optimizing their block-wise coefficients
with .Hm being the corresponding sensing matrix. Based on the GMM prior, the target data can be inferred from the Bayes rule as p(xm |ym ) =
K E
.
˜ mk ), (4.22) ˜ mk , E λ˜ mk N(xm |μ
k=1
where λ˜
. mk
λk N(ym |Hm μk , R + Hm E k HT m) = EK ,. T λ N(y |H μ , R + H E m m l m l Hm ) l=1 l (4.23)
−1 −1 −1 ˜ mk =(HT E m R Hm + E k ) ,
˜ mk (HTm R−1 ym + E −1 ˜ mk =E μ k μk ),
(4.25)
.
(4.20)
k=1
. m
4.3.2.3 Sparsity Typically sparsity prior is to impose sparsity on the representation coefficients in a transform domain. Specifically the multidimensional data .x can be represented by
(4.24)
and .N(ym |Hm μk , R + Hm E k HT m ) in Equation (4.23) denotes probability of .ym with mean T .Hm μk and covariance matrix .R + Hm E k Hm [13, 15, 16]. Most calculations in algorithms incorporating GMM prior are efficient except for matrix inversion [17]. Besides, utilizing GMM prior requires a data set for parameter estimation and noise level calibration.
θˆ = argmin ||yi − Hi wθi ||22 + λ
. i
θ
E
||θi ||1 ,
i
(4.26) where .yi and .Hi are the measurement and the sensing matrix [17]. The above reconstruction can be extended to nonlocal sparse representations [21] and jointly optimizes all the blocks as .
θˆ = argmin ||y − Hwθ||22 + λ θ
E
||θi ||1 , (4.27)
i
in which .θ is the concatenation of .θi , and the reconstructed data is then derived by Equation (4.25). In each iteration the reconstructed blocks are clustered to learn cluster specific subdictionaries as a component of .w. As for the transform domain and corresponding optimization, different variants have been proposed including gradient projection [22], wavelet-based regularization [23, 24], group sparsity-based regularization [21], and overcomplete dictionaries [19, 25], Shearlet [26, 27], etc.
60
W. Zhang and J. Suo
4.3.2.4 Nonlocal Low-Rank Prior Besides sparsity, another prior exploiting the redundancy among similar patches is to force the matrix composed of these nonlocal similar patches to be low rank [28], which can be utilized to regularize the ill-posed data retrieval problem [29]. As for low-rank matrix approximation, nuclear norm minimization (NNM) [30] is widely employed to decompress SCI. One typical example is DeSCI [5] which achieves state-of-the-art performance under the optimization-based framework in Sect. 4.3.1. Specifically, for each target patch .zi of size √ √ . d× d, DeSCI searches top M similar patches from spatial and temporal (spectral) neighbors within a surrounding window and concatenate their vectorized format into a group Zi = [zi,1 , zi,2 , . . . , zi,M ] ∈ Rd×M ,
.
Program Code
The code for DeSCI algorithm can be found in https://github.com/liuyang12/DeSCI.
Generally, the decoding based on conventional learning methods is of low efficiency. Most algorithms in this regime, using different priors and different solvers, generally contain two iterative steps [17, 31], i.e., updating by gradient descent and then projection to the signal domain. GMM performs shallow learning on the available training data and requires no iteration, but the matrix inversion still needs considerable computation. Although being time consuming, such methods are highly interpretable and widely used.
(4.28)
4.4 where .zi,j denotes the j th similar patch of .zi . Then the regularization in Equation (4.5) is presented by the sum of weighted nuclear norm of all groups, i.e., E 1 ˆ = argmin ||y−Hx||22 +λ .x ||Zi ||w,∗ , (4.29) x 2 i
where .||Zi ||w,∗ denotes the weighted nuclear norm [29] as ||Zi ||w,∗ =
min{d,M} E
.
wj σj ,
(4.30)
j =1
and .σj is the j th singular value of .Zi . By using the ADMM solution in Sect. 4.3.1, one can introduce the auxiliary variable .θ, and the problem in Equation (4.29) turns to E 1 ||y−Hθ||22 +λ ||Zi ||w,∗ . (4.31) x 2
ˆ = argmin .x
i
The derivation of the solution of this optimization is analogous to Eqs. (4.7), (4.8), (4.9), and please refer to the original paper for more details.
Plug-and-Play (PnP) Reconstruction: Extending Handcrafted Priors to Deep Priors
It is reasonable to regard the reconstruction residue as a kind of noise, so we can use a denoiser to regularize the inferred visual data. As more and more deep learning methods have achieved excellent performance in image denoising, one solution is to use a denoising deep neural network in updating the latent visual data .v in Equation (4.8) of Sect. 4.3.1. To this end, ADMM-Net proposes to unfold the iterative optimization into sequential stages and use a learnable convolution layer to compensate for the reconstruction artifacts in each stage instead of a soft or hard thresholding function [32–34]. The parameters in each convolution layer are independent of the sensing matrix and not shared among layers, so the whole data flow graph can be trained in an end-to-end (E2E) manner shown in Fig. 4.1, which makes training difficult at a large number of stages. Later, researchers proposed similar networks incorporating deep spatial, spectral [34,35], and Gaussian scale mixture [36] prior into GAP framework.
4 Machine Learning in Coded Optical Imaging
61
Fig. 4.1 The framework and pattern for each patch in each iteration of deep tensor ADMM-Net [33]. (Reprinted by permission from IEEE: J. Ma, X.Y. Liu, Z. Shou, X. Yuan, Deep tensor ADMM-Net for snapshot compressive
imaging, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 10223– 10232, ©2019 IEEE)
The deep unfolded networks [33–39] provide a feasible way of integrating deep learning with conventional optimization, but the network training is quite resource demanding. In contrast, the recently proposed plug-and-play (PnP) method uses a pre-trained denoising network to update the auxiliary variable of ADMM [40, 41] and soundly bridges the gap between conventional machine learning and deep learning. Specifically, PnP only employs a pre-trained network to solve Equation (4.8), and thus the sub-problem becomes
also jointly incorporate deep image prior with handcrafted ones such as TV [2]. By using the deep denoisers to devote their higher performance in the ADMM/GAP framework, PnP methods well combine the speed, accuracy, and flexibility, serving as an easy start-up approach for coded imaging and other imaging systems [44, 45]. In addition, after incorporating the deep network, the required iteration number is significantly reduced.
v
.
(k+1)
= Dσ (x
(k+1)
1 + u(k) ), ρ
(4.32)
where .Dσ denotes the adopted denoising network [6]. It can be proven that denoising networks can also be integrated in the GAP framework which is actually a specific case of ADMM [6]. Theoretically, any deep image denoisers are applicable here. Among them, the fast and flexible denoising convolutional neural network (FFDNet) is a representative network widely used in the PnP-based methods [42]. Fast deep video denoising network (FastDVDNet) is another model designed for video denoising, which makes full use of the temporal coherence between neighboring frames [43]. Furthermore, one can
Program Code
The code for deep tensor ADMM-Net can be found in https://github.com/Phoenix-V/ tensor-admm-net-sci. The code for PnP algorithm for SCI can be found in https:// github.com/liuyang12/PnP-SCI.
4.5
End-to-End Deep Neural Network: Bringing the Wings of Quality and Speed to Reconstruction
Under the framework of conventional optimization, the algorithms often have to trade off between time consumption and reconstruction ac-
62
W. Zhang and J. Suo
curacy, and the parameter setting is empirical. An E2E neural network can take account of both efficiency and performance [2, 6, 7, 46–48] and has been used in various inverse problems in coded optical imaging [49–51]. Specifically, an E2E network generally takes the coded measurement(s) (and optionally the encoding masks) as input and aims to retrieve the expected multidimensional target signal. Most E2E decoding networks require a large amount of training data— coded measurements and the true target visual data for parameter learning. To address this issue, some untrained networks have also emerged. Here we list some examples using E2E networks for coded video and hyper-spectral data acquisition.
4.5.1
E2E on SCI Video Reconstruction
Various E2E models have been proposed for video SCI [2, 46, 47, 52–58]. A typical E2E network is composed of a series of fully connected layers and ReLU function [46], and the loss function is L(θ) =
.
N 1 E ||f (yi ; θ) − xi ||22 , N i
(4.33)
where .yi and .xi are the patch-wise measurement and ground truth, respectively, N is the number of patches extracted from the input frame, .θ is the set of learnable parameters, and .f (yi ; θ) is the output of the network on the input of .yi . Besides, fully connected layers have enormous scale of parameters. Therefore, U-net structure is experimentally a more suitable backbone due to its natural encoder-decoder structure with skip connections [17]. A video-rate E2E convolutional neural network (E2E-CNN) is proposed with the encoder–decoder architecture [2] (see Fig. 4.2), including five residual blocks [59] in both the encoder and the decoder as well as corresponding skip connections. Note that the −1 input is .HT (HHT ) y, sharing the same scale
with the desired signal. Since no iteration is required, the well-trained E2E-CNN achieves fast reconstruction with reliable quality. The above method does not significantly exploit the temporal correlation between consecutive video frames and is of limited performance. To this end, researchers introduce recurrent neural networks (RNNs) to model temporal redundancy in a recursive manner. Moreover, as traditional CNNs only have a local receptive field, self-attention mechanism can be used to utilize the nonlocal similarity of the natural scenes to improve the performance further via capturing long-range features [60]. Leveraging benefits from both RNN and the self-attention, the bidirectional recurrent neural network with adversarial training (BIRNAT), shown in Fig. 4.3 achieves better performance [47]. Adversarial training [61] is introduced to further improve the quality by regarding the ground truth as the “real” sample and the reconstructed frame the “false.” A similar idea is also employed in the large-scale time-lapse coded imaging on electron microscopy [53]. The E2E manner can also be used for optimizing the coding and decoding jointly, to achieve a decent coded imaging system. An example is to search for the best exposure pattern for reconstruction under the physical constraints of coded imaging setup [52, 54]. Program Code
The code for BIRNAT can be found in https://github.com/BoChenGroup/ BIRNAT.
4.5.2
E2E on Hyper-spectral Reconstruction
E2E deep learning is also widely used in decoding the spectral information from coded measurement [62–73], which share similar spirit with those for video reconstruction.
4 Machine Learning in Coded Optical Imaging
63
Fig. 4.2 The structure of E2E-CNN [2]. (Reprinted from M. Qiao, Z. Meng, J. Ma, X. Yuan, Deep learning for video compressive sensing, APL Photonics 5(3), 030801
(2020), with the permission of AIP Publishing, ©2020 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license)
Fig. 4.3 The workflow and structure of BIRNAT [47]. (Reprinted by permission from Springer Nature: Cheng, R. Lu, Z. Wang, H. Zhang, B. Chen, Z. Meng, X. Yuan, BIRNAT: Bidirectional recurrent neural networks with ad-
versarial training for video snapshot compressive imaging, in European Conference on Computer Vision (Springer, 2020), pp. 258–275, ©2020 Springer Nature Switzerland AG)
For example, a network adapted from the very deep super-resolution (VDSR) network [74] is used to enhance the spectral resolution and capable of recovering hyper-spectral data cube from its coded snapshot, dubbed hyper-spectral CNN (HS-CNN) [62]. For better performance, .λ-Net integrates GAN and self-attention in the U-net structure [64]. Utilizing the self-attention mechanism for better decoding, various extended methods have been proposed. Among them, the spatial-spectral self-attention network (TSA-Net) shown in Fig. 4.4 performs spatial and spectral attention, respectively, to model the region-based and spectral channel-based correlation [68].
Inspired by the self-attention mechanism on sampled tokens in a transformer, a transformerbased method named coarse-to-fine sparse transformer (CST) is proposed to perform self-attention on the regions with related spatial information, where the sparsity map of the scene is previously estimated to filter regions [73]. Hence, the network consists of two steps, the coarse patch selecting and the fine pixel clustering, as shown in Fig. 4.5. There are also networks conducting model learning in the frequency domain for better detail reconstruction [71].
64
W. Zhang and J. Suo
Fig. 4.4 The spatial-spectral self-attention mechanism and the architecture of TSA-Net [68]. (Reprinted by permission from Springer Nature: Z. Meng, J. Ma, X. Yuan, End-to-end low cost compressive spectral imaging with
spatial-spectral self-attention, in European Conference on Computer Vision (Springer, 2020), pp. 187–204, ©2020 Springer Nature Switzerland AG)
Fig. 4.5 The structure of CST (a) and the proposed spectra-aware hashing attention block (SAHAB) (b) under the mechanism of SHA-MSA (c) [73]. (Reprinted by permission from Springer Nature: J. Lin, Y. Cai, X. Hu, H. Wang, X. Yuan, Y. Zhang, R. Timofte, L. Van
Gool, Coarse-to-fine sparse transformer for hyper-spectral image reconstruction, in European Conference on Computer Vision (Springer, 2022), pp. 686–704, ©2022 The Author(s), under exclusive license to Springer Nature Switzerland AG)
4 Machine Learning in Coded Optical Imaging
Program Code
The code for CST can be found in https:// github.com/caiyuanhao1998/MST, where the code for .λ-Net, ADMM-Net, TSA-Net, GAP-Net [35], DGSMP [36], BIRNAT, HDNet and MST [72] are also included.
4.5.3
Challenges in E2E Network Training and Solutions
Despite its high accuracy and fast inference, the E2E-based approach is limited in two aspects: • Large amount of training data are crucial for the performance of the deep neural network. However, the real high-dimensional data are difficult to capture, especially for dynamic non-repetitive/non-periodical scenes. • The training process is quite time consuming because of two reasons: on one hand, the E2E model should be limited to a local region, since it cannot handle large pixel numbers due to limited memory and computing resources; on the other hand, we have to learn a bunch of local models which are spatially varying due to non-uniform coding masks caused by imperfect registration and optical aberrations. To handle the problem of lacking training data, several untrained networks have been proposed. For example, the deep image prior (DIP) [75] is employed in the hyper-spectral coded imaging by optimizing - = argmin 1 ||y − Hx||22 + λR(x) .(x, o) x,o 2 ρ + ||y − HTo (e)||22 , subject to x = To (e). 2 (4.34) Here, .x, .H, and .y are, respectively, the target data, coding matrix, and the measurement; .To (e) is the output of a neural network with random input .e and parameter set .o [70]. By decomposing the
65
problem with the ADMM, the sub-problem of .o targets to learn the parameters according to the measurement, thus serving as the loss function of the self-supervised network. A similar idea is employed in high-speed microscopy imaging [58]. To decode large large-scale coded measurements, a memory-efficient network for video SCI named RevSCI-net is proposed [56], which borrows the idea from reversible neural network (Rev-Net) [76] to introduce nonlinear mapping, in addition to the commonly used CNN-based feature extraction and reconstruction, as shown in Fig. 4.6a and b. The Rev-Net performs the back propagation in each layer using the activations derived from the next layer. Specifically, the coarse features extracted by the CNN are split into m multiple parts, and the activations can be computed by a chain rule from the activations in the next layer, as shown in Fig. 4.6c and d. Therefore, only the last activation in the nonlinear mapping step is required to be stored, which significantly saves the required memory. Another approach for processing large-scale data is to avoid training a large number of networks from scratch. Borrowing from the metalearning strategy, the MetaSCI method [57] decomposes the large-scale data into blocks with varying coding patterns and learns their corresponding decoding networks in a coarse-to-fine manner. Specifically, a base model trained from a general dataset with various coding masks serves as the backbone, and then fast adaptation is conducted to fit the data of different blocks. For faster and better convergence, the spatially periodic mask is easy to adapt and more favored. Program Code
The code for the self-supervised network on hyper-spectral imaging can be found in https://github.com/mengziyi64/CASSISelf-Supervised. The code for RevSCInet can be found in https://github.com/ BoChenGroup/RevSCI-net. The code for MetaSCI can be found in https://github. com/xyvirtualgroup/MetaSCI-CVPR2021.
66
W. Zhang and J. Suo
Fig. 4.6 The forward (a) and the reverse (b) computations of the original Rev-Net [76], and the forward (c) and the reverse (d) processes of RevSCI-net [56]. F and G are arbitrary functions. (Reprinted by permission from IEEE: Z. Cheng, B. Chen, G. Liu, H. Zhang, R. Lu, Z. Wang,
4.6
Some Exemplary Applications
As a classic topic in coded optical imaging, the setup designs and reconstruction algorithms are both inspiring for other coded imaging setups. Here we show some exemplary applications sharing the same spirit with SCI. Reconstruction from multiple coded measurements Imaging systems taking multiple coded measurements with different coding patterns can generally decode finer information than from a single snapshot. In hyper-spectral imaging, tens of measurements by pseudo-randomly sampling the full transmission map of the system are proven to be capable of retrieving hundreds of spectral channels at full resolution by a U-Net structure [77]. The untrained network using DIP can also be used to learn parameters from the set of measurements with different sensing matrices [78]. Lensless coded imaging Lensless imaging provides an option to avoid constraints from lens design and fabrication, using diffractive elements or coded aperture to encode the target objects
X. Yuan, Memory-efficient network for large-scale video compressive sensing, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 16246–16255, ©2021 IEEE)
located at a specific distance onto the sensor. Then, one can apply machine learning-based decoding methods in subsequent decoding. Notably, a parallel lensless system with a coded aperture is proven to be beneficial for accelerating the reconstruction by feeding the image blocks into a deep CNN [79]. Quantitative phase retrieval is another important topic in lensless imaging, as the transformation from the phase .φ to the measured intensity .I can be directly expressed by a forward propagation operator .H. Conversely, the decoding can be expressed as following inverse problem: φˆ = argmin ||I − Hφ||22 + λR(φ),
.
φ
(4.35)
in which previously introduced methods are applicable, e.g., an E2E U-Net-based network with residual blocks [80]. Transient photography Transient optical imaging provides a new prospective for understanding the physics and biology phenomena. Although the currently available sensors are either of insufficient frame rate or limited to extremely low resolution, one can design optical imaging setups to record the transient dynamics in an encoded manner. Taking the compressed ultrafast photography
4 Machine Learning in Coded Optical Imaging
67
Fig. 4.7 The U-Net structure used in DeepCUP [88]. (Reprinted by permission from Optica Publishing Group: A. Zhang, J. Wu, J. Suo, L. Fang, H. Qiao, D.D.U. Li, S.
Zhang, J. Fan, D. Qi, Q. Dai, et al., Single-shot compressed ultrafast photography based on U-net network, Optics Express 28(26), 39299 (2020), ©The Optical Society)
(CUP) [81, 82] as an example, the measurement encodes the target scene by a digital micromirror device (DMD) and shears the transient event by a streak camera with a voltage ramp. Under the compressive sensing framework, the decoding takes a similar form to Equation (4.5) as
deep compressive ultrafast photography (DeepCUP) proposes a U-Net-based E2E network in Fig. 4.7, with the loss function defined as
1 Iˆ = argmin ||E − OI||22 + λR(I), I 2
.
(4.36)
Where .E is the 2D measurement on the streak camera, .O is the linear operator, and .I is the desired 3D signal [81]. There are also some other variants for higher performance, e.g., lossless encoding (LLE) CUP introduces a supplementary measurement via a charge-coupled device (CCD) [83,84], and compressed optical-streaking ultrahigh-speed photography (COSUP) performs optical-streaking by integrating a galvanometer scanning to the complementary metal-oxidesemiconductor (CMOS) camera [85]. The TwIST algorithm with TV regulation is employed on the very first version of CUP and LLE-CUP [86] achieving a rate of up to .1011 frames per second (fps) and on trillion-frame-persecond CUP (T-CUP) [87] with up to .1012 fps. Despite its universality in decoding compressed measurements, TwIST requires careful parameter selection for decent performance and is prone to system noise and aberration. To this end, the
L = μa ||I(x, y, t) − Igt (x, y, t)||
.
() I , (4.37) I) − TSC(Igt )|| + μc R + μb ||TSC(-
.I is the prediction, .Igt is the ground truth, where the regulation .R(I) is the regularizer, and TSC is the forward imaging model compressing the 3D signal to 2D measurement [88]. DeepCUP serves as another example to demonstrate the better performance of deep learning methods on image recovery tasks over conventional machine learning ones.
4.7
Summary and Discussions
The above sections introduce the mainstream algorithm designs serving for the computational stage of “optical encoding + computational decoding” scheme, the common architecture of coded optical imaging. The rich set of algorithms generally falls into three groups—conventional optimization, end-to-end deep neural networks, and a hybrid scheme that sits between, which match well the progress of machine learning theories and technologies in the past decades. A
68
W. Zhang and J. Suo
Fig. 4.8 The frameworks of various categories of algorithms proposed for coded imaging [17]. (Reprinted by permission from IEEE: X. Yuan, D.J. Brady, A.K. Kat-
saggelos, Snapshot compressive imaging: Theory, algorithms, and applications, IEEE Signal Processing Magazine 38(2), 65 (2021), ©2021 IEEE)
summary of various categories of algorithms is shown in Fig. 4.8. All these algorithms are targeted for the same objective, imposing priors explicitly or implicitly to solve the ill-posed inversion problem, and each type of decoding strategy has its own advantages and disadvantages. Conventional optimization explicitly imposes priors and infers solutions following the convex optimization theory, so it is of high interpretability and generally applies to different settings flexibly. The main shortcomings are the time-consuming inference and dependence on empirical parameter settings. The E2E deep neural networks leave the prior definition and inference to the network design and training, which achieves high performance and fast inference, but at the expense of high dependence on large training data and inflexibility to changes of system settings. The PnP methods can achieve a decent performance with a good balance between the speed and flexibility, thanks to the combination of two types of machine learning approaches. Overall, both conventional optimization and E2E networks can achieve state-of-the-art accuracy, but the former suffers from high complexity, while the latter lacks flexibility and requires large training data. One promising solution is the hybrid scheme, incorporating deep neural networks into a conventional optimization framework either with an iterative loop or sequential unfolded stages, to leverage advantages from both regimes. To address the issues in real applications, such as lacking training data, being sensitive to noise,
with large-scale data, etc., machine learning techniques can also be utilized, e.g., unsupervised machine learning, lightweighted network design, meta-learning strategy, etc. One should choose algorithms based on specific cases. Although described based on the SCI, the approaches in this chapter are generally applicable to other coded optical imaging setups. The new algorithms are also emerging with the development of machine learning fields. We expect and believe that in the future, machine learning will continue to promote the development of coded optical imaging and witness its broader applications.
References 1. S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al., Foundations and Trends® in Machine learning 3(1), 1 (2011) 2. M. Qiao, Z. Meng, J. Ma, X. Yuan, Apl Photonics 5(3), 030801 (2020) 3. X. Yuan, in 2016 IEEE International Conference on Image Processing (ICIP) (IEEE, 2016), pp. 2539– 2543 4. X. Liao, H. Li, L. Carin, SIAM Journal on Imaging Sciences 7(2), 797 (2014) 5. Y. Liu, X. Yuan, J. Suo, D.J. Brady, Q. Dai, IEEE transactions on pattern analysis and machine intelligence 41(12), 2990 (2018) 6. X. Yuan, Y. Liu, J. Suo, Q. Dai, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1447–1457 7. A. Lucas, M. Iliadis, R. Molina, A.K. Katsaggelos, IEEE Signal Processing Magazine 35(1), 20 (2018)
4 Machine Learning in Coded Optical Imaging 8. L.I. Rudin, S. Osher, E. Fatemi, Physica D: nonlinear phenomena 60(1–4), 259–268 (1992) 9. A. Beck, M. Teboulle, IEEE transactions on image processing 18(11), 2419 (2009) 10. M. Zhu, S.J. Wright, T.F. Chan, Computational Optimization and Applications 47(3), 377 (2010) 11. J.M. Bioucas-Dias, M.A. Figueiredo, IEEE Transactions on Image processing 16(12), 2992 (2007) 12. A. Chambolle, Journal of Mathematical imaging and vision 20(1), 89 (2004) 13. J. Yang, X. Yuan, X. Liao, P. Llull, D.J. Brady, G. Sapiro, L. Carin, IEEE Transactions on Image Processing 23(11), 4863 (2014) 14. A.P. Dempster, N.M. Laird, D.B. Rubin, Journal of the Royal Statistical Society: Series B (Methodological) 39(1), 1 (1977) 15. M. Chen, J. Silva, J. Paisley, C. Wang, D. Dunson, L. Carin, IEEE Transactions on Signal Processing 58(12), 6140 (2010) 16. J. Yang, X. Liao, X. Yuan, P. Llull, D.J. Brady, G. Sapiro, L. Carin, IEEE Transactions on Image Processing 24(1), 106 (2014) 17. X. Yuan, D.J. Brady, A.K. Katsaggelos, IEEE Signal Processing Magazine 38(2), 65 (2021) 18. W. Dong, L. Zhang, G. Shi, X. Li, IEEE transactions on Image Processing 22(4), 1620 (2012) 19. X. Yuan, T.H. Tsai, R. Zhu, P. Llull, D. Brady, L. Carin, IEEE Journal of selected topics in Signal Processing 9(6), 964 (2015) 20. F. Renna, L. Wang, X. Yuan, J. Yang, G. Reeves, R. Calderbank, L. Carin, M.R. Rodrigues, IEEE Transactions on Information Theory 62(11), 6459 (2016) 21. L. Wang, Z. Xiong, G. Shi, F. Wu, W. Zeng, IEEE transactions on pattern analysis and machine intelligence 39(10), 2104 (2016) 22. M.A. Figueiredo, R.D. Nowak, S.J. Wright, IEEE Journal of selected topics in signal processing 1(4), 586 (2007) 23. D. Reddy, A. Veeraraghavan, R. Chellappa, in CVPR 2011 (IEEE, 2011), pp. 329–336 24. X. Yuan, P. Llull, X. Liao, J. Yang, D.J. Brady, G. Sapiro, L. Carin, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 3318–3325 25. Y. Hitomi, J. Gu, M. Gupta, T. Mitsunaga, S.K. Nayar, in 2011 International Conference on Computer Vision (IEEE, 2011), pp. 287–294 26. K. Guo, D. Labate, SIAM journal on mathematical analysis 39(1), 298 (2007) 27. P. Yang, L. Kong, X.Y. Liu, X. Yuan, G. Chen, IEEE Transactions on Image Processing 29, 6466 (2020) 28. Z. Zha, X. Yuan, B. Wen, J. Zhou, J. Zhang, C. Zhu, IEEE Transactions on Image Processing 29, 5094 (2020) 29. S. Gu, L. Zhang, W. Zuo, X. Feng, in Proceedings of the IEEE conference on computer vision and pattern recognition (2014), pp. 2862–2869
69 30. J.F. Cai, E.J. Candès, Z. Shen, SIAM Journal on optimization 20(4), 1956 (2010) 31. C.A. Metzler, A. Maleki, R.G. Baraniuk, IEEE Transactions on Information Theory 62(9), 5117 (2016) 32. J. Sun, H. Li, Z. Xu, et al., Advances in neural information processing systems 29 (2016) 33. J. Ma, X.Y. Liu, Z. Shou, X. Yuan, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 10,223–10,232 34. L. Wang, C. Sun, Y. Fu, M.H. Kim, H. Huang, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019), pp. 8032– 8041 35. Z. Meng, S. Jalali, X. Yuan, arXiv preprint arXiv:2012.08364 (2020) 36. T. Huang, W. Dong, X. Yuan, J. Wu, G. Shi, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 16,216– 16,225 37. J.R. Hershey, J.L. Roux, F. Weninger, arXiv preprint arXiv:1409.2574 (2014) 38. Y. Fu, Z. Liang, S. You, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 2674 (2021) 39. Y. Cai, J. Lin, H. Wang, X. Yuan, H. Ding, Y. Zhang, R. Timofte, L. Van Gool, arXiv preprint arXiv:2205.10102 (2022) 40. S.H. Chan, X. Wang, O.A. Elgendy, IEEE Transactions on Computational Imaging 3(1), 84 (2016) 41. M. Qiao, X. Liu, X. Yuan, Optics letters 45(7), 1659 (2020) 42. K. Zhang, W. Zuo, L. Zhang, IEEE Transactions on Image Processing 27(9), 4608 (2018) 43. M. Tassano, J. Delon, T. Veit, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1354–1363 44. U.S. Kamilov, H. Mansour, B. Wohlberg, IEEE Signal Processing Letters 24(12), 1872 (2017) 45. Z. Siming, L. Yang, M. Ziyi, Q. Mu, T. Zhishen, Y. Xiaoyu, H. Shensheng, Y. Xin, Photonics Research 9(2), B18 (2021) 46. M. Iliadis, L. Spinoulas, A.K. Katsaggelos, Digital Signal Processing 72, 9 (2018) 47. Z. Cheng, R. Lu, Z. Wang, H. Zhang, B. Chen, Z. Meng, X. Yuan, in European Conference on Computer Vision (Springer, 2020), pp. 258–275 48. G. Barbastathis, A. Ozcan, G. Situ, Optica 6(8), 921 (2019) 49. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, A. Ashok, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 449–458 50. K.H. Jin, M.T. McCann, E. Froustey, M. Unser, IEEE Transactions on Image Processing 26(9), 4509 (2017) 51. A. Mousavi, R.G. Baraniuk, in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (IEEE, 2017), pp. 2272–2276
70 52. M. Yoshida, A. Torii, M. Okutomi, K. Endo, Y. Sugiyama, R.i. Taniguchi, H. Nagahara, in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 634–649 53. S. Zheng, C. Wang, X. Yuan, H.L. Xin, Patterns 2(7), 100292 (2021) 54. M. Iliadis, L. Spinoulas, A.K. Katsaggelos, Digital Signal Processing 96, 102591 (2020) 55. K. Xu, F. Ren, in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2018), pp. 1680–1688 56. Z. Cheng, B. Chen, G. Liu, H. Zhang, R. Lu, Z. Wang, X. Yuan, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 16,246–16,255 57. Z. Wang, H. Zhang, Z. Cheng, B. Chen, X. Yuan, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 2083–2092 58. M. Qiao, X. Liu, X. Yuan, Optics Letters 46(8), 1888 (2021) 59. K. He, X. Zhang, S. Ren, J. Sun, in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778 60. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Advances in neural information processing systems 30 (2017) 61. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Communications of the ACM 63(11), 139 (2020) 62. Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, F. Wu, in Proceedings of the IEEE International Conference on Computer Vision Workshops (2017), pp. 518–525 63. D.S. Jeon, S.H. Baek, S. Yi, Q. Fu, X. Dun, W. Heidrich, M.H. Kim, ACM Trans. Graph. 38(4) (2019) 64. X. Miao, X. Yuan, Y. Pu, V. Athitsos, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2019), pp. 4059–4069 65. X. Miao, X. Yuan, P. Wilford, in Digital Holography and Three-Dimensional Imaging (Optica Publishing Group, 2019), pp. M3B–3 66. L. Wang, C. Sun, M. Zhang, Y. Fu, H. Huang, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1661– 1671 67. Z. Meng, M. Qiao, J. Ma, Z. Yu, K. Xu, X. Yuan, Optics Letters 45(14), 3897 (2020) 68. Z. Meng, J. Ma, X. Yuan, in European Conference on Computer Vision (Springer, 2020), pp. 187–204 69. Y. Fu, T. Zhang, L. Wang, H. Huang, Coded hyperspectral image reconstruction using deep external and internal learning. IEEE Transactions on Pattern
W. Zhang and J. Suo
70.
71.
72.
73.
74.
75.
76.
77. 78.
79. 80. 81. 82. 83. 84.
85. 86. 87. 88.
Analysis and Machine Intelligence 44(7), 3404–3420 (2021) Z. Meng, Z. Yu, K. Xu, X. Yuan, in Proceedings of the IEEE/CVF International Conference on Computer Vision (2021), pp. 2622–2631 X. Hu, Y. Cai, J. Lin, H. Wang, X. Yuan, Y. Zhang, R. Timofte, L. Van Gool, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 17,542–17,551 Y. Cai, J. Lin, X. Hu, H. Wang, X. Yuan, Y. Zhang, R. Timofte, L. Van Gool, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 17,502–17,511 J. Lin, Y. Cai, X. Hu, H. Wang, X. Yuan, Y. Zhang, R. Timofte, L. Van Gool, arXiv preprint arXiv:2203.04845 (2022) J. Kim, J.K. Lee, K.M. Lee, in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 1646–1654 D. Ulyanov, A. Vedaldi, V. Lempitsky, in Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 9446–9454 A.N. Gomez, M. Ren, R. Urtasun, R.B. Grosse, Advances in neural information processing systems 30 (2017) D. Gedalin, Y. Oiknine, A. Stern, Optics express 27(24), 35811 (2019) D. Van Veen, A. Jalal, M. Soltanolkotabi, E. Price, S. Vishwanath, A.G. Dimakis, arXiv preprint arXiv:1806.06438 (2018) X. Yuan, Y. Pu, Optics express 26(2), 1962 (2018) A. Sinha, J. Lee, S. Li, G. Barbastathis, Optica 4(9), 1117 (2017) L. Gao, J. Liang, C. Li, L.V. Wang, Nature 516(7529), 74 (2014) J. Liang, in CLEO: Applications and Technology (Optical Society of America, 2020), pp. JTh1G–3 J. Liang, C. Ma, L. Zhu, Y. Chen, L. Gao, L.V. Wang, Science advances 3(1), e1601814 (2017) J. Liang, C. Ma, L. Zhu, Y. Chen, L. Gao, L.V. Wang, in High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management II, vol. 10076 (SPIE, 2017), vol. 10076, pp. 68–73 X. Liu, J. Liu, C. Jiang, F. Vetrone, J. Liang, Optics letters 44(6), 1387 (2019) J. Liang, L.V. Wang, Optica 5(9), 1113 (2018) J. Liang, L. Zhu, L.V. Wang, Light: Science & Applications 7(1), 1 (2018) A. Zhang, J. Wu, J. Suo, L. Fang, H. Qiao, D.D.U. Li, S. Zhang, J. Fan, D. Qi, Q. Dai, et al., Optics Express 28(26), 39299 (2020)
Part II Coded Planar Imaging
5
Diffractive Optical Neural Networks Minhan Lou and Weilu Gao
Abstract
Keywords
Replacing the conventional imaging optics in machine vision systems with diffractive optical neural networks (DONNs) that leverage spatial light modulation and optical diffraction have been promising to enable new machine learning intelligence and functionality in optical domain and reduce computing energy and resource requirements in electrical domain. In this chapter, the fundamental models to describe and design DONNs are first reviewed. Passive DONNs systems operating in the terahertz and short wavelength ranges are then introduced. Moreover, the advanced architectures that are resilient to hardware imperfections, that demonstrate improved performance, and that are implemented in photonic integrated circuits, are discussed. Furthermore, the implementations of system reconfigurability through hybrid optoelectronic approaches are described. In addition, the effects from the physical model inaccuracy and how physics-aware training is used to correct deployment errors from both models and hardware are discussed. Finally, an all-optical reconfigurable DONNs system based on cascaded liquid-crystal spatial light modulators is demonstrated.
Forward and backward propagation models · Terahertz · Short wavelength · Multichannel · Multiplexing · Metamaterials · Photonic integrated circuits · Spatial light modulator · Hybrid optoelectronic reconfigurability · Model inaccuracy · Physics-aware training · All-optical reconfigurability
M. Lou · W. Gao (o) Department of Electrical and Computer Engineering, The University of Utah, Salt Lake City, CT, USA e-mail: [email protected]; [email protected]
5.1
Introduction
Machine learning (ML) algorithms have seen unprecedented performance not only in imaging science and technology, including computer vision and autonomous driving [1–3], but also in broad scientific domains, such as the discovery of materials and molecules [4, 5] and chip and circuit design [6]. However, the execution of ML algorithms on hardware requires substantial computational and memory resources and consumes substantial energy. With the end of Dennard scaling and Moore’s laws, the energy consumption and integration density of electronic circuits have started to hit a bottleneck of further reducing the energy consumption and increasing the integration density of electronic circuits for processing trillions of operations [7, 8], which thus urgently calls for new high-throughput and energy-efficient hardware ML accelerators. Recently, optical architectures are emerging as high-performance ML hardware accelerators by
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_5
73
74
leveraging fundamentally different particles, photons, to break the electronic bottleneck, thanks to the extreme parallelism from the weak interaction and multiplexing of photons as well as low static energy consumption [9]. In addition to the demonstrations of two-dimensional (2D) integrated optical ML circuits [10–14], threedimensional (3D) free-space optical systems exploit the out-of-plane light routing and can host millions of compact active devices [15–28]. Among these demonstrations, diffractive optical neural networks (DONNs) can optically perform ML tasks through the spatial light modulation and optical diffraction of coherent light in multiple diffractive layers (Fig. 5.1a). Specifically, spatial light modulation on each diffractive layer regulates the amplitude and phase of input electric field, and optical diffraction creates interconnects between diffractive layers, which mimic deep neural network architectures. In order to perform ML tasks, amplitude and phase values in all diffractive layers are optimized so that coherent input images (e.g., Modified National Institute of Standards and Technology (MNIST) handwritten digit images) can be converged to one of predefined regions on a detector array that represents the label in supervised learning (e.g., numbers). The classification criterion is thus to find the detector region with the largest signal. Figure 5.1b displays an illustration that an input image with a handwritten digit 5 passes through multiple diffractive layers and then is focused onto one predefined region representing number 5. Replacing the conventional imaging optics in machine vision systems with DONNs can reduce the computational complexity and resource requirement on backend electronic processing units and can bring new functionality to low-resolution imaging systems [29]. This chapter reviews the fundamentals and current development of free-space DONNs and is organized as follows. Section 5.2 describes the mathematical descriptions of DONNs and how the widely used backpropagation algorithm in ML is utilized for training and optimizing the parameters of diffractive layers. Section 5.3 discusses the pioneering hardware implementations of DONNs in the terahertz
M. Lou and W. Gao
(THz) wavelength range, which benefit from the easy fabrication of diffractive layers. Section 5.4 further describes the DONNs in more accessible short wavelength ranges, where optical and optoelectronic components are mature. Section 5.5 summarizes a few advanced architectures with multiple hardware channels and optical multiplexing for highperformance and multifunctional DONNs. Section 5.6 discusses a few on-chip DONNs implemented in photonic integrated circuits in the telecommunication wavelength range. Section 5.7 describes reconfigurable hybrid DONNs with electronic systems. Section 5.8 discusses the discrepancies between physical models and hardware systems and how physicsaware training can mitigate them. Section 5.9 describes our latest demonstration of an alloptical reconfigurable DONNs with physicsaware modeling and training and without any intermediate electrical-optical conversions. Section 5.10 summarizes this chapter.
5.2
Fundamentals of DONNs
Figure 5.1c shows a detailed description of building blocks in a DONNs system illustrated in Fig. 5.1a. Input images are generated by shining coherent light, which mathematically correspond to complex-valued tensors. As input images propagate toward the detector array, which is called forward propagation, the input complex-valued tensors go through two fundamental operations. One is the free-space diffractive propagation, and the other is the pixel-wise complex-valued multiplication when transmitting through diffractive layers. Both are included in the forward propagation model of a DONNs system as described in Sect. 5.2.1. The obtained images on the detector plane after forward propagation are generally different from desired output images. A loss function to quantify such difference, such as mean square error and cross entropy, is defined. The spatial light modulation parameters on diffractive layers are optimized to minimize the loss function, which is called backward propagation. Since the mathematical operations of
Fig. 5.1 Overview and the first demonstration of a THz DONNs system. (a) Schematic and (b) 3D illustration of a DONNs system. (c) Diagram of a DONNs model. (Adapted from Ref. [30]). (d) A THz DONNs system implemented with 3D printing
technology. ((b) and (d) are adapted by permission from Dr. Aydogan Ozcan group from University of California, Los Angeles, U.S.A.)
5 Diffractive Optical Neural Networks 75
76
M. Lou and W. Gao
DONNs are naturally tensor operations, the stateof-the-art ML framework and hardware, such as PyTorch and general-purpose graphics processor units (GPUs), can be leveraged to substantially accelerate the optimization process, which is described in Sect. 5.2.2.
that from the periodic array of that pixel under uniform illumination, which is Ei,m+1
.
E t = k Ci,k,m Ek,m E = k Ci,k,m tk,m Ek,m
(5.4)
with Ci,j,m = S · w(ri,m+1 − rj,m ),
5.2.1
Forward Propagation Model
The optical diffraction is described by a shiftinvariant Dyadic green’s function, which can be approximated by Rayleigh-Sommerfeld diffraction equation z 1 n j 2π nr ( + )exp( ) (5.1) 2 r 2π r jλ λ / with r = (x, y, z), r = x 2 + y 2 + z2 , w(r) =
.
where n and .λ are the refractive index of light propagation medium (.n = 1 for air) and the wavelength, respectively. The diffraction field .Ediff,m from the transmitted field .Emt of the m-th diffractive layer on the surface .Am is ff Ediff,m (r) =
.
Am
w(r − r' )Emt (r' )ds ' .
(5.2)
If the field is assumed to be uniform over each pixel with area S and the spatial sampling is adequate, the input field .Ei,m+1 of the i-th pixel on the .(m + 1)-th layer from the diffraction of transmitted field .Emt on the m-th layer can be expressed as .Ei,m+1
= Ediff,m (ri,m+1 ) ff E ≈ w(ri,m+1 −rk,m )
Ak,m
k
=
E
t (r ' Em k,m )ds
t S · w(ri,m+1 − rk,m )Ek,m
(5.3)
k
where .ti,m is the complex-valued transmission coefficient of the i-th pixel on the m-th layer. The detectors on the layer D only capture light intensity .Ii,D as .|Ei,D |2 . Since the diffraction equation (Eq. 5.2) is shift-invariant, the convolution theorem and Fast Fourier Transform (FFT) can be utilized to substantially speed up the calculation of Eq. 5.4, which is the most computationally intensive in DONNs models. The computation complexity of the direct summation in Eq. 5.4 for .N × N pixels is .O(N 4 ), which is reduced to .O(N 2 log(N)) through FFT.
5.2.2
Backward Propagation Model
The training or optimization of diffractive layers for a fixed single pair of input and output images can be done with the Gerchberg-Saxton (GS) algorithm, which has been a standard approach for hologram design [31]. However, the iterative nature of GS algorithm makes it difficult to be modified for the training with a large number of input images and optimization parameters, which is ubiquitous in ML tasks. Backpropagation algorithm, where the gradient propagates according to chain rules and along the descent direction, has been a widely used efficient ML training approach [32]. According to Eq. 5.4 and a loss function .L, the gradients with respect to the field can be calculated recursively from the final layer to the first layer as
t t (r ), with Ei,m = Em i,m
where .ri,m is the position vector of the i-th pixel on the m-th layer. Regarding the spatial light modulation, the optical response of each pixel is assumed to be
.
| E ∂ L ∂Ej,m+1 ∂ L || = | ∂Ek,m m 1.8) ML framework has provided fast and efficient computation of FFT and its gradient backpropagation, and can substantially speed up the calculations of forward the backward propagation models in DONNs.
5.3
Terahertz-Wavelength DONNs
Mature 3D printing technologies have enabled precise and versatile manufacturing of diffractive optical components in long-wavelength ranges. With such capability, Lin et al. have implemented the first hardware DONNs system in the terahertz (THz) wavelength range [15]. Figure 5.1d shows the fabricated five-layer THz DONNs system illuminated by a coherent THz source. Each diffractive layer consists of .200 × 200 pixels in an .8 × 8 cm2 area. As described before, this system can perform image classification tasks in the MNIST and Fashion-MNIST datasets with good classification accuracies. Once the diffractive layers are fabricated, the inference of ML tasks using this system does not consume energy. In addition, such inference is performed in a single-clock-cycle forward propagation at the light speed. Thus, the implemented THz DONNs system is an energy-efficient and high-throughput ML inference hardware accelerator. In each diffractive layer, each pixel is assumed to possess the phase-only transmission response, which is directly correlated with the pixel height through a simple relationship between the phase delay and plane wave propagation distance. By defining the loss function and employing the backpropagation algorithm described in Sect. 5.2, the optimized phase values of pixels in diffractive layers can
be obtained, and these diffractive layers can be fabricated with corresponding pixel heights using 3D printers. The THz DONNs system is trained as an image classifier to perform automated classification of handwritten digits from 0 to 9 in the MNIST dataset. Input images are binary encoded by letting the input THz light pass through the metal plates with engraved hollow digits from input images. The loss function is minimized to maximize the optical signal for desired detector region for accurate classification. The calculated test classification accuracy is 91.75%, and the experimental one from selected 50 images is 88%. Furthermore, deep architectures with more diffractive layers are shown to be better in classification accuracies than those with less diffractive layers, suggesting that DONNs exhibit “depth” advantages despite without nonlinear functions. However, the performance of DONNs is generally vulnerable to hardware imperfection. One strategy to construct imperfection-resilient DONNs is to incorporate the modeled imperfection into the training process. For example, the classification accuracies of THz DONNs are sensitive to the translation, scaling, and rotation of input images, which are inevitable in practical systems and machine vision applications. In order to construct DONNs systems that are resilient to these object transformations, Mengu et al. first quantify the sensitivity of DONNs to these uncertainties as shown in Fig. 5.2a and further develop a training strategy that incorporates the formulated input object translation, scaling, and rotation as uniformly distributed random variables [33]. As a result, this training approach successfully guides the optimization of diffractive layers toward a shift-, scale-, and rotationinvariant solution. Furthermore, Mengu et al. model layer-to-layer misalignments as random variables and introduce them in the training process to train vaccinated DONNs to be robust against 3D misalignments while at the cost of reduced classification accuracy [34]. Another strategy to improve the robustness of DONNs systems is to explore different architectures with reduced complexity. Li et al. propose a robust multiscale diffractive U-Net (MDUNet)
Fig. 5.2 Hardware imperfection-resilient DONNs. (a) Translation, scaling, and rotation of input images in DONNs. (Adapted by permission from American Chemical Society: ACS Photonics, Scale-, Shift-, and Rotation-Invariant Diffractive Optical Networks, Mengu, D. et al., © 2020). (b) Schematic of the MDUNet architecture and
downsampling and upsampling principles. (Adapted under the terms of the Optica Open Access Publishing Agreement from Optica Publishing Group: Optics Express, Multiscale diffractive U-Net: a robust all-optical deep learning framework modeled with sampling and skip connections, Li, Y. et al., © 2022)
78 M. Lou and W. Gao
5 Diffractive Optical Neural Networks
DONNs framework by introducing sampling and skip connections [35]. Instead of having high pixel resolution for each diffractive layer, the sampling processes, including downsampling and upsampling, change the pixel size and resolution as shown in Fig. 5.2b. The sampling module can greatly improve the system robustness by reducing the complexity of diffractive layers. Furthermore, the skip connections can fusion the multiscale features from diffractive layers with various resolution, which can achieve similar performance compared to cascaded fully connected DONNs while with reduced model parameters.
5.4
Short-Wavelength DONNs
The practical technologies for generating, manipulating, and detecting THz electromagnetic radiations are limited. This has been widely recognized as the THz gap, meaning that efficient and feasible THz components are challenging to be implemented using either electronic or optical approaches [36]. As a result, the implementation of such THz DONNs systems requires special, costly, and sophisticated equipment. Moreover, there is a vast majority of imaging applications in shorter wavelengths, including visible and nearinfrared wavelengths. Thus, it is essential to extend the operation of DONNs to these wavelengths. In addition to more accessible optical hardware in these wavelengths, compact and advanced diffractive components, such as metasurfaces, can be incorporated [37]. Short visible wavelengths, on the order of hundreds of nanometers, require modern nanofabrication technologies to manufacture diffractive layers. As shown in Fig. 5.3a, Chen et al. employ a multi-step photolithography-etching process to produce five diffractive layers on SiO.2 substrates and construct a visible-wavelength DONNs classifier at a single wavelength 632.8 nm to recognize original images of handwritten digits in the MNIST dataset and modified images that are covered or altered [38]. The calculated classification accuracy is 91.57%, and the experimental accu-
79
racy is 84%. Moreover, Goi et al. utilize galvodithered two-photon nanolithography with axial nanostepping of 10 nm to achieve a pixel density of .>5 × 106 mm−2 in diffractive layers [39], which are further integrated with complementary metal-oxide semiconductor (CMOS) chips for optical decryptors in the near-infrared range. Furthermore, Luo et al. employ metasurfaces that consist of subwavelength rectangular TiO.2 nanopillars in diffractive layers and demonstrate a multiplexed metasurface-based DONNs system integrated with a CMOS imaging sensor for a chip-scale architecture, which can process information directly at physical layers for energyefficient and ultra-fast image processing in the visible range [40]; see Fig. 5.3b. Metasurfaces consisting of subwavelength resonators can manipulate the wavefront of light in nearly arbitrary manners and have enabled high-efficiency and broadband holograms [41] and the miniaturized systems to perform mathematical operations [42], such as differentiation [43, 44] and convolution [45]. Due to the compact and thin characteristics of metasurfaces, Luo et al. achieve a pixel areal density up to .6.25 × 106 mm−2 multiplied by the number of channels, highlighting a significant contrast to the THz DONNs system with a low pixel density .6.25 mm−2 [15]. The multiplexing feature is discussed in Sect. 5.5.2.
5.5
Advanced DONNs Architectures
The state-of-the-art convolutional neural networks in electronic systems can achieve the MNIST classification accuracy .>99.9% [46]. Despite the promising capabilities in demonstrated DONNs, there is a noticeable gap of the classification accuracy between DONNs and their electronic counterparts. In addition, one DONNs system is typically designed for only one specific ML task. This section summarizes a few recent efforts of exploring advanced DONNs architectures with improved performance and expanded functionalities.
Fig. 5.3 DONNs in visible wavelengths. (a) Schematic diagram and experimental setup of the visible-light DONNs system for the classification of MNIST images. (Adapted under a Creative Commons license CC BY-NC-ND 4.0 from Elsevier: Engineering, Diffractive Deep Neural Networks at Visible Wavelengths, Chen, H. et al.,
©2021). (b) Experimental demonstration of multiplexed metasurface-enabled on-chip DONNs. (Adapted under a Creative Commons Attribution 4.0 International License from Springer Nature: Light: Science & Applications, Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible, Luo, X. et al., © 2022)
80 M. Lou and W. Gao
5 Diffractive Optical Neural Networks
5.5.1
Multichannel DONNs
One reason why DONNs underperform is due to the limitation that detectors can only capture nonnegative real-valued light intensity without phase information. To mitigate this constraint, Li et al. propose a differential measurement technique shown in Fig. 5.4a [47]. Each class is assigned to a pair of detectors, and the largest pair reading difference is the classification criterion and training target. Moreover, divided individual classes in a target dataset are processed with two jointly trained DONNs in parallel, which can achieve the calculated classification accuracies of 98.52%, 91.48%, and 50.82% for MNIST, Fashion-MNIST, and grayscale CIFAR10 datasets, respectively. These values are close to those obtained in some all-electronic deep neural networks, such as LeNet, which achieves 98.77%, 90.27%, and 55.21% for the same datasets, respectively. Furthermore, Rahman et al. propose the approach of using feature engineering and ensemble learning with multiple independently trained diffractive layers to substantially improve the classification performance of DONNs [48]. A set of input filters are first utilized to extract features from preprocessed input images, and each feature image passes through a DONNs system with differential detection scheme, as shown in Fig. 5.4b. With carefully selected ensemble of DONNs, this approach can achieve a classification accuracy of CIFAR-10 images as .62.13 ± 0.05%, which demonstrate an accuracy improvement of .>16% compared to the average performance of the individual DONNs. The DONNs systems we have discussed so far are purely passive. Thus, deploying different ML tasks requires the complete rebuilding of the entire system, which substantially degrades the hardware utilization efficiency. Li et al. propose a multitask learning DONNs architecture in Fig. 5.4c, which can automatically recognize which task is being deployed in real time and perform corresponding classification task [49]. Instead of constructing two separate DONNs systems for the image classifications in the MNIST and Fashion-MNIST datasets, a few
81
diffractive layers are shared, and two split branches of diffractive layers are tailored for these two different but related ML tasks. The loss function regularization is employed in the training process to balance the performance of each task and prevent overfitting. The numerical calculation results demonstrate that the multitask DONNs system can achieve the same accuracy for both tasks compared to two separate DONNs while with .>75% improvement in hardware utilization efficiency.
5.5.2
Multiplexing DONNs
In addition to introducing multiple hardware channels, including detector channels and multiple optical paths, as summarized in Sect. 5.5.1, the intrinsic physical features of optical electromagnetic waves, such as polarization and wavelength, can enable parallel processing with multiplexing and thus boost the performance and functionalities of DONNs systems. For example, the metasurfaces developed by Luo et al. can enable not only visible-wavelength compact integrated DONNs as discussed in Sect. 5.4 but also the polarization-multiplexing capability. The rectangular cross section of subwavelength TiO.2 nanopillars produces different effective refractive indices along the two crossed axes, which is the fundamental mechanism for polarization multiplexing. These two orthogonal polarization channels are utilized to experimentally construct a multitask DONNs system for the simultaneous recognition of input images from the MNIST and Fashion-MNIST datasets, as shown in Fig. 5.5a. Furthermore, Li et al. propose a polarizationmultiplexed diffractive processor to all-optically perform multiple arbitrarily selected linear transformations by inserting an array of linear polarizers with predetermined orientations in the middle of trainable isotropic diffractive layers [50]. The multiplexing technique can also lower the requirement of hardware resource, which is particularly crucial for technically challenging
82
Fig. 5.4 Multichannel DONNs. (a) Illustration of differential and class-specific differential DONNs systems. (Adapted under a Creative Commons Attribution 4.0 Unported License from SPIE: Advanced Photonics, Classspecific differential detection in diffractive optical neural networks improves inference accuracy, Li, J. et al., © 2019). (b) An ensemble of DONNs systems consisting of multiple different DONNs systems with the features of input images engineered by a set of input filters before each DONNs system that forms the ensemble. (Adapted
M. Lou and W. Gao
under a Creative Commons Attribution 4.0 International License from Springer Nature: Light: Science & Applications, Ensemble learning of diffractive optical networks, Rahman, M.S.S. et al., © 2021). (c) Illustration of a multitask DONNs architecture with two image classification tasks deployed. (Adapted under a Creative Commons Attribution 4.0 International License from Springer Nature: Scientific Reports, Real-time multi-task diffractive deep neural networks via hardware-software co-design, Li, Y. et al., © 2021)
Fig. 5.5 Multiplexing DONNs. (a) Metasurface-enabled polarization-multiplexed DONNs system for two classification tasks. (Adapted under a Creative Commons Attribution 4.0 International License from Springer Nature: Light: Science & Applications, Metasurface-enabled on-chip multiplexed diffractive neural networks in the visible, Luo, X. et al., © 2022). (b) Schematic of spectral encoding of spatial information for object classification, and (c) experimentally measured and the numerically calculated
optical power spectra and classification confusion matrices of MNIST images with a single-pixel detector. ((b) and (c) are adapted under a Creative Commons Attribution License 4.0 from American Association for the Advancement of Science: Science Advances, Spectrally encoded single-pixel machine vision using diffractive networks, Li, J. et al., © 2021)
5 Diffractive Optical Neural Networks 83
84
M. Lou and W. Gao
THz wavelength range. For example, instead of assigning class in the detector array, Li et al. utilize a broadband THz illumination to assign each digit/class in the MNIST dataset to 10 wavelengths [51]. Thus, only a single-pixel detector is needed at the output end as shown in Fig. 5.5b. Furthermore, the calculated classification accuracy is .>95%, and the experimentally obtained classification accuracy using 50 images is 88%, as shown in the confusion matrices in Fig. 5.5c.
5.6
DONNs in Photonic Integrated Circuits
Although DONNs are mainly implemented in free space, there have been recent efforts of creating on-chip DONNs in 2D silicon photonic integrated circuits (PICs). The compact footprint, lithography-defined optical alignment, and CMOS-compatible and mass manufacturing of silicon PICs make a promising platform for miniaturized DONNs systems. Zarei et al. propose an integrated on-chip five-layer DONNs consisting of five onedimensional (1D) metalines at a telecommunication wavelength of .1.55 μm [52]. Each element in a metaline is an etched rectangle slot with subwavelength dimensions on a silicon-oninsulator substrate, which diffracts the in-plane propagating optical waves. Thus, a metaline behaves like a 1D-version diffractive layer in a free-space DONNs system. A 2D input image is flattened to a 1D array, and a 1D on-chip detector array is utilized to capture output signals. Wang et al. later experimentally implement such a system as illustrated in Fig. 5.6a [53]. The chip contains .∼103 nanoscale diffractive etched slot elements in a .0.135 mm.2 area. The test images are three binary letters “X,” “Y,” and “Z” with random amplitude and phase noises, which are encoded and coupled into the chip through an array of grating couplers. As light propagates through multiple metalines or metasurfaces, three detector channels at the output end represent the class of input images, and picking up the largest detector signal among three channels is the classification criterion, as shown in Fig. 5.6b.
Figure 5.6c displays the confusion matrices obtained from numerical finite-difference-timedomain (FDTD) simulations, continuous-wave (CW) excitation, and pulsed excitation, which have the classification accuracies of .96%, .92%, and .89%, respectively. Furthermore, Zhu et al. experimentally demonstrate a more compact integrated DONNs shown in Fig. 5.6d [54]. The ultracompact diffractive cell is a 10-input-10-output multimode interferometer. The input image encoding and nonlinear activation functions are implemented by a mesh of electrically controllable MachZehnder interferometers (MZIs) with thermaloptic phase shifters, as well as detectors. The number of phase shifters, which determines the footprint and energy consumption of the whole chip, scales linearly with the input data dimension. The calculated and experimentally obtained classification accuracies for the MNIST dataset are 92.6% and 91.4%, respectively. These two values for the Fashion-MNIST dataset are 81.4% and 80.4% as shown in Fig. 5.6e.
5.7
Reconfigurable Hybrid DONNs
We have so far mainly focused on passive DONNs, in which the spatial light modulation responses of diffractive layers are fixed once they are fabricated. Thus, the fabricated passive DONNs system can only be utilized for a specific ML task. Although multitasking and multiplexing architectures described in Sect. 5.5 can handle a few but still limited tasks, it is more desirable to be able to dynamically adjust the response of diffractive layers for reconfigurable DONNs systems. The central component for such implementation is a spatial light modulator (SLM), which can regulate the amplitude and phase of the free-space propagating wavefront under the control of external stimulus. Figure 5.7a summarizes a few free-space alloptical and hybrid optoelectronic computing paradigms, which can potentially achieve reconfigurability [55]. Scheme (i) represents all-optical DONNs as we have discussed in
Fig. 5.6 DONNs in photonic integrated circuits. (a) Schematic and in-plane field distribution of the integrated DONNs system with on-chip diffractive metasurfaces. (b) Comparison of measured optical intensities (dots with error bars) and the simulated optical distribution (gray curve) on the output plane. (c) Confusion matrices obtained from FDTD simulations, as well as CW and pulse light excitation. ((a), (b), and (c) are adapted under a Creative Commons Attribution 4.0 International License from Springer Nature: Nature Communications, Integrated photonic metasystem for
image classifications at telecommunication wavelength, Wang Z. et al., © 2022). (d) Schematic of an experimental on-chip space-efficient DONNs device. (e) The numerical testing results of accuracy and loss versus epoch number for the Fashion-MNIST dataset and the experimentally obtained confusion matrix. ((d) and (e) are adapted under a Creative Commons Attribution 4.0 International License from Springer Nature: Nature Communications, Space-efficient optical computing with an integrated chip diffractive neural network, Zhu H.H. et al., © 2022)
5 Diffractive Optical Neural Networks 85
Fig. 5.7 Reconfigurable hybrid DONNs. (a) The computing paradigms of all-optical and hybrid optoelectronic neural network. (Adapted under the terms of the Optica Open Access Publishing Agreement from Optica Publishing Group: Optics Express, Only-train-electrical-to-optical-conversion (OTEOC): simple diffractive neural networks with optical readout, Wu, L. et al., © 2022). (b) Schematic and photo of a hybrid optoelectronic Fourier neural network experimental setup. (Adapted under the terms of the Optica Open Access Publishing Agreement from Optica Publishing
Group: Optica, Massively parallel amplitude-only Fourier neural network, Miscuglio, M. et al., © 2020). (c) Schematic of a DPU including a DMD, a phase-only SLM, and a CMOS sensor. Two neural network architectures constructed using DPUs including (d) DONNs and (e) DRNNs. ((c), (d), and (e) are adapted by permission from Springer Nature: Nature Photonics, Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit, Zhou, T. et al., © 2021)
86 M. Lou and W. Gao
5 Diffractive Optical Neural Networks
prevision sections. Scheme (ii) represents a canonical 4f Fourier imaging processing system, where a diffractive layer spatially modifies the wavefront. The output image is further processed through electronic neural networks for performing ML tasks. Scheme (iii), which is similar to Scheme (ii), further removes any diffractive layer and relies only on the free-space propagation for photonic extreme ML [56]. In Schemes (i) to (iii), the encoding method of input images is fixed. Instead, Wu et al. propose Scheme (iv) that electronic neural networks are trained to encode input images so that the optical free-space propagation can perform ML tasks [55]. Schemes (ii) to (iv) are hybrid optoelectronic systems, and the reconfigurability can be implemented by updating electronic systems. However, strictly speaking, these are not reconfigurable optical systems. Miscuglio et al. experimentally demonstrate some level of optical reconfigurability in Scheme (ii) as shown in Fig. 5.7b [57]. Specifically, a digital micromirror device (DMD), which is one type of SLM for amplitude-only modulation, is used as a diffractive layer. The electronic system is trained to perform ML tasks by reading out images captured by the camera and sending the control signal to DMD. The experimental results show the classification accuracies of 98% for the MNIST dataset and 54% for the CIFAR10 dataset. However, the computation is still heavily performed in the electrical domain, which significantly compromises the advantages of optical computing such as high energy efficiency and parallelism. Zhou et al. have demonstrated a major advance of implementing large-scale reconfigurable DONNs by replacing diffractive layers shown in Scheme (i) of Fig. 5.7a with an electrically controllable liquid-crystal SLM [58]. Specifically, the authors construct a reconfigurable diffractive processing unit (DPU) consisting of a DMD, a phase-only SLM, and a CMOS detector, as shown in Fig. 5.7c. The DMD encodes input images; the SLM is a reconfigurable diffractive layer; and the CMOS reads out the output image to the electrical
87
domain. Multiple DPU blocks can be utilized to construct complex neural network architectures, such as a hybrid DONNs system shown in Fig. 5.7d. In contrast to all-optical DONNs, there are electrical-to-optical (E/O) and opticalto-electrical (O/E) conversions between layers. The CMOS readout from the DPU in the previous layer is electronically processed to control the DMD and SLM from the DPU in the next layer. In addition, nonlinear activation is applied during these E/O and O/E conversions. But the system reconfigurability is achieved by SLMs in the optical domain. Although the demonstrated system is still hybrid or optoelectronic, an advantage is its versatility to construct different architectures, such as diffractive recurrent neural networks (DRNNs) shown in Fig. 5.7e. However, the disadvantages from E/O and O/E conversions include increased energy consumption and processing latency, as well as the discrepancy between models and physical systems, which limits the reconfigurability. The model inaccuracy and physics-aware training for correction are discussed in Sect. 5.8.
5.8
Model Inaccuracy and Physics-Aware Training
In addition to hardware imperfection, there are always discrepancies between calculation models and hardware systems, which can also lead to the deployment errors of trained models to experimental systems. This section first describes our recent discovery of some origins of the model inaccuracy, which come from the over-simplification of the electromagnetic wave interaction and propagation in diffractive layers and free space in the analytical model described in Sect. 5.2.1. Although the hardware-imperfection training approaches described in Sect. 5.3 can provide some model resilience, they are typically at the sacrifice of system performance. This section further introduces two physics-aware training frameworks to minimize the deployment errors from both models and hardware.
88
5.8.1
M. Lou and W. Gao
Model Inaccuracy
The analytical model described in Sect. 5.2.1 does not consider any multiple reflection-diffraction effect circulating between diffractive layers as shown in Fig. 5.8a. Lou et al. incorporate this effect through transfer matrix method and iterative calculations as a modified trainable model. The DONNs system is trained with the MNIST dataset using the analytical model without reflection, and the accuracy drops when the trained model is evaluated using the modified model with reflection. Figure 5.8b displays the accuracy drop as a function of a roundtrip reflected energy ratio over the total energy (.αRT ) for different trained diffractive masks with different material indices. The reflection effect is negligible with low-index materials such as 3D printing polymers with the refractive index .∼1.7 in THz DONNs systems [15, 59]. However, it is substantial with high-index materials, which are generally involved in advanced DONNs. For example, the metasurfaces made of highindex dielectric materials can enable compact and high-density DONNs systems as described in Sect. 5.4. In addition, reconfigurable DONNs can be constructed with emerging chalcogenide phase change materials, which have THz indices .>10 [60] and visible-near-infrared indices .>3 [61]. Thus, it is crucial to consider reflection effect when developing next-generation compact and multifunctional DONNs. In addition, Lou et al. analyze the interpixel interaction effect by comparing the classification accuracies obtained from the analytical model and full-wave FDTD simulations, which can precisely describe experiments [62]. For a twolayer DONNs system, Fig. 5.8c demonstrates that a high accuracy needs a large pixel complexity, which represents a fast-varying spatial response of trained diffractive layers. However, the large complexity leads to a low matching rate between the analytical model and FDTD simulations, which is because the pixel optical response is considerably affected by neighboring pixels and deviates from the periodic pixel assumption in
the analytical model. Deep DONNs can help to break down such trade-off. As shown in Fig. 5.8d, the pixel complexity for a one-layer DONNs system is much larger than that in a five-layer DONNs system to achieve similar high classification accuracies. But the matching rate and thus experimental deployment of the fivelayer DONNs system are better, highlighting the “depth” advantage in achieving high accuracy and accurate deployment simultaneously.
5.8.2
Physics-Aware Training
As discussed before, both hardware-imperfection and model inaccuracy can lead to deployment errors. In order to correct these errors, the training processes incorporating hardware physics emerge as effective approaches. As shown in the flowchart in Fig. 5.9a, Zhou et al. develop an adaptive training approach to circumvent system deployment errors in their reconfigurable hybrid neural networks as discussed in Sect. 5.7 [58]. Specifically, the in silico trained model is deployed onto the practical system, and the in situ experimental outputs from each DPU CMOS sensor are utilized to update the trained model during the forward propagation. As shown in Fig. 5.9b, the direct deployment of trained model leads to a substantial accuracy drop because of the accumulated errors between models and physical systems as light propagates. In contrast, the adaptive training process successfully recovers a high accuracy. Furthermore, as illustrated in Fig. 5.9c, Wright et al. introduce a generic hybrid in situ-in silico physics-aware training algorithm that takes experimentally measured intermediate physical quantities during forward propagation (.fp ) into the computation of backpropagation gradients (.gθ ) with the approximate model (.fm ) [63]. This approach allows training any controllable physical systems, even when physical layers lack any mathematical isomorphism to conventional neural network layers.
5 Diffractive Optical Neural Networks
Fig. 5.8 Model inaccuracy. (a) Illustration of interlayer reflection effect. (b) Accuracy drop as a function of average round-trip power ratio .αRT, avg for multiple trained diffractive masks with different refractive indices. (c) Classification accuracy obtained from analytical and FDTD approaches and their matching rate as a function of the pixel complexity of diffractive layers. (d) Classi-
5.9
All-Optical Reconfigurable DONNs
This section describes our latest demonstration of an all-optical reconfigurable DONNs system based on cascaded liquid-crystal SLMs as shown in Fig. 5.10a [64]. The system is exactly the fully reconfigurable version of Scheme (i) in Fig. 5.7a without any electrical-optical conversions. The fullstack implementation of software and hardware has considered the imperfections from both models and systems. Thus, the in silico trained model can be accurately deployed to the physical system without the need for further tuning. Specifically, the diffraction model is modified to be system-specific through a convolutional Fresnel method [65].
89
fication accuracy obtained from the analytical model and FDTD simulations, as well as diffractive layer complexity as a function of depth. (Adapted by permission from Optica Publishing Group: Optics Letters, Effects of interlayer reflection and interpixel interaction in diffractive optical neural networks, Lou, M. et al., © 2022)
Furthermore, each liquid-crystal SLM has a coupled amplitude and phase modulation. Figure 5.10b displays experimentally measured nonmonotonic and coupled modulation curves under 256 discrete grey levels, which break down the backpropagation algorithm. In order to incorporate such device response, the authors develop a device-specific physics-aware training approach through differentiable discrete mapping based on the categorical reparameterization with Gumbel-Softmax, as illustrated in Fig. 5.10c. This approach can incorporate arbitrary device response. In addition, the full reconfigurability of this system enables fast and precise pixel-bypixel optical alignment. With the accurate diffraction calculation, device-specific physics-aware training, and
Fig. 5.9 Physics-aware training. (a) Flowchart of the proposed adaptive training approach. (b) Convergence plots of hybrid optoelectronic DONNs evaluated on the MNIST test dataset. The blue plot shows the pre-training process. Orange, brown, and yellow plots represent the adaptive training with full, 20%, and 2% training sets, respectively. ((a) and (b) are adapted by permission from Springer Nature: Nature Photonics, Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit, Zhou, T. et al., © 2021). (c) Schematic of the full training
loop for the physics-aware training algorithm applied to arbitrary physical neural networks. fp is the physical forward function, fm is the approximate forward function in the mode, and gθ is the gradient of the loss function with respect to parameter θ . (Adapted under a Creative Commons Attribution 4.0 International License from Springer Nature: Nature, Deep physical neural networks trained with backpropagation, Wright, L.G. et al., © 2022)
90 M. Lou and W. Gao
Fig. 5.10 All-optical reconfigurable DONNs. (a) A photo of the experimental system of all-optical reconfigurable DONNs. Reconfigurable diffractive layers (RDAs) are based on liquid-crystal SLMs. (b) The coupled amplitude and phase modulation responses for used SLMs. (c) Illustration of implementing differentiable discrete complex mapping via Gumbel-Softmax for device-specific physics-aware training. (d) An input image of a handwritten digit 1 from the MNIST dataset, experimentally measured
diffraction pattern and corresponding intensity distribution, and calculated diffraction pattern and corresponding intensity distribution. (e) Confusion matrices of the computer-trained model and experimental measurement. (Adapted by permission from Wiley: Laser & Photonics Reviews, Physics-Aware Machine Learning and Adversarial Attack in Complex-Valued Reconfigurable Diffractive All-Optical Neural Network, Chen, R. et al., © 2022)
5 Diffractive Optical Neural Networks 91
92
M. Lou and W. Gao
precise hardware alignment, the trained gray levels of SLMs can be fast and precisely deployed on the experiment setup. The input images of three digits 1, 2, and 7 from the MNIST dataset are used for training. Figure 5.10d displays an excellent agreement between experimentally measured and calculated output images and corresponding optical intensity distribution. Furthermore, the confusion matrices and accuracies obtained from calculations and experiments in Fig. 5.10e also match well.
5.10
Summary
This chapter has described the fundamentals and current development of both passive and actively reconfigurable DONNs systems. Various implementations of diffractive layers at different wavelengths, such as THz and visible dielectric components, compact and integrated metamaterials, and active spatial light modulators, as well as versatile system architectures have been proposed and experimentally demonstrated to enable highperformance and multifunctional DONNs systems. Furthermore, the deployment errors from hardware imperfections and models as well as the training strategies to correct these errors have been introduced. We believe that the future development of DONNs systems will focus on reducing the footprint of individual diffractive pixels and overall architecture and enhancing the capability of handling sophisticated ML tasks, which will require not only fast-computed, trainable, and accurate physical models but also new material platforms and their enabling innovative devices. Acknowledgments W.G. acknowledges the support from National Science Foundation through Grant No. 2316627.
References 1. Y. LeCun, Y. Bengio, G. Hinton, Nature 521 (7553), 436 (2015). DOI https://doi.org/10.1038/ nature14539
2. I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep learning, vol. 1 (MIT press Cambridge, 2016) 3. S.P. Rodrigues, Z. Yu, P. Schmalenberg, J. Lee, H. Iizuka, E.M. Dede, Nat. Photonics 15(2), 66 (2021). DOI https://doi.org/10.1038/s41566-02000736-0 4. K.T. Butler, D.W. Davies, H. Cartwright, O. Isayev, A. Walsh, Nature 559(7715), 547 (2018). DOI https:// doi.org/10.1038/s41586-018-0337-2 5. A.W. Senior, R. Evans, J. Jumper, J. Kirkpatrick, L. Sifre, T. Green, C. Qin, A. Žídek, A.W. Nelson, A. Bridgland, et al., Nature 577(7792), 706 (2020). DOI https://doi.org/10.1038/s41586-019-1923-7 6. A. Mirhoseini, A. Goldie, M. Yazgan, J.W. Jiang, E. Songhori, S. Wang, Y.J. Lee, E. Johnson, O. Pathak, A. Nazi, et al., Nature 594(7862), 207 (2021). DOI https://doi.org/10.1038/s41586-02103544-w 7. T.N. Theis, H.S.P. Wong, Computing in Science & Engineering 19(2), 41 (2017). DOI https://doi.org/ 10.1109/MCSE.2017.29 8. C.E. Leiserson, N.C. Thompson, J.S. Emer, B.C. Kuszmaul, B.W. Lampson, D. Sanchez, T.B. Schardl, Science 368(6495), eaam9744 (2020). DOI https:// doi.org/10.1126/science.aam974 9. G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljaˇci´c, C. Denz, D.A. Miller, D. Psaltis, Nature 588(7836), 39 (2020). DOI https://doi.org/10.1038/ s41586-020-2973-6 10. Y. Shen, N.C. Harris, S. Skirlo, M. Prabhu, T. BaehrJones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, et al., Nat. Photonics 11(7), 441 (2017). DOI https://doi.org/10.1038/nphoton.2017.93 11. N.C. Harris, J. Carolan, D. Bunandar, M. Prabhu, M. Hochberg, T. Baehr-Jones, M.L. Fanto, A.M. Smith, C.C. Tison, P.M. Alsing, et al., Optica 5(12), 1623 (2018). DOI https://doi.org/10.1364/OPTICA. 5.001623 12. Z. Ying, C. Feng, Z. Zhao, S. Dhar, H. Dalir, J. Gu, Y. Cheng, R. Soref, D.Z. Pan, R.T. Chen, Nat. Commun. 11, 2154 (2020). DOI https://doi.org/10.1038/ s41467-020-16057-3 13. H. Zhang, M. Gu, X. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. Yung, et al., Nat. Commun. 12, 457 (2021). DOI https://doi.org/10.1038/s41467-020-20719-7 14. J. Feldmann, N. Youngblood, M. Karpov, H. Gehring, X. Li, M. Stappers, M. Le Gallo, X. Fu, A. Lukashchuk, A. Raja, et al., Nature 589(7840), 52 (2021). DOI https://doi.org/10.1038/ s41586-020-03070-1 15. X. Lin, Y. Rivenson, N.T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, A. Ozcan, Science 361(6406), 1004 (2018). DOI https://doi.org/10.1126/science.aat8084 16. Y. Luo, D. Mengu, N.T. Yardimci, Y. Rivenson, M. Veli, M. Jarrahi, A. Ozcan, Light: Science & Applications 8(1), 1 (2019). DOI https://doi.org/10. 1038/s41377-019-0223-1
5 Diffractive Optical Neural Networks 17. D. Mengu, Y. Luo, Y. Rivenson, A. Ozcan, IEEE Journal of Selected Topics in Quantum Electronics 26(1), 1 (2019). DOI https://doi.org/10.1109/JSTQE. 2019.2921376 18. S. Jiao, J. Feng, Y. Gao, T. Lei, Z. Xie, X. Yuan, Opt. Lett. 44(21), 5186 (2019). DOI https://doi.org/10. 1364/OL.44.005186 19. R. Hamerly, L. Bernstein, A. Sludds, M. Soljaˇci´c, D. Englund, Phys. Rev. X 9(2), 021032 (2019). DOI https://doi.org/10.1103/PhysRevX.9.021032 20. L. Mennel, J. Symonowicz, S. Wachter, D.K. Polyushkin, A.J. Molina-Mendoza, T. Mueller, Nature 579(7797), 62 (2020). DOI https://doi.org/10. 1038/s41586-020-2038-x 21. J. Spall, X. Guo, T.D. Barrett, A. Lvovsky, Opt. Lett. 45(20), 5752 (2020). DOI https://doi.org/10.1364/ OL.401675 22. L. Bernstein, A. Sludds, R. Hamerly, V. Sze, J. Emer, D. Englund, Sci. Rep. 11, 3144 (2021). DOI https:// doi.org/10.1038/s41598-021-82543-3 23. W. Gao, C. Yu, R. Chen, Advanced Photonics Research p. 2100048 (2021). DOI https://doi.org/10. 1002/adpr.202100048 24. F. Léonard, A.S. Backer, E.J. Fuller, C. Teeter, C.M. Vineyard, ACS Photonics 8(7), 2103 (2021). DOI https://doi.org/10.1021/acsphotonics.1c00526 25. F. Léonard, E.J. Fuller, C.M. Teeter, C.M. Vineyard, Opt. Express 30(8), 12510 (2022). DOI https://doi. org/10.1364/OE.455007 26. T. Wang, S.Y. Ma, L.G. Wright, T. Onodera, B.C. Richard, P.L. McMahon, Nat. Commun. 13, 123 (2022). DOI https://doi.org/10.1038/s41467-02127774-8 27. H. Zeng, J. Fan, Y. Zhang, Y. Su, C. Qiu, W. Gao, Opt. Express 30(8), 12712 (2022). DOI https://doi.org/10. 1364/OE.453363 28. Y. Tang, P.T. Zamani, R. Chen, J. Ma, M. Qi, C. Yu, W. Gao, Laser Photonics Rev. p. 2200381 (2022). DOI https://doi.org/10.1002/lpor.202200381 29. D. Mengu, M.S.S. Rahman, Y. Luo, J. Li, O. Kulce, A. Ozcan, Adv. Opt. Photonics 14(2), 209 (2022). DOI https://doi.org/10.1364/AOP.450345 30. T. Fu, Y. Zang, H. Huang, Z. Du, C. Hu, M. Chen, S. Yang, H. Chen, Opt. Express 29(20), 31924 (2021). DOI https://doi.org/10.1364/OE.435183 31. Q. Song, X. Liu, C.W. Qiu, P. Genevet, Appl. Phys. Rev. 9(1), 011311 (2022). DOI https://doi.org/10. 1063/5.0078610 32. R. Rojas, in Neural networks (Springer, 1996), pp. 149–182. DOI https://doi.org/10.1007/978-3-64261068-4_7 33. D. Mengu, Y. Rivenson, A. Ozcan, ACS Photonics 8(1), 324 (2020). DOI https://doi.org/10.1021/ acsphotonics.0c01583 34. D. Mengu, Y. Zhao, N.T. Yardimci, Y. Rivenson, M. Jarrahi, A. Ozcan, Nanophotonics 9(13), 4207 (2020). DOI https://doi.org/10.1515/nanoph-20200291
93 35. Y. Li, Z. Zheng, R. Li, Q. Chen, H. Luan, H. Yang, Q. Zhang, M. Gu, Opt. Express 30(20), 36700 (2022). DOI https://doi.org/10.1364/OE.468648 36. M. Tonouchi, Nat. Photonics 1(2), 97 (2007). DOI https://doi.org/10.1038/nphoton.2007.3 37. S. Jahani, Z. Jacob, Nat. Nanotechnol. 11(1), 23 (2016). DOI https://doi.org/10.1038/nnano.2015.304 38. H. Chen, J. Feng, M. Jiang, Y. Wang, J. Lin, J. Tan, P. Jin, Engineering 7(10), 1483 (2021). DOI https:// doi.org/10.1016/j.eng.2020.07.032 39. E. Goi, X. Chen, Q. Zhang, B.P. Cumming, S. Schoenhardt, H. Luan, M. Gu, Light Sci. Appl. 10, 40 (2021). DOI https://doi.org/10.1038/s41377-021-00483-z 40. X. Luo, Y. Hu, X. Ou, X. Li, J. Lai, N. Liu, X. Cheng, A. Pan, H. Duan, Light Sci. Appl. 11, 158 (2022). DOI https://doi.org/10.1038/s41377-022-00844-2 41. G. Zheng, H. Mühlenbernd, M. Kenney, G. Li, T. Zentgraf, S. Zhang, Nat. Nanotechnol. 10(4), 308 (2015). DOI https://doi.org/10.1038/nnano.2015.2 42. A. Silva, F. Monticone, G. Castaldi, V. Galdi, A. Alù, N. Engheta, Science 343(6167), 160 (2014). DOI https://doi.org/10.1126/science.1242818 43. L. Wan, D. Pan, T. Feng, W. Liu, A.A. Potapov, Front. Optoelectron. 14(2), 187 (2021). DOI https://doi.org/ 10.1007/s12200-021-1124-5 44. J. Sol, D.R. Smith, P. Del Hougne, Nat. Commun. 13, 1713 (2022). DOI https://doi.org/10.1038/s41467022-29354-w 45. W. Fu, D. Zhao, Z. Li, S. Liu, C. Tian, K. Huang, Light Sci. Appl. 11, 62 (2022). DOI https://doi.org/ 10.1038/s41377-022-00752-5 46. V. Mazzia, F. Salvetti, M. Chiaberge, Sci. Rep. 11, 14634 (2021). DOI https://doi.org/10.1038/s41598021-93977-0 47. J. Li, D. Mengu, Y. Luo, Y. Rivenson, A. Ozcan, Adv. Photon. 1(4), 046001 (2019). DOI https://doi.org/10. 1117/1.AP.1.4.046001 48. M.S.S. Rahman, J. Li, D. Mengu, Y. Rivenson, A. Ozcan, Light Sci. Appl. 10, 14 (2021). DOI https://doi. org/10.1038/s41377-020-00446-w 49. Y. Li, R. Chen, B. Sensale-Rodriguez, W. Gao, C. Yu, Sci. Rep. 11, 11013 (2021). DOI https://doi.org/10. 1038/s41598-021-90221-7 50. J. Li, Y.C. Hung, O. Kulce, D. Mengu, A. Ozcan, Light Sci. Appl. 11, 153 (2022). DOI https://doi.org/ 10.1038/s41377-022-00849-x 51. J. Li, D. Mengu, N.T. Yardimci, Y. Luo, X. Li, M. Veli, Y. Rivenson, M. Jarrahi, A. Ozcan, Sci. Adv. 7(13), eabd7690 (2021). DOI https://doi.org/10.1126/ sciadv.abd7690 52. S. Zarei, M.r. Marzban, A. Khavasi, Opt. Express 28(24), 36668 (2020). DOI https://doi.org/10.1364/ OE.404386 53. Z. Wang, L. Chang, F. Wang, T. Li, T. Gu, Nat. Commun. 13, 2131 (2022). DOI https://doi.org/10. 1038/s41467-022-29856-7 54. H. Zhu, J. Zou, H. Zhang, Y. Shi, S. Luo, N. Wang, H. Cai, L. Wan, B. Wang, X. Jiang, et al., Nat. Com-
94
55. 56.
57.
58.
59.
60.
M. Lou and W. Gao mun. 13, 1044 (2022). DOI https://doi.org/10.1038/ s41467-022-28702-0 L. Wu, Z. Zhang, Opt. Express 30(15), 28024 (2022). DOI https://doi.org/10.1364/OE.462370 D. Pierangeli, G. Marcucci, C. Conti, Photonics Res. 9(8), 1446 (2021). DOI https://doi.org/10.1364/PRJ. 423531 M. Miscuglio, Z. Hu, S. Li, J.K. George, R. Capanna, H. Dalir, P.M. Bardet, P. Gupta, V.J. Sorger, Optica 7(12), 1812 (2020). DOI https://doi.org/10.1364/ OPTICA.408659 T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, Q. Dai, Nat. Photonics 15(5), 367 (2021). DOI https://doi.org/10.1038/s41566-02100796-w M. Lou, Y.L. Li, C. Yu, B. Sensale-Rodriguez, W. Gao, Opt. Lett. 48(2), 219 (2023). DOI https:// doi.org/10.1364/OL.477605 K. Makino, K. Kato, Y. Saito, P. Fons, A.V. Kolobov, J. Tominaga, T. Nakano, M. Nakajima, J. Mater.
61.
62.
63.
64.
65.
Chem. C. 7(27), 8209 (2019). DOI https://doi.org/ 10.1039/C9TC01456J M. Wuttig, H. Bhaskaran, T. Taubner, Nat. Photonics 11(8), 465 (2017). DOI https://doi.org/10.1038/ nphoton.2017.126 M. Mansouree, A. McClung, S. Samudrala, A. Arbabi, ACS Photonics 8(2), 455 (2021). DOI https://doi.org/10.1021/acsphotonics.0c01058 L.G. Wright, T. Onodera, M.M. Stein, T. Wang, D.T. Schachter, Z. Hu, P.L. McMahon, Nature 601(7894), 549 (2022). DOI https://doi.org/10.1038/s41586021-04223-6 R. Chen, Y. Li, M. Lou, J. Fan, Y. Tang, B. SensaleRodriguez, C. Yu, W. Gao, Laser Photonics Rev. p. 2200348 (2022). DOI https://doi.org/10.1002/lpor. 202200348 G. Vdovin, H. van Brug, F. van Goor, in Fifth International Topical Meeting on Education and Training in Optics, Delft, The Netherlands (1997), pp. 19–21. DOI https://doi.org/10.1117/12.294366
6
Zone Plate-Coded Imaging Jiachen Wu and Liangcai Cao
Abstract
Keywords
With the developments of photoelectric imaging technology and the improvements in computing power, the core architecture of imaging systems could be shifted from the frontend hardware equipment to the back-end computing reconstruction technology, forming the field of computational optical imaging. As a branch of computational optical imaging, zone plate-coded imaging uses zone plates instead of lenses to encode scene images, which can significantly reduce the volume and weight of the imaging system and has broad application prospects in embedded systems, wearable devices, distributed sensors, etc. At present, zone plate-coded imaging has the challenge in terms of imaging quality, and the development of image reconstruction algorithms is expected to break through the bottleneck of zone platecoded imaging. This chapter introduces the principle of zone plate-coded imaging and focuses on the twin image elimination methods to improve imaging quality. The compressive reconstruction approach is also illustrated, which paves the way for image fusion based on multiple sensors.
Computational imaging · Coded aperture · Fresnel zone plate · Lensless imaging · Incoherent holography · Twin image · Compressive sensing · Total variation regularization
J. Wu · L. Cao (O) Department of Precision Instruments, Tsinghua University, Beijing, China e-mail: [email protected]; [email protected]
6.1
Overview
The Fresnel zone plate (FZP) is one of the earliest coded apertures for imaging incoherent radiation such as X-rays and gamma rays. In 1950, Rogers noted the similarity between FZPs and pointsource holograms and considered holograms to be generalized zone plates with complex patterns [1]. In 1961, Mertz and Young proposed using the FZP as the coded aperture [2]. Under incoherent illumination, encoded images by FZP are similar to Gabor holograms, and then the images can be decoded through hologram reconstruction. This method, also known as zone plate-coded imaging (ZPCI), has been widely used in fields such as astronomy [3], nuclear medicine [4], laser inertial confinement fusion [5], and X-ray fluorescence microscope [6]. Although the ZPCI technology has been successfully applied in the field of radiation imaging, it cannot be directly extended to visible wavelengths. On one hand, because the radiation source is usually sparse, the twin image generated by holographic reconstruction is not enough
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_6
95
96
J. Wu and L. Cao
to interfere with image recognition. On the other hand, the purpose of radiation imaging is mostly to obtain the location and size of the radiation source. Therefore, the demand for image resolution is not high. In order to improve the imaging resolution, it is necessary to reduce the feature size of the mask. However, reducing the feature size in the visible wavelengths may cause a significant diffraction effect and make the image reconstruction method ineffective. Therefore, novel computational reconstruction methods are widely explored for ZPCI in visible wavelengths. Shimano et al. from Hitachi were the first to extend ZPCI to visible wavelengths [7–10]. In order to eliminate inherent twin image noise in the holographic reconstruction, this method refers to the fringe scanning technology in interferometry and needs to capture at least four FZP-coded images with different phases to realize the reconstruction without the twin image. Wu et al. proposed a single-shot method that uses total variation (TV) regularization to eliminate the twin image [11, 12]. To improve the imaging resolution, Wu et al. incorporated the diffraction effect into the forward model and used deep learning to reconstruct highquality images [13]. Nakamura et al. proposed a resolution improvement method by color-channel synthesis [14]. As a kind of computational optical imaging, ZPCI can significantly reduce the volume and weight of the imaging system and has broad application prospects in surveillance systems, embedded systems, wearable devices, etc. This chapter introduces Fresnel ZPCI from multiple perspectives, such as imaging principles, reconstruction methods, and resolution analysis.
6.2
Principle of Recording and Reconstruction
According to the continuity of transmittance distribution, the zone plate can be divided into the Gabor zone plate (GZP) and the FZP. The GZP has continuously varying transmittance, as shown in Fig. 6.1a. Its transmittance function is
T (r) =
.
( 2) 1 1 r + cos π 2 . 2 2 r1
(6.1)
Here, r1 is the constant of the zone plate and represents the radius of the innermost zone. r is the radial distance from the center of the zone plate. However, it is difficult to fabricate the mask with continuously varying transmittance. The FZP, which has binary transmittance, is usually used instead of the GZP to realize the function of the coding aperture, as shown in Fig. 6.1b. The cost of that is the high-order diffraction of the FZP can form a defocused image on the main focal plane, resulting in the decline of reconstructed image quality. In 1992, Beynon et al. proposed a binarized GZP [15], which greatly reduced the difficulty of GZP production while retaining the focusing characteristics of GZP. Beynon et al. believed that the zone width of binary zone plates could vary not only in the radial direction but also in different azimuth angles. The binarization of GZP can be realized by adjusting the binarization transmittance of different azimuths, so that the integral along the circumference is consistent with the transmittance of the corresponding radius of GZP. Specifically, a zone plate is divided into several sectors, and then the proportion of transparent arc length under different radii on each sector to the whole arc length is adjusted according to GZP transmittance. The resulting zone plate is also called the Hampton court zone plate, because it resembles the astronomical clock at Hampton Court Palace, as shown in Fig. 6.1c. This kind of zone plate has an obvious spoke-like structure, which can lead to artifacts in the reconstructed image. To solve this problem, a rotated binary GZP was designed, which rotates half a sector of adjacent zones and eliminates the spoke-like structure, as shown in Fig. 6.1d. The structure of ZPCI and the workflow are shown in Fig. 6.2a. The imaging workflow consists of two steps: recording and reconstruction. In the first step, a film or image sensor is used to record the incident rays coming from a radiation source. Each point in the radiation source casts a shadow of the zone plate on the recording plane. Suppose the distance between the object and the
6 Zone Plate-Coded Imaging
97
Fig. 6.1 Different zone plate patterns. (a) Gabor zone plate. (b) Fresnel zone plate. (c) binary Gabor zone plate. (d) Rotated binary Gabor zone plate
FZP is z1 , and the distance between the zone plate and the recording plane is z2 . The zone plate ' constant r1 of the zone plate shadow is ( ) z2 r' = 1 + r1 . z1
. 1
(6.2)
When z1 >> z2 , the object is far enough away from the zone plate and the recording plane, it has ' r1 ≈ r1 . The recorded pattern can be represented as [ ( )] N 1E π 2 I (r) . = Ik 1 + cos 2 |r − rk | , 2 k r1
(6.3)
where I(r) is the light intensity on the image sensor plane; Ik is the light intensity of the kth point light source; r represents the position vector on the image sensor plane; rk represents the displacement vector of the shadow center under the illumination of the kth point light source; and N is the number of point light sources. All the shadows superimpose to form the coded image, as shown in Fig. 6.2b. From the perspective of holography, each zone plate shadow can be thought of as a point source hologram encoding the spatial position and intensity of the point light source, and then the coded image is similar to the inline hologram. Therefore, the zone plate acts as a hologram encoder for an incoherent source. The only difference is that the in-line hologram is the complex amplitude superposition of the point source hologram, while the ZPCI is the intensity superposition. The second step is to illuminate the developed film by collimated coherent source, or numer-
ically calculate the light field after diffraction propagation of the encoded image. The shadow of each FZP focuses the incident coherent source onto a spot to achieve the reproduction of the original image, as shown in Fig. 6.2c. In Fresnel diffraction, the diffracted light field can be expressed as .OR
(ro ) =
exp (i2π d/λ) iλd
[
ff I (r) exp
] iπ |r − ro |2 dS, λd
(6.4) where λ and d represent the wavelength and distance used to reconstruct the original image, respectively; r0 is a position vector on the reconstructed plane; and dS is the tiny facet on the coded image. To obtain the light field at the focal plane of the zone plate, λ and d should satisfy the equation: .r12 = λd. Suppose the integral area is infinite, the light field can be calculated as
.
( ) ff N E 1 iπ 2 |r | exp dS · − r OR (ro ) = Ik o 2 r12 k ] [ ff N ) 1E iπ ( exp 2 |r − ro |2 −|r − rk |2 dS + Ik 4 r k ] [ 1 f f N ) 1E iπ ( 2 2 dS, exp 2 |r − ro | +|r − rk | Ik + 4 r1 k = +
N ir 21 E
2 ir 21 8
N r4 E Ik + 1 Ik δ (ro − rk ) 4 k k ) ( N E iπ 2 |ro − rk | Ik exp 2r12 k
(6.5)
98
J. Wu and L. Cao
Fig. 6.2 (a) ZPCI system structure and workflow. (b) Recording process. (c) Reconstruction process. (Adapted by permission from China Science Publishing & Media
Ltd.: Acta Photonica Sinica, Fresnel zone aperture lensless imaging by compressive sensing (invited), Wu-Jiachen, et al., © 2022)
The first term on the right-hand side of the expression is constant and is proportional to the total intensity of the object. The second term is a set of image points appearing at the same location as the geometric image points, which reproduce the image of the original object. The third term is the superposition of a series of spherical waves with a propagation distance of 2d, which can be regarded as the defocused image of the original object. This is also called the twin image in holographic reconstruction. Twin image is an inherent problem in holographic reconstruction. It obfuscates the reconstructed image and reduces the image quality. Twin images are usually removed by experimental means, such as employing an offaxis setup or using a phase shifting approach, which requires changing the experimental optical path or introducing additional modulation
devices. Zhang et al. proposed a compressive sensing approach to eliminate the twin image at the algorithm level [16]. This method can be effectively transplanted to ZPCI, which will be introduced in the next section.
6.3
Twin Image Elimination
The twin image is essentially the defocused image of the original image, and the image reconstructed by the back propagation method is the superposition of the defocused image and the focused image. The edge of the focused image is clear and sharp, while the other regions have a smooth transition. The defocused image has diffraction fringes at the edge of the image, and the diffraction fringes spread to the whole image with the increase of the propagation distance. As reflected
6 Zone Plate-Coded Imaging
99
Fig. 6.3 Sparsity comparison of focused and defocused images in the gradient domain. (Adapted by permission from China Science Publishing & Media Ltd.: Acta Pho-
tonica Sinica, Fresnel zone aperture lensless imaging by compressive sensing (invited), Wu-Jiachen, et al., © 2022)
in the gradient of the image, most of the gradient of the focused image tends to zero value, presenting sparsity, while the gradient of defocusing image is non-sparse. Therefore, the twin image in reconstruction can be suppressed by applying gradient sparse prior constraints, which is called TV regularization.
of the image. For a digital image, TV has the following discrete form TV(f ) =
.
E || || ||∇fm,n ||
2
m,n
=
E /| | | | |fm+1,n − fm,n |2 + |fm,n+1 − fm,n |2 , m,n
(6.7)
6.3.1
Total Variation Regularization
For a two-dimensional continuous function, TV is defined as the integral of function gradient amplitude: f |∇f (x, y)| dxdy
TV (f, o) =
.
where m and n represent the pixel indices of a two-dimensional discrete image. TV in the above definition is isotropic. Due to the existence of the square root, it is complicated to solve in practice. To facilitate calculation, the anisotropic total variation version based on the l1 norm can be adopted, which is defined as
o
=
f / fx2 + fy2 dxdy,
(6.6)
o
where fx and fy are the partial derivatives of the function along the horizontal and vertical directions respectively, and o is the supporting domain
.TVaniso (f )
=
E [| | | |] |fm+1,n − fm,n | + |fm,n+1 − fm,n | . m,n
(6.8) Figure 6.3 shows the gradient of defocused and focused images and the corresponding his-
100
J. Wu and L. Cao
togram distribution. The focused image has obvious sparse characteristics in the gradient domain. To reconstruct the image with TV regularization, the forward model of ZPCI should be built. The coded image can be represented as the convolution of the image of the object and the zone plate
operator combining above operation, on behalf of the forward model of the ZPCI system. According to the forward model, the following optimization objective function can be constructed to achieve the reconstruction of the original image { xˆ = arg min
.
O
I . (x, y) = O (x, y) ∗ T (x, y) + e (x, y) , (6.9) where the symbol “*” represents the convolution operator. O represents the object image to be restored; T is the transmittance function of the zone plate, as shown in Eq. (6.1); e(x, y) represents the noise, including photodetector noise, ambient light noise, and error caused by diffraction effect, etc. If the cosine term of T(x, y) is expressed in the form of complex exponential, Eq. (6.9) can be rewritten as: 1 [O (x, y) ∗ h (x, y) 4 ] ∗ +O (x, y) ∗ h (x, y) + e (x, y) , . ] 1[ U (x, y) + U ∗ (x, y) + e (x, y) =C+ 4 1 = C + Re {U (x, y)} + e (x, y) 2 (6.10) I (x, y) = C +
[ ( ) 2 where .h (x, y) = exp i π/r1 ( 2 C2is)]constant, and x + y . When .r12 = λd, h(x, y) has the same expression as the impulse response function of Fresnel diffraction. Then U(x, y) can be regarded as the diffracted light field of the object. U* (x, y) is the conjugate wave of U(x, y). If ignoring the constant term and lexicographically sorting the image to vector, Eq. (6.10) can be rewritten as matrix vector multiplication y=
.
} { 1 Re F−1 HFx + e = Kx + e, 2
} 1 ||y − Kx||22 + τ TV (x) , 2 (6.12)
where τ is the regularization coefficient, which is used to control the weight of the regularization term in the whole objective function. By using the proximal gradient descent algorithm [17, 18] to solve Eq. (6.12), a twin-image-free reconstruction can be achieved. Figure 6.4 shows a Fresnel ZPCI system setup. The zone plate pattern is etched on the chrome plated mask, the mask is attached to the bare image sensor, and the distance between the zone plate and the image sensor is z2 = 3 mm. The screen is used to display the test image, and the distance from it to the zone plate is z1 = 30 cm. The zone plate constant r1 = 0.32 mm. The reconstructed images are shown in Fig. 6.5. A binary image and a grayscale image are used for comparison. The first column shows the normalized coded images. It can be seen that the coded images have a similar diffraction fringe structure to that of the holograms. The second column shows the Fresnel propagation results of coded images. The original image can be reconstructed correctly. However, the fringe noise, caused by the twin image, significantly degrades the reconstruction. The third column shows the reconstructed images by TV regularization, which illustrates the twin image, can be effectively suppressed.
(6.11)
where x and y represent the original image and the coded image, respectively; F and F−1 are the two-dimensional Fourier transform matrix and its inverse transform matrix; H is a diagonal matrix, and its elements are the discrete sampling values of the Fresnel diffraction transfer function; Re{·} is the operation of taking the real part; and K is an
6.3.2
Imaging Resolution
In the imaging model of Sect. 6.2, the size of the zone plate and image sensor is assumed to be infinite. However, in experiments, both zone plate and image sensor have finite sizes, which limited the spatial bandwidth product (SBP) of imaging. The SBP determines the upper limit of the reso-
6 Zone Plate-Coded Imaging
101
Fig. 6.4 (a) Experimental setup of Fresnel zone platecoded imaging. (b) Bare image sensor. (c) Image sensor assembled with zone plate. (Adapted by permission from
Optica: Optics Express, Explicit-restriction convolutional framework for lensless imaging, Ma-Yuchen, et al., © 2022)
Fig. 6.5 Comparison of reconstruction results. The twin image can be effectively eliminated by TV regularization. (Adapted by permission from Springer Nature: Light:
Science & Applications, Single-shot lensless imaging with Fresnel zone aperture and incoherent illumination, WuJiachen, et al., © 2020)
lution of the imaging system. Consider the image sensor can completely record the projection of the zone plate, while the pixel pitch satisfies the
sampling theorem. According to the additive of the linear system, the coded image is the superposition of the projection of the zone plate, and the
102
J. Wu and L. Cao
highest frequency of the coded image is limited by the highest frequency of the zone plate, while the highest frequency of the zone plate depends on the width of the outermost zone. The width of the outermost zone is related to the zone plate radius R and the zone plate constant r1 . When R is fixed, the decreasing constant r1 can reduce the width of the outermost zone, and improve imaging resolution. To quantitatively analyze imaging resolution, we substitute the aperture function A(r) = circ(r/R) into the integral of the second term of Eq. (6.5), and let rk = 0, Ik = 1.Then, the impulse response function of the imaging system after eliminating the twin image is, [ ] ) iπ ( exp 2 |r − ro |2 − |r|2 A (|r|) dS r1 ) ( , ) iπ 2 R ( 2 2π r r J R/r = exp 1 0 0 1 r0 r12 ff
The values of zone constant r1 from top to bottom are 0.8 mm, 0.5 mm, and 0.3 mm. The widths of the outermost band are 0.063 mm, 0.024 mm, and 0.009 mm, respectively. The second column shows the impulse response functions. The corresponding minimum resolved distances are 0.076 mm, 0.030 mm, and 0.011 mm. The last column shows the reconstructed images, which illustrate that the smaller r1 , the higher the quality of the reconstruction. It is worth noting that the resolution of ZPCI cannot be improved without limitation by reducing r1 , because the diffraction effect becomes prominent when the density of the zone increases, which causes model errors and leads to the decline of the reconstructed image quality.
IIRF (ro ) = .
6.4
(6.13) where J1 (·) is the first kind of first-order Bessel function. According to the Rayleigh criterion, the minimum distance to resolve two points is defined as the distance from the center of the impulse response function to the first zero. The impulse response function of Eq. (6.13) at the first zero is 0.61(r1 /R)r1 , so the minimum resolved distance of ZPCI is r = 0.61
. c
r12 . R
(6.14)
Furthermore, according to the relationship between the zone width and zone plate radius, the minimum resolved distance can be approximated as r = 1.22Ar,
. c
(6.15)
where Ar is the width of the outermost zone. It is revealed that the resolution of ZPCI can be estimated by the width of the outermost zone. The ZPCI results of different zone plate constants are presented in Fig. 6.6. The first column shows zone plate and the close-up of the outmost zone. The radius of FZP is R = 5.12 mm.
Compressive Reconstruction from Incomplete Measurements
According to the last section, the resolution of ZPCI is related to the zone plate constant and the radius of the zone plate. For the same zone plate, the larger sensor can receive the projected zone plate with a larger radius, so the resolution is higher. However, the high cost of large image sensors limits the popularization of this technology. In this section, a compressive sensing imaging model based on incomplete measurements is introduced, which can realize the reconstruction of high-quality images when partial measurement data is missing. This method provides a theoretical basis for the image fusion of multiple smallsize image sensors.
6.4.1
Compressive Sensing Theory
Compressive sensing theory holds that if the signal is sparse, the original signal can be recovered with nearly 100% probability by finding the sparse solution of the underdetermined linear system at far below Nyquist sampling frequency. There are two necessary conditions for high quality signal recovery: sparsity and incoherence. Sparsity refers to most of the values of the
6 Zone Plate-Coded Imaging
103
Fig. 6.6 Image resolution contrast of the ZPCI with different zone plate constants r1 . The left to right columns show the FZPs (the outermost zones are shown in the top left insets), the impulse response functions (the twodimensional distributions are shown in the top left insets),
and the reconstructed images. The values of r1 from top to bottom in rows are (a) 0.8 mm, (b) 0.5 mm, and (c) 0.3 mm. (Reprinted by permission from Springer Nature: Light: Science & Applications, Single-shot lensless imaging with Fresnel zone aperture and incoherent illumination, Wu-Jiachen, et al., © 2020)
signal or its transformation is zero. Natural images are generally sparse in the gradient domain as demonstrated in Sect. 6.3.1. The “incoherence” of compressive sensing requires that the coding process must be a one-to-many mapping relationship. To quantitatively describe the “incoherence” properties, Candes et al. proposed the concept of restricted isometry property (RIP) [19]. For any signal θ of length N and sparsity K, if the
following inequality can be established, .
(1 − δ) ||θ||22 ≤ ||Aθ||22 ≤ (1 + δ) ||θ||22 , (6.16)
then the sensing matrix A meets the RIP condition, where δ ∈ (0, 1). The RIP condition provides a standard for judging whether the imaging process is suitable for compressive sensing. However, the verifica-
104
J. Wu and L. Cao
(a) r1 = 0.8 mm
0.2
0 -0.5
Signal reconstruction error
(b)
0.5
-2
-1
0
1
GZP matrix - r1=0.8 mm GZP matrix - r1=0.7 mm GZP matrix - r1=0.6 mm GZP matrix - r1=0.5 mm Gaussian random matrix
2
0.5
0.15 0 -0.5
-2
-1
0
1
2
-2
-1
0
1
2
-2
-1
0
1
2
-0.5
r1 = 0.6 mm
RMSE
r1 = 0.7 mm
0.1
0 -0.5
0.05
0.5
r1 = 0.5 mm
0
0
-0.5
0
500
1000
1500
2000
Measured signal length Fig. 6.7 Comparison of signal recovery ability between the GZP measurement matrix and Gaussian random measurement matrix (a) One-dimensional GZP distributions with different zone plate constants, (b) Reconstruction error of GZP measurement matrix and Gaussian random
measurement matrix with signal length N = 2048 and nonzero element K = 200. (Reprinted by permission from China Science Publishing & Media Ltd.: Acta Photonica Sinica, Fresnel zone aperture lensless imaging by compressive sensing (invited), Wu-Jiachen, et al., © 2022)
tion of RIP condition requires .CNK times of calculation, which is impractical for most applications. The alternative method is to compare the signal recovery ability of the matrix with that of the Gaussian measurement matrix. Here, a convolution matrix corresponding to the one-dimensional radial function of the GZP is used as the measurement matrix to test the ability to reconstruct the signal. The signal has length N = 2048 and sparsity K = 200. The plate constants used for testing are 0.8 mm, 0.7 mm, 0.6 mm, and 0.5 mm. The results are shown in Fig. 6.7. It can be seen that the smaller r1 is, the smaller the reconstruction error is. When r1 = 0.5 mm, the reconstruction performance is almost the same as the Gaussian measurement matrix. This conclusion is consistent with the analysis results of ZPCI resolution in Sect. 6.3.2.
the convolution between the point spread function and the original image is a linear convolution without periodic extension. So linear convolutional coding with circular convolutional reconstruction results in model errors. To eliminate this model error, an equivalent relation between linear convolution and cyclic convolution should be established. Considering zero padding the P × P size image to Q × Q size, and the moving range of the convolution kernel boundary is in the zeropadding region, the part of its periodic extension falls in the zero-padding region, which does not affect the final convolution result. To sum up, the following conclusions can be drawn. First, zeropad P × P pixels image to Q × Q pixels. Second, circularly convolve it with Q × Q pixels point spread function. Finally, take the central (Q – P) × (Q – P) region of the convolution result. This region is equivalent to linear convolution between P × P pixels image and Q × Q pixels point spread function. In particular, when P = Q / 2, the equivalent region is the same size as the original image, as shown in Fig. 6.8. In other words, the P × P linear convolution can be regarded as an incomplete measurement of the central P × P region of the 2P × 2P circular convolution. Based on the conclusion, the objective function, i.e., Eq. (6.12), can be revised as
6.4.2
Convolution Model
In numerical calculation, the Fourier transform is usually used to calculate the convolution to accelerate operations. However, this calculation actually treats convolution as a circular convolution, which is a periodic extension of the convolution signal. In the actual imaging process,
6 Zone Plate-Coded Imaging
105
Fig. 6.8 Equivalent relationship between linear convolution and circular convolution
{
} 1 2 ˆ = arg min ||D (y − Kx)||o + τ TV (x) , .x x 2 (6.17) where D represents the difference operator, which aims to eliminate the interference of the DC term in the measurement image; ||·||o indicates that the l2 norm is calculated only on the index set o.
6.4.3
Sampling Pattern
The rectangular sampling area is first used to sample the coded image, and the sampling ratio is calculated by taking the coded image of 256 × 256 pixels as 100% measurement data. By reducing the data amount of the acquired image, the performance of the compressive reconstruction algorithm is tested. The correlation coefficient (CC) is used to evaluate the quality of the reconstructed image, because the distribution range of pixel values of the reconstructed image is different from that of the original image. The results are shown in Fig. 6.9. When the sampling ratio is only 56.3%, the compressive reconstruction can still reconstruct the rough outline of the original image. When the sampling ratio is reduced to 25%, the algorithm can only recover
the central part of the image information due to the loss of the edge information of the sampled image. However, rectangular sampling does not consider the energy distribution of the actual image in the spectrum domain. The zone plate-coded image has the same form as the Fresnel diffraction image, which is characterized by the lowfrequency component in the center and the highfrequency component in the edge. Rectangular sampling only samples the low-frequency components in the center of the image, which could lead to the loss of high-frequency information in the reconstructed image and become blurred. Radial sampling can ensure that the center and edge of the image information can be collected. Compared with rectangular sampling, radial sampling can reconstruct high quality images with less sampling data. As shown in Fig. 6.10, the image reconstructed with 10.7% sampling ratio in radial sampling has nearly the same reconstruction quality as that reconstructed with 56.3% of the sampled data in rectangular sampling, and the CC of the reconstructed image is 0.89. In the experiment, three sampling patterns, that is a full sampling pattern, a separated rectangular pattern, and a radial pattern, are used to test the reconstruction. The target image is a rect-
106
J. Wu and L. Cao
Fig. 6.9 Compressive reconstruction results based on rectangular sampling pattern. (Adapted by permission from China Science Publishing & Media Ltd.: Acta Pho-
tonica Sinica, Fresnel zone aperture lensless imaging by compressive sensing (invited), Wu-Jiachen, et al., © 2022)
Fig. 6.10 Compressive reconstruction results based on radial sampling pattern. (Adapted by permission from China Science Publishing & Media Ltd.: Acta Photonica
Sinica, Fresnel zone aperture lensless imaging by compressive sensing (invited), Wu-Jiachen, et al., © 2022)
6 Zone Plate-Coded Imaging
107
Fig. 6.11 Experimental results of compressive reconstruction. (a) Three different sampling modes and their corresponding image reconstruction results, the sample ratio is 100%, 47.3%, and 7.3% respectively. (b) The
cross-sectional intensity distribution of the letter “O” in the image. (Adapted by permission from China Science Publishing & Media Ltd.: Acta Photonica Sinica, Fresnel zone aperture lensless imaging by compressive sensing (invited), Wu-Jiachen, et al., © 2022)
angular logo containing the word “HOLOLAB”. As shown in Fig. 6.11, under the three different sampling modes, the image contrast did not decrease significantly. In extreme situations where only 7.3% of the data is collected, the method can still identify letters effectively. The rectangular sampling mode can be realized by the stitching of several small image sensors, and the radial sampling mode can be realized by the stitching
of several linear array sensors, which verifies the feasibility of constructing multi-image sensor architecture based on ZPCI.
6.5
Conclusion
The ZPCI offers unique imaging property that encoding the incident light into hologram-
108
like pattern, which builds a bridge between holographic imaging and coded mask imaging. The holographic imaging methods, such as back propagation, can be applied to reconstruct the original image. To improve the imaging quality and robustness, TV regularization-based compressive reconstruction method can suppress twin image and achieve high-quality image from the incomplete measurements. Driven by artificial intelligence technology, the ZPCI shows great potential to build a miniaturized, integrated, and intelligent image sensors and would introduce attractive applications to various fields such as photography, machine vision, and biomedical imaging by furthering the investigations. Acknowledgments This work is partially supported by the National Natural Science Foundation of China (62235009) and the China Postdoctoral Innovative Talent Support Program (BX20220180).
References 1. G. L. Rogers, “Gabor Diffraction Microscopy: the Hologram as a Generalized Zone-Plate,” Nature 166, 237–237 (1950). 2. L. Mertz and N. Young, “Fresnel transformation of images,” Optical Instruments and Techinques, 305 (1962). 3. S. K. Chakrabarti, S. Palit, D. Debnath, A. Nandi, V. Yadav, and R. Sarkar, “Fresnel zone plate telescopes for X-ray imaging I: experiments with a quasi-parallel beam,” Exp. Astron. 24, 109–126 (2009). 4. H. H. Barrett, “Fresnel zone plate imaging in nuclear medicine,” J. Nucl. Med 13, 382–385 (1972). 5. Z. Zhao, W. He, J. Wang, Y. Hao, L. Cao, Y. Gu, and B. Zhang, “An improved deconvolution method for X-ray coded imaging in inertial confinement fusion,” Chin. Phys. B 22, 104202 (2013). 6. J. Soltau, P. Meyer, R. Hartmann, L. Strüder, H. Soltau, and T. Salditt, “Full-field x-ray fluorescence imaging using a Fresnel zone plate coded aperture,” Optica 10, 127–133 (2023).
J. Wu and L. Cao 7. Y. Nakamura, T. Shimano, K. Tajima, M. Sao, and T. Hoshizawa, “Lensless Light-field Imaging with Fresnel Zone Aperture,” ITE Technical Report, IST2016– 51 (2016). 8. K. Tajima, T. Shimano, Y. Nakamura, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with multiphased Fresnel zone aperture,” in 2017 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2017), 1–7. 9. T. Shimano, Y. Nakamura, K. Tajima, M. Sao, and T. Hoshizawa, “Lensless light-field imaging with Fresnel zone aperture: quasi-coherent coding,” Appl. Opt. 57, 2841–2850 (2018). 10. S. Mayu, N. Yusuke, T. Kazuyuki, and S. Takeshi, “Lensless close-up imaging with Fresnel zone aperture,” Japanese Journal of Applied Physics 57, 09SB05 (2018). 11. J. Wu, H. Zhang, W. Zhang, G. Jin, L. Cao, and G. Barbastathis, “Single-shot lensless imaging with Fresnel zone aperture and incoherent illumination,” Light: Science & Applications 9, 53 (2020). 12. Y. Ma, J. Wu, S. Chen, and L. Cao, “Explicitrestriction convolutional framework for lensless imaging,” Opt. Express 30, 15266-15278 (2022). 13. J. Wu, L. Cao, and G. Barbastathis, “DNN-FZA camera: a deep learning approach toward broadband FZA lensless imaging,” Opt. Lett. 46, 130–133 (2021). 14. T. Nakamura, T. Watanabe, S. Igarashi, X. Chen, K. Tajima, K. Yamaguchi, T. Shimano, and M. Yamaguchi, “Superresolved image reconstruction in FZA lensless camera by color-channel synthesis,” Opt. Express 28, 39137–39155 (2020). 15. T. D. Beynon, I. Kirk, and T. R. Mathews, “Gabor zone plate with binary transmittance values,” Opt. Lett. 17, 544–546 (1992). 16. W. Zhang, L. Cao, D. J. Brady, H. Zhang, J. Cang, H. Zhang, and G. Jin, “Twin-image-free holography: A compressive sensing approach,” Phys. Rev. Lett. 121, 093902 (2018). 17. J. M. Bioucas-Dias and M. A. Figueiredo, “A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. 16, 2992–3004 (2007). 18. A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” Siam Journal on Imaging Sciences 2, 183–202 (2009). 19. E. J. Candes, “The restricted isometry property and its implications for compressed sensing,” Comptes rendus mathematique 346, 589–592 (2008).
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring Shay Elmalem and Raja Giryes
Abstract
This book chapter proposes a joint opticaldigital processing method for motion deblurring in photography. The problem of motionrelated image blur limits the exposure time for capturing moving objects, making it challenging to achieve proper exposure. To compensate for this issue, the proposed method utilizes dynamic phase-coding in the lens aperture during image acquisition, which encodes the motion trajectory in an intermediate optical image. The coding embeds cues for both motion direction and extent by coloring the spatial blur of each object. These color cues serve as guidance for a digital deblurring process, implemented using a convolutional neural network trained to utilize such coding for image restoration. The proposed approach encodes cues with no limitation on the motion direction and without sacrificing light efficiency. The advantage of the proposed approach is demonstrated over blind-deblurring methods with no optical coding, as well as over other solutions that use coded acquisition, both in
Shay Elmalem: this work had been performed while affiliated with Tel Aviv University S. Elmalem Weizmann Institute of Science, Rehovot, Israel e-mail: [email protected] R. Giryes () Tel Aviv University, Tel Aviv, Israel e-mail: [email protected]
simulation and real-world experiments. The joint optical-digital processing method presented in this chapter provides a promising solution for motion deblurring in photography, which can increase light throughput without motion artifacts and improve image quality. Keywords
Motion blur · Image deblurring · PSF engineering · Aperture coding · Diffractive optical element · Phase-mask · Focus sweep · Spatiotemporal coding · Color-coding · Deep learning · Neural network · Multiscale · U-net
7.1
Introduction
Achieving the proper exposure setting is a wellknown challenge in photography, requiring a careful balancing act between aperture size, exposure time, and gain. While increasing exposure time allows more light to reach the sensor, it also introduces motion blur, and increasing aperture size can result in a shallow depth-of-field and optical aberrations. Various solutions have been proposed to automatically balance exposure parameters [35], but they are often specific to the scenario or provide unsatisfactory performance. Alternatively, methods have been developed to eliminate the artifacts introduced by a non-balanced exposure,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_7
109
110
such as applying a large gain followed by denoising [9, 29, 32, 45], increasing the aperture followed by out-of-focus deblurring [55], or motion-related blur restoration [28], which is the focus of this work. In addition to post-processing methods, computational imaging solutions [37] aim to redesign the entire imaging system by manipulating the image acquisition process to encode information that can be utilized for postprocessing. Deep learning has become a popular framework for image processing and end-to-end design for various applications [5], including extended depth-of-field [12, 13, 21, 30, 47, 54], hyper/multi-spectral imaging [15–17], lensless imaging [2,4,8], depth estimation [22,30,52,56], computational microscopy [23, 27, 41], and motion deblurring [1, 6, 10, 31, 42, 48], to name a few. This chapter proposes a joint optical-digital processing method for motion deblurring in photography. The method uses dynamic phasecoding in the lens aperture during image acquisition to encode motion trajectory in an intermediate optical image. The encoded cues guide a digital deblurring process using a convolutional neural network trained to utilize such coding for image restoration. The proposed approach encodes cues with no limitation on the motion direction and without sacrificing light efficiency, demonstrating improved performance over existing solutions in simulation and realworld experiments. This joint optical-digital processing method provides a promising solution for motion deblurring in photography, increasing light throughput without motion artifacts, and improving image quality. The rest of the chapter is organized as follows: Sect. 7.2 reviews existing solutions for motion deblurring. Section 7.3 presents our proposed spatiotemporal aperture coding solution published [14] with its corresponding post-processing model. Section 7.4 demonstrates the advantage of the proposed solution both in simulation and real-world experiments. Section 7.5 discusses the trade-offs and concludes the chapter.
S. Elmalem and R. Giryes
7.2
Existing Computational Imaging Solutions for Motion Deblurring
Motion deblurring is an important problem in computational photography, and several approaches have been proposed to manipulate the imaging process during exposure to improve deblurring performance. Hybrid imaging [6], light-field cameras [48], and rolling shutter effects [3, 38] are some of the solutions proposed to achieve this goal. One computational imaging solution for motion deblurring is the temporal amplitude coding of the aperture, known as the “fluttered shutter,” developed by Raskar et al. [42]. The technique involves opening and closing the aperture during exposure in a predetermined temporal code to generate a wider frequency response, which leads to improved motion deblurring results. However, the technique requires prior knowledge of the motion direction and extent and suffers from reduced light efficiency due to the closure of the aperture during half of the exposure. To address this issue, a follow-up work proposed a fluttered shutter Point-Spread Function (PSF) estimation method as part of the deblurring process, avoiding the need for prior knowledge of the motion parameters [1]. However, this approach requires a compromise in the deblurring performance, and light efficiency is still reduced. A detailed analysis of the design and implementation of a fluttered shutter camera appears in [49, 50]. The flutter shutter method was extended by Jeon et al. to multi-image photography using complementary fluttering patterns [25]. Levin et al. explored how sensor motion may lead to a motion-invariant PSF [31]. Using such a PSF, one can perform non-blind deconvolution (using the known kernel) on the entire image without estimating each object’s trajectory separately. Using rigorous analysis, the authors demonstrate that motion-invariant PSFs are achieved by a parabolic motion of the image sensor during exposure. Assuming the object’s velocity is within a predefined range, this technique tracks every moving object
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
during exposure for a moment, during which the object moves at the same speed as the sensor. As each object is tracked by the camera for a brief moment, before moving relative to the camera for the remainder of the exposure, the blur of all objects is the same. Using this method, we can apply a conventional deblurring technique that assumes uniform blur. Despite its major advantage, this approach has one serious drawback: the PSF encoding is limited to the axis where the parabolic motion occurred. It is important to note that if an object moves in a different direction, then the motion-invariant PSF assumption no longer holds, resulting in a degradation in performance. The ability to deblur the image is completely lost in the case of movement that is orthogonal to the perceived direction. Consequently, Levin et al. proposed an advanced method to resolve this problem by utilizing two orthogonal parabolic motions taken with the help of two images [10] for additional information. This method allows for the deblurring of motion in all directions, but it is more complex to implement, taking two consecutive images at the same time or using a variety of lenses to achieve this effect. It should be noted that the rigorous model presented in [49] for fluttered shutter cameras can also be applied to parabolic motion cameras. Overall, these approaches have shown promising results in addressing motion blur in imaging systems. However, they also have their own limitations and trade-offs, requiring careful consideration and evaluation when selecting the appropriate method for motion deblurring. In this chapter, we focus on spatiotemporal coding schemes that are described in the next section.
7.3
Spatiotemporal Aperture Coding-Based Deblurring
This chapter presents a novel computational imaging approach for multi-directional motion deblurring from a single image. Our method relies on an innovative encoding scheme that embeds dynamic cues for both the motion trajectory and its extent in the intermediate image (see Fig. 7.1), which serves as a strong prior for both shift-
111
variant Point-Spread Function (PSF) estimation and the deblurring operation. The encoding is achieved by performing spatiotemporal phase-coding in the lens aperture plane during the image acquisition, which induces a specific chromatic-temporal coupling in the PSF of the coded system. Unlike in a conventional camera, this results in a colorvarying spatial blur that encodes the different motion trajectory of each object (see Fig. 7.2). We show in a Fourier analysis that the encoding is performed in the phase domain, and we train a convolutional neural network (CNN) to analyze the embedded cues and use them to reconstruct a deblurred image. The encoding design is performed for a general case, whereby objects move in different directions and velocities, allowing blind deblurring of the intermediate image. The required prior information for both PSF estimation and image restoration is embedded in the encoded cues, which the deblurring CNN can use to implicitly estimate the spatially varying PSF and reconstruct a sharp image. We also present an experimental setup of a conventional camera performing the designed spatiotemporal blur and demonstrate its motion deblurring performance both in the presence of uniform and non-uniform multi-directional motion. One of the main challenges in motion deblurring is that the blur kernel is shift-dependent, making linear shift-invariant deconvolution operations unsuitable. However, by encoding cues in the acquired image, we can mitigate some of the hurdles and achieve improved motion deblurring of a general scene. Our method encodes the intermediate image with enough information that allows both estimating and inverting the spatially varying PSF of the acquired image, thus enabling blind deblurring. Finally, we show that our method has good generalization ability. By tuning it in simulation using synthetic data, we can achieve good performance in a real-world experiment with our designed prototype. Overall, our computational imaging approach represents a significant advance in the field of motion deblurring, providing a powerful tool for improving image
112
S. Elmalem and R. Giryes
Fig. 7.1 Motion deblurring using spatiotemporal phase aperture coding: A moving scene is captured using a camera with spatiotemporal phase aperture coding, which generates a motion-color coded PSF. Using the PSF
coding, a CNN performs blind spatially varying motion deblurring. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al. , © 2020)
Fig. 7.2 Motion-blurred PSF simulation: (left) conventional camera, (middle) gradual focus variation in conventional camera, and (right) the proposed camera-gradual
focus variation with phase aperture coding. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
quality in scenarios where objects move in different directions and velocities.
resolution and higher frame rate using a dynamic mask designed as a spatiotemporal shutter [43] or a fluttered shutter camera [24]. Compressed sensing techniques are used by Llull et al. [34] to extract more than ten frames from a single snapshot, where a moving coded aperture mask is used for the spatiotemporal coding. For using spatiotemporal coding for the task of motion deblurring, the motion variations are projected onto the color space. By changing the PSF color during the motion, a motion-variant encoding with cues to the motion direction and velocity can be achieved. However, color filtering introduces light loss and requires a mechanism for filter replacement, which is not desired. The design of the spatiotemporal coded PSF is crucial for encoding the object trajectory during image acquisition. The variation of the PSF needs to provide cues for both the motion direction and extent while avoiding spatial blur. The design of the spatiotemporal coded PSF is critical for our proposed approach to encode the intermediate image with enough information to allow both estimating and inverting the spatially varying PSF of the acquired image, improving motion deblurring of a general scene.
7.3.1
The Spatiotemporally Coded PSF Design
To encode the motion trajectory during image acquisition, a spatiotemporal coded Point-Spread Function (PSF) is designed. The objective is to create a PSF that varies in some way along the motion trajectory, providing cues for both the motion direction and extent. Spatial variations of the PSF along the motion trajectory may introduce spatial blur, which is not desired. Therefore, the PSF variation needs to take place in another dimension. Prior research has also investigated spatiotemporal coded cameras for other computational imaging applications. For example, a rolling shutter mechanism was proposed to assist with optical flow estimation and high-speed photography [19]. By modifying the rolling shutter, a video sequence can be extracted from a single coded exposure photograph [20, 33]. In addition, low-resolution and low-frame-rate video sequences can be converted to higher
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
To overcome this issue, a phase-mask is used for motion-color coding. Our proposed design uses a phase-mask for motion-color coding, which allows for motion-variant encoding with cues to the motion direction and velocity. Previous studies have used phase-masks located at the aperture plane for PSF engineering [11–13, 21, 22, 39, 54]. Phase-masks located at the aperture plane offer increased light throughput, making them more desirable than amplitude aperture coding in many applications. The increased light throughput of phase over amplitude aperture coding is significant, and many amplitude-coding based systems reduce the light throughput by about 50% [30, 42, 46]. For the purpose of PSF engineering, phasemasks comprised of several concentric rings are used, similar to those used in various previous works [13, 21, 22, 54]. In the case of a single phase ring and polar coordinates (the example below can be easily extended to a multiple rings pattern), the phase-mask function .P M(r, φring ) can be expressed as follows: P M(r, φring ) =
.
exp{j φring }
r1 < ρ < r2
1
otherwise, (7.1)
where .r = [r1 , r2 ] is the normalized coordinates of the ring location and φ
. ring
=
2π [n − 1]hring λ
(7.2)
is the phase-shift introduced by a phase-ring. Note that .λ is the illumination wavelength, .hring is the ring height, and n is the refractive index of the mask substrate. As .φring ∝ λ1 , incorporating such a mask in the aperture plane can introduce a predesigned and controlled axial chromatic aberration, which engineers the PSF to have a joint defocus-color dependency. As opposed to a conventional corrected lens (in which the response is designed to be the same for all colors), incorporating such a phase-mask in the lens aperture introduces a discrepancy in the lens response to the different colors. For example, a phase-mask can be designed to generate an in-focus PSF, which is narrow for the blue wavelength band, wider
113
for green, and even wider for red. To generate such joint defocus-color dependency, the phasemask is designed in a way that defocus-variations change the width of the PSFs at different colors, such that in another focus plane the narrow PSF color is either green or red, and the “order” of the PSFs’ width is interchanging. Note that the joint dependency between colors that is generated by the phase-mask was utilized for both extended depth of focus (EDOF) [13] and depth estimation [22] by focusing the lens to a specific plane in a scene containing objects located at various depths. This causes each object to be blurred by a different blur kernel, according to its defocus condition .ψ, defined as: 1 1 1 + − zo zimg f π R2 1 1 π R2 1 1 = − − = , λ zimg zi λ zo zn (7.3)
ψ=
.
π R2 λ
where .zimg is the sensor plane location for an object in the nominal position (.zn ); .zi is the ideal image plane for an object located at .zo ; f and R are the imaging system focal length and exit pupil radius; and .λ is the illumination wavelength. The .ψ parameter indicates the maximum of the quadratic phase error (due to defocus) in the pupil function (see [13, 21, 22, 54] for details), so the pupil function .PP M,OOF of a lens with both defocus error and phase-mask is: PP M,OOF = P (ρ, θ)P M(r, φring ) exp{j ψρ 2 }, (7.4)
.
where .P (ρ, θ) is the in-focus pupil function. The PSF is calculated using the pupil function by the relation [18]: P SF = |F{PP M,OOF )}|2 .
.
(7.5)
The color-depth encoding of the blur kernels makes it possible to estimate EDOF in highquality (which, in the general case, requires blind shift-variant deconvolution) and monocular depth estimation.
114
Previous works such as [13, 21, 54] have utilized chromatic aberration as a cue to estimate the correct PSF for deblurring. In these cases, deblurring is performed using the sharp color channel which contains the image information, while the other color channels are used to estimate the PSF. This approach works well since natural images almost always have color content in all channels, and pure monochromatic objects are rare. However, in the proposed design, color cues are used to indicate the motion trajectory for shift-dependent PSF estimation. To achieve this, a phase-mask with two rings (similar to the one presented in [13,21,22,54]) is used, where the phase values for the two rings are .r = [0.55, 0.8, 0.8, 1] and .φ ring = [6.2, 12.3]rad (measured for .λ = 455 nm). The desired spatiotemporal dependency is achieved by modulating the PSF through different focus settings during the exposure, which results in the PSF being “colored” (i.e., narrow in a certain color band and wider in the other bands) rather than being filtered chromatically. The proposed system is different from previous works, such as [13, 21, 54], which use chromatic aberration to estimate the correct PSF for deblurring. Instead, the proposed system uses color cues to indicate the motion trajectory for shift-dependent PSF estimation. This enables the system to achieve a spatiotemporal encoding of the motion blur while avoiding the tradeoff between motion blur and spatial blur. The strength of the defocus-color dependency is crucial in this design. Therefore, a phase-mask with two rings is used to achieve a strong defocuscolor dependency. The two rings are designed to have different phase values, which results in a different PSF color for each ring. The PSF color is defined by the ratio between the PSF width in the different color channels. The described design assumes an infiniteconjugate imaging setting, which is common in various applications such as security cameras and smartphone cameras. By adding the colordiversity phase-mask, the PSF can be modulated to be “colored” as different focus settings are applied. As the focus setting changes gradually during the exposure, the PSF color varies,
S. Elmalem and R. Giryes
and different motion trajectories are blurred differently in the chromatic dimension. The PSF modulation is performed according to Eq. 7.3, where .ψ represents the focus variation. If the lens is focused properly, then .ψ = 0. However, if a focus variation is introduced, .ψ changes, and the PSF is modulated accordingly. The .ψ variation domain is a design parameter, which needs to be traded off between motion extent and sensitivity. We concentrate on the domain of .0 < ψ < 8 (calculated for .λ = 455 nm), as for it, the mask provides the strongest chromatic separation. For an exposure time of .Texp , the .ψ 8 variation is set to .ψ(t) = Texp t, and the spatiotemporally coded pupil function is: Pcoded = P (ρ, θ)P M(r, φring ) exp{j ψ(t)ρ 2 }, (7.6)
.
with a similar notation to Eq. (7.4). The proposed method utilizes a static phasemask to introduce a color-defocus coupling, which is manifested in the image by both the dynamic focus setting .ψ(t) and the objects’ movements. The encoding process is illustrated in Fig. 7.2. In the left panel of Fig. 7.2, the blur of a horizontally moving point source captured by a conventional camera is shown, resulting in a blurred line. When gradual focus variation of .ψ(t) is performed on a clear aperture lens during exposure, as shown in the middle panel of Fig. 7.2, the PSF gets wider in all colors simultaneously, resulting in a considerable spatial blur in the last parts of the motion. However, when the same focus variation is performed on a lens equipped with a ring phase-mask, shown in the right panel of Fig. 7.2, the PSF colors change along the motion line, from blue through green to red, thus encoding both the motion extent and velocity. To further demonstrate the motion encoding of our method, we performed simulations of moving point sources using our method and compared it to a conventional camera, the fluttered-shutter camera [42], and the parabolic motion camera [31]. The PSF encoding performed by the different methods is shown in Fig. 7.3, which extends a similar comparison shown in Fig. 3 of [31].
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
115
The original scene consists of two sets of point sources arranged in two orthogonal lines. While the joint dot stays in place, all the other dots are moving at different velocities, as illustrated by the arrows in panel (a) of Fig. 7.3. Imaging simulation of this scene is performed using the four methods. Using a conventional camera, the stationary dot remains unchanged, and all the other dots are blurred according to their motion trajectory. The fluttered shutter camera blocks parts of the dots’ trace, and the code can be clearly seen. As suggested in [42], such a code generates an easy-to-invert PSF, assuming that the motion direction and extent are known. However, an inherent invertibility/estimation trade-off exists, as discussed in [1]. Additionally, the light throughput loss caused by the fluttered shutter is clearly visible (for the code proposed in [42] and used here, the loss is 50%).
The parabolic motion camera, which utilizes parabolic motion in the horizontal direction, shows that the PSF is nearly invariant to motion in the sensor motion direction, as evidenced by the horizontal dots. However, in any other direction, especially in the orthogonal direction (in this case, vertical), the motions of both the dots (linear) and the sensor (parabolic) are combined, resulting in highly motion-variant PSFs. In contrast, our proposed joint phase-mask and focus variation coding method colors each PSF according to the specific motion trajectory. The direction of motion is represented by the bluegreen-red transition, and the width of the transition reflects the motion velocity.
Fig. 7.3 Simulation of the different coding methods: The imaging is performed on single pixel dots to simulate point sources (For visualization purposes dilation and gamma correction are applied). (a) First frame (arrows indicate dots path and velocity), (b) last frame, (c) con-
ventional static camera, (d) fluttered shutter camera [42], (e) parabolic motion camera [31], and (f) our proposed camera. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
Spatiotemporal spectra analysis To analyze the motion encoding ability of our scheme, a spatiotemporal spectra analysis of the PSF is
116
carried out using the spatiotemporal Fourier analysis model proposed in [31]. In this model, a single spatial dimension is examined vs. the temporal dimension, and a 2D Fourier transform (FT) is carried out on the .(x, t) slice of the full .(x, y, t) space. In such setting, different velocities of a point source form lines at different angles in the .(x, t) plane. The analysis in [31] included only the spectrum amplitude, but in our case, we include also the phase, since our encoding is also phase dependent. We start by comparing all methods on a static point, represented by a vertical line in the .(x, t) plane (see Fig. 7.4). For the conventional static camera, the .(x, t) slice of the PSF has a Sincfunction spectrum amplitude, which allows good reconstruction of an object at this velocity (represented by the angle of the .(x, t) PSF). Since the PSF is “gray” (i.e., has no chromatic shift along its trajectory), its spectrum phase is also gray. This “gray phase” feature is common also to the fluttered-shutter and parabolic motion cameras. In the three reference methods, the PSF is “gray” (i.e., has no chromatic shift along its trajectory), and therefore the spectrum phase is also gray. Our proposed PSF can be considered as an infinite sequence of smaller PSFs, each one of a different color. As all PSFs have a similar spatial shape, but each has a different color and different location in the .(x, t) plane, the spectrum amplitude is “white” and similar to the spectrum amplitude of the conventional PSF. Yet, the phase (which holds the shift information) is colored, according to the shift (i.e., spatiotemporal location) of each color. Our spatiotemporal chromatic coupling can be considered as utilization of the spectrum phase as a degree of freedom for the coding. The color variations in the phase indicate the coupling between the color and the trajectory, as can be seen in Fig. 7.4 (note that vertical artifacts in the phase plots are due to errors in the phase unwrapping method used in the process). Additional comparisons presenting different velocities (which correspond to different angles in the .(x, t)) space are presented in Figs. 7.5 and 7.6. In the conventional camera, no information is encoded in the phase (as can be seen, the three
S. Elmalem and R. Giryes
phases are almost the same). The parabolic motion camera is designed to generate a motioninvariant PSF; therefore, its phase also holds little information (the minor differences are due to the fact that the PSF is not fully motion invariant, due to the finite parabolic motion). Using our method, the different velocities of the source are coded in the different colored pattern of the spectrum phase. Note that the phase of the flutteredshutter camera PSF indeed contains some motion estimation information (i.e., the temporal code generates phase variations), but this ability holds an estimation-invertibility trade-off, as mentioned above and discussed in [1]. In our method, where the color space is utilized for encoding, the motion cues are much stronger and allow improved PSF estimation while preserving PSF invertibility as in each part of the motion at least part of the spectrum is sharp and can serve as a guide to reconstruct the blurred colors.
7.3.2
Color-Coded Motion Deblurring Neural Network
The spatiotemporal blur kernel generated by the dynamic phase aperture coding produces chromatic variations that encode motion trajectories in all directions. These color cues serve as prior information for shift-variant PSF estimation, allowing effective non-homogeneous motion deblurring. Traditionally, spatially varying deblurring involves two stages: PSF estimation for different objects/segments, followed by deblurring of each segment. Recent work, such as [13, 40], shows that this can be accomplished with a single CNN trained on a dataset containing various shiftvariant blur possibilities. The CNN extracts the cues required for PSF estimation and utilizes the information to perform image deblurring. Training data To train the CNN for our motion deblurring process, images of moving objects blurred with our spatiotemporal varying blur kernel, and their corresponding sharp images, are required. However, experimentally acquiring such images is complex, even without the dynamic aperture coding. Therefore, an imaging simula-
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
117
Fig. 7.4 PSF spatiotemporal spectra analysis. PSFs and the corresponding spectra for a static point source captured using (top to bottom) static camera, parabolic motion camera, fluttered-shutter camera, and our method.
(a) .(x, t) slice of PSF and its (b) amplitude and (c) phase in Fourier domain. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
tion using the GoPro dataset [40] is utilized. The simulated images are generated by blurring consecutive frames using the coded kernel and then summing them up. A dataset of 2,500 images,
containing sequences of 9 frames, is created, with 80% used for training and the rest for validation (the original test set is used for testing). Since our deblurring process relies on local cues encoded
118
S. Elmalem and R. Giryes
Fig. 7.5 PSF spatiotemporal spectra analysis. PSFs and the corresponding spectra for a moving point source (the velocity is indicated by the angle at the (.x, t) plane) captured using (top to bottom) static camera, parabolic motion camera, fluttered-shutter camera, and our method.
(a) .(x, t) slice of PSF and its (b) amplitude and (c) phase in Fourier domain. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
by our spatiotemporal kernel rather than image statistics, a CNN trained on this synthetic data can
generalize well to real-world images, as shown in the following section.
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
119
Fig. 7.6 PSF spatiotemporal spectra analysis. PSFs and the corresponding spectra for a moving point source (the velocity is indicated by the angle at the (.(x, t) plane) captured using (top to bottom) static camera, parabolic motion camera, fluttered-shutter camera, and our method.
(a) .(x, t) slice of PSF and its (b) amplitude and (c) phase in Fourier domain. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
The deblurring network architecture In order to restore blurry images, a fully convolutional network (FCN) architecture is utilized. The U-
Net structure is chosen as it is a popular multiscale FCN architecture and is effective at capturing the structure of motion-blurred objects. The
120
S. Elmalem and R. Giryes
Table 7.1 CNN layers: The number of filters in each convolution operation Layer
.Nin
.Nout
.CONVin
3 64 128 256 512 1024 512 256 128 64
64 128 256 512 512 256 128 64 64 3
.DOW N1 .DOW N2 .DOW N3 .DOW N4 .U P1 .U P2 .U P3 .U P4 .CONVout
U-Net model used contains four downsampling blocks and their corresponding four upsampling blocks, with additional convolutions in the input and output. Each convolution operation consists of .3 × 3 filters, followed by BatchNorm layer and a Leaky-ReLU activation. Each CONV block contains double Conv-BN-ReLU sequence. Downsampling blocks perform convolution and then a .2 × 2 max-pooling operation. Upsampling blocks perform a .×2 trainable upsampling (transpose-Conv layer), CONV block, and a concatenation with the corresponding downsampling block’s output. A skip-connection is added between the input and output, which allows the “U” structure to estimate a “residual” correction to the input blurred image. The number of filters in each layer is specified in Table 7.1. The U-Net architecture is trained using patches of size .128 × 128 taken from the dataset, and noise augmentation is used to simulate noise in real images taken using the target camera. The network is trained using the Huber loss, and the average reconstruction results on the test set are .P SNR = 29.5, SSI M = 0.93. To quantify the benefit of the proposed PSF encoding, we generated a version of the same dataset without our spatiotemporal coding and trained the same architecture using it. In this case, a significant over-fitting occurred, result-
ing in poor results on the test set (.P SNR = 24.6, SSI M = 0.84). To evaluate the benefit of the proposed PSF encoding, a version of the dataset without spatiotemporal coding is generated, and the same architecture is trained using it. In this case, a significant over-fitting occurred, resulting in poor results on the test set. Another network structure, which is similar to the one presented in [13,22], is evaluated. This model is much shallower, containing ten consecutive blocks where each block contains Conv-BN-ReLU layers (without any pooling). A skip connection is made from the input to the output (in similarity to our U-Net structure). The filter size in each block is .3×3, and each layer has 32 filters (besides the output layer, which has only three, to generate the final residual image). This architecture achieves nominal performance (.P SNR = 27.5, SSI M = 0.9) as multiscale information is important for this task. However, this architecture is much shallower and with just 2% of the weights of the full U-net model and still achieves comparable results to the model of [40]. This comparison demonstrates the benefit of the aperture coding, where the encoded cues provide strong guidance for the deblurring operation, enabling a very shallow model to achieve comparable performance to a much larger one.
7.4
Experiments
We begin by testing the proposed method in simulation and present two different comparisons. Firstly, we compare our dynamic aperture phase coding approach to other computational imaging methods such as the fluttered shutter camera [42] and the parabolic motion camera [31], demonstrating the superiority of our method. Secondly, we compare our approach to the deblurring CNN presented by Nah et al. [40] that is designed for conventional cameras, showcasing the advantages of coded aperture. Real-world results are presented subsequently, which were acquired using a prototype of the spatiotemporal coded camera.
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
7.4.1
Comparison to Other Coding Methods
To demonstrate our method’s PSF estimation capability in motion deblurring in comparison to the motion direction sensitivity of other methods, we simulate a scene with a rotating spoke resolution target. The scene contains motion in all directions and velocities simultaneously, making it challenging. The synthetic scene serves as an input to the imaging simulation for the three different methods (fluttered shutter, parabolic motion, and our method). The fluttered shutter code being used (both in the imaging and reconstruction) is for motion to the right, in the extent of the linear motion of the outer parts of the spoke target. The parabolic motion takes place on the horizontal direction. Each imaging result is noised using additive white Gaussian noise (AWGN) with .σ = 3 to simulate a real imaging scenario in good lighting conditions (since the fluttered shutter coding blocks 50% of the light throughput, the noise level of its image is practically doubled). Figure 7.7 presents the intermediate images and the deblurring results of the three different techniques. The imaging simulations of the fluttered shutter and parabolic motion cameras were implemented by us, following the descriptions in [31, 42]. The fluttered shutter reconstruction is performed using the code released by the authors. In [31], the parabolic motion reconstruction is performed using an algorithm developed by the authors, and its implementation is not available. In Section 4.1 of [31], the authors suggested using the Lucy-Richardson deconvolution algorithm [36, 44] that achieves almost the same performance as the original algorithm used in [31]. However, both options are non-DL-based methods. For a fairer comparison, we chose to implement the deblurring operation using the IDBP non-blind deblurring method [51], which utilizes a denoising CNN in a back-projection process, thus employing the strength of both model-based and learning-based deblurring methods (this method achieved state-of theart results for various tasks, as reported in
121
[51]). This method indeed provided superior results. However, the main issue in the current comparison is the sensitivity of the other coding methods to the motion direction, which is not related to the used reconstruction algorithm. The fluttered-shutter-based reconstruction restores the general form of the area with the corresponding motion coding (outer lower part, moving right), and some of the opposite direction (outer upper part, moving left), and fails on all other directions/velocities. This can be partially solved using a different coding that allows both PSF estimation and inversion. Yet, such scheme introduces an estimation-invertibility trade-off. However, a rotating target is a challenging case for shift-variant PSF estimation, and in case a restoration with incorrect PSF is performed, it leads to poor results (as can be seen in Fig. 7.7). Moreover, the increased noise sensitivity of this approach is apparent, as it blocks 50% of the light throughput. The method based on parabolic motion is effective in reconstructing the horizontal motion in both directions (left and right), which is evident from the clear images of the upper and lower parts of the spoke that move horizontally. However, it should be noted that this method does not perform equally well for both left and right directions due to the finite nature of practical parabolic motion, which makes it impossible to generate a truly motion-invariant PSF. Additionally, vertical motions are not encoded adequately by this method, leading to poor reconstruction. On the other hand, our proposed method can estimate motion in all directions, enabling shift-variant blind deblurring of the scene.
7.4.2
Comparison to CNN-Based Blind Deblurring
In order to assess the advantages of our motioncues coding method, we compared it to the multiscale motion deblurring CNN presented by Nah et al. [40], using the test set of the GoPro dataset as input. Since Nah et al. trained their model on sequences of between 7 and 13 frames, we created similar scenes using both our coding method and
122
Fig. 7.7 Simulation results of rotating target: (a) rotating target and the intermediate images vs. reconstruction results for (b, c) fluttered-shutter, (d, e) parabolic motion
S. Elmalem and R. Giryes
camera and (f, g) our method. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
simple frame summation, as used in [40], with the proper gamma-related transformations. It is worth noting that in our case, a spatial blur related to diffraction is added to the motion blur, making our model better equipped to handle a more challenging task. We compared the reconstruction results for several noise levels, with .σ varying between 0 and 3 on a scale of 0–255. The measures on each motion length were averaged over the different noise levels, and the results were displayed in Table 7.2. As can be seen, our method outperforms the method of Nah et al. [40] in terms of recovery error, as demonstrated by higher PSNR and SSIM scores. Both methods provide visually pleasing restorations for small motion lengths, but our method is more accurate in terms of PSNR/SSIM. As the motion length increases, the advantage of our method becomes more significant, which can be attributed to the fact that the architecture used in [40] is trained using adversarial loss, leading to data hallucination during reconstruction. As the motion length increases, this data hallucination becomes less accurate, resulting in a more significant reduction in PSNR/SSIM scores. In contrast, our method employs encoded motion cues for the reconstruction, resulting in more accurate results and a better perception-distortion trade-off [7]. Notably, our model is trained only on images generated using sequences of 9 frames, yet its deblurring performance for shorter and longer sequences is superior to that of [40], which is trained on sequences ranging from 7 to 13 frames. This indicates that our model has learned to extract color-motion cues and utilize them for the image deblurring task, beyond the specific extent present in the training data. Our advantage is even more prominent when the motion length is larger, although this comparison is not entirely fair to [40], as their model is trained on sequences of up to 13 frames. Additionally, in our dataset, an additional diffraction-related spatial blur is added, making our advantage over [40] even more pronounced when a similar spatial blur is added to the original GoPro dataset (without the motioncolor cues). Furthermore, our method is more robust to the level of noise in the image. The results presented here are limited to the range
123
Table 7.2 Quantitative comparison to blind deblurring: PSNR/SSIM comparison between the method presented in [40] and our method, for various lengths of motion (.Nf rames ). Results are the average measures for various scenes and different noise levels. See the supplementary material of [14] for per noise level statistics and additional information .Nf rames
Nah et al.
=7 .N = 9 .N = 11 .N = 13
.28.1/0.93
.N
.27/0.91 .25.9/0.89 .24.9/0.87
Ours 30.9/0.95 30/0.94 28.9/0.92 28/0.91
σ = [0, 3] to facilitate a fair comparison to [40], which is trained for a noise level of .σ = 2 (see the supplementary material of [14] for additional results per noise level and results for higher noise levels).
.
7.4.3
Table-Top Experiment
To validate the effectiveness of our proposed PSF spatiotemporal encoding, we conducted a realworld experiment using a custom-built setup (refer to Figs. 7.8 and 7.9), following the simulation results presented earlier. The setup includes an 18MP camera with a C-mount lens of focal length .f = 12[mm] (Edmund Cx C-mount lens #33632) and pixel size of .1.25[μm]. We added a phase-mask similar to the one used in [13, 22] and a liquid focusing lens (Corning Varioptic A25H0 lens) in the aperture plane of the main lens. The liquid lens is calibrated to vary the focus during the exposure by an amount equivalent to 8 .ψ = t, as described in Sect. 7.3. We trigger Texp the liquid lens using the camera’s flash activation signal to achieve the desired focus variation. In the first experiment, we used two white LEDs mounted on a spinning wheel as pointsources, similar to the ones used in the simulations shown in Fig. 7.3. We captured a motionblurred image of the spinning LEDs using the phase-mask and the proper focus variation during exposure. The resulting image, presented in Figs. 7.10 and 7.11, shows a clear and gradual change in color along the motion trajectory. Next, we performed a deblurring experiment on moving objects using a rotating photo as the
124
Fig. 7.8 The table-top experimental setup: The liquidlens and phase-mask are incorporated in the C-mount lens. The micro-controller synchronizes the focus varia-
S. Elmalem and R. Giryes
tion to the frame exposure using the camera flash signal. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
Fig. 7.9 Block diagram of the experimental setup: Following the setup photo presented in Fig. 7.8, the block diagram describes the setup structure and interconnections. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
Fig. 7.10 PSF encoding validation experiment: two rotating white LEDs simulating point sources captured using our camera. The color-coded motion trace indicates
both direction and velocity. (Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
125
Fig. 7.11 Experimental validation of PSF coding: Zoom in on the moving white LED captured with our camera, validates the required PSF encoding. (Reproduced
by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
Fig. 7.12 Rotating image experiment: reconstruction results of (top) rotating photo and (bottom) zoom-ins, using (a) our method and (b) Nah et al. reconstruction [40].
(Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
target. To examine various motion directions and velocities, we captured an image of the rotating object and processed it using the CNN described in Sect. 7.3. It’s worth noting that the CNN was trained purely on simulated images, and no finetuning to the experimental PSF was performed. For comparison, we also captured an image of the same rotating object using a conventional camera without the phase-mask and applied multiscale motion deblurring using the CNN of Nah
et al. [40]. The results, presented in Figs. 7.12 and 7.13, demonstrate the superior performance of our camera over the conventional camera and the existing deblurring method. We also captured an image of a linearly moving toy train using the same experimental configuration as the previous example, and the results are presented in Fig. 7.14. As seen in the figure, our camera outperforms the conventional camera and the existing deblurring method in this scenario as well.
126
S. Elmalem and R. Giryes
Fig. 7.13 Rotating image experiment: reconstruction results of (top) rotating photo and (bottom) zoom-ins, using (a) our method and (b) Nah et al. reconstruction [40].
7.5
Summary and Conclusion
In conclusion, we have presented a novel approach for blind motion deblurring based on spatiotemporal phase coding of the lens aperture. Our approach encodes a motion-variant PSF by utilizing both static and dynamic components of phase coding and then uses a CNN for joint PSF estimation and motion deblurring. Our approach achieves better performance than other blind deblurring methods and computational imaging-based strategies in various scenarios, without imposing a limitation
(Reproduced by permission from Optica: Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
on the motion direction. The shift-variant PSF estimation ability and generalization potential of our approach to real-world scenes are analyzed and discussed. The proposed spatiotemporal PSF color encoding also provides cues to the entire motion trajectory, making our approach potentially useful for video-from-motion and temporal super-resolution applications. In summary, our approach presents a significant contribution to the field of computational imaging and has the potential to enable new applications in various domains, including photography,
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
127
Fig. 7.14 Train experiment: Recovery results of a moving train using (a) our method and (b) Nah et al. reconstruction [40]. (Reproduced by permission from Optica:
Optica, Motion deblurring using spatiotemporal phase aperture coding, Elmalem, S., et al., © 2020)
videography, and medical imaging, similar to [20, 24, 26, 33, 34]. Recently, the approach proposed in this work has been utilized for the task for reconstructing video from a single blurred image [53]. In this follow-up work, a spatiotemporally coded camera for video reconstruction from motion blur has been proposed. This method utilizes the motion blur limitation to encode motion cues in a single-coded image, which are then used to reconstruct a frame burst of the scene using a CNN trained for this purpose. By choosing a sequence of relative time parameters, a sharp frame burst can be reconstructed. We present simulation and real-world results, demonstrating improved performance compared to existing methods based on conventional imaging, both in reconstruction
quality and handling the inherent direction ambiguity. This method can also assist camera designers in balancing the various trade-offs between frame rate, sampling rate, and light efficiency. To summarize, the proposed spatiotemporal approach in this chapter shows great potential in extending existing photography capabilities with simple and minor hardware changes.
References 1. Agrawal, A.K., Xu, Y.: Coded exposure deblurring: Optimized codes for psf estimation and invertibility. 2009 IEEE Conference on Computer Vision and Pattern Recognition pp. 2066–2073 (2009) 2. Antipa, N., Kuo, G., Heckel, R., Mildenhall, B., Bostan, E., Ng, R., Waller, L.: Diffusercam: lens-
128
3.
4.
5.
6. 7.
8.
9.
10.
11.
12.
13.
14.
15.
S. Elmalem and R. Giryes less single-exposure 3d imaging. Optica 5(1), 1– 9 (2018). DOI https://doi.org/10.1364/OPTICA.5. 000001. URL http://www.osapublishing.org/optica/ abstract.cfm?URI=optica-5-1-1 Antipa, N., Oare, P., Bostan, E., Ng, R., Waller, L.: Video from stills: Lensless imaging with rolling shutter. In: 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–8 (2019) Asif, M.S., Ayremlou, A., Sankaranarayanan, A., Veeraraghavan, A., Baraniuk, R.G.: Flatcam: Thin, lensless cameras using coded aperture and computation. IEEE Transactions on Computational Imaging 3(3), 384–397 (2017) Barbastathis, G., Ozcan, A., Situ, G.: On the use of deep learning for computational imaging. Optica 6(8), 921–943 (2019). DOI https://doi.org/10.1364/ OPTICA.6.000921. URL http://www.osapublishing. org/optica/abstract.cfm?URI=optica-6-8-921 Ben-Ezra, M., Nayar, S.K.: Motion deblurring using hybrid imaging. In: CVPR (2003) Blau, Y., Michaeli, T.: The perception-distortion tradeoff. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) Boominathan, V., Adams, J., Robinson, J., Veeraraghavan, A.: Phlatcam: Designed phase-mask based thin lensless camera. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 1–1 (2020) Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) Cho, T.S., Levin, A., Durand, F., Freeman, W.T.: Motion blur removal with orthogonal parabolic exposures. In: 2010 IEEE International Conference on Computational Photography (ICCP), pp. 1– 8 (2010). DOI https://doi.org/10.1109/ICCPHOT. 2010.5585100 Cossairt, O., Zhou, C., Nayar, S.: Diffusion coded photography for extended depth of field. ACM Trans. Graph. 29(4) (2010). DOI https://doi.org/10.1145/ 1778765.1778768 Dowski, E.R., Cathey, W.T.: Extended depth of field through wave-front coding. Appl. Opt. 34(11), 1859– 1866 (1995). DOI https://doi.org/10.1364/AO.34. 001859. URL http://ao.osa.org/abstract.cfm?URI= ao-34-11-1859 Elmalem, S., Giryes, R., Marom, E.: Learned phase coded aperture for the benefit of depth of field extension. Opt. Express 26(12), 15316– 15331 (2018). DOI https://doi.org/10.1364/OE.26. 015316. URL http://www.opticsexpress.org/abstract. cfm?URI=oe-26-12-15316 Elmalem, S., Giryes, R., Marom, E.: Motion deblurring using spatiotemporal phase aperture coding. Optica 7(10), 1332–1340 (2020). DOI https://doi.org/10.1364/OPTICA.399533. URL http://www.osapublishing.org/optica/abstract.cfm? URI=optica-7-10-1332 Gedalin, D., Oiknine, Y., Stern, A.: Deepcubenet: reconstruction of spectrally compressive sensed hyperspectral images with deep neural networks.
16.
17.
18. 19.
20.
21.
22.
23.
24.
25.
26.
Opt. Express 27(24), 35811–35822 (2019). DOI https://doi.org/10.1364/OE.27.035811. URL http:// www.osapublishing.org/oe/abstract.cfm?URI=oe27-24-35811 Gehm, M.E., John, R., Brady, D.J., Willett, R.M., Schulz, T.J., Others: Single-shot compressive spectral imaging with a dual-disperser architecture. Opt. Express 15(21), 14013–14027 (2007) Golub, M.A., Averbuch, A., Nathan, M., Zheludev, V.A., Hauser, J., Gurevitch, S., Malinsky, R., Kagan, A.: Compressed sensing snapshot spectral imaging by a regular digital camera with an added optical diffuser. Appl. Opt. 55(3), 432–443 (2016). DOI https://doi. org/10.1364/AO.55.000432. URL http://ao.osa.org/ abstract.cfm?URI=ao-55-3-432 Goodman, J.: Introduction to Fourier Optics, 2nd edn. MaGraw-Hill (1996) Gu, J., Hitomi, Y., Mitsunaga, T., Nayar, S.K.: Coded rolling shutter photography: Flexible space-time sampling. 2010 IEEE International Conference on Computational Photography (ICCP) pp. 1–8 (2010) Gupta, M., Mitsunaga, T., Hitomi, Y., Gu, J., Nayar, S.K.: Video from a single coded exposure photograph using a learned over-complete dictionary. In: 2011 IEEE International Conference on Computer Vision (ICCV 2011), pp. 287–294. IEEE Computer Society, Los Alamitos, CA, USA (2011). DOI https://doi.org/ 10.1109/ICCV.2011.6126254 Haim, H., Bronstein, A., Marom, E.: Computational multi-focus imaging combining sparse model with color dependent phase mask. Opt. Express 23(19), 24547–24556 (2015). DOI https://doi.org/10.1364/ OE.23.024547. URL http://www.opticsexpress.org/ abstract.cfm?URI=oe-23-19-24547 Haim, H., Elmalem, S., Giryes, R., Bronstein, A., Marom, E.: Depth Estimation from a Single Image using Deep Learned Phase Coded Mask. IEEE Transactions on Computational Imaging pp. 298–310 (2018). DOI https://doi.org/10.1109/TCI.2018.2849326 Hershko, E., Weiss, L.E., Michaeli, T., Shechtman, Y.: Multicolor localization microscopy and point-spreadfunction engineering by deep learning. Opt. Express 27(5), 6158–6183 (2019). DOI https://doi.org/10. 1364/OE.27.006158. URL http://www.opticsexpress. org/abstract.cfm?URI=oe-27-5-6158 Holloway, J., Sankaranarayanan, A.C., Veeraraghavan, A., Tambe, S.: Flutter shutter video camera for compressive sensing of videos. In: 2012 IEEE International Conference on Computational Photography (ICCP), pp. 1–9 (2012). DOI https://doi.org/10.1109/ ICCPhot.2012.6215211 Jeon, H., Lee, J., Han, Y., Kim, S.J., Kweon, I.S.: Multi-image deblurring using complementary sets of fluttering patterns. IEEE Transactions on Image Processing 26(5), 2311–2326 (2017). DOI https://doi. org/10.1109/TIP.2017.2675202 Jin, M., Meishvili, G., Favaro, P.: Learning to extract a video sequence from a single motion-blurred image. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
7
Spatiotemporal Phase Aperture Coding for Motion Deblurring
27. Kellman, M., Bostan, E., Repina, N.A., Waller, L.: Physics-based learned design: Optimized codedillumination for quantitative phase imaging. IEEE Transactions on Computational Imaging 5(3), 344–353 (2019). URL https://ieeexplore.ieee.org/ document/8667888 28. Lai, W., Huang, J., Hu, Z., Ahuja, N., Yang, M.: A comparative study for single image blind deblurring. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1701–1709 (2016). DOI https://doi.org/10.1109/CVPR.2016.188 29. Lefkimmiatis, S.: Non-local color image denoising with convolutional neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 30. Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional camera with a coded aperture. In: ACM SIGGRAPH 2007 Papers, SIGGRAPH ’07. ACM, New York, NY, USA (2007). DOI https://doi.org/10.1145/1275808.1276464 31. Levin, A., Sand, P., Cho, T.S., Durand, F., Freeman, W.T.: Motion-invariant photography. ACM Transactions on Graphics (SIGGRAPH) (2008) 32. Liba, O., Murthy, K., Tsai, Y.T., Brooks, T., Xue, T., Karnad, N., He, Q., Barron, J.T., Sharlet, D., Geiss, R., Hasinoff, S.W., Pritch, Y., Levoy, M.: Handheld mobile photography in very low light (2019) 33. Liu, D., Gu, J., Hitomi, Y., Gupta, M., Mitsunaga, T., Nayar, S.K.: Efficient space-time sampling with pixel-wise coded exposure for high-speed imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(2), 248–260 (2014). DOI https://doi. org/10.1109/TPAMI.2013.129 34. Llull, P., Liao, X., Yuan, X., Yang, J., Kittle, D., Carin, L., Sapiro, G., Brady, D.J.: Coded aperture compressive temporal imaging. Opt. Express 21(9), 10526–10545 (2013). DOI https://doi.org/10.1364/ OE.21.010526. URL http://www.opticsexpress.org/ abstract.cfm?URI=oe-21-9-10526 35. London, B., Upton, J., Stone, J.: Photography. Pearson (2013). URL https://books.google.co.il/books? id=8f1WMQEACAAJ 36. Lucy, L.B.: An iterative technique for the rectification of observed distributions. Astron. J. 79, 745–754 (1974). DOI https://doi.org/10.1086/111605 37. Mait, J.N., Euliss, G.W., Athale, R.A.: Computational imaging. Adv. Opt. Photon. 10(2), 409–483 (2018). DOI https://doi.org/10.1364/AOP.10.000409. URL http://aop.osa.org/abstract.cfm?URI=aop-10-2-409 38. Mohan, M.M.R., Rajagopalan, A.N., Seetharaman, G.: Going unconstrained with rolling shutter deblurring. In: The IEEE International Conference on Computer Vision (ICCV) (2017) 39. Nagahara, H., Kuthirummal, S., Zhou, C., Nayar, S.K.: Flexible depth of field photography. In: Computer Vision–ECCV 2008, pp. 60–73. Springer (2008) 40. Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene de-
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
129
blurring. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) Nehme, E., Weiss, L.E., Michaeli, T., Shechtman, Y.: Deep-storm: super-resolution single-molecule microscopy by deep learning. Optica 5(4), 458– 464 (2018). DOI https://doi.org/10.1364/OPTICA.5. 000458. URL http://www.osapublishing.org/optica/ abstract.cfm?URI=optica-5-4-458 Raskar, R., Agrawal, A., Tumblin, J.: Coded exposure photography: Motion deblurring using fluttered shutter. In: ACM SIGGRAPH 2006 Papers, SIGGRAPH ’06, pp. 795–804. ACM, New York, NY, USA (2006). DOI https://doi.org/10.1145/1179352.1141957 Reddy, D., Veeraraghavan, A., Chellappa, R.: P2c2: Programmable pixel compressive camera for high speed imaging. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’11, pp. 329–336. IEEE Computer Society, Washington, DC, USA (2011). DOI https://doi. org/10.1109/CVPR.2011.5995542 Richardson, W.H.: Bayesian-based iterative method of image restoration∗. J. Opt. Soc. Am. 62(1), 55– 59 (1972). DOI https://doi.org/10.1364/JOSA.62. 000055. URL http://www.osapublishing.org/abstract. cfm?URI=josa-62-1-55 Schwartz, E., Giryes, R., Bronstein, A.M.: Deepisp: Toward learning an end-to-end image processing pipeline. IEEE Transactions on Image Processing 28, 912–923 (2018) Shedligeri, P.A., Mohan, S., Mitra, K.: Data driven coded aperture design for depth recovery. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 56–60 (2017). DOI https://doi.org/10. 1109/ICIP.2017.8296242 Sitzmann, V., Diamond, S., Peng, Y., Dun, X., Boyd, S., Heidrich, W., Heide, F., Wetzstein, G.: End-toend optimization of optics and image processing for achromatic extended depth of field and superresolution imaging. ACM Trans. Graph. 37(4) (2018). DOI https://doi.org/10.1145/3197517.3201333 Srinivasan, P.P., Ng, R., Ramamoorthi, R.: Light field blind motion deblurring. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) Tendero, Y., Morel, J., Rougé, B.: The flutter shutter paradox. SIAM Journal on Imaging Sciences 6(2), 813–847 (2013). DOI https://doi.org/10.1137/ 120880665 Tendero, Y., Osher, S.: On a mathematical theory of coded exposure. Research in the Mathematical Sciences 3(1), 4 (2016). DOI https://doi.org/10.1186/ s40687-015-0051-8 Tirer, T., Giryes, R.: Image restoration by iterative denoising and backward projections. IEEE Transactions on Image Processing 28(3), 1220–1234 (2019) Wu, Y., Boominathan, V., Chen, H., Sankaranarayanan, A., Veeraraghavan, A.: Phasecam3d – learning phase masks for passive single view depth estimation. In: 2019 IEEE International Conference on
130 Computational Photography (ICCP), pp. 1–12 (2019) 53. Yosef, E., Elmalem, S., Giryes, R.: Video reconstruction from a single motion blurred image using learned dynamic phase coding (2023) 54. Zalevsky, Z., Shemer, A., Zlotnik, A., Eliezer, E.B., Marom, E.: All-optical axial super resolving imaging using a low-frequency binary-phase mask. Opt. Express 14(7), 2631–2643 (2006). DOI https:// doi.org/10.1364/OE.14.002631. URL http://www. opticsexpress.org/abstract.cfm?URI=oe-14-7-2631
S. Elmalem and R. Giryes 55. Zhang, K., Zuo, W., Gu, S., Zhang, L.: Learning deep CNN denoiser prior for image restoration. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 56. Zhou, C., Lin, S., Nayar, S.K.: Coded aperture pairs for depth from defocus and defocus deblurring. International Journal of Computer Vision 93(1), 53– 72 (2011). DOI https://doi.org/10.1007/s11263-0100409-8
8
Single-Pixel Imaging and Computational Ghost Imaging Ming-Jie Sun
Abstract
Single-pixel imaging reconstructs an image by sampling a scene with a series of patterns by associating them with their corresponding measured intensities, whereas a modern digital camera captures an image using a pixelated detector array. It has been demonstrated that single-pixel imaging is suited for unconventional applications, such as wide spectrum imaging, terahertz imaging, X-ray imaging, and so on. In this chapter, the investigation value of single-pixel imaging is discussed, a brief history of single-pixel imaging is given, its working principles are explained, and its performance in terms of its key elements is analyzed. Keywords
Single-pixel imaging · Ghost imaging · Compressive sensing · Intensity correlation · Spatial sampling · Single-pixel detector · Spatial light modulator · Quantum entanglement · Bi-photon correlation · Orthogonal basis
M.-J. Sun () School of Instrumentation Science and Opto-electronic Engineering, Beihang University, Beijing, China e-mail: [email protected]
8.1
Introduction
Single-pixel imaging, a technique whose name is quite self-explanatory, refers to a category of imaging methods using only a single-pixel detector to retrieve images. The word “pixel”, sometimes meaning the elementary point in an image, is used to describe a single photoreceptor on a detector in this context. Before the principles of single-pixel imaging can be explained, a more pressing question would be, WHY does singlepixel imaging deserve to be researched? In order to answer this question, we will take a quick review at the evolution of biological vision. The simplest and maybe the earliest, biological visual system is a photoreceptor organelle in cyanobacteria, known as an eyespot [1]. Later, eye organelles with elaborated structures resembling a human eye emerge in certain dinoflagellates. These organelles have different structures like cornea, lens and retina, but are all assembled in a single cell [2, 3]. As cell differentiating, the cells with eye organelles evolve into photoreceptor cells and pigment cells [4] and then these cells cluster and evolve to form insect compound eyes and vertebrate lens eyes [5], as they are today. Interestingly, we can see that the biological vision evolution is from a single cell to a complex structure of many cells. Given a recap of the biological vision evolution, we cannot help but notice that machine vision has a very similar trajectory of evolution. It seems extremely fast that human beings used less
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_8
131
132
than 150 years to develop the digital cameras that we have today since the discovery of the photoelectric effect and the invention of the first selenium photodetector, which is a glimpse compared to the three billion years biological vision evolution took. The photoconductivity of selenium was discovered by Willoughby Smith in 1873 [6]. The discovery became a design for converting images into electric signals by using a single selenium phototube and a spiral-perforated disk conceived by Paul Nipkow in 1884 [7], known as the “electric telescope”. Using the design, John Logie Baird pioneered the first “televisor” in 1929 [8]. It was only after the invention of charge coupled device (CCD) by Willard S. Boyle and George E. Smith in 1969 [9] that the digital camera had a resemblance to the vertebrate lens eye, which fundamentally consists of a lens and a cluster of photodetectors. Image sensors based on the complementary metal oxide semiconductor (CMOS) technique were invented by Peter Denyer et al. in 1989 [10], which made a highly integrated digital camera possible and finally led to the trilliondollar-worth global market of smartphones. The major reason that digital cameras have evolved so fast, in our opinion, is that a deterministic path (i.e., the evolution of biological vision) was given for its development. In comparison, being the result of countless random genetic mutations combined with earth’s environment selection, biological vision evolution is much slower. While marveling at how fast and efficient our digital camera progress has been, we keep wondering how many unexplored possibilities we have missed during the progress of both biological and digital vision evolution. For example, animal’s vision evolves to detect not only the visible spectrum but also ultraviolet and infrared spectra. Similarly, now we are able to detect spectrums of electromagnetic radiation covering ultrahigh-energy cosmic rays (with a corresponding frequency of 1035 Hz) to the extreme long wave (with a corresponding frequency of 10−2 Hz) with different detectors. Each of these detectable electromagnetic radiations can principally be used for imaging. The provided information can assist mankind’s social and scientific advancement. Single-pixel imaging, which
M.-J. Sun
enables imaging in principle anything detectable, can be an ideal playground for us to try out all the missed possibilities that an imaging system may achieve. Therefore, it is worth a lot of investigation.
8.2
History
In the context of this book, single-pixel imaging is a technique to reconstruct an image by recording the total light intensities of overlapping a scene and a series of coded patterns using a single element detector, and then combining the recorded signals with knowledge of the coded masks [11– 14]. In a broader sense, however, single-pixel imaging includes any imaging methods using a single-pixel detector, with the very first instance in human history being the “electric telescope” using a spiral-perforated disk conceived by Paul Nipkow in 1884 [7], and the following second the “televisor” pioneered by John Logie Baird in 1929 [8]. Both instances have been mentioned in the previous section as early examples of digital imaging. The mathematical theory for the Nipkow disk based imaging method was developed in 1934 [15], which is now commonly known as “raster scan”. Although raster scan imaging is no longer the optimal choice for visible spectrum imaging after the emergence of detector arrays, it is commonly used in non-visible spectra, where detector arrays are either unavailable or very expensive [16–18]. The single-pixel imaging we are talking about in this chapter originated from a quantum experimental design proposed in 1988 by Klyshko to measure the entanglement of biphotons emitted from a spontaneous parametric down conversion (SPDC) light source with two detectors [19]. With a very similar experiment, a group led by Shih demonstrated that the quantum entanglement of biphotons can be used to perform non-local imaging in 1995 [11], which was considered quantum natured and soon named “ghost imaging” due to its non-local feature. In the following decade, researchers, including Boyd, demonstrated that thermal light sources can be used to perform ghost imaging as well [20–23]. On one hand, it
8 Single-Pixel Imaging and Computational Ghost Imaging
indicated the potential that ghost imaging could be applied practically. On the other hand, it led to a heated debate over whether or not ghost imaging should be exclusively explained in quantum theories [24–28]. In our opinion, the controversy should have reached a conclusion after the proposition of computational ghost imaging [12] by Shapiro, because computational ghost imaging contained no reference path but yielded the same results and was fundamentally the same as pseudothermal ghost imaging. Therefore, “That the semi-classical and quantum theories yield identical measurements statistics in this case means that pseudothermal ghost-imaging experiments cannot distinguish between these two interpretations, even though we know that light is intrinsically quantum mechanical” [17]. Parallel to the development and debate of ghost imaging, some interesting research progresses emerged in the field of information theory around 2006, known as compressive sensing [29– 31]. The conventional Nyquist/Shannon sampling theory tells us that in order to reconstruct a signal correctly, the sampling rate must be at least twice the highest frequency of the signal. However, compressive sensing demonstrated that such a common criterion might not be always needed. To put it in a nutshell, compressive sensing can reconstruct a signal accurately with far fewer measurements/samples than what Nyquist theory requires by solving a convex optimization problem [32], given the presumption that the signal is compressible or sparse in the sense that it can be represented as a linear superposition of a small number of a certain orthogonal basis. One thing worth mentioning is that modern compression algorithms, such as JPEG [33], compress an image after its formation while compressive sensing performs compressive measurements before the reconstruction of an image. Compressive sensing is believed to have potential in many data acquisition related applications, and in 2006 it was demonstrated by Baraniuk that compressive sensing can be applied in optical imaging to build simpler, smaller, and less-expensive digital cameras by using only a single-pixel detector [13], in which “single-pixel imaging” was first named. It is
133
difficult to say when researchers from the ghost imaging community and the compressive sensing community started to realize the similarity they shared, but two consecutive works published in 2009 by Silberberg [14, 34] definitely were one of the early inspirations. Such realization led to a blooming of single-pixel imaging and its application in the following decade.
8.3
Principles
8.3.1
Intensity Correlation Theory for Ghost Imaging
An experiment performed by Hanbury-Brown and Twiss in 1956 to measure the angular size of cosmic stars, sometimes known as the HBT effect [35], was first explained as two photon interference. HBT interferometer [36] based on this method aimed to bring improvement beyond the Michelson interferometer for star measuring applications. Later, it was proved that intensity correlation was equivalent to two photon interference when interpreting the HBT effect [37, 38]. Therefore, like ghost imaging, the HBT effect is not necessarily quantum exclusive. In the HBT experiment, when two detectors are placed such that they are in the same light field or in different light fields with the same statistical intensity distributions, then a correlation can be found between the measured intensity signals of the two detectors. The intensity correlation is yielded as. G(2) 12 = 〈I1 I2 〉 ,
.
(8.1)
where .G(2) 12 is the second-order (therefore, intensity) correlation of the light fields detected by Detector 1 and Detector 2, I1 and I2 refer to the intensities measured by each detector, and 〈 . . . 〉 represents the ensemble average, in another word, the average of many different measurements following the same distribution statistics. Usually, this intensity correlation is normalized as g (2) =
. 12
〈I1 I2 〉 . 〈I1 〉 〈I2 〉
(8.2)
134
M.-J. Sun
Fig. 8.1 Ghost imaging experiment setup using a pseudo thermal light source. (a) Ghost imaging with signal and reference paths. (b) Computational ghost imaging with
In a ghost imaging experiment setup, as shown in Fig. 8.1a, a rotating ground glass is illuminated by a laser beam and generates time varying speckle patterns along the signal path and the reference path. An image of the object can be yielded by performing the correlation between the measured signals of the two detectors. For simplicity, the mathematics will be in one dimension, along the x direction. In the signal path, the intensity measured is s=
.
I (x1 ) O (x1 ) dx1 ,
(8.3)
where I(x1 ) is the intensity distribution on x1 and O(x1 ) represents the transmitting function of the object. The intensity correlation between two detectors is
only the signal path. CW continuous-wave, SLM spatial light modulator
GI (x2 ) = 〈sI (x2 )〉 = 〈I (x1 ) I (x2 )〉O(x1 ) dx1 ,
.
(8.4) where I(x2 ) is the intensity distribution on x2 . By substituting Eq. 8.2, Eq. 8.4 becomes GI (x2 ) = 〈I1 〉 〈I2 〉
.
(2) O (x1 ) dx1 . g12
(8.5)
Equation 8.5 demonstrates that the yielded intensity correlation between two detectors is proportional to the convolution between the transmitting function O(x1 ) of the object and the normalized intensity correlation of x1 and x2 . The signal path and the reference path are separated by a beam splitter, and the distances from x1 and x2 to the rotating ground glass are the same. (2) Therefore, .g12 is fundamentally a second-order
8 Single-Pixel Imaging and Computational Ghost Imaging
self-correlation of the light field on x1 or x2 , and GI(x2 ) is an image of the object O(x1 ) convoluted (2) by a Gaussian like .g12 with its full width at half maximum (FWHM) representing the spatial resolution of the image. Equation 8.5 becomes GI (x2 ) = 〈I1 〉 〈I1 〉
.
(2) O (x1 ) dx1 , g11
(8.6)
(2) is the second-order self-correlation of where .g11 the light field on x1 . In Fig. 8.1a, the reference path is true to its name: its actual function is to obtain the intensity distribution of x1 . With this understanding, it is reasonable, maybe inevitable that the researchers managed to get rid of the reference path and obtain the intensity distributions in a computational manner, i.e., the computational ghost imaging [12]. As shown in Fig. 8.1b, the rotating ground glass is replaced with a spatial light modulator (SLM), which generates certain amplitude and/or phase patterns when being illuminated by a laser. Upon knowing the distance dA , the intensity distribution I(x1 ) can be calculated using free-space propagation, and the ghost image can be reconstructed by Eq. 8.6.
8.3.2
Spatial Sampling Theory for Single-Pixel Imaging
Theoretically, the intensity distribution is calculated using free-space propagation in computational ghost imaging. However, the expense of such computational burden is usually high, and the free-space presumption is not always true. Practically, a projection lens can be put in front of the SLM, forming a conjugation between the SLM and the object, as shown in Fig. 8.2a. With such a setup, the intensity distribution I(x1 ) on the object, no longer needing calculating, is a magnified image of the pattern displayed on the SLM with a magnification being −dB /dA , given that the aberrations of the projection lens are ignored. Fundamentally, the object is structured illuminated by the patterns displayed on the SLM.
135
Now, let us view all of this in spatial sampling theory. An object can be digitalized and discretized into a two-dimensional (2D) array, the values of whose elements represent the reflectivity of the object. By transforming the 2D array into one dimension (1D), the object becomes O = [o1 , o2 , . . . , oN ]T , where oi is the average reflectivity of a pixelated spatial location, and N is the number of spatial locations pixelated. The projected patterns can also be digitalized and discretized into a series of 2D arrays and transformed into 1D as Pi = [pi1 , pi2 , . . . , piN ] where i means it is the ith pattern displayed on the SLM. The collection lens and the single-pixel detector, combined as a bucket detector, measure the total light intensity si after the object as the inner product of Pi and O s = Pi · O.
. i
(8.7)
After the experiment system performs M measurements, a linear equation set can be formed as S = P × O,
.
(8.8)
where S = [s1 , s2 , . . . , sM ]T is the 1D array of M measurements, and P = [P1 , P2 , . . . , PM ]T is an M × N 2D array known as the measuring matrix. Equation 8.8 shows that in order to obtain the reflectivity of the object, i.e., the image, it is to solve N independent unknowns (o1 , o2 , . . . , oN ) using M linear equations. Linear algebra tells us that to make Eq. 8.8 not ill-posed and solvable, there are two necessities, i.e., M = N and the measuring matrix P is orthogonal. With these two necessities, an image can be reconstructed as O = P −1 × S.
.
(8.9)
With such understanding, an identity matrix EN is an obvious choice of the measuring matrix P, with its practical implementation being a raster scan [39–41]. The point-by-point measuring strategy, despite being easy to implement, seems inefficient toward natural scenes, which are sparse or compressible in the sense that they can
136
M.-J. Sun
Fig. 8.2 Comparison of computational ghost imaging and single-pixel imaging setups. (a) Computational ghost imaging with a projection lens, where the illumination of
the object is modulated. (b) Single-pixel imaging, where the image is modulated
be represented with other proper bases, such as wavelet [42–45], Hadamard [46–48], and Fourier [49, 50]. Besides the orthodox solution to solve Eq. 8.8 when M = N, it is interesting to survey what will happen when M /= N. Consider M > N first. For any orthogonal measuring matrix P, a larger M means that the measurements are redundant and unnecessary, with one exception that if a microscanning strategy is performed [47], the spatial resolution of the image can be improved by oversampling or a better low-light-level performance [51]. In case the measuring matrix P is partially correlated, a larger M is needed to make P fully ranked. This indicates the reason that pseudo-thermal ghost imaging yields poor images with a large number of measurements is the fact that the speckle patterns generated from the laser illuminated rotating ground glass are highly cross-correlated. In other words, since the measuring matrix P in pseudo-thermal ghost
imaging is ‘very’ non-orthogonal, it will take M ⪢ N to reconstruct the image properly. Researchers are more intrigued about what will happen when M < N. If P is a sub-set of an orthogonal matrix, then the scene is orthogonally sub-sampled, and the reconstructed image will exhibit missing certain information, depending on which orthogonal basis and which sub-set are used in the instance. Different orthogonally subsampling strategies [42, 49, 52, 53] were proposed to reconstruct images with a minimum loss in image quality. For example, Fig. 8.3 shows the degrading of the reconstructed image when the sampling number of a Hadamard basis decreases using an evolutionary sub-set [52]. If P is not orthogonal, then the most successful approach is compressive sensing [29–31]. Since the core idea of compressive sensing is to reconstruct images by solving a convex optimization problem [32], which is profoundly explained in Chap. 3, it will not be explained in this chapter.
8 Single-Pixel Imaging and Computational Ghost Imaging
Fig. 8.3 Comparison of different sample numbers using partial Hadamard basis. The original image has a resolution of 256 × 256 pixels. The sampling patterns have a
8.4
Performance and Applications
As an alternative imaging method, single-pixel imaging is always compared to that of a conventional digital camera in performance, and critics conclude that single-pixel imaging is an out-ofdate technique not worth investigating since its performance for conventional imaging is nowhere near that of a compact, cheap, powerful digital camera everyone can have on their smartphones. Indeed, single-pixel imaging is not as good as digital cameras in visible imaging or even infrared imaging, but it is not because that singlepixel imaging is out-of-date or without potential but because it is not specialized in these applications while huge amounts of capital were invested and mature industrial systems were developed for conventional digital cameras to specialized in visible and infrared imaging applications. It is like comparing an average Joe to a professional marathoner in the performance of long-distance running and then concluding that Joe is not a worthy person. Instead, it may be more rational to analyze the merits and drawbacks of singlepixel imaging itself, and come to a conclusion about what potential it has in which scenarios. The analysis will be performed according to the features of single-pixel imaging.
8.4.1
Single-Pixel Detector
The applications of detector array based imaging are limited by the kinds of detector arrays available, which include visible detector arrays,
137
resolution of 64 × 64 pixels. The sample number refers to the number of patterns contained in the sub-set
infrared detector arrays, as well as newly emerging terahertz detector arrays and time correlated single photon detector arrays. By using a singlepixel detector, single-pixel imaging has a much wider range of choices for different applications than detector array based imaging has or ever would have, because any cutting-edge sensor becomes available in the form of a single-pixel detector long before it can be manufactured into an array. Therefore, single-pixel imaging always has the privilege to use newly developed sensors much earlier than detector array based imaging does. For example, it will not be possible to capture dark matter images with a detector array for a long time, but it is possible to obtain these images with the LUX-ZEPLIN dark matter detector [54]. The performance of single-pixel imaging is related to the following specifications of a singlepixel detector.
8.4.1.1 Spectrum Responsivity Spectrum responsivity refers to how sensitive a detector is regarding the electromagnetic radiation energy it receives at a given wavelength. Spectrum responsivity is determined by the photoelectric materials used in the detectors. Typical materials include gallium phosphide (GaP, 150–550 nm), silicon (Si, 400–1000 nm), germanium (Ge, 900–1600 nm), indium gallium arsenide (InGaAs, 800–2600 nm), mercury cadmium telluride (HgCdTe, 3–5 μm), vanadium oxide (VOx , 8–14 μm), and cutting edge graphene- or perovskite-based materials covering a wide spectrum from X-ray to terahertz. As a result, single-pixel imaging was performed in Xray imaging [55–57], multi-wavelength or hyper-
138
spectrum [58–62] and terahertz [63, 64], in which detector arrays are not available or expensive.
8.4.1.2 Bandwidth Bandwidth describes how fast a detector gives an effective signal once it receives radiation energy. It represents the temporal response capacity of the detector and is usually related to the rise time or dead time of the detector. The rise time is the time a detector needs to reach its stable output voltage after the photocurrent is generated and applied to a load resistor. It is related to the property of photoelectric material and the gain of the detector. The dead time is the time a single photon detector needs to be ready for detecting the next coming photo after it outputs the previous photon signal. It is related to the design of electric amplification. Taking advantage of the high bandwidth of a single-pixel detector, timeof-flight based 3D imaging was performed with single-pixel imaging [65–68], while a detector array has difficulty performing time-of-flight 3D imaging since its temporal resolving capability is limited by its progressive scanning readout. 8.4.1.3 Dark Noise Dark noise refers to the output of a detector when it does not receive any incident light, i.e., radiation energy. A detector is designed to generate mobile photoelectrons when energy from incident light is absorbed. Thermal energy also creates mobile electrons that contribute to the output of a detector, though these electrons are not photoelectrons. Dark noise is related to the photoelectric material, the active area, and the temperature in which a detector works. Dark noise jeopardizes the accuracy of a measurement representing the intensity of incident radiation, and consequently the signal-to-noise ratio of the reconstructed image of single-pixel imaging. Dark noise and other noises were quantitatively investigated [51, 69, 70] and suppressed [71–74] for single-pixel imaging, among which differential measurement [71] is most commonly used. 8.4.1.4 Limitation While single-pixel imaging enjoys the privileges of wide spectrum range, high time resolving,
M.-J. Sun
and cutting edge capacity such as single-photon detection, these privileges do not come without a price. With only one detecting element on a single-pixel detector, single-pixel imaging needs to perform a large number of consecutive measurements in order to reconstruct an image. Therefore, the time to take these measurements (also known as the acquisition time) is long, and the frame rate of single-pixel imaging is limited. An interesting observation can be made by comparing single-pixel imaging to detector array based imaging. To obtain an image with N pixels, single-pixel imaging performs N measurements in the temporal domain with the spatial extreme of one detecting element while a detector array carries out N measurements spatially with N detecting elements but only one temporal instance. However, the N measurements can be distributed in both spatial and temporal domains, that is, performing A temporal measurements with B detecting elements with A × B = N. By adopting this concept into single-pixel imaging [75–77], the acquisition time of the imaging system can be dramatically reduced by a fact of B, the number of detecting elements used, although single-pixel imaging may not be a proper name for the technique.
8.4.2
Spatial Light Modulator
In single-pixel imaging, a single-pixel detector can only record the light intensity of the measurement, and it leaves the spatial light modulator to provide the spatial information, which can be direct or indirect. Direct spatial light modulators include the ancient Nipkow disk and galvanometer scanner, which send information in a certain spatial location directly into the detector. Indirect spatial light modulators include liquid crystal array modulators, digital micromirror devices, laser illuminated rotating ground glass, and other structured illumination devices. Indirect spatial light modulators usually sample a scene or an image with a basis, such as Fourier, Hadamard and wavelet, and the recorded light intensities correspond to the coefficients of the basis. Flexibility is one merit of single-pixel imaging, and it
8 Single-Pixel Imaging and Computational Ghost Imaging
roots in the use of spatial light modulators. The performance of single-pixel imaging is related to the following specifications of a spatial light modulator.
8.4.2.1 Spatial Resolution The spatial resolution of an image reconstructed by single-pixel imaging is determined by the spatial resolution of the patterns displayed on the spatial light modulator and its associated imaging lens. For example, if the patterns displayed on a digital micromirror device are 100 × 100 in pixel size, then the reconstructed image is 100 × 100, even if the digital micromirror device consists of 1024 × 768 micromirrors. Of course, if the displayed patterns are 1024 × 768, then the reconstructed image has the same pixel size. It is worth mentioning that the spatial resolution here refers to the pixel count of the image. The actual spatial resolving capability is the combined result of pixel count, pixel pitch and the numerical aperture of the imaging lens. Therefore, in a sense, the spatial resolution of single-pixel imaging is very similar to that of a conventional digital camera, both of which are determined by both the array and the lens. An exception in single-pixel imaging is laser illuminated rotating ground glass and other lensless single-pixel imaging, where the light field is modulated and free propagates onto the scene. Being coherently illuminated, the speckle patterns generated are the results of the highorder diffraction superposition with its speckle feature size smaller than the diffraction limitation (i.e., the zero-order diffraction). Researchers have been investigating the possibilities of resolution beyond diffraction limitation since ghost imaging was proposed [78–83]. More interestingly, with a programmable spatial light modulator, single-pixel imaging can perform imaging in adaptive ways detector array cannot. A typical example is to arrange the pixel geometry of the patterns in a non-Cartesian manner, mimicking the cell distribution on the human retina, and therefore reconstructing foveated images in order to capture detailed information of a dynamic scene [84]. Another example is to determine the measurements based on the measured
139
ones, and therefore improving the efficiency of the sampling [42–44], which is smarter than sampling everything and then compressing away the useless information as is performed in detector array based imaging.
8.4.2.2 Modulation Rate Modulation rate refers to how many patterns can a spatial light modulator display during a time unit. As mentioned in Sect. 8.3, single-pixel imaging requires M measurements to form an image. Therefore, the modulation rate is crucial to the frame rate of single-pixel imaging. Most spatial light modulators, with their low modulation rates, cannot meet the need for single-pixel imaging for dynamic applications. The highest modulation rate of existing commercially available devices is the digital micromirror device, which can display binary patterns with 22 kHz. However, it corresponds to a frame rate of 21 frames-persecond (22 kHz/1024) for 32 × 32 pixel resolution, which is not satisfying. Developments in high-speed photonic components indicated that modulation rate beyond MHz is possible for the LED array [85, 86] and optical phase array devices [87, 88]. Besides increasing the modulation rate directly, interesting schemes, such as cooperating a spatial light modulator with another scanning device [89] and cooperating mask patterns with a period rotating object [90], have been proposed recently to dramatically increase the frame rate of single-pixel imaging.
8.5
Summary
In this chapter, the relationship between singlepixel imaging and detector array based imaging and why single-pixel imaging is worth investigating are discussed. A brief history of single-pixel imaging is given. The principles of single-pixel imaging are explained from both intensity correlation and spatial sampling perspectives. The performance of single-pixel imaging is discussed in terms of single-pixel detectors and spatial light modulators. Their corresponding merits and limitations are given.
140
In spite of the current performance of singlepixel imaging is not comparable, particularly in the visible spectrum, to that of detector array based imaging, as we mentioned earlier, singlepixel imaging provides a fascinating playground to try out all possibilities of how an imaging system may be, especially in the following aspects. First, the use of single-pixel detectors makes single-pixel imaging a perfect scheme to test cutting edge sensors, such as LUX-ZEPLIN dark matter detector, for imaging purposes. Second, the use of programmable SLMs offers flexibility in tuning imaging parameters. For example, image pixel arrangement can be of any geometry other than Cartesian. Spatial resolution, frame rate, and signal-to-noise ratio can be traded off according to application requirements. Third, various compressive sampling strategies can be applied in single-pixel imaging to decrease the amount of data that needs to be sampled and transferred, in light of the fact that data transfer has a much higher energy consumption than data computation does in integrated electronics.
M.-J. Sun
10.
11.
12. 13.
14.
15. 16.
17. 18.
19.
References 1. G. Kreimer. The green algal eyespot apparatus: a primordial visual system and more?. Current Genetics, 55, 19–43, 2009. 2. C. Greuet. Structure fine de locelle ´ d’Erythropsis pavillardi Hertwig, pteridinien Warnowiidae Lindemann. Comptes rendus de l’Academie des sciences, 261, 1904–1907, 1965. 3. C. Greuet. Anatomie ultrastructurale des Pteridiniens ´ Warnowiidae en rapport avec la differenciation des organites cellulaires. éditeur non identifié, 1969 4. H. Takeda, K. Nishimura, K. Agata. Planarians maintain a constant ratio of different cell types during changes in body size by using the stem cell system. Zoological Science, 26, 805–813, 2009. 5. W. J. Gehring. 21The evolution of vision. Wiley Interdisciplinary Reviews: Developmental Biology, 3, 1– 40, 2014. 6. W. Smith. “Selenium”, its electrical qualities and the effect of light thereon. Journal of the Society of Telegraph Engineers, 6, 423–441, 1877. 7. P. Nipkow. Optical disk. German Patent, 30150, 1884. 8. B. J. Logie. Apparatus for transmitting views or images to a distance. U.S. Patent, 1,699,270, 1929. 9. G. E. Smith. The invention and early history of the CCD. Nuclear Instruments and Methods in Physics
20.
21.
22.
23.
24.
25.
26.
27.
Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 607, 1–6, 2009. D. Renshaw, P. B. Denyer, G. Wang, M. Lu. ASIC image sensors. IEEE International Symposium on Circuits and Systems, 3038–3041, 1990. T. B. Pittman, Y. H. Shih, D. V. Strekalov, A. V. Sergienko. Optical imaging by means of twophoton quantum entanglement. Physical Review A, 52, R3429–R3432, 1995. J. H. Shapiro. Computational ghost imaging. Physical Review A, 78, 13, 2008. D. Takhar, J. N. Laska, M. B. Wakin, M. F. Duarte, D. Baron, S. Sarvotham, K. F. Kelly, R. G. Baraniuk. A New Compressive Imaging Camera Architecture using Optical-Domain Compression. Proc. IS&T/SPIE Computational Imaging IV, January 2006 Y. Bromberg, O. Katz, Y. Silberberg. Ghost imaging with a single detector. Physical Review A, 5, 053840, 2009. P. Mertz, F. Gray. Bell Syst. Techn. Jr, 13, 464, 1934. T. J. Kane, W. J. Kozlovsky, R. L. Byer, C. E. Byvik. Coherent laser radar at 1.06 μm using Nd:YAG lasers. Optics Letters, 12, 239–241, 1987. B.-B. Hu, M. C. Nuss. Imaging with terahertz waves. Optics Letters, 20, 1716, 1995. P. Thibault, M. Dierolf, A. Menzel, O. Bunk, C. David, F. Pfeiffer. High-resolution scanning x-ray diffraction microscopy. Science, 321, 379–382, 2008. D.N. Klyshko. A simple method of preparing pure states of an optical field, of implementing the Einstein–Podolsky–Rosen experiment, and of demonstrating the complementarity principle. Soviet Physics Uspekhi, 31, 74, 1988. B. Rs, B. Sj, B. Rw. “Two-photon” coincidence imaging with a classical source - art. no. 113601. Physical review letters, 89, 113601, 2002. F. Ferri, D. Magatti, A. Gatti, M. Bache, E. Brambilla, L.A. Lugiato. High-resolution ghost image and ghost diffraction experiments with thermal light. Physical review letters, 94, 183602, 2005. A. Valenci, G. M. Scarcelli, D’Angelo, Y.-H. Shih. Two-photon imaging with thermal light. Physical review letters, 94, 063601, 2005. Y.-H. Zhai, X.-H. Chen, D. Zhang, L.-A. Wu, Twophoton interference with true thermal light. Physical Review A, 72, 2005. Y.-H. Shih. The physics of ghost imaging. Classical, Semi-classical and Quantum Noise, Springer, New York, NY, 169–222, 2012. B. I. Erkmen, J. H. Shapiro. A unified theory of ghost imaging with Gaussian state light. Physical Review A, 77, 043809, 2008. Y. H. Shih. The physics of ghost imaging – nonlocal interference or local intensity fluctuation correlation? Quantum Information Processing, 11, 995– 1001, 2012. J. H. Shapiro, R. W. Boyd. Response to “The physics of ghost imaging – nonlocal interference or local intensity fluctuation correlation?”. Quantum Information Processing, 11, 1003–1011, 2012.
8 Single-Pixel Imaging and Computational Ghost Imaging 28. J. H. Shapiro, R.W. Boyd. The physics of ghost imaging. Quantum Information Processing, 11, 949–993, 2012. 29. E. Candès, J. Romberg, T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on information theory, 52, 489–509, 2006. 30. D. Donoho. Compressed sensing. IEEE Transactions on information theory, 52, 1289–1306, 2006. 31. E. Candès, T. Tao. Near optimal signal recovery from random projections: Universal encoding strategies?. IEEE transactions on information theory, 52, 5406– 5425, 2006. 32. book Convex Optimization Stephen Boyd Department of Electrical Engineering Stanford University. 33. R. Aravind, G. L. Cash, J. P. Worth. On implementing the JPEG still-picture compression algorithm. Visual communications and image processing IV. SPIE,1199, 799–809, 1989. 34. O. Katz, Y. Bromberg, Y. Silberberg. Compressive ghost imaging. Applied Physics Letters, 95, 131110, 2009. 35. R. H. Brown, R. Q. Twiss. Correlation between photons in two coherent beams of light. Nature, 177, 27– 29, 1956. 36. R. H. Brown, R.Q. Twiss. A test of new type of stellar interferometer on Sirius. Nature, 178, 1046– 1048, 1956. 37. R. J. Glauber. The quantum theory of optical coherence. Physical Review, 130, 2529–2539, 1963. 38. E. C. G. Sudarshan, Equivalence of semiclassical and quantum mechanical descriptions of statistical light beams. Physical Review Letters, 10, 277–279, 1963. 39. J. S. Massa, A. M. Wallace, G. S. Buller, S. J. Fancey, A. C. Walker. Laser depth measurement based on time-correlated single photon counting. Optics letters, 22, 543–545, 1997. 40. A. McCarthy, R. J. Collins, N. J. Krichel, V. Fernandez, A. M. Wallace, G. S. Buller. Long-range timeof-flight scanning sensor based on high-speed timecorrelated single-photon counting. Applied optics, 48, 6241–6251, 2009. 41. A. McCarthy, N. J. Krichel, N. R. Gemmell, X. Ren, M. G. Tanner, S. N. Dorenbos. Kilometer-range, high resolution depth imaging via 1560 nm wavelength single-photon detection. Optics express, 21, 8904– 8915, 2013. 42. M. Aβmann, M. Bayer. Compressive adaptive computational ghost imaging. Scientific reports, 3, 1–5, 2013. 43. W.-K. Yu, M.-F. Li, X.-R. Yao, X.-F. Liu, L.-A. Wu, G.-J. Zhai. Adaptive compressive ghost imaging based on wavelet trees and sparse representation. Optics express, 22, 7133–7144, 2014. 44. F. Rousset, N. Ducros, A. Farina, G. Valentini, C. D’Andrea, F. Peyrin. Adaptive basis scan by wavelet prediction for single-pixel imaging. IEEE Transactions on Computational Imaging, 3, 36–46, 2016.
141
45. K. M. Czajkowski, A. Pastuszczak, R. Koty´nski. Single-pixel imaging with Morlet wavelet correlated random patterns. Scientific reports, 8, 1–8, 2018. 46. B. Lochocki, A. Gambín, S. Manzanera, E. Irles, E. Tajahuerce, J. Lancis, P. Artal. Single pixel camera ophthalmoscope. Optica, 3, 1056–1059, 2016. 47. M.-J. Sun, M. P. Edgar, ; D. B. Phillips,; Phillips, D.B.; G. M. Gibson, M. J. Padgett. Improving the signal-to-noise ratio of single-pixel imaging using digital microscanning. Optics express, 24, 10476– 10485, 2016. 48. L. Wang, S. Zhao. Fast reconstructed and high-quality ghost imaging with fast Walsh-Hadamard transform. Photonics Research, 4, 240–244, 2016. 49. Z. Zhang, X. Ma, J. Zhong. Single-pixel imaging by means of Fourier spectrum acquisition. Nature communications, 6, 1–6, 2015. 50. K. M. Czajkowski, A. Pastuszczak, R. Koty´nski. Real-time single-pixel video imaging with Fourier domain regularization. Optics express, 26, 20009– 20022, 2018. 51. D. Shin, J. H. Shapiro, V. K. Goyal. Performance Analysis of Low-Flux Least-Squares Single-Pixel Imaging. IEEE Signal Processing Letters, 23, 1756– 1760, 2016. 52. N. Radwell, K. J. Mitchell, G. M. Gibson, M.P. Edgar, R. Bowman, M.J. Padgett. Single-pixel infrared and visible microscope. Optica, 1, 285–289, 2014. 53. M.-J. Sun, L.-T. Meng, M. P. Edgar, M. J. Padgett, N. Radwell. A Russian Dolls ordering of the Hadamard basis for compressive single-pixel imaging. Scientific Reports, 7, 3464, 2017. 54. J.Aalbers et al., Background Determination for the LUX-ZEPLIN (LZ) Dark Matter Experiment, arXiv:2211.17120, https://doi.org/10.48550/ arXiv.2211.17120 55. J. Cheng, S.-S. Han. Incoherent coincidence imaging and its applicability in X-ray diffraction. Physical Review Letters, 92, 093903, 2004. 56. J. Greenberg, K. Krishnamurthy, D. Brady. Compressive single-pixel snapshot x-ray diffraction imaging. Optics Letters, 39, 111–114, 2014. 57. A.-X. Zhang, Y.-H. He, L.-A. Wu, L.-M. Chen, B.-B. Wang. Tabletop x-ray ghost imaging with ultra-low radiation. Optica, 5, 374–377, 2018. 58. V. Studer, J. Bobin, M. Chahid, H.S. Mousavi, E. Candes, M. Dahan. Compressive fluorescence microscopy for biological and hyperspectral imaging. Proceedings of the National Academy of Sciences, 109, E1679–E1687, 2012. 59. S. S. Welsh, M. P. Edgar, R. Bowman, P. Jonathan, B.Q. Sun, M. J. Padgett. Fast full-color computational imaging with single-pixel detectors. Optics Express, 21, 23068–23074, 2013. 60. M. P. Edgar, G. M. Gibson, R. W. Bowman, B.-Q. Sun, N. Radwell, K. J. Mitchell, S.S. Welsh, M.J. Padgett. Simultaneous real-time visible and infrared video with single-pixel detectors. Scientific Reports, 5, 10669, 2015.
142 61. L.-H. Bian, J.-L. Suo, G.-H. Situ, Z.-W. Li, J.-T. Fan, F. Chen, Q.-H. Dai. Multispectral imaging using a single bucket detector. Scientific Reports, 6, 24752, 2016. 62. W. Chen, M.-J. Sun*, W.-J. Deng, H.-X. Hu, L.J. Li, and X.-J Zhang. Hyperspectral imaging via a multiplexing digital micromirror device. Optics and Lasers in Engineering, 151, 106889, 2021. 63. C. M. Watts, D. Shrekenhamer, J. Montoya, G. Lipworth, J. Hunt, T. Sleasman. Terahertz compressive imaging with metamaterial spatial light modulators. Nature photonics, 8, 605–609, 2014. 64. R. I. Stantchev, B. Sun, S. M. Hornett, P. A. Hobson, G. M. Gibson, M. J. Padgett. Noninvasive, near-field terahertz imaging of hidden objects using a singlepixel detector. Science advances, 2, e1600190, 2016. 65. G. A. Howland, P. B. Dixon, J. C. Howell. Photoncounting compressive sensing laser radar for 3D imaging. Applied optics, 50, 5917–5920, 2011. 66. C. Zhao, W. Gong, M. Chen, E. Li, H. Wang, W. Xu. Ghost imaging LIDAR via sparsity constraints. Applied Physics Letters, 101, 141123, 2012. 67. G. A. Howland, D. J. Lum, M. R. Ware, J. C. Howell. Photon counting compressive depth mapping. Optics Express, 21, 23822, 2013. 68. M.-J. Sun, M. P. Edgar, G. M. Gibson, B. Sun, N. Radwell, R. Lamb. Single-pixel three-dimensional imaging with time-based depth resolution. Nature communications, 7, 12010, 2016. 69. M.-J. Sun*, Z.-H. Xu, and L.-A. Wu. Collective noise model for focal plane modulated single-pixel imaging. Optics and Lasers in Engineering, 100, 18–22, 2018. 70. R. Li, J.-Y. Hong1, X. Zhou, C.-M. Wang, Z.-Y. Chen, B. He, Z.-W. Hu, N. Zhang, Q. Li, P. Xue, X. Zhang. SNR study on Fourier single-pixel imaging. New Journal of Physics, 23, 073025, 2021. 71. F. Ferri, D. Magatti, L. Lugiato, A. Gatti. Differential ghost imaging. Physical review letters, 104, 253603, 2010. 72. I. N. Agafonov, K. H. Luo, L. A. Wu, M. V. Chekhova, Q. Liu, R. Xian. High-visibility, high-order lensless ghost imaging with thermal light. Optics letters, 35, 1166–1168, 2010. 73. B. Sun, S. Welsh, M. P. Edgar, J. H. Shapiro, M. J. Padgett. Normalized ghost imaging. Optics letters, 20, 16892–16901, 2012. 74. F.-Y. Sha, S. K. Sahoo, H. Q. Lam, B. K. Ng, C. Dang. Improving single-pixel imaging performance in high noise condition by under-sampling. Scientific Reports, 10, 1–8, 2020. 75. M. Herman, J. Tidman, D. Hewitt, T. Weston, L. McMackin, F. Ahmad. A higher-speed compressive sensing camera through multi-diode design. Compressive Sensing II. SPIE, 8717, 42–56, 2013. 76. M.-J.Sun, W. Chen, T.-F. Liu, L.-J. Li. Image retrieval in spatial and temporal domains with a quadrant detector. IEEE Photonics Journal, 9, 1–6, 2017.
M.-J. Sun 77. M.-J. Sun*, H.-Y. Wang, and J.-Y. Huang. Improving the performance of computational ghost imaging by using a quadrant detector and digital micro-scanning. Scientific Reports, 9, 4105, 2019. 78. A. N. Boto, P. Kok, D. S. Abrams, S. L. Braunstein, C. P. Williams, J. P. Dowling. Quantum interferometric optical lithography: Exploiting entanglement to beat the diffraction limit. Physical Review Letters, 85, 2733, 2000. 79. M. D’Angelo, M. V. Chekhova, Y.H.Shih. Twophoton diffraction and quantum lithography. Phys. Physical Review Letters, 87, 013602, 2001. 80. J. Xiong, D. Z. Cao, F. Huang, H. G. Li, X. J. Sun, K. G. Wang. Experimental observation of classical subwavelength interference with a pseudothermal light source. Physical review letters, 94, 173601, 2005. 81. W. Gong, S. Han. Super-resolution far-field ghost imaging via compressive sampling, e-print arXiv, 0911.4750, 2009. 82. M.-J. Sun, X. D. He, M. F. Li, L. A. Wu. Thermal light subwavelength diffraction using positive and negative correlations. Chinese Optics Letters, 14, 040301, 2016. 83. A. Kallepalli, L. Viani, D. Stellinga, E. Rotunno, R. Bowman, G. M. Gibson, M.-J. Sun, P. Rosi, S. Frabboni, R. Balboni, A. Migliori, V. Grillo, M. J. Padgett. Challenging Point Scanning across Electron Microscopy and Optical Imaging using Computational Imaging. Intelligent Computing, 2022: 0001, 2022. 84. D. B. Phillips, M.-J. Sun, J. M. Taylor, M. P. Edgar, S. M. Barnett, G. M. Gibson, M. J. Padgett. Adaptive foveated single-pixel imaging with dynamic supersampling. Science advances, 3, e1601782, 2017. 85. Z.-H. Xu, W. Chen, J. Penulas, M. J. Padgett, and M.J. Sun*. 1000 fps computational ghost imaging using LED-based structured illumination. Optics Express, 26, 2427–2434, 2018. 86. H.-X. Huang, L.-J. Li, Y.-X. Ma, M.-J. Sun*. 25,000 fps Computational Ghost Imaging with Ultrafast Structured Illumination. Electronic Materials, 3, 93– 100, 2022. 87. K. Komatsu, Y. Ozeki, Y. Nakano, T. Tanemura. Ghost imaging using integrated optical phased array. 2017 Optical Fiber Communications Conference and Exhibition (OFC). IEEE, 2017, 1-3, 2017. 88. L.-J. Li, W. Chen, X.-Y. Zhao, M.-J. Sun. Fast Optical Phased Array Calibration Technique for Random Phase Modulation LiDAR. IEEE Photonics Journal, 11, 1–10, 2019. 89. P. Kilcullen, T. Ozaki, J. Liang. Compressed ultrahigh-speed single-pixel imaging by swept aggregate patterns. Nature Communications, 13, 7879, 2022. 90. W.-J.Jiang, Y.-K. Yin, J.-P Jiao, X Zhao, B.-Q Sun. 2,000,000 fps 2D and 3D imaging of periodic or reproducible scenes with single-pixel detectors. Photonics Research, 10, 2157–2164, 2022.
9
Spatial Frequency Domain Imaging Rolf B. Saager
Abstract
In this chapter, we will cover the principles of operation of spatial frequency domain imaging techniques and how this modality has been applied to the non-invasive quantification of tissue properties. This is a diffuse reflectance technique that can separate and quantify the optical properties of turbid media, such as tissue (i.e., absorption and reduced scattering coefficients), and then interpret these properties in terms of function through absorption (e.g., hemoglobin concentration and oxygenation, water fraction, etc.) and structure through scattering. Several design considerations and implementations will be discussed that employ SFDI techniques to target spatial characteristics, temporal dynamics, and spectral analysis of tissue, based on current hardware and technologies available. Keywords
Spatial frequency domain imaging · Diffuse reflectance · Tissue imaging · Tissue spectroscopy · Light transport modeling · Quantitative imaging · Calibration · Structured illumination · Absorption · Scattering · Sub-surface imaging · Layered models · Tomography R. B. Saager () Department of Biomedical Engineering, Linköping University, Linköping, Sweden e-mail: [email protected]
9.1
Introduction and Background
In this chapter, we will explore the principles and implementations of Spatial Frequency Domain Imaging (SFDI). This technique has been developed in the context of wide-field, quantitative imaging of turbid media, such as biological tissues. In order to understand the motivations and paths of development of this technique, one must also appreciate the specific sources of optical contrast in these turbid media it seeks to quantify. SFDI primarily measures what is commonly referred to as the “Diffuse Reflectance” from tissue. Described in other terms, this is the amount of light remitted from the medium at any given wavelength relative to the wavelength specific light delivered into it. The two dominant mechanisms that determine this relative difference between light delivered and light detected are light scattering and absorption.
9.1.1
Scattering and Absorption
Absorption is a mechanism of energy transfer where all of the photon’s energy is absorbed by a molecule, sending the molecule to a higher excited state. As energy levels of molecules are discrete, only specific wavelengths (energies) of light may be absorbed. This is advantageous for multiple reasons: (1) different molecule species will have different energy levels, resulting in
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_9
143
144
R. B. Saager
Fig. 9.1 Basis absorption spectra for chromophores that are typically encountered in skin. Note that these spectra are individually scaled by arbitrary units to highlight their spectral features
distinct spectral signatures for each, and (2) once a photon is absorbed, it will no longer propagate and hence this mechanism will only occur once. These two features of absorption are quite powerful as they provide the basis that multiple molecule species (also referred to as chromophores) can be differentiated based on their optical absorption spectra and multiple chromophores can be quantified since photons can only be absorbed once in their propagation through tissue and hence a total absorption spectrum can be considered as a linear superposition of contributions from multiple chromophores. In tissue, typical chromophores detected via optical absorption are hemoglobin (in its oxygenated, deoxygenated, and other forms), melanin, water, lipids, and carotenoids, among others. The absorption spectra for these chromophores are shown in Fig. 9.1. Scattering is a more nuanced and complex mechanism of light-medium interaction. Scattering occurs when photons encounter a boundary or object with a differing index of refraction or dielectric constant from that of the surrounding
medium. Here, the energy of the photon is preserved, but the trajectory of the photon is altered. While in turbid media like tissue, where the distribution of these scattering objects is considered random over the spatial scales, diffuse reflectance techniques, such as spatial frequency domain imaging (SFDI), are typically deployed, there remain certain key features from these scattering properties that can be exploited and utilized to describe light transport in these media. Wavelength dependent scattering can be determined by the mean size and density of scattering objects within a volume. When objects are smaller than the wavelength of light, the deflection of the photon trajectory is due to a dipole interaction between the wave-nature of the photon and the object. This has been referred to as Rayleigh scattering and is described as a λ−4 wavelength dependence. When objects are on the order of and larger than the wavelength of light, the principles of geometric optics (refraction and reflection) can be employed to describe the spectral scattering dependence. This is referred to as Mie Scattering [1] and exhibits a power-law like dependence of:
9 Spatial Frequency Domain Imaging
λ−b , where the parameter b is correlated with the mean size distribution of scattering objects the light interacts with in the media volume [2, 3].
9.1.2
Principle of Operation
SFDI is a technique that was initially developed to differentiate and quantify these two, highly coupled mechanisms of light interactions within turbid media. Put into very simple, high-level terms, the principle of operation of SFDI is as follows: when structured patterns of light are projected onto turbid media, (1) any absorption event will merely attenuate the number of photons that will continue to propagate throughout the medium volume, but not alter the direction of propagation, (2) any scattering event, however, will not alter the photon itself, but rather change its direction of propagation. Relative to the projected pattern delivered, the resulting remitted pattern exiting the medium will essentially be “weaker” due to absorption and “blurred” due to the scattering, as shown graphically in Fig. 9.2. Therefore, if one can measure multiple spatial frequency depen-
145
dent components from the diffuse reflectance, it is then possible to isolate and quantify the absorption and scattering properties of the medium, respectively. Sensitivity to absorption properties is greatest at low spatial frequencies while scattering contrast will persist through higher spatial frequencies, as Fig. 9.3 illustrates. In the context of tissue, this technique then holds the potential to non-invasively quantify multiple chromophores (e.g., oxy- and deoxy-hemoglobin, melanin, water, etc.) as well as gain insight into the bulk tissue structure (interpreted in terms of the mean density and size distribution of scattering objects) via the spectral dependence of scattering. In the subsequent sections of this chapter, we will go into greater detail about the underlying model of light transport that SFDI utilizes to describe tissue and other turbid media in terms of absorption and scattering. We will explore the practical hardware considerations, implementation methods, and common pitfalls encountered in the wide range of instrumentation designs and data acquisition sequences. Finally, we will review the specific applications and biological/medical challenges that have
Fig. 9.2 Graphical representations of how absorption (top) and scattering (bottom) will independently alter the integrity of patterned illumination after being remitted from turbid media, such as tissue
146
R. B. Saager
Fig. 9.3 Differentiated sensitivity of absorption and scattering to spatial frequency dependent reflectance responses. On the left, each curve shows the response of different absorption values when scattering remains constant. On the right, each curve shows the response of different
scattering values when absorption remains constant. Here, one can see that changes in absorption most dramatically impact low (> absorption), (2) use spatial frequencies between 0 and 0.25/mm (probing volumes of tissue >1/μs ), and (3) are measuring homogenous tissue. As detailed in Cuccia et al. [7], the derivation of the RTE for the spatial frequency domain using the diffusion approximation results in the following analytic expression, where n is the index of refraction of tissue and is typically assumed to be 1.4 in general cases:
a = μs
.
A=
.
μa ,
1 − Reff , and 2 (1 + Reff )
Reff ≈ 0.0636n + 0.668 +
.
0.710 1.440 . − n n2
It is worth noting that solutions of the RTE in the diffusion approximation regime also depend on the definition of boundary conditions. An alternate diffusion approximation derivation has been published by Post et al. that utilizes a partial current boundary condition [11]. Higher order diffusion based approximations, such as the Delta-P1 approximation, have also been derived for SFDI. These solutions extend the range where it may be applied, loosening the restrictions with respect to scattering dominating absorption and the range of spatial frequencies that may be used. However, as all analytic solutions are constrained to be piecewise continuous, these solutions break down near boundaries, such as the surface of the tissue.
9.2.3
Monte Carlo
The diffusion approximation allows us to derive an analytical function for tissue reflectance, where the absorption and scattering coefficients are inputs. The Monte Carlo approach simulates incremental light tissue interactions in a random medium defined by its absorption and scattering
148
values, resulting in a statistical distribution of remitted light through stochastic methods [12]. This approach allows for greater accuracy for modeling tissue reflectance when absorption approaches scattering values and when interrogated tissue volumes are on par with the reduced mean free path of light in tissue 1/μs . This advantage, though, comes at the price of computational power and processing time. Another challenge with using Monte Carlo methods to describe light transport in the spatial frequency domain is that Monte Carlo methods have been developed to describe light transport in space and time. To convert these modeling results into the spatial frequency domain, an additional transform must be applied. Since Monte Carlo simulations are computationally intensive, their methods of implementation have been developed to optimize the calculation. One such optimization is to adopt a cylindrical geometry relative to the source of illumination. This allows for the area corresponding to each “distance” light to be remitted relative to the source to increase because the photon statistics are hence accumulated over increasing annular rings. Given this cylindrical geometry, a Henkel transform is then required to convert the statistical accumulation of remitted photons as a function of radial distance to remitted photons as a function of spatial frequency.
R. B. Saager
When the Monte Carlo method is typically evoked in the context of SFDI data processing, it does not refer to running a full Monte Carlo simulation for every set of optical properties during the data processing. Rather, it refers to a specific implementation that uses a single, pre-calculated Monte Carlo simulation that is then scaled and manipulated to recreate the reflectance values for arbitrary sets of optical properties. This is referred to as a “white Monte Carlo” [3, 13]. This is a simulation that models the distribution of remitted photons (in terms of space and time) relative to a pencil beam source at the surface of the tissue, where only scattering is presented, hence the term “white”. This simulation establishes the distribution of scattered light, given a specific density and anisotropy of scattering objects (and a specific index of refraction of the medium). Here, the scattering coefficient can then be changed simply by changing the density of these objects in the simulation. This effect can be achieved by merely scaling the spatial extent of the resulting output after the simulation is completed, thereby eliminating the need to run multiple simulations (see Fig. 9.4). Once the pre-calculated simulation is scaled to match the desired scattering value, the resulting reflectance values from scatteringonly simulation can be extracted for each spatial distance relative to the source pencil beam. The time domain information, also stored in the
Fig. 9.4 Graphical representation of how a white Monte Carlo can model a continuous range of homogeneous scattering properties by simply scaling the physical dimensions of a single pre-computed simulation
9 Spatial Frequency Domain Imaging
simulation output, could then be used to rescale the mean photon path length traveled within the new tissue dimensions at each spatial location. The contribution from absorption is then added (weighted) to the simulated reflectance via Beer’s law.
9.2.4
Inverse Solvers
Both the analytic solutions and the stochastic models describe spatial frequency dependent reflectance in terms of input values for absorption and reduced scattering (i.e., forward model). However, what we wish to extract from our tissue measurements is the inverse of these models. It is the reflectance that we measure, and it is the optical properties that we want to solve for. To that end, a minimization strategy is employed. As these forward models can generate reflectance curves based on unique pairs of absorption and reduced scattering values, they can be compared to the curve collected from the measurement. Through iterative minimization, these pairs of absorption and reduced scattering can be updated through various search algorithms such that the error between model and measurement is reduced to a minimum threshold. At this point, the final values for the pair of optical properties are output as the estimated absorption and reduced scattering values. This minimization occurs at every pixel and at every wavelength resulting in multi-spectral images of estimated absorption and reduced scattering values.
9.2.5
Alternative Methods to Extract Absorption and Scattering Properties
Rather than utilize either analytic or stochastic models to generate a spatial frequency dependent reflectance curve to iteratively compare to measured reflectance responses and minimize their difference, look-up-table approaches have been developed to provide a direct (and hence fast)
149
method to map combinations of absorption and scattering coefficient values to explicit pairs of spatial frequency dependent reflectance values. These tables are pre-computed to relate ranges of absorption and scattering to the resulting diffuse reflectance values at two discrete spatial frequencies. The data used to generate these tables can be derived from simulations (e.g., Monte Carlo) or empirically from matrices of tissue simulating phantoms with known or independently characterized optical properties [14]. So long as the absorption and scattering values of the unknown sample or tissue fall within the range of these pre-computed meshes, the optical properties can be extracted through the physical location of the coordinates to the two reflectance values and interpolated from the mesh of known values. This direct mapping approach greatly reduces the computational time required to calculate optical properties from model minimization. However, it is particularly sensitive to measurement noise in the spatial frequency dependent reflectance values. Any noise or error at either measured spatial frequency amplitude will map directly to errors in absorption and scattering on the overlying mesh. Therefore, it is critical when employing this technique that any system or measurement noise is well characterized and minimized. The second consideration with using this technique is the selection of the two spatial frequencies used to generate the look-up-table. As we know that the sensitivity to the effects of absorption and scattering differ as a function of spatial frequency, the size of the parameter space within the look-up-table will also vary. Figure 9.5 shows the parameter space meshes for several combinations of spatial frequencies used in SFDI. Similar to look-up-table approaches, machine learning techniques and neural networks have been developed to generate an approximate inverse function that can describe spatial frequency dependent reflectance curves in terms of unique absorption and scattering values. These also have been developed using either simulated data or direct measurements of known optical properties as the basis of their training sets [15–18].
150
R. B. Saager
Fig. 9.5 Examples of the parameter space mesh when mapping optical properties to combinations of spatial frequency reflectance values in the look up table approach. The top row shows the relation between spatial frequency dependent reflectance at fx = 0 (i.e., planar illumination) combined with spatial frequencies of 0.125, 0.2, and 0.3 mm−1 , respectively (y-axis). The bottom row
9.3
Demodulation
It is important to keep in mind that these models and inverse solver strategies are all based on the spatial frequency dependent response (i.e., AC amplitude) of the total reflectance signal measured, which contains both AC and DC components. This is the residual amplitude of the projected pattern that is remitted from the tissue after interacting with the optical properties within it. There have been many approaches in SFDI systems to extract this AC component, some of which will be introduced in later sections, but in general, the majority of SFDI techniques utilize a “classical” three-phase demodulation. This approach projects the same spatial frequency pattern three times, each shifted by 120◦ relative to the other. With these three phases, it is then possible to extract the sinusoidal amplitude present at each pixel location, independently from its neighbor-
shows the combinations between fx = 0.125 mm−1 and the remaining two spatial frequencies. This illustrates how sensitive this approach is to measurement noise and bias in relation to selection of spatial frequencies employed (area of parameter space) and range of anticipated optical property values (density of mesh relative to reflectance values)
ing locations. This is a very useful approach in heterogeneous tissues as the envelope function of the pattern can vary greatly. The raw AC component is extracted through this expression, where I1 , I2 , and I3 represent the pixel-specific intensity for a given wavelength, λ, and spatial frequency, fx , at the three phases:
.AC
(fx , λ) =
√
1/2 2 (I1 −I2 )2 +(I2 − I3 )2 + (I3 − I1 )2 3
(9.3) As accurate and robust demodulation of the detected remitted pattern is critical for accurate estimation of optical properties, it is useful to be aware of three common sources of error during the raw data acquisition that can be visualized as residual patterned structures within the demodulated images. These three potential sources of error stem from changes in ambient or stray
9 Spatial Frequency Domain Imaging
151
Fig. 9.6 Visualization of sources of demodulation error relative to a homogenous medium. Here, (left) background fluctuation is modeled as a DC off-set shift in one of the three phases, (middle) source fluctuation is modeled
as a reduction in amplitude in one of the phases, (right) non-linear projection is modeled as a distortion in the sinusoidal intensity in all of the phases
light between the acquisition of the three phases. Here, a residual pattern at the spatial frequency will remain. The second comes from variation in light intensity between the three phases, resulting in a residual frequency twice that of the source pattern. Finally, if the projector is not linear in its response (i.e., does not project a pure sinusoid pattern), a residual frequency three times that of the illumination pattern will remain. Figure 9.6 illustrates these errors.
in parallel with the sample measurement. In planar reflectance-based measurements, this is often achieved by measurements of reflectance standards with known reflectivity. Calibration for SFDI methods, however, is more complex as it is not just the light intensity distributions on the target sample that need to be accounted for, but also any volumetric imaging aberrations of the system as well. It is important to remember that SFDI is a co-planar imaging approach, where both the image quality and propagation of the projected structured illumination patterns into the sample and the image quality of the imaging arm play critical roles in the determination of the spatial frequency dependent reflectance. The fundamental models used to describe spatial frequency dependent reflectance presume an idealized imaging system where the projected patterns will propagate indefinitely, maintaining its image quality and intensity. It is only the interactions with absorbers and scatterers in the medium volume that will alter the diffusely reflected pattern remitted from the target.
9.4
Calibration
Many spectroscopic and photometric techniques require a calibration method to account for the light throughput of the instrumentation. This procedure is necessary to isolate and quantify the signals specific to the target of interest, from those that come as a consequence of the source, detector, and optical elements of the device themselves. In spectrophotometry of non-turbid samples, this calibration is often achieved through the use of a reference arm measurement acquired
152
R. B. Saager
In practical instrument designs, however, there are limitations to the volumes over which these projected patterns can be formed and propagated before their image quality degrades due to optical aberrations presented within the instrument itself. In the context of most SFDI instrument designs, the two most prevalent considerations are the depth of focus and chromatic aberrations. It is for this reason that a volumetric calibration method has been adopted for SFDI measurements.
9.4.1
Methods
Rather than using a reflectance standard to calibrate SFDI instruments, turbid media with known, well characterized absorption and scattering properties are used to provide the reference measurement. These media, designed with absorption and scattering properties that approximate the mean values encountered in the target tissue serve as a means to provide a traceable reference of both the spectral intensity throughput and the accumulated aberrations of the instrument over the approximate volume over which the spatial frequency dependent reflectance will interrogate. There are several media that can be used to fabricate these homogeneous turbid reference standards (also referred to as tissue simulating phantoms) [19, 20]. Of all these media options, solid phantoms using polymerized silicone (PDMS) or epoxy resin are the most commonly used as they have the longest shelf-life. Phantoms fabricated from these base media have been shown to be stable over years.
9.4.2
Procedure
One can define a raw measurement in SFDI as contributions from the light interaction with the target medium, R(fx ,λ), and the contributions from the instrument itself, finst (fx ,λ): AC measured (fx , λ) = R (fx , λ) ∗ finst (fx , λ) . (9.4)
.
By measuring a reference phantom before or after the target measurement, it is assumed that this instrument function remains the same, but only the optical properties of the medium have changed. The optical properties of the known reference phantom, however, can be related to the idealized reflectance response using the aforementioned models of light transport. As a result, the target specific reflectance R(fx ,λ) can be extracted by the following equation: R (λ, fx ) =
.
Rsample (λ, fx ) • finst (λ, fx ) Rphant (λ, fx ) • finst (λ, fx ) Rphantref (λ, fx ) . (9.5)
9.4.3
Calibration for Variations in Surface (and Subsurface) Topology
The influence of optical aberrations on SFDI measurements is most notable when imaging over large areas of tissue and/or reconstructing complex anatomical structures in depth. These are where the surface topology of tissue can vary greatly; over a range of millimeters to centimeters. As there is no single, flat plane for projected patterns to propagate through, the depth specific alterations of these structured images will be interpreted as variations in absorption and scattering by the models of light transport. There have been two basic calibration strategies to address these sources of topological errors. In the first approach, calibration measurements are collected at multiple, known heights relative to the ideal image plane. Here, the image intensity and integrity can be characterized at discrete distances from the instrument and related to the known optical properties of the calibration reference phantom. Additional structured patterns, (e.g., grids, checkerboards, etc.) can then be projected onto the surface of the target tissue and the relative spatial distortion of these additional patterns can be used to generate a topological height map of the imaged target in relation to the instrument’s ideal image plane. The multi-
9 Spatial Frequency Domain Imaging
height calibration measurements can then be interpolated as a function of depth to match the specific tissue topology on a pixel by pixel basis, compensating for the spatial illumination variation, both in lateral and depth dimensions [21]. This approach has been expanded even further to account for high angle corrections and secondary surface reflections of the projected patterns [22], extending this surface correction approach toward extreme surface topologies due to anatomical features found on the face, feet, hands, subdermal tumor growths, etc. Another approach has also been proposed which aims to characterize the volumetric impact of optical aberrations independent from spectral intensity throughput and interpret these depth specific distortions in terms of the resulting optical properties, empirically [23]. This approach assumes that the imaging optics of the particular SFDI instrument are fixed. Here, reference phantoms are also measured at multiple distances, but this is only performed once, not at every time target tissue is acquired. This initial multi-height measurement is then used to determine the wavelength specific errors in the calculated absorption and scattering values relative to the calibration measurement at the ideal image plane, thus at the time of tissue measurement, only a single calibration measurement is required along with data to map the surface topology. This method has shown that aberration induced errors are independent of the target’s optical properties, but rather are a consequence of the depth. Through the initial characterization of the depth dependent distortions induced by these aberrations in terms of absorption and scattering, corrections for surface topology can now be applied in post-processing as a wavelength dependent scaling factor. Compensating for structural topology in post process has an added benefit as this correction can also be applied to subsurface structures when layered or tomographic models are applied, thereby maintaining the spectral integrity of embedded structures within tissue volumes.
153
9.5
Implementation and System Design
Since its early inception [6, 7], there have been several implementation and hardware designs developed for SFDI, depending on the specific application needs. In this section, several principal instrument designs will be discussed that focus on three types of clinical and investigational interests: applications that examine spatial variations within tissue, temporal dynamics within tissue, or spectral characteristics and signatures in tissue. Though an ideal instrument would be one that can perform all three aspects, current hardware limitations have resulted in these three categories. Figure 9.7 shows a depiction of some early SFDI systems that were designed to perform in either “space”, “time”, or “energy” (spectrum).
9.5.1
Design and Implementation of SFDI in Spatially Resolved Applications
The most prevalent technology used to generate and project spatial frequency encoded patterns is digital micro-mirror devices (DMDs). These programmable reflective elements are commonly found in most digital projectors and offer a high degree of flexibility and customization when it comes to designing and implementing SFDI acquisitions and sequences. DMDs are two dimensional arrays of mirror elements that can individually deflect light such that it either continues through the projection optics or is directed to an internal beam dump. They can be manufactured in a variety of element sizes and resolutions (dimensions/resolutions) and can be optimized for the visible or the near infrared ranges. In applications where spatial variation in tissue optical properties is prioritized, the vast majority of these types of SFDI systems couple these DMD devices with individually addressable LED sources to provide the spectral bands and a monochrome camera to detect these sequential LED illumination patterns [24–26]. Other system designs have used broadband or simultaneous LED illumination with ei-
154
R. B. Saager
Fig. 9.7 Early examples of SFDI instrumentation designs and implementations. Given the technologies available, these different implementations focus on performance in either Space (spatial range, resolution) [24, 27], Time [72],
or Energy (spectral range and resolution) [74]. The scale of the radar plot axes is in arbitrary units to emphasize the relative performance space of each instrument
ther sequential filtering at the detection side, using programmable filter wheels or liquid crystal tunable filters [27, 28], or aligning several monochrome cameras, each with a fixed bandpass filter in front of it [29–31]. A low-resource setting implementation of SFDI has also been developed that exploits the broad spectral bands of LEDs used in compact low-cost projectors with the broad spectral bands of a typical Bayerfilter color camera. Here, it was demonstrated that two additional spectral bands can be measured in addition to the three primary bands as the shorter wavelengths of the green LED band can be detected by the lower edge of the blue Bayer filter, and the longer wavelengths of the greed LED can be detected by the red Bayer filter [32].
While these systems may have just 4–20 spectral bands, they have been successfully deployed in a number of research investigations, providing greater insight into the early assessment of burn wound severity [33–35], the monitoring of chronic wounds [36–40], and wound healing progression [41, 42]. Figure 9.8 shows superficial scattering contrast obtained by the aforementioned low-resource setting SFDI system at 530 nm that correlates directly with the epithelialization observed histology under stem cell mitigated burn wounds vs. control wounds. It has been applied as a tool for surgical guidance to monitor and assess tissue viability [29] over a wide range of conditions and interventions. Here, SFDI has been specifically applied to skin
9 Spatial Frequency Domain Imaging
155
Fig. 9.8 Re-epithelization from a stem cell based therapeutic intervention in a second-degree burn wound (exvivo tissue model) relative to a control wound (no intervention). The top row shows the tomographic reconstruction of new cell growth (red) visualized through superficial
light scattering contrast at 530 nm. The bottom row shows the representative histology, where 100 μm of new cell growth is visualized in the stem cell intervention and negligible growth in the control, agreeing with the SFDI reconstruction
transfer flap surgery [43–46], facial transplants [47], liver [29], bowel [29], and kidney [48]. Early cancer detection, drug delivery and therapeutic monitoring have been explored by SFDI techniques. Several studies have been conducted in preclinical models of breast cancers [49–52]. The area of skin cancers has been explored as well, not just in terms of lesion characterization, but in terms of monitoring and adapting treatment planning in light based therapeutics, such as Photodynamic Therapy and PDT [25, 53–55]. Given its wide-field imaging capabilities, the utilization of SFDI as a label-free cancer margin detection tool has been investigated in resected tissues [56– 59]. It has also been applied in studies within neuroscience where models of Alzheimer’s disease were characterized in terms of scattering hemodynamic changes as a means to detect or predict vascular impairment due to plagues or neuronal death [60, 61]. It was also applied to investigations of traumatic brain injury [62], and stroke [63], as
well as providing a means to detect localized hemodynamic changes in functional activation studies [64, 65].
9.5.2
Design and Implementation of SFDI in Temporally Resolved Applications
There are several factors that can limit the acquisition time for an SDFI data set. These limitations can be due to the implementation of the data set acquisition sequence and/or the performance characteristics of the technologies/components used in the instrument design.
9.5.2.1 Limitations in the Traditional Acquisition Sequence The advantage of utilizing a three-phase demodulation scheme is that it can determine the envelope function [i.e., AC(fx ,λ) in Eq. (9.4)] at individual pixel locations independent from adjacent
156
pixel locations. This is useful for maintaining the spatial resolution of this imaging component while also differentiating heterogeneous structures. However, this requires the sequential acquisition of three images for every spatial frequency. Additionally, in order to separate the effects of absorption from that of scattering, a minimum of two spatial frequency responses is required. From this traditional approach, this would hence require a minimum of 6 patterns to be projected sequentially. Should this system be LED driven with a single monochrome camera, this sequence would need to be repeated, yet again, for each spectral band acquired.
9.5.2.2 Limitations of Traditional Components and Technologies DMDs may be the most prevalent core components in current projector systems, given the ease and flexibility to select any range or sequence of spatial frequencies. It is important to keep in mind, however, that the bit-depth of the images generated is related to the amount of time the individual mirror pixels are angled in the “on(−axis)” position relative to the “off(−axis)”. In this case, typical 8-bit projections from standard DMD chip-sets will take 40 ms to complete the generation of a single image. Signal to noise ratios, both in terms of the source illumination intensity and the sensitivity of the camera sensor (and the optimization of the spectral throughput of the system optics), can play a profound role in the total acquisition time of the system. This is a particular challenge when water and/or lipid absorption are of interest to detect. Both these chromophores have relatively weak absorption bands in the 900–1000 nm range (lipids 930 nm and water 970 nm) In this spectral range, the quantum efficiency of silicon detectors is substantially weak (i.e., 10–5%) relative to that at 700 nm. LED sources are also relatively weak when compared to visible and other near infrared sources. As a result of these acquisition sequence and hardware limitations, many spatially resolved SFDI systems can take tens of seconds to minutes to acquire a complete data set.
R. B. Saager
9.5.2.3 Modifications to SFDI Acquisitions to Increase Acquisition Rates In order to increase the acquisition rates of SFDI data sets, several modified strategies have been explored. The goal of these systems is to be able to capture tissue dynamics on the order of 1–10 Hz. To reduce the number of projected patterns needed, a single image method has been developed: Single Snapshot of Optical Properties (SSOP) [66, 67]. This approach abandons the three phase demodulation approach, but rather utilizes a Fourier transform method to decompose a single spatial frequency encoded image into AC and DC components and then uses them as the minimal two spatial frequencies required to estimate the tissue optical properties. This approach does come at the sacrifice of some spatial resolution. Another approach is to speed up the frame rate of the DMD by reducing the bit-depth of the projected patterns. By going from 8-bit to 6-bit, it was noted early on that there was little measurable difference in the demodulation of the reduced projection set, while there was a 4x increase in the frame rate of the DMD. This concept was taken to a much further extreme by reducing the bit-depth to binary images and hence replacing sinusoids with square wave patterns. Not only did this increase the frame rate by a factor of 256, but square waves can also be interpreted as a superposition of a series of sinusoids at differing spatial frequencies. In this case, one could utilize these square waves as a means to multiplex the projection of several spatial frequencies at the same time [68]. In practice, as tissue acts as a low pass filter, only 2 spatial frequencies may be reliably detected with this approach. The last approach to improving the projection rate is to abandon DMDs altogether and utilize static transmission masks with sinusoidal functions already imprinted on them [69–71]. Here, there are no frame rate concerns on the projection side. Should multiple spatial frequencies and phases still be desired, transmission masks have been generated on spinning disks, where a full rotation of the disk would cycle through all imprinted patterns and phases (Fig. 9.9).
9 Spatial Frequency Domain Imaging
157
Fig. 9.9 Analog transmission masks (left: static pattern for SSOP methods, right: spinning disk for multi-frequency, multi-phase methods)
Another consideration is the sensitivity of the camera, as it would mean little to develop a fast projection scheme if there is no sensor that can acquire remitted light at those rates and at sufficient signal to noise ratios. To that end, many fast SFDI systems that use one of the projection schemes above tend to have just a limited number of spectral bands (2–3) that are projected over a small field of view (1–2 cm) and use a fast, high sensitivity sensor (e.g., an sCMOS camera) in order to achieve an acquisition rate on the order of 10 Hz. These systems are targeted towards fluctuations in tissue like hemodynamics and plethysmography as well as other potential fast signals such as neuronal activation, which has been theorized to evoke a change in local scattering properties. Existing systems have been investigating the effects of delayed cardio-pulmonary resuscitation as these fast SFDI measurements have revealed a delay between when blood flow has been restored and when function (and oxygen extraction) returns to the brain [72].
9.5.3
Design and Implementation of SFDI in Spectrally Resolved Applications
In the context of spectral measurements of turbid media, SFDI has an advantage over many other diffuse reflectance spectroscopic techniques
since it can differentiate the contributions from absorption and scattering without the use of spectral priors. It can decompose the changes in the spatial frequency dependent reflectance signal in terms of absorption and scattering at each wavelength independently. There are no spectral constraints to impose at the processing stage where reflectance spectra are modeled as a function of tissue optical properties. For example, scattering spectra are not required to fit a power-law like function, nor are absorption spectra confined to represent a linear combination of anticipated chromophores, such as melanin, hemoglobin species, lipids, and water. For this reason, high spectral resolution variants of SFDI make compelling investigative platforms to explore tissue optical properties, conditions, and diseases that may not be fully characterized or understood. Referred to as spatial frequency domain spectroscopy, SFDS (or spatially modulated quantitative spectroscopy, SMoQS), these techniques typically sacrifice spatial range and resolution for spectral range and resolution. These types of instrument designs mimic those previously used for SFDI with the only exceptions of utilizing a broadband source with the projector and a spectrometer to detect a single point from the illuminated tissue. As a point spectroscopy design, a three phase demodulation scheme is required in order to extract the AC component at a single pixel [73]. Line imaging spectrographs have also been
158
R. B. Saager
used in the instrument design to at least acquire some spatial information along a single lateral direction [74]. What makes these designs unique is that they are capable of determining spectral properties of tissue over both visible and near infrared regimes at a resolution of just a few nanometers. This approach has been used to identify and characterize hemoglobin breakdown present in burn wounds [75] as well as detect often ignored or unanticipated chromophores in skin, such as carotenoids [76]. This investigational platform, though limited in acquisition speeds and imaging capabilities, has been beneficial in identifying sources of contrast in tissue that, if multi-spectral imaging methods did not anticipate and account for, could result in significant errors in their interpretation of tissue optical properties. Given the high spectral resolution of SFDS, it is also possible to evaluate and design optimized multispectral imagers, ensuring that the minimal number and widths of the spectral bands are sufficient to accurately determine the tissue properties of interest [76]. This approach has also been used to develop and evaluated depth sensitive methods that exploit spectral dependent penetrance variability in tissue [77]. This will be further expanded on in the next section.
9.6
Depth Sensitive Methods
With SFDI, there are two mechanisms that can be employed to probe tissue properties at differing depths. As it has been shown that SFDI can be used in both visible and near infrared ranges, spectral depth penetrance can be exploited to probe different volumes of tissue. As the visual regime tends to have 5–10 times higher absorption and 2–3 times higher scattering, it has been shown that diffuse reflectance imaging techniques in the spectral range will probe the first few hundred micrometers into tissue, whereas SFDI in the near infrared range can probe millimeters into tissue. The precise volumes probed are based on the wavelength specific optical properties encountered and can be
estimated by the fluence calculated by the models of light transport employed. A very simple example is referred to as the depth penetrance metric, which provides a rough estimation of the lights interaction length in tissue [78]:
1 .δ =
3μa μa + μs .
(9.6)
This simple metric in particular represents the 1/e distance of a continuous fluence distribution based on the standard diffusion approximation solution of planar illumination in turbid media. By imposing a two-layer model of skin, i.e., describing it as an epidermal and dermal layer, this spectral depth dependence model will fit chromophores to the measured absorption spectrum over the visible and near infrared regimes, independently. In this case, differences in chromophore concentration estimation between these two spectral domains are presumed to be due to a partial volume effect where visible light absorption represents the more superficial volumes (i.e., biased toward epidermal absorption properties) and the near infrared absorption includes a larger depth penetrance, increasing the contributions from dermal tissue in the measurement. The relative differences in depths, calculated by the fluence or simplified depth penetrance metric, can then be leveraged to determine the layer thickness of the epidermal layer and compartmentalize depth specific contributions of chromophores. This approach, though semi-empirical in nature, was validated in a comparative study against melanin concentration and thickness estimation determined by multi-photon microscopy, where this wide field imaging technique showed agreement with microscopy within ±15 μm [79]. This technique was also used to examine the impact of melanin concentration on the estimation of dermal hemodynamics. Here, it identified that using homogeneous models of skin in the near infrared was susceptible to a systemic error in blood oxygenation estimation in the presence of increasing melanin concentration. In the case of reflectance-based imaging, blood
9 Spatial Frequency Domain Imaging
oxygenation would be underestimated in darker skin types, unless a layered model was applied [80]. The second mechanism for depth sensitivity in SFDI measurements come from the selection of spatial frequencies themselves. Here, one can think of turbid media as a low-pass filter where the propagation of the sinusoidal pattern in depth is limited by the spatial frequency used, namely the higher the frequency encoded in the pattern, the faster that pattern blurs out due to scattering, and hence the shorter the depth that spatial frequency will propagate in tissue. For this mechanism, the depth sensitivity can be described by the fluence-squared (i.e., the photon-hitting density) derived by analytical solutions of the RTE in the spatial frequency domain. Several tomographic reconstruction methods and strategies have been applied to SFDI to render full 3D renderings of tissue volumes [64, 81–86].
9.7
Online Resources
There are a number of additional online resources, tutorials, and sample codes available for those interested in pursuing this approach further. As this technique remains under continual development and new technological advancements can enable SFDI to be implemented and utilized in currently unexplored or unanticipated directions, it is recommended that the following resources be also considered as a means to understand the evolution of this quantitative imaging approach beyond the scope of what has been discussed here. These resources may also provide tools and more detailed explorations of specific topics and concepts discussed in this chapter. In terms of modeling of light transport within turbid media such as tissue in the spatial frequency domain and other diffuse optical approaches, the Virtual Photonics Technology Initiative provides a wide variety of tutorials, interactive tools and solvers, as well as open source codes and executable that can be adapted to modeling light-tissue interactions in a variety of diffuse optical methods and geometries. This
159
resource can be found at https://virtualphotonics. org/ and has been supported by NIBIB and NIGMS among others [87]. Of particular note, this resource provides a graphical user interface that can provide forward and inverse solutions in the spatial frequency domain for homogenous media utilizing a variety of solutions approached to the radiative transport equation. They also provide open-source codes for Monte Carlo solutions which can either be run as executable functions or be adapted for specific needs. The Open SFDI Initiative is another online resource specifically dedicated to instrumentation development and data processing of SFDI systems [26]. The goal of this online forum is to engage researchers interested in exploring SFDI methods by providing step-by-step tutorials on how to build a simple, low-cost SFDI system, based on an LED driven source and a monochrome camera. The specific design detailed here has been implemented at multiple institutions and provides an initial platform for standardization in SFDI data acquisition and analysis. Additionally, there is a tutorial on data processing with pseudo-code to provide a framework for first-time researchers can develop their own processing codes as well as a simple data processing executable with example data that can be used to test and validate any individual’s SFDI processing code. This can be found at http:// opensfdi.org/.
9.8
Conclusion
In this chapter, the fundamental principles of spatial frequency domain imaging are introduced. Described in its most generalized terms, SFDI is a quantitate imaging technique that can differentiate events of light absorption from that of light scattering in turbid media. By projecting patterns of light (structured illumination), absorption will attenuate the total signal of light remitted by the medium, while the effects of scattering will blur the pattern, resulting in a reduction of amplitude contrast in the remitted signal. Models of light transport have been developed to provide mathematical relations
160
between the signal detected (i.e., the spatial frequency dependent diffuse reflectance) in terms of the absorption and scattering present within the interrogated tissue volumes. In the context of biologic tissues, the resulting absorption properties can be interpreted as a linear combination of chromophores (absorbing molecules in the tissue volume). Most common chromophores present in tissue are Hemoglobin species (e.g., oxygenated and deoxygenated), melanin, water, lipids and carotenoids. Scattering properties result from photon deflections from objects that are of sizes ranging from microns to 100’s of nanometers. In tissue, these objects are sub-cellular and extra-cellular structures. In many regards, however, SFDI is still an emerging technique as there remain many opportunities for further development, exploitation, and application. While the foundation behind this imaging technique has been introduced here, there are currently many advancements that are ongoing that either advance the sensitivity and specificity of SFDI by developing models that include sub-diffusive scattering responses [88–90]. SFDI can also be combined with other imaging modalities, where its ability to model light transport can inform the volume and depth sensitivity of the other imaging approaches. SFDI has been combined with fluorescence to create a quantitative fluorescence technique [53] as well as enable tomography [91–93]. It has also been combined with laser speckle imaging as a means to measure metabolic activity [94] rather than just changed in blood oxygenation, as well as combined with polarimetry [95, 96] and projection orientation [97] to better extract structural orientation and organization from light scattering. From these general principles, several implementation strategies and hardware designs are explored, motivated by applications that require spatial imaging and resolution, temporal dynamics, or spectral resolution and fidelity to address current unmet needs in clinical imaging and in basic medical sciences. As new technologies and hardware emerge, the hope is that these instrument designs can expand their performance (e.g., Fig. 9.7). As imaging technologies mature, com-
R. B. Saager
ponents may fall in price and size, promoting the introduction and use of this imaging approach in more clinical and/or low resource settings.
References 1. Wriedt, T., Mie theory: a review. The Mie theory: Basics and applications, 2012: p. 53–71. 2. Mourant, J.R., et al., Mechanisms of light scattering from biological cells relevant to noninvasive opticaltissue diagnostics. Applied Optics, 1998. 37(16): p. 3586–3593. 3. Kienle, A. and M.S. Patterson, Determination of the optical properties of turbid media from a single Monte Carlo simulation. Physics in Medicine and Biology, 1996. 41(10): p. 2221–2227. 4. Ishimaru, A., Wave propagation and scattering in random media. 1978, New York: Academic Press. 5. Kim, A.D., Transport theory for light propagation in biological tissue. J Opt Soc Am A Opt Image Sci Vis, 2004. 21(5): p. 820–7. 6. Dognitz, N. and G. Wagnieres, Determination of tissue optical properties by steady-state spatial frequency-domain reflectometry. Lasers in Medical Science, 1998. 13(1): p. 55–65. 7. Cuccia, D.J., et al., Quantitation and mapping of tissue optical properties using modulated imaging. J Biomed Opt, 2009. 14(2): p. 024012. 8. Li, X.D., et al., Diffraction tomography for biochemical imaging with diffuse-photon density waves. Optics Letters, 1997. 22(8): p. 573–575. 9. Kienle, A., et al., Noninvasive determination of the optical properties of two-layered turbid media. Applied Optics, 1998. 37(4): p. 779–791. 10. Markel, V.A. and J.C. Schotland, Inverse problem in optical diffusion tomography. I. Fourier-Laplace inversion formulas. Journal of the Optical Society of America a-Optics Image Science and Vision, 2001. 18(6): p. 1336–1347. 11. Post, A.L., D.J. Faber, and T.G. van Leeuwen, Model for the diffuse reflectance in spatial frequency domain imaging. J Biomed Opt, 2023. 28(4): p. 046002. 12. Wang, L.H., S.L. Jacques, and L.Q. Zheng, Mcml – Monte-Carlo Modeling of Light Transport in Multilayered Tissues. Computer Methods and Programs in Biomedicine, 1995. 47(2): p. 131–146. 13. Swartling, J., et al., Accelerated Monte Carlo models to simulate fluorescence spectra from layered tissues. Journal of the Optical Society of America a-Optics Image Science and Vision, 2003. 20(4): p. 714–727. 14. Rajaram, N., T.H. Nguyen, and J.W. Tunnell, Lookup table-based inverse model for determining optical properties of turbid media. Journal of Biomedical Optics, 2008. 13(5). 15. Sun, Z.Z., et al., An artificial neural network model for accurate and efficient optical property mapping from spatial-frequency domain images. Computers and Electronics in Agriculture, 2021. 188.
9 Spatial Frequency Domain Imaging 16. Zhao, Y.Y., et al., Deep learning model for ultrafast multifrequency optical property extractions for spatial frequency domain imaging. Optics Letters, 2018. 43(22): p. 5669–5672. 17. Yudovsky, D. and A.J. Durkin, Spatial frequency domain spectroscopy of two layer media. Journal of Biomedical Optics, 2011. 16(10). 18. Rowland, R., et al., Burn wound classification model using spatial frequency-domain imaging and machine learning. Journal of Biomedical Optics, 2019. 24(5). 19. Pogue, B.W. and M.S. Patterson, Review of tissue simulating phantoms for optical spectroscopy, imaging and dosimetry. Journal of Biomedical Optics, 2006. 11(4). 20. Ayers, F., et al., Fabrication and characterization of silicone-based tissue phantoms with tunable optical properties in the visible and near infrared domain. Design and Performance Validation of Phantoms Used in Conjunction with Optical Measurements of Tissue, 2008. 6870. 21. Gioux, S., et al., Three-dimensional surface profile intensity correction for spatially modulated imaging. Journal of Biomedical Optics, 2009. 14(3). 22. Zhao, Y., et al., Angle correction for small animal tumor imaging with spatial frequency domain imaging (SFDI). Biomedical Optics Express, 2016. 7(6): p. 2373–2384. 23. Majedy, M., et al., Influence of optical aberrations on depth-specific spatial frequency domain techniques. Journal of Biomedical Optics, 2022. 27(11). 24. Mazhar, A., et al., Implementation of an LED based Clinical Spatial Frequency Domain Imaging System. Emerging Digital Micromirror Device Based Systems and Applications Iv, 2012. 8254. 25. Saager, R.B., et al., A Light Emitting Diode (LED) Based Spatial Frequency Domain Imaging System for Optimization of Photodynamic Therapy of Nonmelanoma Skin Cancer: Quantitative Reflectance Imaging. Lasers in Surgery and Medicine, 2013. 45(4): p. 207–215. 26. Applegate, M.B., et al., OpenSFDI: an open-source guide for constructing a spatial frequency domain imaging system. Journal of Biomedical Optics, 2020. 25(1). 27. Ayers, F.R., et al., Wide-Field Spatial Mapping of In Vivo Tattoo Skin Optical Properties Using Modulated Imaging. Lasers in Surgery and Medicine, 2009. 41(6): p. 442–453. 28. Cuccia, D.J., et al., Quantitation and mapping of tissue optical properties using modulated imaging. Journal of Biomedical Optics, 2009. 14(2). 29. Gioux, S., et al., First-in-human pilot study of a spatial frequency domain oxygenation imaging system. Journal of Biomedical Optics, 2011. 16(8). 30. Stromberg, T., et al., Spatial frequency domain imaging using a snap-shot filter mosaic camera with multiwavelength sensitive pixels. Photonics in Dermatology and Plastic Surgery 2018, 2018. 10467.
161 31. Kennedy, G., et al., Spatial frequency domain imager based on a compact multiaperture camera: testing and feasibility for noninvasive burn severity assessment. Journal of Biomedical Optics, 2021. 26(8). 32. Belcastro, L., et al., Handheld multispectral imager for quantitative skin assessment in low-resource settings. Journal of Biomedical Optics, 2020. 25(8). 33. Ponticorvo, A., et al., Evaluating clinical observation versus Spatial Frequency Domain Imaging (SFDI), Laser Speckle Imaging (LSI) and thermal imaging for the assessment of burn depth. Burns, 2019. 45(2): p. 450–460. 34. Hosking, A.M., et al., Spatial Frequency Domain Imaging for Burn Wound Assessment: A Case Series. Lasers in Surgery and Medicine, 2018. 50: p. S13– S13. 35. Nguyen, J.Q., et al., Spatial frequency domain imaging of burn wounds in a preclinical model of graded burn severity. Journal of Biomedical Optics, 2013. 18(6). 36. Lee, S., et al., SFDI biomarkers provide a quantitative ulcer risk metric and can be used to predict diabetic foot ulcer onset. Journal of Diabetes and Its Complications, 2020. 34(9). 37. Murphy, G.A., et al., Quantifying dermal microcirculatory changes of neuropathic and neuroischemic diabetic foot ulcers using spatial frequency domain imaging: a shade of things to come? Bmj Open Diabetes Research & Care, 2020. 8(2). 38. Yafi, A., et al., Quantitative Skin Assessment Using Spatial Frequency Domain Imaging (SFDI) in Patients With or at High Risk for Pressure Ulcers. Lasers in Surgery and Medicine, 2017. 49(9): p. 827–834. 39. Saidian, M., et al., Characterisation of impaired wound healing in a preclinical model of induced diabetes using wide-field imaging and conventional immunohistochemistry assays. International Wound Journal, 2019. 16(1): p. 144–152. 40. Saidian, M., et al., Multimodality Optical Characterization of Impaired Wound Healing in a Principal Model of Diabetes Mellitus. Lasers in Surgery and Medicine, 2017. 49: p. 8–9. 41. Sayadi, L.R., et al., A Quantitative Assessment of Wound Healing With Oxygenated Micro/Nanobubbles in a Preclinical Burn Model. Annals of Plastic Surgery, 2021. 87(4): p. 421–426. 42. Kennedy, G.T., et al., Spatial frequency domain imaging: a quantitative, noninvasive tool for in vivo monitoring of burn wound and skin graft healing. Journal of Biomedical Optics, 2019. 24(7). 43. Yafi, A., et al., Postoperative quantitative assessment of reconstructive tissue status in a cutaneous flap model using spatial frequency domain imaging. Plast Reconstr Surg, 2011. 127(1): p. 117–130. 44. Pharaon, M.R., et al., Early detection of complete vascular occlusion in a pedicle flap model using quantitative [corrected] spectral imaging. Plast Reconstr Surg, 2010. 126(6): p. 1924–1935.
162 45. Ponticorvo, A., et al., Quantitative assessment of partial vascular occlusions in a swine pedicle flap model using spatial frequency domain imaging. Biomed Opt Express, 2013. 4(2): p. 298–306. 46. Nguyen, J.T., et al., A novel pilot study using spatial frequency domain imaging to assess oxygenation of perforator flaps during reconstructive breast surgery. Ann Plast Surg, 2013. 71(3): p. 308–15. 47. Vargas, C.R., et al., Intraoperative Hemifacial Composite Flap Perfusion Assessment Using Spatial Frequency Domain Imaging: A Pilot Study in Preparation for Facial Transplantation. Ann Plast Surg, 2016. 76(2): p. 249–55. 48. Nadeau, K.P., et al., Quantitative assessment of renal arterial occlusion in a porcine model using spatial frequency domain imaging. Opt Lett, 2013. 38(18): p. 3566–9. 49. Tabassum, S., et al., Two-layer inverse model for improved longitudinal preclinical tumor imaging in the spatial frequency domain. J Biomed Opt, 2018. 23(7): p. 1–12. 50. Tabassum, S., et al., Feasibility of spatial frequency domain imaging (SFDI) for optically characterizing a preclinical oncology model. Biomed Opt Express, 2016. 7(10): p. 4154–4170. 51. Zhao, Y., et al., Angle correction for small animal tumor imaging with spatial frequency domain imaging (SFDI). Biomed Opt Express, 2016. 7(6): p. 2373– 84. 52. Tank, A., et al., Spatial frequency domain imaging for monitoring immune-mediated chemotherapy treatment response and resistance in a murine breast cancer model. Sci Rep, 2022. 12(1): p. 5864. 53. Saager, R.B., et al., Quantitative fluorescence imaging of protoporphyrin IX through determination of tissue optical properties in the spatial frequency domain. J Biomed Opt, 2011. 16(12): p. 126013. 54. Rohrbach, D.J., et al., Characterization of nonmelanoma skin cancer for light therapy using spatial frequency domain imaging. Biomed Opt Express, 2015. 6(5): p. 1761–6. 55. Rohrbach, D.J., et al., Preoperative mapping of nonmelanoma skin cancer using spatial frequency domain and ultrasound imaging. Acad Radiol, 2014. 21(2): p. 263–70. 56. Laughney, A.M., et al., Spectral discrimination of breast pathologies in situ using spatial frequency domain imaging. Breast Cancer Res, 2013. 15(4): p. R61. 57. Laughney, A.M., et al., System analysis of spatial frequency domain imaging for quantitative mapping of surgically resected breast tissues. J Biomed Opt, 2013. 18(3): p. 036012. 58. Nandy, S., et al., Label-free quantitative optical assessment of human colon tissue using spatial frequency domain imaging. Tech Coloproctol, 2018. 22(8): p. 617–621. 59. Nandy, S., et al., Quantitative multispectral ex vivo optical evaluation of human ovarian tissue using spa-
R. B. Saager
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
tial frequency domain imaging. Biomed Opt Express, 2018. 9(5): p. 2451–2456. Lin, A.J., et al., Optical imaging in an Alzheimer’s mouse model reveals amyloid-beta-dependent vascular impairment. Neurophotonics, 2014. 1(1). Lin, A.J., et al., In Vivo Optical Signatures of Neuronal Death in a Mouse Model of Alzheimer’s Disease. Lasers in Surgery and Medicine, 2014. 46(1): p. 27–33. Shaul, O., et al., Application of spatially modulated near-infrared structured light to study changes in optical properties of mouse brain tissue during heatstress. Appl Opt, 2017. 56(32): p. 8880–8886. Abookasis, D., et al., Imaging cortical absorption, scattering, and hemodynamic response during ischemic stroke using spatially modulated nearinfrared illumination. J Biomed Opt, 2009. 14(2): p. 024033. Konecky, S.D., et al., Hyperspectral optical tomography of intrinsic signals in the rat cortex. Neurophotonics, 2015. 2(4): p. 045003. Reisman, M.D., et al., Structured illumination diffuse optical tomography for noninvasive functional neuroimaging in mice. Neurophotonics, 2017. 4(2): p. 021102. Vervandier, J. and S. Gioux, Single snapshot imaging of optical properties. Biomed Opt Express, 2013. 4(12): p. 2938–44. Aguenounon, E., et al., Single snapshot of optical properties image quality improvement using anisotropic two-dimensional windows filtering. J Biomed Opt, 2019. 24(7): p. 1–21. Nadeau, K.P., et al., Multifrequency synthesis and extraction using square wave projection patterns for quantitative tissue imaging. J Biomed Opt, 2015. 20(11): p. 116005. Ghijsen, M., et al., Real-time simultaneous single snapshot of optical properties and blood flow using coherent spatial frequency domain imaging (cSFDI). Biomed Opt Express, 2016. 7(3): p. 870–82. Torabzadeh, M., et al., Hyperspectral imaging in the spatial frequency domain with a supercontinuum source. J Biomed Opt, 2019. 24(7): p. 1–9. Torabzadeh, M., et al., Compressed single pixel imaging in the spatial frequency domain. J Biomed Opt, 2017. 22(3): p. 30501. Wilson, R.H., et al., High-speed spatial frequency domain imaging of rat cortex detects dynamic optical and physiological properties following cardiac arrest and resuscitation. Neurophotonics, 2017. 4(4). Saager, R.B., D.J. Cuccia, and A.J. Durkin, Determination of optical properties of turbid media spanning visible and near-infrared regimes via spatially modulated quantitative spectroscopy. Journal of Biomedical Optics, 2010. 15(1). Saager, R.B., et al., Portable (handheld) clinical device for quantitative spectroscopy of skin, utilizing spatial frequency domain reflectance techniques. Review of Scientific Instruments, 2017. 88(9).
9 Spatial Frequency Domain Imaging 75. Saager, R.B., et al., Impact of hemoglobin breakdown products in the spectral analysis of burn wounds using spatial frequency domain spectroscopy. Journal of Biomedical Optics, 2019. 24(2). 76. Saager, R.B., et al., Method using in vivo quantitative spectroscopy to guide design and optimization of low-cost, compact clinical imaging devices: emulation and evaluation of multispectral imaging systems. Journal of Biomedical Optics, 2018. 23(4). 77. Saager, R.B., et al., Method for depth-resolved quantitation of optical properties in layered media using spatially modulated quantitative spectroscopy. Journal of Biomedical Optics, 2011. 16(7). 78. Jacques, S.L., Simple Optical Theory for Light Dosimetry during Pdt. Optical Methods for Tumor Treatment and Detection: Mechanisms and Techniques in Photodynamics Therapy, 1992. 1645: p. 155–165. 79. Saager, R.B., et al., In vivo measurements of cutaneous melanin across spatial scales: using multiphoton microscopy and spatial frequency domain spectroscopy. Journal of Biomedical Optics, 2015. 20(6). 80. Saager, R.B., et al., In vivo isolation of the effects of melanin from underlying hemodynamics across skin types using spatial frequency domain spectroscopy. Journal of Biomedical Optics, 2016. 21(5). 81. Konecky, S.D., et al., Quantitative optical tomography of sub-surface heterogeneities using spatially modulated structured light. Opt Express, 2009. 17(17): p. 14780–90. 82. Cuccia, D.J., et al., Modulated imaging: quantitative analysis and tomography of turbid media in the spatial-frequency domain. Opt Lett, 2005. 30(11): p. 1354–6. 83. Lukic, V., V.A. Markel, and J.C. Schotland, Optical tomography with structured illumination. Opt Lett, 2009. 34(7): p. 983–5. 84. Belanger, S., et al., Real-time diffuse optical tomography based on structured illumination. J Biomed Opt, 2010. 15(1): p. 016006. 85. D’Andrea, C., et al., Fast 3D optical reconstruction in turbid media using spatially modulated light. Biomed Opt Express, 2010. 1(2): p. 471–481.
163 86. Kristensson, E., E. Berrocal, and M. Alden, Quantitative 3D imaging of scattering media using structured illumination and computed tomography. Opt Express, 2012. 20(13): p. 14437–50. 87. Hayakawa, C.K., et al., MCCL: an open-source software application for Monte Carlo simulations of radiative transport. Journal of Biomedical Optics, 2022. 27(8). 88. Kanick, S.C., et al., Sub-diffusive scattering parameter maps recovered using wide-field high-frequency structured light imaging. Biomed Opt Express, 2014. 5(10): p. 3376–90. 89. Bodenschatz, N., et al., Detecting structural information of scatterers using spatial frequency domain imaging. J Biomed Opt, 2015. 20(11): p. 116006. 90. McClatchy, D.M., 3rd, et al., Wide-field quantitative imaging of tissue microstructure using sub-diffuse spatial frequency domain imaging. Optica, 2016. 3(6): p. 613–621. 91. Mazhar, A., et al., Structured illumination enhances resolution and contrast in thick tissue fluorescence imaging. J Biomed Opt, 2010. 15(1): p. 010506. 92. Konecky, S.D., et al., Spatial frequency domain tomography of protoporphyrin IX fluorescence in preclinical glioma models. J Biomed Opt, 2012. 17(5): p. 056008. 93. Ducros, N., et al., Full-wavelet approach for fluorescence diffuse optical tomography with structured illumination. Opt Lett, 2010. 35(21): p. 3676–8. 94. Ghijsen, M., et al., Quantitative real-time optical imaging of the tissue metabolic rate of oxygen consumption. J Biomed Opt, 2018. 23(3): p. 1–12. 95. Ghassemi, P., et al., A polarized multispectral imaging system for quantitative assessment of hypertrophic scars. Biomed Opt Express, 2014. 5(10): p. 3337–54. 96. Yang, B., et al., Polarized light spatial frequency domain imaging for non-destructive quantification of soft tissue fibrous structures. Biomed Opt Express, 2015. 6(4): p. 1520–33. 97. Konecky, S.D., et al., Imaging scattering orientation with spatial frequency domain imaging. J Biomed Opt, 2011. 16(12): p. 126001.
Imaging Through Scattering Media Using Wavefront Shaping
10
Yuecheng Shen
Abstract
Optical scattering has been a major obstacle that prevents optical imaging from seeing deeper. In this chapter, we briefly introduce the development and the current achievement of a technique, namely, wavefront shaping, which allows optical imaging through scattering media. Such a capability is enabled by modulating the incident light with a special wavefront. With this special wavefront, scattering-induced wavefront scrambling can be effectively compensated so that arbitrary optical field distribution can be synthesized through scattering. Keywords
Wavefront shaping · Scattering medium · Optical phase conjugation · Transmission matrix · Feedback-based optimization algorithm · Phase retrieval · Optical imaging · Guide star · Ultrasonic modulation
Y. Shen () School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China e-mail: [email protected]
10.1
Light Propagation in Scattering Media
In many imaging scenarios, the propagation of light is confined within transparent media such as glass and air. This almost lossless propagation allows precise control of light distribution in both spatial and temporal domains, endowing coded optical imaging with various functionalities. In practice, however, the presence of optical scattering can destroy the predictable nature of light propagation, thus significantly deteriorating and even completely ruining the imaging capability through optics. Optical scattering, which originates from refractive-index inhomogeneities of transmission media, such as biological tissue, has been the major obstacle that restricts the imaging depth in complex media. The scattering strength of complex media is characterized by the scattering mean free path, which is the average propagation distance between two adjacent scattering events. For visible light, the scattering mean free path in soft tissue is generally considered to be about 100 μm. Since conventional optical microscopy largely relies on the detection of unscattered light, also known as ballistic light, its imaging capability is usually limited at this depth. Many innovative techniques, including optical coherence tomography [1] and multiphoton microscopy [2] were developed to advance imaging depth by rejecting multiply scattered light through various gating approaches.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_10
165
166
Y. Shen
However, as the intensity of ballistic light decays exponentially with respect to the increased imaging depth (Beer’s law), the penetration depths of these imaging modalities are still restricted to a few scattering mean free paths. In this condition, it is almost impossible to utilize ballistic light to imaging deeper. Multiply scattered light can be utilized for optical imaging at large depths, but even with a heavy computational burden, the imaging resolution is still quite poor [3]. Currently, it remains challenging to achieve high-resolution optical imaging through thick scattering media.
10.2
Feedback-Based Wavefront Shaping
Optical scattering is seemly random and complicated to be dealt with, and a variety of research works were devoted to understanding scattering behaviors through both theoretical and simulation approaches [4–14]. Gradually, researchers begin to aware that scattering-induced information scrambling can be treated as a deterministic process within a certain time scale. Therefore, it is possible to compensate for scattering effects and recover the scrambled image by modulating the incident wavefront. In the 1980s, optical phase conjugation realized through nonlinear photore-
(a) plane wave
random speckle
strongly scattering sample
Fig. 10.1 The general principle of focusing light through a scattering medium. (a) A plane wave gets scattered after passing through a scattering sample, forming a speckle pattern. (b) A shaped wave with a special wavefront is re-
fractive crystals was first demonstrated with the capability to overcome mode scrambling inherent in multimode fibers, enabling clear images to be delivered through [15, 16]. Thanks to the development of micro-electronic technology and material science, the emergence of spatial light modulators (SLMs) offers prospects for modulating wavefronts with great flexibility and high efficiency. This capability lays the foundation for a variety of optical computational imaging modalities through active illumination. In 2007, researchers from the University of Twente demonstrated that by shaping the incident light with a special wavefront, coherent light can be refocused even after passing through scattering media [17]. This work pioneers the field of wavefront shaping and inspires people to envision the possibility of imaging through scattering media with shaped wavefront. The general principle of focusing scattered light is illustrated in Fig. 10.1. When the scattering medium is illuminated by a plane wave, as shown in Fig. 10.1a, a speckle pattern is formed. This observation can be understood by treating the optical scattering as a deterministic linear process and modeling the scattering sample as a linear transmission matrix. In this matrix, each element tmn is unknown and connects the n-th segment of the incident field and the m-th segment of the output field. In this condition, the
(b) shaped wave
focused light
strongly scattering sample
focused after passing through the same scattering medium. (Reprinted by permission from Optica Publishing Group: Optics Letters, Focusing Coherent Light Through Opaque Strongly Scattering Media, Vellekoop, I. M. and Mosk A. P., © 2007)
10 Imaging Through Scattering Media Using Wavefront Shaping
167
transmitted field at the target position, Em , can be mathematically described as Em =
.
tmn An eiϕn
(10.1)
n=1
where An and ϕn are, respectively, the amplitude and phase of the light contributed from the n-th segment of the incident field. For the plane wave, An = 1 and ϕn = 0 can be set for simplicity. Since the phase value of tmn is completely random, the incident plane wave leads to a random phasor sum for Em . To focus the m-th segment of the output field, the amplitude of Em needs to be maximized. In other words, the randomly oriented phasors that contribute to Em need to be aligned in the same direction. Mathematically, such a requirement can be fulfilled by setting ϕn as the conjugate of the argument of tmn . In practice, one can shape a special wavefront to the illumination light in prior, which is physically equivalent to rotating and aligning those phasors. As a result, a shaped wavefront results in a bright focus at the target position, as shown in Fig. 10.1b. To shape the wavefront, a liquid crystal SLM was employed, enabling phase-only modulation for ϕn rather than An . Theoretically, it can be shown that for phase-only modulation, the maximum number of intensity enhancements for focusing light on a single spot is [17]
HeNe
η=
.
π (N − 1) + 1 4
(10.2)
where N is the total number of segments, i.e., the number of independent controls, of the incident field. Here, the factor π/4 is due to the phase-only modulation enforced by the operating scheme of the SLM. In the ideal case with both amplitude and phase modulation, this factor should be 1. Besides employing SLMs with phase-only modulation, SLMs with other modulation schemes such as binary-phase modulation [18], binary-amplitude modulation [19], and polarization modulation [20], can also be utilized for wavefront shaping, leading to respective factors of 1/π , 1/π, and 0.394 [18, 21–24], respectively. The first experimental setup that demonstrates focusing light through scattering media was shown in Fig. 10.2 [17]. Red light output from a 632.8-nm laser was spatially modulated by an SLM and was subsequently focused onto a scattering medium. The number of independent controls used during experiments could be varied by grouping different numbers of pixels of the SLM. A camera monitored the intensity distribution of the scattered light in the target region and provided feedback for the optimization process. The first attempt was tested on a scattering medium made of TiO2 pigment. The authors determined the optimal phase for a
M O/2 O/4
BS O/4 P
NA = 0.85
NA = 0.50
63x
20x
S
CCD P
SLM Fig. 10.2 Schematics of the experimental setup for feedback-based wavefront shaping. HeNe Helium-neon laser, M mirror, λ/2 half-wave plate, λ/4 quarter-wave plate, BS 50% nonpolarizing beam splitter, SLM spatial light modulator, P polarizer, S scattering medium, CCD
charge-coupled device. (Reprinted by permission from Optica Publishing Group: Optics Letters, Focusing Coherent Light Through Opaque Strongly Scattering Media, Vellekoop, I. M. and Mosk A. P., © 2007)
168
Y. Shen
Fig. 10.3 Experimental demonstrations of focusing light through scattering media with a feedback scheme. (a) Transmitted intensity distribution with a plane wave illumination. (b) Focusing light to a single spot with a shaped wave illumination. The enhancement factor is about 1000. (c) Focusing light to five discrete spots with a shaped
wave illumination. (d) The phase map to produce the result in (c). (Reprinted by permission from Optica Publishing Group: Optics Letters, Focusing Coherent Light Through Opaque Strongly Scattering Media, Vellekoop, I. M. and Mosk A. P., © 2007)
single segment at a time by cycling its phase from 0 to 2π. For each segment, the phase that leads to the highest target intensity was kept and stored. After traversing all segments, the optimal phase map could be obtained. This phase map enables the optical field contributed from all segments to interfere constructively so that the target intensity at the target position is at the global maximum. When illuminating the scattering medium with a plane wave, a speckle pattern was formed, as shown in Fig. 10.3a. In contrast, with a shaped wavefront consisting of 3228 individually controlled segments, a bright focus was formed through the scattering medium, as shown in Fig. 10.3b. The enhancement factor was over a factor of 1000 for this focus. By adjusting the target function, focusing light on multiple foci simultaneously could also be achieved. As shown in Fig. 10.3c, five spots with enhancement factors over 200 were obtained. The phase map that corresponds to Fig. 10.3c is also provided in Fig. 10.3d, showing
that the scattering medium fully scrambles the incident light as neighboring segments are uncorrelated. This work demonstrates that a specially shaped wavefront can effectively overcome the optical scattering effect, opening the field of wavefront shaping that enables optical focusing and imaging through scattering media. Since the way of finding the optimal phase map relies on the feedback information of the target intensity, this technique is generally referred to as feedback based wavefront shaping afterward. This technique has a simple and reference-free design, allowing the signal-tonoise ratio (SNR) to be maintained at a relatively high level throughout the whole optimization process. It is worth mentioning that, the core of feedback based wavefront shaping is the greedy algorithms that search for the optimized wavefront. Although the continuous sequential algorithm employed in Ref. [17] is the most efficient one mathematically, in practice, it is susceptible to measurement noises, especially at
10 Imaging Through Scattering Media Using Wavefront Shaping
Fig. 10.4 Imaging results of sweat bee wing behind an optical scatterer. (a) Direct photoacoustic imaging with uniform illumination. (b) Direct photoacoustic imaging with random optical speckle illumination. (c) Photoacous-
the beginning of iterations. Later on, various optimization algorithms with advantages in terms of robustness and convergence rate were proposed and demonstrated [19, 25–32]. Each optimization algorithm has its pros and cons, which can be applied in various scenarios with great flexibility [33]. With this focus, imaging through scattering media can be realized by raster scanning the optical focus. For example, photoacoustic imaging on a sweat bee wing with a complex structure was demonstrated [34]. The biological sample was placed behind an optical scatterer so that both uniform and random speckle illumination could not reveal its structure, as shown in Fig. 10.4a and b, respectively. After achieving an optical focus through feedback based wavefront shaping, a bright focus could be raster scanned to form an image. Rich information like the intersection of the large vein on the leading edge of the wing and a branching vein could be obtained, as shown in Fig. 10.4c. Given the relatively slow raster scanning speed, various types of memory effects [35, 36], including angular memory effect [37, 38], translational [39], rotational [40–42], and spectral memory effects [43–45], can be exploited to increase the scanning speed of the optical focus.
169
tic imaging obtained by raster scanning the optimized focus. (Reprinted by permission from Springer Nature: Nature Communications, Super-resolution Photoacoustic Imaging through a Scattering Wall, Conkey, D. B. et al., © 2015)
10.3
Transmission Matrix Based Wavefront Shaping
The key principle of wavefront shaping is to model the propagation of light in scattering media, as a linear transmission matrix. Thus, retrieving the complete knowledge of the transmission matrix also offers the capability of imaging and focusing light through scattering media. The first demonstration of the transmission matrix based wavefront shaping targeted a microscope slide deposited with 80 ± 25 μm thick ZnO [46]. In this work, the transmission matrix connected the surfaces of the SLM (input) and the camera (output). The sizes of the independent pixels at the input and output planes were chosen to have a perfect matching so that one pixel corresponds to one mode. To retrieve the optical transmission matrix, one has to measure the complex optical fields. A direct implementation is to adopt an interferometric design with phase stepping. To minimize coherent noise introduced by the additional reference beam, the authors employed common path interferometry with the experimental setup shown in Fig. 10.5. In particular, for the entire wavefront that couples into the scattering
170
Y. Shen
Fig. 10.5 Schematics of the experimental setup for transmission matrix based wavefront shaping. L lens, SLM spatial light modulator, D diaphragm, P polarizer, CCD charge-coupled device. (Reprinted by permission from
American Physical Society: Physical Review Letters, Measuring the Transmission Matrix in Optics: An Approach to the Study and Control of Light Propagation in Disordered Media, Popoff, S. M. et al., © 2010)
medium, only the central square part inside the pupil of the microscope objective (65% of the effective area) was modulated, while the rest part (35% of the effective area) served as a static reference. Note that although static, the reference light exhibited randomly distributed speckles across the surface of the camera, leading to an unknown complex factor for each row of the transmission matrix. Nonetheless, since constructive interference is formed for each row independently, this factor does not affect the performance of focusing light through scattering media with the retrieved transmission matrix. Experimentally, the dimension of the transmission matrix was set as N × N, where N = 256. Correspondingly, N Hadamard bases were chosen as the input bases to probe the scattering medium. A four-step phase-shifting method was employed so that a total number of 4 N phase maps were displayed, while the resultant 4 N intensity distributions were measured. A typical example of the measured speckle pattern is shown in Fig. 10.6a. By pairing the illumination patterns and their corresponding measurements, the transmission matrix could be constructed through basic theory in linear algebra. With the retrieved
transmission matrix, focusing light to an arbitrary position could simply be achieved through phase conjugation, as explained in Eq. (10.1). Therefore, by displaying the conjugate phase map as the input, a single bright focus could be observed, as shown in Fig. 10.6b. To further demonstrate focusing light to any spot in the output plane, Fig. 10.6c shows the normalized focusing operator. The strong diagonal terms prove the ability, while weaker sub-diagonal terms are due to the correlations between neighboring pixels. In addition, focusing light on three different spots simultaneously was also demonstrated, as shown in Fig. 10.6d. Instead of raster scanning the optical focus to form an image through scattering media, the knowledge of the transmission medium allows direct imaging of objects hidden behind. The authors further show that once the output field Eout is measured and the transmission matrix T is known, the initial input signal Ein , which represents the information of the hidden object, can be estimated as Ein = T† Eout
.
(10.3)
10 Imaging Through Scattering Media Using Wavefront Shaping 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
a.
c.
171
b.
0
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
d.
0
Fig. 10.6 Experimental demonstrations of focusing light through scattering media. (a) Transmitted intensity distribution with a plane wave illumination. (b) Focusing light to a single spot with a shaped wave illumination. (c) The normalized focusing operator. (d) Focusing light to three
discrete spots with a shaped wave illumination. (Reprinted by permission from American Physical Society: Physical Review Letters, Measuring the Transmission Matrix in Optics: An Approach to the Study and Control of Light Propagation in Disordered Media, Popoff, S. M. et al., © 2010)
where the operator † denotes taking complex conjugate to each element of the matrix. Here, the assumption that TT† ≈ I is utilized, which is generally valid in the existence of strong scattering. In cases where the measurement noise is low and the transmission matrix is accurate, it is preferable to directly employ the pseudo inverse of T to replace T† in Eq. (10.3) [47], leading to an image with a better contrast [48]. As a proof of concept, Fig. 10.6 shows the reconstructed images through scattering media using Eq. (10.3), revealing a single object and two objects in Fig. 10.7a and b, respectively. This result demonstrates that imaging objects placed behind the scattering medium can be obtained by inverting the transmission matrix, regardless of the randomness of the medium. This imaging process can be intuitively considered as
the parallel version of scanning the optical focus mentioned in the previous section. There are plenty of ways to measure the transmission matrix of the scattering medium. The most direct and common one is to employ holographic methods with additional plane reference beams. The advantage of this implementation is that the aforementioned factors can be eliminated, allowing the construction of the exact transmission matrix. This capability expedites a variety of applications through scattering media such as holographic imaging [49] and spatiotemporal imaging [50, 51]. Since the holographic method is sensitive to coherent noise, an alternative approach is to completely abandon the external reference beam. In this condition, retrieving optical fields from pure
172
Y. Shen 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1 0
0
a.
b.
Fig. 10.7 Experimental results of imaging objects hidden behind the scattering medium. (a) Reconstructed image for one object. (b) Reconstructed image for two objects. (Reprinted by permission from American Physical Soci-
ety: Physical Review Letters, Measuring the Transmission Matrix in Optics: An Approach to the Study and Control of Light Propagation in Disordered Media, Popoff, S. M. et al., © 2010)
intensity measurements is essentially solving for the optimum solution of a set of nonlinear equations. Such an implementation has the simplest experimental setup and the highest stability, but without sufficient constraints, this problem is known to generate multiple solutions and can be easily trapped into local maximums. A variety of phase retrieval algorithms have been proposed and demonstrated with great success in accurately retrieving the transmission matrix of the scattering medium [52–60]. In the field of phase retrieval, the Gerchberg–Saxton (GS) algorithm is one of the most famous ones to retrieve the phase values from a pair of related intensity measurements in an iterative manner. By generalizing this algorithm, the transmission matrix of the scattering medium can be retrieved in parallel [58]. Due to its high efficiency, this generalized GS algorithm was further modified with a significantly reduced number of intensity measurements and number of iterations [59, 60], serving as a powerful tool in the fast retrieval of a large optical transmission matrix for optical imaging at a large scale.
an optimization algorithm. In transmission matrix based wavefront shaping, one needs to transverse all possible inputs to construct one row of the transmission matrix. Given the above information, both techniques require a series of measurements which are quite time-consuming. In contrast, optical phase conjugation based wavefront shaping can determine one row of the transmission matrix in a one-time measurement, thus being the most efficient one to focus light through scattering media. The underlying physics of the optical phase conjugation based wavefront shaping is the time-reversal symmetry of wave equations. Suppose we have an incident light propagating from left to right into the scattering medium. Due to scattering, the scattered light diverges, as shown in Fig. 10.8a. Imagining that time can be reversed, the diverged light will trace the original trajectory back to the original state. However, in real life, time reversal is impossible to be realized. Fortunately, however, for monochromatic light with a single frequency, phase conjugation is equivalent to achieving the effect of time reversal. Therefore, as shown in Fig. 10.8b, by using a phase conjugate mirror, the reflected light, i.e., phase conjugated light, can still trace the original path back to the original state, as if time evolves backward. The equivalence between optical time reversal and optical phase conjugation for light with a single frequency can be understood mathematically
10.4
Optical Phase Conjugation Based Wavefront Shaping
In feedback-based wavefront shaping, the optimal phase map is obtained by iteratively running
10 Imaging Through Scattering Media Using Wavefront Shaping
Scattering medium
(b)
Phase conjugate mirror
Phase conjugate mirror
(a)
173
Scattering medium
Fig. 10.8 The principle of optical phase conjugation based wavefront shaping. (a) Forward scattering process. (b) backward propagation of phase-conjugated light. (Reprinted by permission from SPIE – the international
society for optics and photonics: Journal of Biomedical Optics, Focusing Light through Biological Tissue and Tissue-mimicking Phantoms up to 9.6 cm in Thickness with Digital Optical Phase Conjugation, Shen, Y. et al., © 2016)
as follows. Taking one-dimensional plane wave as an example, the optical field E0 along the x-axis can be written as
When implementing optical phase conjugation based wavefront shaping, there are two kinds of phase conjugate mirrors that can be employed. The first one is based on photorefractive crystals, which have been widely used in the 1980s to overcome mode scrambling in multimode fibers [15, 16]. In 2008, it has also been used in optical phase conjugation based wavefront shaping to focus light through biological tissue [61]. This work presents the first demonstration of scattering compensation in biological tissue, showing wavefront shaping has great prospects in deep-tissue biomedical imaging. Although being fast and has the potential to handle a large number of independent modes [62–65], the efficiency of the conjugate wavefront generated by this method is very low even after amplification [66], thus becoming undesirable for imaging applications. Moreover, phase conjugate mirrors realized through photorefractive crystals are sensitive to wavelength and power. Because of these reasons, recent developments in electronic devices make the combination of high-performance cameras and SLMs gradually become the mainstream of phase conjugate mirrors. In this condition, cameras are used to record the wavefront of the scattered light, while SLMs are used to generate the conjugate wavefront of the scattered light. The first demonstration of digital optical phase conjugation was in 2010 and the design of the system is illustrated in Fig. 10.9 [67]. The camera and the SLM were placed at the symmetric
E0 = A exp (i (ωt − kx + ϕ0 ))
.
(10.4)
where A is the amplitude, ω is the angular frequency, k is the wave vector, and ϕ0 is the initial phase. This formula represents light propagating along +x direction. Interestingly, its timereversed companion also satisfies the wave equation, and its optical field ETR becomes ETR = A exp (i (−ωt − kx + ϕ0 ))
.
(10.5)
which represents light propagating along -x direction. By comparing Eqs. (10.4) and (10.5), we note that only the time is reversed. By performing phase conjugation to the optical field in Eq. (10.4), we get the phase-conjugated optical field EOPC as EOPC = A exp (i (ωt + kx − ϕ0 ))
.
(10.6)
Notably, both the time-reversed light in Eq. (10.5) and the phase-conjugated light in Eq. (10.6) deviate by a global minus sign in phase. These two optical fields represent the same traveling wave propagating from right to left. Therefore, phase conjugation is mathematically equivalent to time reversal for monochromatic light.
174
Y. Shen
and requires a precise alignment between the camera and the SLM. After being invented for more than 10 years, various alignment methods have been proposed in aiding the one-to-one pixel matching between the camera and the SLM [68–72]. Nonetheless, the ability to shape the wavefront with a large amount of independent degree of freedom through a one-time measurement makes optical phase conjugation based wavefront shaping promising to overcome dynamic scattering processes in biomedical applications [18, 21, 73, 74]. Fig. 10.9 Schematics of the digital optical phase conjugation. (a) The recording step of the input signal. (b) The playback step of the phase conjugated light. EO electrooptic phase modulator, CCD charge-coupled device, SLM spatial light modulator, Ref reference beam. (Reprinted by permission from Optica Publishing Group: Optics Express, Implementation of a Digital Optical Phase Conjugation System and its Application to Study the Robustness of Turbidity Suppression by Phase Conjugation, Cui, M. and Yang, C., © 2010)
plane of a beam splitter. Such a composite system should work only if the two components are exactly aligned with respect to each other. In other words, the camera and the SLM need to be pixelby-pixel matched. The general process of optical phase conjugation based wavefront shaping can be divided into two steps. In the first recording step, the beam splitter directed the input signal toward the camera, as shown in Fig. 10.9a. A plane reference beam was provided to assist in the measurement of the signal. A computer processed the measured data and retrieved the wavefront of the input signal, i.e., the scattered light. In the second playback step, as shown in Fig. 10.9b, the conjugate phase map was displayed by the SLM. A reference beam acquired the phase map and counter-propagated with respect to the input signal. The conjugated wavefront then achieved an optical focus after passing through the scattering medium. These three different implementations of wavefront shaping find the same optimal phase map [23]. However, compared with the aforementioned two wavefront shaping techniques, the optical phase conjugation based wavefront shaping system is rather complicated
10.5
Guide Star Assisted Wavefront Shaping for Imaging inside Scattering Media
All wavefront shaping techniques introduced above can focus or image through scattering media. However, this capability requires access to the other side of the scattering medium, such as placing a camera or synthesizing a bright spot in prior. However, in many imaging scenarios, especially for deep-tissue biomedical imaging, one does not have the luxury to do so. As a result, wavefront shaping needs to be combined with appropriate guide stars to focus or image through scattering media. The guide star can interact with the scattered light inside scattering media locally, causing detectable changes in intensity, phase, and even frequency [75]. In this condition, the shaped wavefront with respect to this change can focus light on the position of the guide star. As a result, the properties of the guide star, such as its position, size, and contrast mechanism, play important roles in the quality of the formed optical focus. A variety of physical guide stars have been proposed and demonstrated, including fluorescent molecules [76–78], secondary harmonic nanoparticles [79, 80], magnetic particles [81, 82], microbubbles [83], and genetically encoded fluorescent proteins [84]. These guide stars can be selectively detected by various methods, with the advantages of high contrast and small size. However, these guide
10 Imaging Through Scattering Media Using Wavefront Shaping
175
Fig. 10.10 The operating principle of focus light inside scattering media with ultrasonic guide star. (a) In the recording step, only the wavefront of ultrasonically tagged light is measured through frequency selection. (b) In the
playback step, optical phase conjugation is only applied to the ultrasonically tagged light, allowing them to trace their trajectory back to the original ultrasonic focus
stars are generally not biologically friendly and cannot be moved once fixed. The movement of absorption change of endogenous scatterers inside scattering media can be used as guide stars as well [85, 86]. However, they usually appear in large amounts and are difficult to be manipulated and isolated. Compared to physical guide stars, focused ultrasound is a virtual guide star with the advantage of being freely adjustable in biological tissue. In 2011, this guide star was first adopted in optical phase conjugation based wavefront shaping, enabling optical focusing and imaging deep inside scattering media [64]. Figure 10.10 shows the operating principle of employing focused ultrasound as the guide star. Compared with light, ultrasound is relatively transparent in biological tissue and thus can be focused on a small volume inside. In the recording step shown in Fig. 10.10a, a portion of the light passing through the ultrasonic focus is frequency shifted with the same amount of ultrasound frequency, which is marked in red. In the playback step, the ultrasonically tagged light is separated from the untagged one through frequency selection. Then, as shown in Fig. 10.10b, performing optical phase conjugation to the tagged light allows them to trace their trajectory back to the ultrasonic focus where they come. In this way, an optical focus can be formed, despite scattering. By raster scanning the focus and measuring the amount of
ultrasonically tagged light, objects hidden behind the scattering medium can be reconstructed [64]. Later, two independent groups demonstrated that ultrasonically guided optical focusing enables fluorescent imaging deep inside scattering media [87, 88]. Figure 10.11 shows typical fluorescent imaging results by raster scanning the optical focus deep inside biological tissue. A schematic of the imaging setup is shown in Fig. 10.11a, in which the ultrasonic transducer guides the input light to focus inside the scattering medium. The fluorescent target was sandwiched by two pieces of 2.5-mm-thick chicken tissue. Figure 10.11b–d show the clear image before embedding, the direct image after embedding, and the raster-scanned image with wavefront shaping of patterned characters using quantum dots, respectively. In addition, Fig. 10.11e–g show corresponding images of tumor microtissues embedded in tissues, respectively. These results demonstrate the capability of ultrasound-guided wavefront shaping for fluorescence imaging in the diffusive regime. It is worth noting that, the imaging resolution is determined by the size of the ultrasonic focus, which is restricted by the diffraction limit of ultrasound. To overcome this issue, iterative optical phase conjugation has been proposed, which effectively reduces the focal spot size and enhances the peak intensity [89, 90]. Besides being
176
Y. Shen
shaping, the photoacoustic transmission matrix can be directly measured, enabling photoacoustic imaging deep inside scattering media [98, 99]. Therefore, to realize imaging through scattering media in the condition when the imaging plane is inaccessible, developing non-invasive and freely adjustable guide stars with small size and high contrast is highly desirable.
10.6 Fig. 10.11 Fluorescence imaging through scattering media. (a) Schematic of the imaging setup. (b) The clear image of the quantum dot patterned characters before embedding. (c) The direct image of the characters after embedding. The features are not resolved. (d) The rasterscanned image of the embedded characters through wavefront shaping. (e) The clear image of tumor microtissues before embedding. (f) The direct image of tumor microtissues after embedding. (g) The raster-scanned image of the embedded tumor microtissues through wavefront shaping. Blue dots indicate the locations of collected data points. Pixels between data points are interpolated for display using bicubic interpolation. Scale bars, 50 μm. (Reprinted by permission from Springer Nature: Nature Communications, Deep-tissue Focal Fluorescence Imaging with Digitally Time-reversed Ultrasound-encoded Light, Wang, Y. M. et al., © 2012)
implemented from the perspective of hardware, iterative phase conjugation can also be achieved computationally [91]. Moreover, it has also been utilized to facilitate the raster scanning speed in imaging [92]. Ideally, iterative optical phase conjugation leads to a single-speckle focusing. However, possibly due to the measurement noise and system error, the best improvement factors reported experimentally are 2–3 times in resolution and 20 times in peak intensity [89]. Speckle-scale focusing down to 5 μm in the diffusive regime can be realized through variance encoding of the scattered light, but this approach requires thousands of measurements that are time-consuming [93]. It is worth mentioning that besides using focused ultrasound, the photoacoustic effect can also be utilized as a special type of virtual guide star. In feedback based wavefront shaping, the photoacoustic signal can be treated as a feedback mechanism for the optimization algorithm [34, 94–97]. In transmission matrix based wavefront
Discussion and Conclusion
Recent developments in semiconductor manufacturing, computational resources, and information theory breed a variety of new optical computational imaging modalities that deliver multidimensional information in amplitude, phase, polarization, and angular momentum. When applied for practical applications, these imaging modalities generally suffer from the limitations brought by optical scattering. Wavefront shaping techniques explore the optical scattering and control the scattered wavefront at a large scale with high precision, enabling active cancellation of the scattering-induced information scrambling. Therefore, research on wavefront shaping is becoming critically important in imaging through scattering media. Nowadays, wavefront shaping has evolved into a multidisciplinary integration of many research areas, which relies on state-of-the-art technologies and theories. However, limited by the bandwidth of the existing apparatus and the efficiency of the wavefront reconstruction algorithms, the speed of wavefront shaping is not fast enough to handle the dynamic scattering process in real time. Therefore, imaging through dynamic scattering media, such as live animals, turbid water, and air turbulence, is still challenging. Moreover, theoretical descriptions of wavefront shaping on complex systems with nonlinearity remain largely unexplored. We anticipate that with the continuous development of wavefront shaping, scattering media can eventually be made transparent, and bring tremendous convenience to imaging applications in both scientific research and our daily life.
10 Imaging Through Scattering Media Using Wavefront Shaping
References 1. D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito, and et al, “Optical coherence tomography,” Science 254, 1178 (1991). 2. F. Helmchen and W. Denk, “Deep tissue two-photon microscopy,” Nature Methods 2, 932 (2005). 3. B. Chance, K. Kang, L. He, J. Weng, and E. Sevick, “Highly sensitive object location in tissue models with linear in-phase and anti-phase multi-element optical arrays in one and two dimensions,” Proceedings of the National Academy of Sciences 90, 3423–3427 (1993). 4. I. M. Vellekoop and A. P. Mosk, “Universal Optimal Transmission of Light Through Disordered Materials,” Physical Review Letters 101, 120601 (2008). 5. W. Choi, A. P. Mosk, Q. H. Park, and W. Choi, “Transmission eigenchannels in a disordered medium,” Physical Review B 83, 134207 (2011). 6. Y. Chong and A. D. Stone, “Hidden black: Coherent enhancement of absorption in strongly scattering media,” Physical Review Letters 107, 163901 (2011). 7. A. Goetschy and A. D. Stone, “Filtering Random Matrices: The Effect of Incomplete Channel Control in Multiple Scattering,” Physical Review Letters 111, 063901 (2013). 8. S. F. Liew, S. M. Popoff, A. P. Mosk, W. L. Vos, and H. Cao, “Transmission channels for light in absorbing random media: From diffusive to ballistic-like transport,” Physical Review B 89, 224202 (2014). 9. S. M. Popoff, A. Goetschy, S. F. Liew, A. D. Stone, and H. Cao, “Coherent Control of Total Transmission of Light through Disordered Media,” Physical Review Letters 112, 133903 (2014). 10. M. Kim, W. Choi, C. Yoon, G. H. Kim, S.-h. Kim, G.-R. Yi, Q. H. Park, and W. Choi, “Exploring antireflection modes in disordered media,” Opt. Express 23, 12740–12749 (2015). 11. S. F. Liew and H. Cao, “Modification of light transmission channels by inhomogeneous absorption in random media,” Opt. Express 23, 11043–11053 (2015). 12. C. W. Hsu, A. Goetschy, Y. Bromberg, A. D. Stone, and H. Cao, “Broadband Coherent Enhancement of Transmission and Absorption in Disordered Media,” Physical Review Letters 115, 223901 (2015). 13. A. Yamilov, S. Petrenko, R. Sarma, and H. Cao, “Shape dependence of transmission, reflection, and absorption eigenvalue densities in disordered waveguides with dissipation,” Physical Review B 93, 100201 (2016). 14. Y. He, D. Wu, R. Zhang, Z. Cao, Y. Huang, and Y. Shen, “Genetic-algorithm-assisted coherent enhancement absorption in scattering media by exploiting transmission and reflection matrices,” Opt. Express 29, 20353–20369 (2021). 15. P. H. Beckwith, I. McMichael, and P. Yeh, “Image distortion in multimode fibers and restoration
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
177
by polarization-preserving phase conjugation,” Opt. Lett. 12, 510–512 (1987). I. McMichael, P. Yeh, and P. Beckwith, “Correction of polarization and modal scrambling in multimode fibers by phase conjugation,” Opt. Lett. 12, 507–509 (1987). I. M. Vellekoop and A. P. Mosk, “Focusing coherent light through opaque strongly scattering media,” Opt. Lett. 32, 2309–2311 (2007). Y. Liu, C. Ma, Y. Shen, J. Shi, and L. V. Wang, “Focusing light inside dynamic scattering media with millisecond digital optical phase conjugation,” Optica 4, 280–288 (2017). D. B. Conkey, A. N. Brown, A. M. Caravaca-Aguirre, and R. Piestun, “Genetic algorithm optimization for focusing through turbid media in noisy environments,” Opt. Express 20, 4840–4849 (2012). J. Park, J.-H. Park, H. Yu, and Y. Park, “Focusing through turbid media by polarization modulation,” Opt. Lett. 40, 1667–1670 (2015). D. Wang, E. H. Zhou, J. Brake, H. Ruan, M. Jang, and C. Yang, “Focusing through dynamic tissue with millisecond digital optical phase conjugation,” Optica 2, 728–735 (2015). Y. Shen, Y. Liu, C. Ma, and L. V. Wang, “Focusing light through scattering media by full-polarization digital optical phase conjugation,” Opt. Lett. 41, 1130–1133 (2016). Y. Shen, Y. Liu, C. Ma, and L. V. Wang, “SubNyquist sampling boosts targeted light transport through opaque scattering media,” Optica 4, 97–102 (2017). J. Yang, Y. Shen, Y. Liu, A. S. Hemphill, and L. V. Wang, “Focusing light through scattering media by polarization modulation based generalized digital optical phase conjugation,” 111, 201108 (2017). I. M. Vellekoop and A. P. Mosk, “Phase control algorithms for focusing light through turbid media,” Optics Communications 281, 3071–3080 (2008). H. Huang, Z. Chen, C. Sun, J. Liu, and J. Pu, “Light Focusing through Scattering Media by Particle Swarm Optimization,” Chinese Physics Letters 32, 104202 (2015). L. Fang, H. Zuo, Z. Yang, X. Zhang, L. Pang, W. Li, Y. He, X. Yang, and Y. Wang, “Particle swarm optimization to focus coherent light through disordered media,” Applied Physics B 124, 155 (2018). L. Fang, X. Zhang, H. Zuo, and L. Pang, “Focusing light through random scattering media by fourelement division algorithm,” Optics Communications 407, 301–310 (2018). Y. Wu, X. Zhang, and H. Yan, “Focusing light through scattering media using the harmony search algorithm for phase optimization of wavefront shaping,” Optik 158, 558–564 (2018). Z. Wu, J. Luo, Y. Feng, X. Guo, Y. Shen, and Z. Li, “Controlling 1550-nm light through a multimode fiber using a Hadamard encoding algorithm,” Opt. Express 27, 5570–5580 (2019).
178 31. Y. Zhao, Q. He, S. Li, and J. Yang, “Gradient-assisted focusing light through scattering media,” Opt. Lett. 46, 1518–1521 (2021). 32. C. M. Woo, H. Li, Q. Zhao, and P. Lai, “Dynamic mutation enhanced particle swarm optimization for optical wavefront shaping,” Opt. Express 29, 18420– 18426 (2021). 33. I. M. Vellekoop, “Feedback-based wavefront shaping,” Opt. Express 23, 12189–12206 (2015). 34. D. B. Conkey, A. M. Caravaca-Aguirre, J. D. Dove, H. Ju, T. W. Murray, and R. Piestun, “Super-resolution photoacoustic imaging through a scattering wall,” Nature Communications 6, 7902 (2015). 35. G. Osnabrugge, R. Horstmeyer, I. N. Papadopoulos, B. Judkewitz, and I. M. Vellekoop, “Generalized optical memory effect,” Optica 4, 886–892 (2017). 36. H. Liu, Z. Liu, M. Chen, S. Han, and L. V. Wang, “Physical picture of the optical memory effect,” Photon. Res. 7, 1323–1330 (2019). 37. S. Feng, C. Kane, P. A. Lee, and A. D. Stone, “Correlations and Fluctuations of Coherent Wave Transmission through Disordered Media,” Physical Review Letters 61, 834–837 (1988). 38. S. Schott, J. Bertolotti, J. F. Leger, L. Bourdieu, and S. Gigan, “Characterization of the angular memory effect of scattered light in biological tissues,” Opt Express 23, 13505–13516 (2015). 39. B. Judkewitz, R. Horstmeyer, I. M. Vellekoop, I. N. Papadopoulos, and C. Yang, “Translation correlations in anisotropically scattering media,” Nature Physics 11, 684 (2015). 40. C. Wang and N. Ji, “Characterization and improvement of three-dimensional imaging performance of GRIN-lens-based two-photon fluorescence endomicroscopes with adaptive optics,” Opt. Express 21, 27142–27154 (2013). 41. L. V. Amitonova, A. P. Mosk, and P. W. H. Pinkse, “Rotational memory effect of a multimode fiber,” Opt. Express 23, 20569–20575 (2015). 42. C. Ma, J. Di, Y. Li, F. Xiao, J. Zhang, K. Liu, X. Bai, and J. Zhao, “Rotational scanning and multiple-spot focusing through a multimode fiber based on digital optical phase conjugation,” Applied Physics Express 11, 062501 (2018). 43. X. Wei, Y. Shen, J. C. Jing, A. S. Hemphill, C. Yang, S. Xu, Z. Yang, and L. V. Wang, “Real-time frequency-encoded spatiotemporal focusing through scattering media using a programmable 2D ultrafine optical frequency comb,” Science Advances 6, eaay1192 (2020). 44. L. Zhu, J. Boutet de Monvel, P. Berto, S. Brasselet, S. Gigan, and M. Guillon, “Chromato-axial memory effect through a forward-scattering slab,” Optica 7, 338–345 (2020). 45. R. Zhang, J. Du, Y. He, Z. C. Luo, and Y. Shen, “Characterization of the spectral memory effect of scattering media,” Opt. Express 29(2021). 46. S. M. Popoff, G. Lerosey, R. Carminati, M. Fink, A. C. Boccara, and S. Gigan, “Measuring the Transmis-
Y. Shen
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
sion Matrix in Optics: An Approach to the Study and Control of Light Propagation in Disordered Media,” Physical Review Letters 104, 100601 (2010). S. M. Popoff, G. Lerosey, M. Fink, A. C. Boccara, and S. Gigan, “Controlling light through optical disordered media: transmission matrix approach,” New Journal of Physics 13, 123021 (2011). J. Xu, H. Ruan, Y. Liu, H. Zhou, and C. Yang, “Focusing light through scattering media by transmission matrix inversion,” Opt. Express 25, 27234–27246 (2017). K. Lee and Y. Park, “Exploiting the specklecorrelation scattering matrix for a compact referencefree holographic image sensor,” Nature Communications 7, 1–7 (2016). D. Andreoli, G. Volpe, S. Popoff, O. Katz, S. Grésillon, and S. Gigan, “Deterministic control of broadband light through a multiply scattering medium via the multispectral transmission matrix,” Scientific Reports 5, 10347 (2015). M. Mounaix, D. Andreoli, H. Defienne, G. Volpe, O. Katz, S. Grésillon, and S. Gigan, “Spatiotemporal Coherent Control of Light through a Multiple Scattering Medium with the Multispectral Transmission Matrix,” Physical Review Letters 98, 727–734 (2016). A. Drémeau, A. Liutkus, D. Martina, O. Katz, C. Schülke, F. Krzakala, S. Gigan, and L. Daudet, “Reference-less measurement of the transmission matrix of a highly scattering material using a DMD and phase retrieval techniques,” Opt. Express 23, 11898– 11911 (2015). M. N’Gom, M.-B. Lien, N. M. Estakhri, T. B. Norris, E. Michielssen, and R. R. Nadakuditi, “Controlling light transmission through highly scattering media using semi-definite programming as a phase retrieval computation method,” Scientific Reports 7, 1–9 (2017). M. N’Gom, T. B. Norris, E. Michielssen, and R. R. Nadakuditi, “Mode control in a multimode fiber through acquiring its transmission matrix from a reference-less optical system,” Opt. Lett. 43, 419–422 (2018). L. Deng, J. D. Yan, D. S. Elson, and L. Su, “Characterization of an imaging multimode optical fiber using a digital micro-mirror device based single-beam system,” Opt. Express 26, 18436–18447 (2018). T. Zhao, L. Deng, W. Wang, D. S. Elson, and L. Su, “Bayes’ theorem-based binary algorithm for fast reference-less calibration of a multimode fiber,” Opt. Express 26, 20368–20378 (2018). G. Huang, D. Wu, J. Luo, Y. Huang, and Y. Shen, “Retrieving the optical transmission matrix of a multimode fiber using the extended Kalman filter,” Opt. Express 28, 9487–9500 (2020). G. Huang, D. Wu, J. Luo, L. Lu, F. Li, Y. Shen, and Z. Li, “Generalizing the Gerchberg-Saxton algorithm for retrieving complex optical transmission matrices,” Photon. Res. 9, 34–42 (2021).
10 Imaging Through Scattering Media Using Wavefront Shaping 59. Z. Wang, D. Wu, G. Huang, J. Luo, B. Ye, Z. Li, and Y. Shen, “Feedback-assisted transmission matrix measurement of a multimode fiber in a referenceless system,” Opt. Lett. 46, 5542–5545 (2021). 60. D. Ancora, L. Dominici, A. Gianfrate, P. Cazzato, M. De Giorgi, D. Ballarini, D. Sanvitto, and L. Leuzzi, “Speckle spatial correlations aiding optical transmission matrix retrieval: the smoothed Gerchberg– Saxton single-iteration algorithm,” Photon. Res. 10, 2349–2358 (2022). 61. Z. Yaqoob, D. Psaltis, M. S. Feld, and C. Yang, “Optical phase conjugation for turbidity suppression in biological samples,” Nature Photonics 2, 110 (2008). 62. E. McDowell, M. Cui, I. Vellekoop, V. Senekerimyan, V. Senekerimyan, Z. Yaqoob, and C. Yang, “Turbidity suppression from the ballistic to the diffusive regime in biological tissues using optical phase conjugation,” Journal of Biomedical Optics 15, 025004 (2010). 63. P. Lai, X. Xu, H. Liu, Y. Suzuki, and L. Wang, “Reflection-mode time-reversed ultrasonically encoded optical focusing into turbid media,” Journal of Biomedical Optics 16, 080505 (2011). 64. X. Xu, H. Liu, and L. V. Wang, “Time-reversed ultrasonically encoded optical focusing into scattering media,” Nature Photonics 5, 154 (2011). 65. Y. Liu, P. Lai, C. Ma, X. Xu, A. A. Grabar, and L. V. Wang, “Optical focusing deep inside dynamic scattering media with near-infrared time-reversed ultrasonically encoded (TRUE) light,” Nature Communications 6, 5904 (2015). 66. B. Jayet, J. P. Huignard, and F. Ramaz, “Optical phase conjugation in Nd:YVO4 for acousto-optic detection in scattering media,” Opt. Lett. 38, 1256–1258 (2013). 67. M. Cui and C. Yang, “Implementation of a digital optical phase conjugation system and its application to study the robustness of turbidity suppression by phase conjugation,” Opt. Express 18, 3444–3455 (2010). 68. M. Jang, H. Ruan, H. Zhou, B. Judkewitz, and C. Yang, “Method for auto-alignment of digital optical phase conjugation systems based on digital propagation,” Opt. Express 22, 14054–14071 (2014). 69. M. Azimipour, F. Atry, and R. Pashaie, “Calibration of digital optical phase conjugation setups based on orthonormal rectangular polynomials,” Appl. Opt. 55, 2873–2880 (2016). 70. A. Hemphill, Y. Shen, J. Hwang, and L. Wang, “High-speed alignment optimization of digital optical phase conjugation systems based on autocovariance analysis in conjunction with orthonormal rectangular polynomials,” Journal of Biomedical Optics 24, 11 (2018). 71. Y.-W. Yu, C.-C. Sun, X.-C. Liu, W.-H. Chen, S.-Y. Chen, Y.-H. Chen, C.-S. Ho, C.-C. Lin, T.-H. Yang, and P.-K. Hsieh, “Continuous amplified digital optical phase conjugator for focusing through thick, heavy scattering medium,” OSA Continuum 2, 703–714 (2019).
179
72. C. K. Mididoddi, R. A. Lennon, S. Li, and D. B. Phillips, “High-fidelity off-axis digital optical phase conjugation with transmission matrix assisted calibration,” Opt. Express 28, 34692–34705 (2020). 73. Y. Liu, C. Ma, Y. Shen, and L. V. Wang, “Bitefficient, sub-millisecond wavefront measurement using a lock-in camera for time-reversal based optical focusing inside scattering media,” Opt. Lett. (2016). 74. A. Hemphill, Y. Shen, Y. Liu, and L. Wang, “Highspeed single-shot optical focusing through dynamic scattering media with full-phase wavefront shaping,” Applied Physics Letters 111, 221109 (2017). 75. R. Horstmeyer, H. Ruan, and C. Yang, “Guidestarassisted wavefront-shaping methods for focusing light into biological tissue,” Nature Photonics 9, 563 (2015). 76. I. Vellekoop, E. Van Putten, A. Lagendijk, and A. Mosk, “Demixing light paths inside disordered metamaterials,” Opt. Express 16, 67-80 (2008). 77. I. M. Vellekoop and C. M. Aegerter, “Scattered light fluorescence microscopy: imaging through turbid layers,” Opt. Lett. 35, 1245–1247 (2010). 78. I. M. Vellekoop, M. Cui, and C. Yang, “Digital optical phase conjugation of fluorescence in turbid tissue,” Applied Physics Letters 101, 081108 (2012). 79. C.-L. Hsieh, Y. Pu, R. Grange, G. Laporte, and D. Psaltis, “Imaging through turbid layers by scanning the phase conjugated second harmonic radiation from a nanoparticle,” Opt. Express 18, 20723–20731 (2010). 80. C.-L. Hsieh, Y. Pu, R. Grange, and D. Psaltis, “Digital phase conjugation of second harmonic radiation emitted by nanoparticles in turbid media,” Opt. Express 18, 12283–12290 (2010). 81. H. Ruan, T. Haber, Y. Liu, J. Brake, J. Kim, J. Berlin, and C. Yang, “Focusing light inside scattering media with magnetic-particle-guided wavefront shaping,” Optica 4, 1337–1343 (2017). 82. Z. Yu, J. Huangfu, F. Zhao, M. Xia, X. Wu, X. Niu, D. Li, P. Lai, and D. Wang, “Time-reversed magnetically controlled perturbation (TRMCP) optical focusing inside scattering media,” Scientific Reports 8, 2927 (2018). 83. H. Ruan, M. Jang, and C. Yang, “Optical focusing inside scattering media with time-reversed ultrasound microbubble encoded light,” Nature Communications 6, 8968 (2015). 84. J. Yang, L. Li, A. A. Shemetov, S. Lee, Y. Zhao, Y. Liu, Y. Shen, J. Li, Y. Oka, V. V. Verkhusha, and L. V. Wang, “Focusing light inside live tissue using reversibly switchable bacterial phytochrome as a genetically encoded photochromic guide star,” 5, eaay1211 (2019). 85. C. Ma, X. Xu, Y. Liu, and L. V. Wang, “Timereversed adapted-perturbation (TRAP) optical focusing onto dynamic objects inside scattering media,” Nature Photonics 8, 931 (2014). 86. E. H. Zhou, H. Ruan, C. Yang, and B. Judkewitz, “Focusing on moving targets through scattering samples,” Optica 1, 227–232 (2014).
180 87. Y. M. Wang, B. Judkewitz, C. A. DiMarzio, and C. Yang, “Deep-tissue focal fluorescence imaging with digitally time-reversed ultrasound-encoded light,” Nature Communications 3, 928 (2012). 88. K. Si, R. Fiolka, and M. Cui, “Fluorescence imaging beyond the ballistic regime by ultrasound-pulseguided digital phase conjugation,” Nature Photonics 6, 657 (2012). 89. H. Ruan, M. Jang, B. Judkewitz, and C. Yang, “Iterative time-reversed ultrasonically encoded light focusing in backscattering mode,” Scientific Reports 4, 7156 (2014). 90. K. Si, R. Fiolka, and M. Cui, “Breaking the spatial resolution barrier via iterative sound-light interaction in deep tissue microscopy,” Scientific Reports 2, 748 (2012). 91. D. Aizik, I. Gkioulekas, and A. Levin, “Fluorescent wavefront shaping using incoherent iterative phase conjugation,” Optica 9, 746–754 (2022). 92. Y. Suzuki, J. W. Tay, Q. Yang, and L. V. Wang, “Continuous scanning of a time-reversed ultrasonically encoded optical focus by reflection-mode digital phase conjugation,” Opt. Lett. 39, 3441–3444 (2014). 93. B. Judkewitz, Y. M. Wang, R. Horstmeyer, A. Mathy, and C. Yang, “Speckle-scale focusing in the diffusive regime with time reversal of variance-encoded light (TROVE),” Nature Photonics 7, 300 (2013).
Y. Shen 94. F. Kong, R. H. Silverman, L. Liu, P. V. Chitnis, K. K. Lee, and Y. C. Chen, “Photoacoustic-guided convergence of light through optically diffusive media,” Opt. Lett. 36, 2053–2055 (2011). 95. A. M. Caravaca-Aguirre, D. B. Conkey, J. D. Dove, H. Ju, T. W. Murray, and R. Piestun, “High contrast three-dimensional photoacoustic imaging through scattering media by localized optical fluence enhancement,” Opt. Express 21, 26671–26676 (2013). 96. P. Lai, L. Wang, J. W. Tay, and L. V. Wang, “Photoacoustically guided wavefront shaping for enhanced optical focusing in scattering media,” Nature Photonics 9, 126 (2015). 97. M. A. Inzunza-Ibarra, E. Premillieu, C. Grünsteidl, R. Piestun, and T. W. Murray, “Sub-acoustic resolution optical focusing through scattering using photoacoustic fluctuation guided wavefront shaping,” Opt. Express 28, 9823–9832 (2020). 98. T. Chaigne, O. Katz, A. C. Boccara, M. Fink, E. Bossy, and S. Gigan, “Controlling light in scattering media non-invasively using the photoacoustic transmission matrix,” Nature Photonics 8, 58 (2013). 99. T. Zhao, S. Ourselin, T. Vercauteren, and W. Xia, “High-speed photoacoustic-guided wavefront shaping for focusing light in scattering media,” Opt. Lett. 46, 1165–1168 (2021).
Coded Ptychographic Imaging
11
Shaowei Jiang, Tianbo Wang, and Guoan Zheng
Abstract
Ptychography was originally proposed to solve the phase problem in electron crystallography. In the past decade, it has evolved into an enabling imaging technique for both fundamental and applied sciences. This chapter discusses the background and implementations relevant to high-resolution optical imaging using ptychography. It begins with a brief overview of the phase retrieval concept and the ptychography technique. Subsequently, it focuses on the information encoding process in ptychography and discusses two strategies for high-resolution optical imaging, namely coded illumination and coded detection. For coded illumination, we present the Fourier ptychography approach that uses angle-varied plane waves for widefield, high-resolution imaging. For coded detection, we present the lensless coded ptychography approach that uses a coded image sensor for high-throughput optical imaging. These two approaches can address the intrinsic trade-off between resolution and field of view of an optical system, thereby offering new opportunities for biomedical imaging in the visible light regime. For both approaches, we provide a detailed S. Jiang · T. Wang · G. Zheng () Department of Biomedical Engineering, University of Connecticut, Storrs, CT, USA e-mail: [email protected]; [email protected]; [email protected]
procedure for implementing ptychographic reconstruction and discuss how to avoid the most common pitfalls encountered in the experiments. The related biomedical applications and concept extensions are also discussed. Finally, we outline several perspectives for future development. Keywords
Phase retrieval · Ptychography · Coherent diffraction imaging · Coded illumination · Coded detection · Super-resolution imaging · Lensless imaging · Quantitative phase imaging · High-throughput optical imaging · Fourier ptychography · Coded ptychography · Diffraction tomography
11.1
Phase Retrieval and Ptychography
Ptychography is a computational imaging technique that relies on an information encoding and decoding process. The encoding process is performed by modulating the complexvalued object with a structured probe beam. The decoding process is performed by an inversion algorithm that recovers the complexvalued object from intensity-only measurements. In the past decade, ptychography has grown significantly and become an indispensable imaging tool in most X-ray synchrotrons and
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_11
181
182
S. Jiang et al.
national laboratories worldwide. In the visible light regime, new developments continue to emerge and offer unique solutions for highresolution, high-throughput optical imaging with minimum hardware modifications. The demonstrated imaging throughput can now be greater than that of a high-end whole slide scanner. In this section, we first provide a brief overview of the phase retrieval concept and discuss how to recover the phase information using a single diffraction measurement. We then introduce the ptychography technique and discuss its imaging model. Two information encoding strategies are highlighted, namely coded illumination using a structured probe beam and coded detection using a structured surface. Several ptychographic implementations are surveyed based on these two information encoding strategies.
11.1.1 Phase Retrieval from a Single Diffraction Measurement Phase characterizes optical delay during a wave propagation process. Conventional optical detectors can only measure intensity variations of the incoming light wave. The associated phase information is lost in the detection process. The loss of phase information is termed ‘the phase problem’. It was first noted in the field of crystallography, where materials under study are usually crystals with periodic structures. Figure 11.1a shows the layout of a typical crystallography experiment: a crystalline object O(x, y) is illuminated by a coherent X-ray beam and an image sensor is placed in the far field for diffraction data acquisition. Farfield propagation of light waves is equivalent to the operation of a Fourier transform. In Fig. 11.1a, we use ‘FT’ to denote the Fourier transform of light waves from real space (x, y) to reciprocal space (kx , ky ). The captured diffraction image in the far field can be expressed as the squared magnitude of the object’s Fourier spectrum: I kx , ky = |F T {O (x, y)}|2
.
(11.1)
Figure 11.1b shows a representative intensity image captured in a crystallography experiment, where the bright spots represent the Bragg peaks that are associated with the crystalline periodicity. In 1952, Sayre envisioned that the crystalline structure can be determined if the phase information of the diffraction pattern can be recovered in reciprocal space [1]. To address the phase problem in crystallography, Hauptman and Karle introduced the atomicity constraint that made reconstructions of small molecules a routine exercise [3]. They also won the Nobel Prize in Chemistry in 1985 “for their outstanding achievements in the development of direct methods for the determination of crystal structure”. To demonstrate the use of atomicity constraint for phase retrieval, we consider the example in Fig. 11.1a, where the object is a 2D periodic crystal with its unit cell consisting of five atoms. The following procedure can be used to recover the object O(x, y) based on a single diffraction measurement [4]: Step 1: Generate a random initial guess of the object electron density O(x, y) in real space. Step 2: Perform a Fourier transform of O(x, y) and obtain its Fourier spectrum .Oˆ kx , ky = F T {O (x, y)} in reciprocal space. Step 3: The Fourier magnitude constraint is imposed by replacing the magnitude of .Oˆ kx , ky with . I kx , ky while keeping the phase unchanged. Step 4: Perform an inverse Fourier transform to the object spectrum and obtain the updated object O(x, y) in real space. Step 5: The atomicity constraint is imposed by locating five non-overlapping pixels that have the maximum intensity values. The values of these five pixels are kept unchanged and the values of other pixels are set to zeros. Step 6: Repeat Steps 2–5 until the object estimate converges. While there are many factors to consider in a practical crystallography experiment, the simplified procedure discussed above captures the essence of the phase retrieval process: different constraints of the object are iteratively imposed in the reconstruction process. Expressed in more
11 Coded Ptychographic Imaging
183
Fig. 11.1 (a) A typical experimental setup for X-ray crystallography, where ‘FT’ denotes Fourier transform. (b) A typical diffraction pattern captured in the far field [2], where the locations of the Bragg peaks (the
bright spots) can be used to infer the crystalline structure in real space. ((b) is reprinted by permission from https://wellcomecollection.org/works/jxhce79r/items: Xray diffraction pattern of the enzyme glutamate, Baker, P., © CC BY 4.0)
general terms, phase retrieval is a process of finding an object estimate that simultaneously satisfies constraints A, B, C, etc. Constraint A can be that the object has the correct Fourier modulus as the actual measurement. Constraint B can be, for example, the atomicity constraint that the object only contains a certain number of atoms in a unit cell. For non-crystalline structures, an analogy to the atomicity constraint is the object sparsity constraint, in which objects are assumed to be spatially isolated and located at different positions in real space. Each atom in the atomicity constraint becomes an isolated object (with a sharp boundary) in the object sparsity constraint. Specimen with this property only has non-zero contrasts on a small fraction of its imaging field of view. The non-zero contrast area of isolated objects is also termed the ‘object support’, which can be determined by the autocorrelation map of the object (the inverse Fourier transform of the captured farfield diffraction pattern) [5]. The phase retrieval process can be implemented by iteratively imposing the Fourier modulus constraint and the object sparsity constraint. Similar to the case in crystallography, the Fourier modulus constraint can be imposed by replacing the modulus of the object spectrum with the measured modulus while keeping the phase unchanged. The object sparsity constraint can be imposed by first estimating the support area from the autocorrelation map. Signals outside the object support area can then be
set to zeros while signals within the support area are kept unchanged. In addition to the simple alternating projection strategy discussed above, other iterative algorithms have also been developed for phase retrieval. One prominent example is the hybrid input-output algorithm reported by Fienup in 1982 [5]. Unlike the strategy of alternatively imposing the Fourier and object constraint, the hybrid input-output algorithm skips the object domain step and replaces it with a negative feedback acting upon the estimate in the previous iteration [5]. We also note that, the hybrid input-output and its related algorithms [4, 6] are different instances of the Douglas-Rachford algorithm [7].
11.1.2 Ptychography for Phase Retrieval For extended objects like most confluent biospecimens, the non-zero contrast of the object extends across the entire imaging field of view. Therefore, one cannot define a support area for imposing the object constraint discussed in the previous section. Ptychography is able to address this problem and provides a reliable solution for imaging extended objects without using any lens. The original concept was developed by Hoppe in a three-part paper series in 1969 [8]. By translating a confined coherent probe beam on a crys-
184
S. Jiang et al.
Fig. 11.2 (a) Operating principle of modern ptychography. (b) The imaging model of a typical ptychographic implementation. (c) The ptychographic reconstruction process
talline object, Hoppe aims to extract the phase of the Bragg peaks in reciprocal space [8]. The name ‘ptychography’ is derived from the Greek ‘ptycho’, meaning ‘convolution’ in German. In the original concept, the interaction between the probe beam and the object can be approximated by a multiplication process in real space. Equivalently, it can be modeled by a reciprocal-space convolution between the probe beam’s Fourier spectrum and the Bragg peaks. The convolution operation is a key aspect of this technique, justifying its name. In 2004, Faulkner and Rodenburg adopted the iterative phase retrieval framework for ptychography, thereby bringing the technique to its modern form [9]. Although the original concept was developed to solve the phase problem in crystallography, the modern form of this technique is equally applicable to non-crystalline structures. Figure 11.2a shows the operation of a modern ptychographic implementation: an extended object is illuminated by a spatially confined probe beam and the image sensor is placed at the far field for diffraction data acquisition. The forward imaging model of this scheme is shown in Fig. 11.2b, where the complex object O(x, y) is modulated
by the confined probe beam Probe(x, y) and the resulting exit wave propagates to the far field via a Fourier transform. The detector placed at reciprocal space then measures the squared modulus of the Fourier spectrum. By translating the object to different lateral positions (xi , yi )s, a collection of diffraction patterns Ii (kx , ky ) (i = 1, 2, 3 . . . ), termed ptychogram, can be acquired for reconstruction. Figure 11.2c shows a typical iterative reconstruction process of ptychography, where two constraints are iteratively imposed for phase retrieval. The first constraint is the Fourier modulus constraint of the captured diffraction pattern in reciprocal space. Similar to the discussion in the previous section, it can be imposed by replacing the modulus of the estimated spectrum with the measurement while keeping the phase unchanged. The second constraint is the compact support constraint of the object exit wave. It can be imposed by setting the signals outside the probe beam area to zeros while keeping the signals inside unchanged. In ptychography, these two constraints can be imposed for each measurement in an inner loop. The inner loop can then be repeated multiple
11 Coded Ptychographic Imaging
times (outer loops) until the object estimate converges. To get a good quality reconstruction in ptychography, it is important to have overlapped areas of illumination during the object translation process. We highlight this point in the middle panel of Fig. 11.2c, where the illumination area is represented by a circular disc. The overlapped area in-between different circular discs in Fig. 11.2c allows the object to be visited multiple times in the acquisition process, thereby resolving the ambiguities of the phase retrieval process. If no overlap is imposed, the phase retrieval process will be carried out independently for each acquisition, in a way similar to the single-shot scheme discussed in the previous section. If the probe beam is relatively large and does not contain a sharp boundary (i.e., not a compact support), it will lead to the usual ambiguity inherent to the phase problem. The resulting reconstruction will be in low quality. Compared with single-shot phase retrieval, ptychography can routinely image contiguously connected samples over an extended area. It requires neither a compact support constraint of the object nor a reference wave. The rich information provided by a ptychogram also allows the correction of system imperfections in a ptychographic experiment. For example, the effect of limited spatial coherence of a light source can be corrected via state mixture modeling [10]. One can also jointly recover the object and the confined probe beam in the phase retrieval process [11–13]. By adopting the multi-slice model in ptychography, 3D volumetric information can be recovered from a set of 2D measurements [14–16]. In the past decade, ptychography has grown significantly and become an enabling computational imaging technique for different fields. In the regimes of X-ray and extreme ultraviolet wavelengths, the lensless nature of ptychography has made it an indispensable tool in most X-ray synchrotrons and national laboratories worldwide [17–19]. In the regime of visible light, new developments of Fourier ptychography [20] and lensless coded ptychography [21, 22] offer unique solutions for
185
high-resolution, high-throughput optical imaging with minimum hardware modifications. In the regime of transmission electron microscopy, recent development has pushed the achieved resolution to the record-breaking deep subangstrom limit [23].
11.1.3 Information Encoding Strategies in Ptychography Conventional ptychography uses a confined probe beam to encode object information for detection. It can be generalized into a codedillumination scheme shown in Fig. 11.3a, where the structured probe beam multiplies with the complex object and the resulting object exit waves pass through an optical system for intensity detection. In this scheme, the object information is encoded by the structured probe beam as highlighted by the red box in Fig. 11.3a. For conventional ptychography, the probe beam is a spatially-confined circular disc, and the optical system is just an empty space for far-field propagation. In this coded illumination scheme, the structured probe beam can also take the form of anglevaried plane waves or extended speckle patterns. Similarly, the optical system can take the form of a regular lens-based microscope or an empty space for near-field Fresnel propagation. Fourier ptychography [20] and near-field ptychography [24] are two other representative implementations of this coded-illumination scheme. In Fourier ptychography, angle-varied plane waves are used for object illumination and the optical system is a regular lens-based microscope. Different phase gradients of angle-varied plane waves provide nonuniform ‘structures’ to the illumination beam. On the other hand, near-field ptychography uses non-uniform speckle patterns for object illumination. The optical system is an empty space for near-field Fresnel propagation. This near-field ptychography approach can also be implemented with a lens-based microscope setup for imaging beyond the resolution limit of the employed objective lens [25, 26].
186
S. Jiang et al.
Fig. 11.3 Two information encoding strategies for ptychography: coded illumination, and coded detection. (a) With coded illumination, the structured illumination beam multiplies with the object and the resulting object exit wave passes through an optical system for data acquisition. The structured beam can take the form of a spatially confined beam, an extended speckle pattern, or angle-
varied plane waves. (b) With coded detection, the structured surface multiplies with the object wavefront and the resulting exit wave passes through an optical system for data acquisition. The structured surface can take the form of a confined aperture, a confined phase plate, a diffuser, a disorder-engineered surface, a blood-coated surface, or a programmable spatial light modulator
Figure 11.3b shows a different information encoding strategy for ptychography. Instead of illuminating the object with a structured probe beam, it illuminates the object with a normal-incident plane wave and adopts a structured surface downstream for coded detection. In this scheme, the transmission profile of the structured surface multiples with the object wavefront and the resulting object exit wave passes through an optical system for intensity detection. The structured surface in this scheme can be a spatially confined aperture [27–29], a diffuser [22, 30–32], a disorderengineered surface [21], a blood-coated surface [33, 34], or a spatial light modulator [35]. Among these different implementations, ptychographic structured modulation [30, 31] and coded ptychography [21, 22, 33, 36] are two examples for high-resolution optical imaging. With ptychographic structured modulation, the optical system is a regular lens-based microscope. A diffuser is placed in between the object and the objective lens, serving as a computational scattering lens for redirecting the large-angle diffracted waves into smaller angles for detection. Therefore, the otherwise inaccessible high-resolution object information can now be encoded into the system, in a way similar to structured illumination microscopy [37]. It has been shown that this ap-
proach can achieve a 4.5-fold resolution gain over the diffraction limit of the employed objective lens [30]. On the other hand, lensless coded ptychography directly attaches the structured surface on the image sensor’s coverglass. The optical system in this approach is an empty space between the coverglass and the pixel array, with a propagation distance typically less than 1 mm. A recent demonstration of this approach can resolve a linewidth of 308 nm in the visible light regime [21]. The demonstrated detection numerical aperture (NA) is 0.8, the highest among different lensless ptychographic implementations. The field of ptychography is currently experiencing an exponential growth. Covering many aspects of ptychography is beyond the scope of this chapter. In the following, we only focus on two ptychographic implementations for high-resolution, high-throughput imaging in the visible light regime. In Sect. 11.2, we discuss the Fourier ptychography approach as an example of the coded-illumination scheme. In Sect. 11.3, we discuss the lensless coded ptychography approach as an example of the coded-detection scheme. In Sect. 11.4, we summarize our discussions and outline several directions for future development.
11 Coded Ptychographic Imaging
11.2
Fourier Ptychography via Coded Illumination
Fourier ptychography adopts the codedillumination strategy to address the intrinsic trade-off between imaging resolution and field of view [20]. In the past decade, it has evolved from a simple microscope tool to a general imaging technique for different research communities [38–40]. In addition to the original 2D microscopy demonstration, the concept has been applied to far-field synthetic aperture imaging [41, 42], high-resolution Xray nanoscopy [43], 3D Fourier ptychographic diffraction tomography [44, 45], among others. In this section, we first discuss the imaging model and the implementation procedures of Fourier ptychography. We then discuss how to properly set up a Fourier ptychographic imaging experiment. Lastly, we discuss its applications and concept extensions.
187
this object with a plane wave .eikxi x eikyi y in the spatial domain is equivalent to shifting the object spectrum by (kxi , kyi ) in the Fourier plane. The middle panel of Fig. 11.4b shows the shifted object spectrum filtered by the pupil aperture of the objective lens. The size of the circular pupil is determined by the NA of the objective lens. The right panel of Fig. 11.4b shows the corresponding captured low-resolution intensity image at the image plane. Figure 11.4c shows the reconstruction process of Fourier ptychography. It first initializes the object O(x, y) and the pupil aberration Pupil(kx , ky ). The object spectrum .Oˆ kx , ky is then recovered following the iterative phase retrieval process. Finally, the object spectrum is transformed back to the spatial domain for obtaining the real-space object profile O(x, y). The detailed recovery procedures are provided as follows: Step 1: Initialize object O(x, y) in the spatial domain by averaging the square root of the captured raw images Ii (x, y) (i = 1, 2, . . . , T):
11.2.1 Imaging Model and Reconstruction Procedures Figure 11.4a shows a typical implementation of Fourier ptychography, where the object is placed at the object plane (x, y), the pupil aperture is placed at the Fourier plane (kx , ky ), and the detector is placed at the image plane (x, y). In this setup, a programmable LED array illuminates the object with different incident wavevectors (kxi , kyi )s and the microscope system records the corresponding intensity images Ii (x, y)s. The objective lens of the microscope performs a Fourier transform (denoted as ‘FT’ in Fig. 11.4) to convert the object light waves from the spatial domain to the Fourier domain. The tube lens of the microscope performs another Fourier transform to convert light waves back to the spatial domain for image acquisition. Figure 11.4b shows the forward imaging model of Fourier ptychography. The left panel of Fig. 11.4b shows a 2D thin object with both intensity and phase properties. Illuminating
O (x, y) =
.
1 T Ii (x, y) i=1 T
(11.2)
Step 2: Up-sample the spatial-domain initializathe object Fourier spectrum tion, and generate ˆ kx , ky : .O .
Oˆ kx , ky = F T O(x, y)↑(M/m)
(11.3)
where ‘FT’ represents the Fourier transformation, ‘↑(M/m)’ represents the up-sampling process with a factor of ‘M/m’. We note that the captured raw images Ii (x, y) have a dimension of m × m pixels, while the Fourier spectrum ˆ kx , ky and the final recovered object O(x, y) .O have a dimension of M × M pixels. For example, if we use a 2×, 0.1 NA objective lens for image acquisition and aim to synthesize a NA of 0.5, the factor ‘M/m’ can be 5 or larger. Step 3: Initialize the pupil function Pupil(kx , ky ) as a lowpass filter:
188
S. Jiang et al.
Fig. 11.4 Imaging model and operating principle of Fourier ptychography. (a) A typical implementation of Fourier ptychography, where ‘FT’ denotes Fourier trans-
2π P upil kx , ky = circ NA · λ
.
= Oˆ
form inverse Fourier transform of the filtered f il Fourier spectrum .Oˆ i kx , ky to generate the low-resolution image in the spatial domain.
(11.4)
where ‘circ’ represents a circular mask, ‘NA’ represents the numerical aperture of the optical system, λ is the central wavelength of the employed LED light source. We assume no pupil aberration in the initialization process. It is also possible to measure pupil aberration using a calibration target [46]. crop Step 4: Crop a sub-region .Oˆ i kx , ky with m × m pixels from the object spectrum ˆ kx , ky : .O
.
form. (b) The forward imaging model of Fourier ptychography. (c) The iterative phase retrieval process
crop Oˆ i (1 : m, 1 : m)
M +m M −m + kxi − 1, + kxi : 2 2
M +m M −m + kyi − 1 + kyi : 2 2 (11.5)
.
crop f il k x , ky Oˆ i kx , ky = P upil kx , ky · Oˆ i (11.6) f il
Oi
.
(11.7)
where ‘·’ represents point-wise multiplication, ‘FT−1 ’ represents inverse Fourier transform, f il and .Oi (x, y) represents the generated lowresolution image in the spatial domain. Step 6: Replace the amplitude using the square root of the ith measurement while keeping the phase unchanged. Perform a Fourier transform of the updated low-resolution image to generate the updated filtered spectrum ˆ if il kx , ky . .O f il .Oi
where kxi and kyi represent the shift in the Fourier domain based on the ith illumination wavevector. Step 5: Multiply the pupil function Pupil(kx , ky ) crop kx , ky . Perwith the cropped spectrum .Oˆ i
f il (x, y) = F T −1 Oˆ i kx , ky ,
f il Oi (x, y)
(x, y) = Ii (x, y)·
f il
Oi (x, y)
(11.8) .
f il f il kx , ky = F T Oi (x, y) (11.9) Oˆ i
11 Coded Ptychographic Imaging
189
crop Step 7: Jointly update the sub-region .Oˆ i kx , ky 11.2.2 Key Considerations for a Fourier Ptychography and the pupil function Pupil(kx , ky ) using the Experiment regularized ptychographic iterative engine (rPIE) algorithm [47]: Figure 11.5a shows a typical Fourier ptychographic imaging platform built using a regular ˆ crop kx , ky = Oˆ crop kx , ky .O i i upright microscope and a programmable LED f il f il array [20]. Figure 11.5b shows a large field-ofconj P upil kx , ky • Oˆ i kx , ky − Oˆ i kx , ky +
2 2 view image of a blood smear captured using a (1 − αO ) P upil kx , ky + αO P upil kx , ky max 2× objective lens [48]. The zoomed-in views of (11.10) Fig. 11.5b show the captured raw data and the Fourier ptychographic reconstruction. To obtain a good quality ptychographic reconstruction, we .P upil kx , ky = P upil kx , ky highlight several key considerations as follows.
f il crop f il kx , ky • Oˆ i kx , ky − Oˆ i k x , ky conj Oˆ i +
2
2
crop
crop kx , ky + αP Oˆ i k x , ky (1 − αP ) Oˆ i max
(11.11) where ‘conj’ represents taking the complex conjugate, ‘α O ’ and ‘α P ’ are the parameters for the rPIE algorithm and we can set them to 1. Step 8: Update the corresponding sub-region of the object spectrum:
M −m M +m + kxi : + kxi − 1, 2 2
M +m M −m crop kx , ky + kyi : + kyi − 1 = Oˆ i 2 2 ˆ .O
(11.12)
Repeat steps 4–8 until a convergence condition is fulfilled – either with a pre-defined number of iterations or reaching stagnation of an error metric. Step 9: Perform an inverse Fourier transform to ˆ the object spectrum .O kx , ky to obtain the recovered object O(x, y) in the spatial domain: O (x, y) = F T −1 Oˆ kx , ky
.
(11.13)
1. Choice of microscope platform and objective lens. If one aims to push the resolution limit using Fourier ptychography, a high-resolution objective lens can be used for image acquisition [49–51]. A recent demonstration utilizes a 40×, 0.95-NA objective lens to synthesize a NA of 1.9, which is close to the maximum theoretical limit of 2 in free space [51]. If one aims to perform large field-of-view imaging, a low-resolution objective lens can be used for image acquisition. For example, we can use the 2×, 0.1-NA Nikon objective lens (Nikon, Plan APO) and the related Nikon microscope platform to set up a Fourier ptychographic imaging experiment. Other choices of the objective lens include the 4×, 0.2-NA Nikon objective lens (Nikon, Plan APO), 2×, 0.1-NA objective lens (Thorlabs, TL2X-SAP), and 4×, 0.2-NA objective lens (Thorlabs, TL4X-SAP). 2. Choice of image sensor and its gain setting. For Fourier ptychography, it is better to choose an image sensor with a small pixel size and large sensing area (i.e., a large number of pixels). One cost-effective option is the Sony IMX 183 sensor (DMK 33UX183, The Imaging Source), with 20 megapixels and a 2.4-μm pixel size. Other options include the
190
S. Jiang et al.
Fig. 11.5 (a) A Fourier ptychographic microscopy platform built with a regular light microscope and a programmable LED array. (Modified by permission from Springer: Nature Photonics, Wide-field, high-resolution Fourier ptychographic microscopy, Zheng, G. et al. ©
2013). (b) The captured raw image of a blood smear and its high-resolution Fourier ptychographic reconstruction. (Modified by permission from Optica: Optics Letters, Quantitative phase imaging via Fourier ptychographic microscopy, Ou, X. et al. © 2013)
Sony IMX 540 sensor (24.5 megapixels, 2.75μm pixel size, model no. BFS-U3-244S8M-C, Teledyne FLIR) and other full-frame cameras. One important setting for the image sensor is the digital gain in the acquisition process. A typical Fourier ptychographic experiment needs to acquire a large number of darkfield images, with the illumination angles larger than the maximum collection angle of the objective lens. One option to acquire these images is to set a long exposure time for image acquisition (typically 1 s). A better highthroughput alternative is to keep the exposure time short but set the digital gain to a maximum value during the image acquisition process. A common pitfall for a Fourier ptychography experiment is the exposure time is too short or the digital gain setting is too low for darkfield image acquisition. 3. Choice of programmable LED array. Small and bright surface-mounted LED elements are recommended for building a Fourier ptychographic imaging platform. The small size of the LED element enables a larger Fourier overlapping ratio in-between adjacent acquisitions. It also allows the LED array to be placed closer to the object for increasing the light delivery efficiency. The high brightness
of the LED elements allows a shorter exposure time of the image sensor, thereby increasing the achievable imaging throughput. Off-theshelf options include the LED matrixes from Adafruit (product ID: 3444 and 607). One can also build an LED matrix with individual small and bright LED elements (e.g., Adafruit product ID: 3341) [39]. 4. Incident angle calibration. Another common challenge for Fourier ptychography is to infer the incident wavevectors of the LED array. Ref. [39] discusses a simple solution for wavevector calibration based on the brightfield-to-darkfield transition features from the captured images. In this approach, an LED element from the array is selected as the central reference point. One can then choose 4 adjacent LED elements such that their incident angles are close to the maximum acceptant angle of the objective lens. By turning on these 4 LED elements, the captured image exhibits brightfield-to-darkfield transition features. The center of these features determines the location of the reference LED element. The orientation of these features determines the relative rotation angle between the LED array and the image sensor. If we know the pitch of the LED array, the incident wavevector of each
11 Coded Ptychographic Imaging
LED element can be calculated based on its relative location with respect to the reference LED element. 5. Recovery of spatially varying pupil aberrations. For imaging with a low-NA objective lens, the pupil aberration varies at different regions of the field of view. These spatially varying aberrations need to be recovered or measured for obtaining good-quality reconstructions. To this end, one can use a blood smear as a calibration target and divide the captured images into small tiles (e.g., 256 by 256 pixels) for reconstruction. In this way, the pupil aberration can be treated as spatially invariant for each tile and the LED illumination can be assumed to be planar. In Ref. [52], the aberration is first recovered at the center of the imaging field of view. This recovered aberration is then used as the initial guess for regions adjacent to the central region. The process goes on from the center to the edge, covering the entire field of view at the end. An alternative is to model the spatially varying aberration with full-field parameters that govern the fielddependent pupil over the entire field of view. These parameters can be jointly recovered with the object in the phase retrieval process [53]. 6. Sample thickness. The Fourier ptychography approach usually assumes the sample to be a 2D thin section. Only under this assumption, the light interaction between the object and the plane wave can be approximated by a pointwise multiplication process. For a thick object, one may need to model it as multiple slices [16, 54] or integrate Fourier ptychography with diffraction tomography to recover the 3D scattering potential of the object [44, 45]. 7. Slow-varying phase features. The phase transfer function characterizes the transfer property of the phase information at different spatial frequencies. The phase transfer function is usually close to zero at low spatial frequency regions [55]. Therefore, the slow-varying phase features of the object cannot be effectively converted into intensity variations for detection. To address this problem, one can illuminate the object with an incident angle close to the max-
191
imum acceptant angle of the objective lens. This annular illumination condition can better convert the slow-varying phase into intensity variations for detection [56, 57].
11.2.3 Applications and Extensions Fourier ptychography has several advantages for microscopy applications. Its large imaging fieldof-view, high resolution, and the capability of post-measurement refocusing make it an enabling technique for cell and tissue screening. Figure 11.6 shows the recovered gigapixel image using a Fourier ptychographic microscopy platform with a 2× objective lens. The final resolution is not determined by the NA of the employed objective lens. Instead, it is determined by the largest incident angle of the LED array. In this example, the synthetic NA is 0.5, similar to that of a regular 20× objective lens. Therefore, one can achieve an effective resolution of a 20× lens while retaining the large field of view of a 2× lens, having the best of both worlds. The recovered phase via Fourier ptychography further provides a complementary contrast mechanism to reveal quantitative topographic profiles of unlabeled specimens [48]. For example, it has been used to image live cell cultures using a high-speed Fourier ptychography platform, where the initial phase estimate was obtained from the differential phase contrast approach [58]. Similarly, annular illumination has been demonstrated for recovering phase images of HeLa cell culture at a frame rate of 25 Hz [57]. By integrating Fourier ptychography with the concept of diffraction tomography [44], one can recover the 3D refraction index distribution with a resolution beyond the limit of the employed objective lens [45]. The basic principle and some of the results are reproduced in Fig. 11.7. In this implementation, tilting the illumination angle effectively rotates the Ewald sphere around the origin in the Fourier space (Fig. 11.7a). Therefore, each captured image corresponds to a spherical cap of the Ewald sphere, as shown in Fig. 11.7b. These spherical caps do not span a single 2D plane as in conventional Fourier ptychography.
192
S. Jiang et al.
Fig. 11.6 Large field-of-view, high-resolution imaging using Fourier ptychography. (a) A high-resolution gigapixel image captured using a 2× objective lens. (b), (c1), (d), and (e): The zoomed-in views of the Fourier ptychographic recovery. (c2) and (c3): The reference im-
ages captured using a 20× and a 2× lens. (Modified by permission from Springer: Nature Photonics, Widefield, high-resolution Fourier ptychographic microscopy, Zheng, G. et al. © 2013)
Synthesizing these caps in the 3D Fourier space allows the recovery of the entire 3D volume. Figures 11.7c–e show the recovered 3D refraction index distribution of live HeLa cell culture over a large field of view. Using this approach, 3D highresolution imaging can be performed without rotating the sample or capturing a through-focus series. Another important extension of Fourier ptychography is a camera-scanning scheme that deviates from microscopy and enables far-field super-resolution photographic imaging [41, 42]. In this scheme, the object is placed in the far field and a camera with a photographic lens is used to capture images. Light propagation
from the object to the camera corresponds to the operation of a Fourier transform. Therefore, the lens aperture of the camera serves as a confined aperture constraint in the Fourier space, similar to that of the objective lens in a microscope platform. By moving the entire camera to different lateral positions, one can capture multiple images corresponding to different circular apertures in the Fourier space. These images can then be synthesized into one super-resolution image of the object. The final resolution is no longer limited by the size of the camera aperture. Instead, it is determined by the translation distance of the camera (i.e., the synthetic aperture of the translation process).
11 Coded Ptychographic Imaging
Fig. 11.7 Fourier ptychographic diffraction tomography. (a) Tilting the illumination angle effectively rotates the Ewald sphere around the origin in the Fourier space. (b) Each captured image corresponds to a spherical cap in the Fourier space. Synthesizing these caps can cover the highresolution 3D volume of the object. (c) The recovered 3D
11.3
Lensless Coded Ptychography via Coded Detection
Coded ptychography is a recently developed high-resolution, high-throughput lensless microscopy approach based on the coded-detection strategy [21, 33, 34, 36]. Similar to Fourier ptychography, it can also address the trade-off between resolution and field of view of an optical system. A recent demonstration shows that a coded ptychography platform can resolve the 308-nm linewidth on a resolution target under a normal incident plane wave [21]. This resolving power is achieved without performing aperture synthesizing as that in Fourier ptychography. The corresponding detection NA is 0.8, the highest among current lensless ptychographic implementations. Gigapixel images with a 240mm2 effective field of view can be acquired in 15 s, with an image acquisition throughput greater than that of a high-end whole slide scanner [21]. In this section, we first discuss the imaging model and implementation procedures of coded ptychography. We then discuss how to properly
193
refraction index distribution over a large field of view. (d– e) The zoomed-in views of (c). (Modified by permission from Elsevier: Optics and Lasers in Engineering, Widefield high-resolution 3D microscopy with Fourier ptychographic diffraction tomography, Zuo, C. et al. © CC BY 4.0)
set up a coded ptychography platform using a blood-coated image sensor. Lastly, the related biomedical applications and concept extensions are discussed.
11.3.1 Imaging Model and Reconstruction Procedures Figure 11.8a shows a typical implementation of coded ptychography, where the coded surface (the blood cells) is directly attached to the coverglass of the image sensor. In this setup, we can define three planes: the object plane (x, y), the coded surface plane (x , y ), and the image plane (x , y ). The exit waves from the object propagate for a distance d1 and reach the coded surface on the image sensor. This coded surface can be made by smearing a drop of blood on the sensor’s coverglass. It can redirect the large-angle diffracted waves to smaller angles for detection, serving as an effective scattering lens. With this coded surface, the previously inaccessible high-resolution object information can now be acquired using the image sensor. Different from the conventional
194
S. Jiang et al.
Fig. 11.8 Imaging model and operating principle of coded ptychography. (a) A typical implementation of coded ptychography, where ‘Prop’ denotes free-space propagation. The coded sensor is made by smearing blood
optical lens, this coded surface can be made at arbitrary size without introducing optical aberrations. Therefore, it can unlock an optical space with spatial extent (x, y) and spatial frequency content (kx , ky ) that is inaccessible using conventional lens-based optics [21]. After the encoding process by the coded surface, the light waves further propagate for a distance d2 and reach the pixel array for intensity detection. Figure 11.8b shows the forward imaging model of coded ptychography. By translating the object (or the coded sensor) to different positions (xi , yi ), the lensless system records a set of intensity images Ii (x , y ) for reconstruction. The positional shift (xi , yi ) can be recovered based on the cross-correlation analysis of the captured images [21, 22, 34]. Figure 11.8c shows the iterative recovery process. It first initializes the object wavefront W(x , y ) and the coded surface CS(x , y ). The object wavefront W(x , y ) is then recovered following the iterative phase retrieval process. Finally, the wavefront is propagated for a distance −d1 to obtain the object profile O(x, y). The detailed recovery procedures are provided as follows:
Step 1: Initialize the object wavefront W(x , y ) on the coded surface plane:
cells on the sensor’s coverglass. (b) The forward imaging model of coded ptychography. (c) The iterative phase retrieval process
W x , y = P SF f ree (−d2 )
T 1 Ii (x + xi , y + yi ) (11.14) ∗ i=1 T ↑M .
where ‘∗ ’ represents the convolution process, ‘PSFfree (d)’ represents the point spread function for free-space propagation over a distance d, Ii (x , y ) represents the ith captured image corresponding to the positional shift (xi , yi ), and ‘↑M’ represents the M-fold nearest up-sampling. Step 2: Initialize the coded surface profile CS(x , y ) on the sensor’s coverglass: CS x , y = P SF f ree (−d2 )
T 1 Ii (x , y ) ∗ i=1 T ↑M
.
(11.15)
This step only needs to be performed in a calibration experiment. Once characterized, the profile of the coded surface CS(x , y ) remains unchanged for all subsequent experiments. shif t Step Generate the shifted wavefront .Wi 3: x , y based on the estimated shifts:
11 Coded Ptychographic Imaging shif t
Wi
.
195
x , y = W x − xi , y − yi (11.16)
CS(x , y ) using the rPIE algorithm [47]. The coded surface profile only needs to be updated in the calibration experiment.
Step 4: The exit wave Ei (x , y ) leaving the coded surface is given by the point-wise product between the coded surface CS(x , y ) and shif t the shifted wavefront .Wi x , y . The wavefront on the sensor plane Si (x , y ) can be obtained by propagating Ei (x , y ) for a distance d2 . shif t Ei x , y = CS x , y · Wi x , y (11.17)
shif t shif t Wi x , y = Wi x , y conj CS x , y • Ei x , y −Ei x , y , + (1 − αW ) |CS (x , y )|2 + αW |CS (x , y )|2max (11.22) .
.
S x , y
. i
= Ei x , y ∗ P SF f ree (d2 ) (11.18)
.CS
x , y = CS x , y
shif t x ,y • Ei x , y − Ei x , y conj Wi +
shif t 2
shif t 2 x , y + αCS Wi x ,y (1 − αCS ) Wi
,
max
(11.23) Step 5: Down-sample the intensity image on the sensor plane. Update Si (x , y ) on the sensor plane based on the measurement Ii (x , y ). .Ui x /M , y /M 2 = Si x , y ∗ P SF pixel ↓M
where ‘α W ’ and ‘α CS ’ are the parameters for the rPIE algorithm, and we can set them to 1.
shif t
Step 8: Update W(x , y ) based on .Wi
:
shif t W x , y = Wi x + xi , y + yi (11.24)
.
(11.19)
I x /M , y /M = S x , y · i .Si x , y i Ui x /M , y /M
(11.20) where ‘ ’ represents the ceiling function, ‘↓M’ represents M-fold down-sampling, and ‘PSFpixel ’ models the spatial response of the pixels (sensitivity is often higher at the center of the pixel) [21]. Step 6: Propagate the updated wavefront .Si x , y back to the coded surface plane: Ei x , y = Si x , y ∗ P SF f ree (−d2 ) (11.21)
Step 9: Repeat Steps 3–8 until a convergence condition is fulfilled – either with a pre-defined number of iterations or stagnation of an error metric. Lastly, propagate the object wavefront W(x , y ) back to the object plane and obtain the complex object profile: O (x, y) = W x , y ∗ P SF f ree (−d1 ) (11.25)
.
11.3.2 Key Considerations for a Coded Ptychography Experiment
.
Step 7: Jointly update the shifted object wavefront shif t .Wi x , y and the coded surface profile
Figure 11.9a1 shows a parallel coded ptychography platform built using an array of coded sensors. Figure 11.9a2 shows the zoomed-in view of the coded sensor where a disorder-engineered surface is directly attached to the sensor’s cov-
196
S. Jiang et al.
Fig. 11.9 (a1) A parallel coded ptychography platform built with an array of coded sensors. (a2) The coded image sensor with a disorder-engineered surface attached to its coverglass. (b1) The captured raw image using the coded sensor in (a2). (b2) The recovered highresolution image of a resolution target. (Modified by permission from American Chemical Society: ACS Photonics, Resolution-Enhanced Parallel Coded Ptychography for High-Throughput Optical Imaging, Jiang, S. et al. ©
2021). (c1) Obtain a drop of blood from a finger prick. (c2) Smear the blood on the top of the sensor’s coverglass. (c3) Integrate the blood-coated sensor with a Blu-ray disc for coded ptychographic imaging. The laser diode of the Bluray player can be used to illuminate the object. (Modified by permission from American Chemical Society: ACS Sensors, Blood-Coated Sensor for High-Throughput Ptychographic Cytometry on a Blu-ray Disc, Jiang, S. et al. © 2022)
erglass [21]. The imaging performance of this platform is shown in Fig. 11.9b. The captured raw image in Fig. 11.9b1 has a pixel size of 1.85 μm and the recovered image in Fig. 11.9b2 can resolve the 308-nm linewidth on the resolution target. The disorder-engineered surface in Fig. 11.9a requires chemical etching followed by carbon nanoparticle printing. One alternative for preparing the coded surface is to smear a drop of blood on the sensor’s coverglass followed by alcohol fixation. In Fig. 11.9c1, a drop of blood is obtained from a finger prick. One can then smear the blood on the sensor’s coverglass in Fig. 11.9c2. Figure 11.9c3 shows a coded ptychography platform built using the blood-coated sensor and a Blu-ray disc. By spinning the disc on top of the blood-coated sensor, multiple diffraction images can be acquired for ptychographic reconstruction. The 405-nm laser diode from the Blu-
ray player can also be used as the coherent laser source in this platform. In the following, we highlight several key considerations for a successful coded ptychography experiment. 1. Choice of image sensor. For coded ptychography, it is better to choose a monochromatic image sensor with a small pixel size. Current options include the Sony IMX 226 monochromatic sensor with a 1.85-μm pixel size and the ON Semiconductor MT9J003 monochromatic sensor with a 1.67-μm pixel size. One can also use image sensors developed for smartphones. Such sensors often have a small pixel size with a large field of view. However, the pixel crosstalk and the effect of the Bayer filter need to be considered and corrected in ptychographic reconstruction.
11 Coded Ptychographic Imaging
2. Coded surface on the image sensor. The coded surface needs to be a thin layer placed close to the pixel array. Only under this condition, the light interaction with the coded surface can be approximated by a point-wise multiplication process in Eq. (11.17). One straightforward approach to prepare the coded surface is to smear a drop of blood on the sensor’s coverglass, as shown in Fig. 11.9c. Blood-cell layer has rich spatial structures to modulate both the intensity and phase of the incoming light waves. However, we note that the blood cells need to be distributed as a dense monolayer across the coverglass. Common pitfalls include multiple layers of cells on the coverglass or a sparse distribution of cells. We also note that goat blood often has a better imaging performance compared with human blood [34]. The size of goat blood cells is 2–3 microns, the smallest among all animals in the world. 3. Choice of light source. Different from the LED sources used in Fourier ptychography, coded ptychography requires a monochromatic laser light source for sample illumination. The reason is due to the free-space propagation process in the forward imaging model. Light at different wavelengths will be separated to different axial planes in this process. In Fourier ptychography, the lens in the microscope system can partially compensate for such a chromatic dispersion effect. Light at different wavelengths recombines to the same focal plane in the lens-based microscope system. One off-the-shelf laser option for coded ptychography is the cost-effective fibercoupled laser diode by Thorlabs (LP405SF10). One can also repurpose the laser diode from a Blu-ray drive for coded ptychography [33]. We also note that it is possible to use an LED source with a laser line filter for coded ptychography. The resulting low optical flux requires the object to come to a complete stop during the scanning process, thereby limiting the imaging throughput. 4. Recovery of the positional shifts. A typical coded ptychography experiment continuously
197
acquires the diffraction patterns during the non-stop object translation process. It is challenging to obtain the precise positional shifts from the motorized stage. A more robust and reliable approach is to recover the positional shifts from the captured diffraction patterns. For example, we can leave an empty space on the coded surface [21, 33, 34]. The positional shift (xi , yi ) can be recovered via the cross-correlation analysis of the captured data through this empty region. Deep sub-pixel accuracy can be achieved using a sub-pixel registration algorithm [59]. 5. Recovery of the coded surface in a calibration experiment. Similar to the pupil aberration recovery in Fourier ptychography, the coded surface profile can be jointly recovered with the object in a calibration experiment. A good calibration object can be a blood smear slide, which contains uniform and dense spatial features across the entire coded surface. A calibration experiment often requires more diffraction measurements, typically >1500 images. Once recovered, the coded surface profile remains unchanged for all subsequent experiments. 6. Sample thickness. In coded ptychography, the forward imaging process does not need to model the light interaction between the object and the incident beam. The recovered wavefront W(x , y ) solely depends on how the light waves exit the sample and not how they enter. Therefore, this approach can image objects with arbitrary thickness. With the recovered object wavefront W(x , y ), one can refocus it to any axial plane post-measurement. 7. Slow-varying phase features. The coded surface on the image sensor can convert the slowvarying phase information into distortions of the diffraction patterns. Therefore, coded ptychography can quantitatively recover slowvarying phase profiles with many 2π wraps. For example, it has been used to recover the phase profiles of optical prisms [21], bacterial colonies [36], various urine crystals [33], and thick cytology smears [34]. These phase profiles are challenging to obtain using other common lensless microscopy techniques such as transport-of-intensity equation, support
198
constraint-based phase retrieval, digital inline holography, multi-height and multiwavelength phase retrieval.
11.3.3 Applications and Extensions Coded ptychography can perform large field-ofview and high-resolution microscopy imaging without using any lens. Fig. 11.10a1 shows a recovered whole slide image of a blood smear, where one can clearly resolve the subcellular structures of the white blood cells from Fig. 11.10a2–a5. Compared with Fourier ptychography, coded ptychography can image thick specimens with slow-varying phase features. Fig. 11.10b shows the recovered phase image of a urine sediment slide. The zoomed-in views in Fig. 11.10b2–b5 show the recovered phase of different crystals, which contain slowvarying phase features with many 2π wraps. These phase profiles are challenging to obtain using other common lensless imaging techniques where slow-varying phase features cannot be effectively converted into intensity variations for detection. With coded ptychography, the phase profile can be effectively converted into spatial distortions of the diffraction patterns, thereby enabling the recovery of the true quantitative phase at all spatial frequencies. In addition to imaging fixed biospecimen, coded ptychography can also be used to quantify the growth of live cell cultures over a large area at high spatiotemporal resolution. In particular, the temporal correlation between adjacent time points can be used to reduce the number of acquisitions and accelerate the convergence speed of reconstructions. For example, one can use the recovered image from the last time point as the initial guess of the current time for ptychographic reconstruction [36]. Using this temporal correlation constraint, coded ptychography can achieve a 488-nm half-pitch spatial resolution and a 15-second temporal resolution over a centimeter-scale field of view [36]. Figure 11.11 shows the recovered phase images of a bacterial colony cultured on an uneven agar plate. The growth of the bacterial
S. Jiang et al.
colony can be clearly resolved by tracking the change of the phase wraps. One extension of coded ptychography is optofluidic ptychography that integrates the concept of ptychography with optofluidic microscopy [60]. In this approach, the coded surface is fabricated on the bottom substrate of a microfluidic channel. It then delivers specimens across the channel using microfluidic flow and captures multiple diffraction patterns for reconstruction. The experimental schematic of this platform is similar to that of a regular flow cytometer but equipped with a 2D image sensor for capturing coded diffraction patterns. It complements the miniaturization provided by microfluidics and allows the integration of ptychography into various lab-on-a-chip devices [60]. Another notable extension of coded ptychography is the synthetic aperture ptychography approach shown in Fig. 11.12a [61]. In this approach, a coded sensor is placed at the far field for diffraction data acquisition. By translating the coded sensor to different lateral positions, multiple images will be acquired for ptychographic reconstruction. Different from the conventional ptychography, the resolution of this approach is no longer limited by the spanning angle of the detector. Instead, it is determined by the synthetic aperture size of the coded sensor translation process. Figure 11.12b1 shows a recovered wavefront of a resolution target using 3-by-3 scanning steps. The scanning positions of the coded sensor are highlighted by the red dots in Fig. 11.12b1. In this experiment, the size of the coded sensor is 5 mm by 5 mm while the synthetic aperture size is 5.5 mm by 5.5 mm. By propagating this recovered wavefront to the object plane, we obtain the object image in Fig. 11.12b2. Figure 11.12c1 shows the recovered wavefront of the same target using 20by-20 scanning steps. The synthetic aperture size is 20 mm by 20 mm in this case, and the recovered object image is shown in Fig. 11.12c2. The effect of resolution enhancement can be clearly seen by comparing Fig. 11.12b2 with Fig. 11.12c2. Since this approach does not require any lens in the imaging process, it may offer new possibilities for coherent X-ray and extreme ultraviolet mi-
11 Coded Ptychographic Imaging
199
Fig. 11.10 (a1) The recovered whole slide image of a blood smear using coded ptychography. (a2)–(a5) Zoomed-in views of white blood cells. (Modified by permission from American Chemical Society: ACS Photonics, Resolution-Enhanced Parallel Coded Ptychography for High-Throughput Optical Imaging, Jiang, S. et al. © 2021). (b1) The recovered whole slide phase image of
a urine sediment slide using coded ptychography. (b2)– (b5) Zoomed-in views of different crystals on the urine sediment slide. (Modified by permission from American Chemical Society: ACS Sensors, Blood-Coated Sensor for High-Throughput Ptychographic Cytometry on a Blu-ray Disc, Jiang, S. et al. © 2022)
croscopy where high-quality lenses are challenging to make.
coded detection, we discuss the lensless coded ptychography approach that uses a coded image sensor for data acquisition. For both approaches, we also discuss their applications and concept extensions. The focus of this chapter has been on high-resolution, high-throughput ptychographic imaging in the visible light regime. For other aspects and developments of ptychography, we direct interested readers to the following related resources: a comprehensive book chapter covering the history and different ptychographic implementations by Rodenburg and Maiden [62], an introduction article for non-experts by Guizar-Sicairos and Thibault [63], and recent review articles on X-ray ptychography [64], Fourier ptychography [38–40], extremeultraviolet ptychography [65].
11.4
Summary and Outlook
This chapter briefly discusses the information encoding and decoding process of ptychography. The encoding process is performed by a pointwise multiplication between the object and the structured probe beam. The decoding process is performed based on an iterative phase retrieval algorithm that recovers the complex object from intensity-only measurements. Two encoding strategies are highlighted, namely coded illumination and coded detection. For coded illumination, we discuss the Fourier ptychography approach that uses angle-varied plane waves for object illumination. For
200
S. Jiang et al.
Fig. 11.11 Tracking the growth of a bacterial colony at high spatiotemporal resolution. (a1–a2) The recovered phase image of a bacterial colony cultured on an uneven agar plate. (b1–b6) Time-lapse tracking of the bacterial growth. The change of the phase wraps can be clearly
resolved from the recovered time-lapse phase images. (Modified by permission from Elsevier: Biosensors and Bioelectronics, Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution, Jiang, S. et al. © 2022)
Fig. 11.12 Synthetic aperture ptychography via coded sensor translation. (a) Schematic of the synthetic aperture ptychography approach, where the captured raw images are used to synthesize a large aperture at the far field. (b1) With 3 by 3 scanning steps (red dots), an object wavefront is recovered with a synthetic aperture size of 5.5 mm by 5.5 mm. (b2) The recovered object image by propagating the wavefront in (b1) to the object plane.
(c1) With 20 by 20 scanning steps, an object wavefront is recovered with a synthetic aperture size of 20 mm by 20 mm. (c2) The recovered object image by propagating the wavefront in (c1) to the object plane. (Modified by permission from Optica: Photonics Research, Synthetic aperture ptychography: coded sensor translation for joint spatial-Fourier bandwidth expansion, Song, P. et al. © 2022)
11 Coded Ptychographic Imaging
For future development of the technology, we envision the integration between the codedillumination and coded-detection schemes for new ptychographic implementations. For example, angle-varied illumination in Fourier ptychography can be adopted in a lensless coded ptychography experiment for synthetic aperture imaging. The current best detection NA of coded ptychography is 0.8. By adopting angle-varied coded illumination, it is possible to double the current best NA to a value of 1.6 or even larger. Similarly, diffraction tomography can be implemented with coded ptychography for imaging 3D samples on a chip. The optofluidic ptychography platform can also be integrated with angle-varied illumination for imaging 3D refractive index of cells flowing through the microfluidic channel. If successful, it can extract specific 3D intracellular structures in a lensless flow cytometry configuration. Lastly, many biomedical applications remain to be seen. To name a few, rapid quantification of bacterial growth over a large area can find clinical applications in antibiotic susceptibility testing. Post-acquisition refocusing capability in both Fourier ptychography and coded ptychography can find applications in digital pathology. Quantitative phase imaging capability can find applications in rapid on-site evaluations of labelfree fine needle aspiration biopsies.
References 1. D. Sayre, “Some implications of a theorem due to Shannon,” Acta Crystallographica 5, 843–843 (1952). 2. P. Baker, “https://wellcomecollection.org/works/ jxhce79r/items“, retrieved. 3. H. A. Hauptman, “The phase problem of x-ray crystallography,” Reports on Progress in Physics 54, 1427 (1991). 4. V. Elser, “Solution of the crystallographic phase problem by iterated projections,” Acta Crystallographica Section A: Foundations of Crystallography 59, 201– 209 (2003). 5. J. R. Fienup, “Phase retrieval algorithms: a comparison,” Applied optics 21, 2758–2769 (1982). 6. D. R. Luke, “Relaxed averaged alternating reflections for diffraction imaging,” Inverse problems 21, 37 (2004).
201 7. H. H. Bauschke, P. L. Combettes, and D. R. Luke, “Phase retrieval, error reduction algorithm, and Fienup variants: a view from convex optimization,” JOSA A 19, 1334–1345 (2002). 8. W. Hoppe and G. Strube, “Diffraction in inhomogeneous primary wave fields. 2. Optical experiments for phase determination of lattice interferences,” Acta Crystallogr. A 25, 502–507 (1969). 9. H. M. L. Faulkner and J. Rodenburg, “Movable aperture lensless transmission microscopy: a novel phase retrieval algorithm,” Physical review letters 93, 023903 (2004). 10. P. Thibault and A. Menzel, “Reconstructing state mixtures from diffraction measurements,” Nature 494, 68–71 (2013). 11. M. Guizar-Sicairos and J. R. Fienup, “Phase retrieval with transverse translation diversity: a nonlinear optimization approach,” Opt. Express 16, 7264–7278 (2008). 12. P. Thibault, M. Dierolf, O. Bunk, A. Menzel, and F. Pfeiffer, “Probe retrieval in ptychographic coherent diffractive imaging,” Ultramicroscopy 109, 338–343 (2009). 13. A. M. Maiden and J. M. Rodenburg, “An improved ptychographical phase retrieval algorithm for diffractive imaging,” Ultramicroscopy 109, 1256– 1262 (2009). 14. A. M. Maiden, M. J. Humphry, and J. Rodenburg, “Ptychographic transmission microscopy in three dimensions using a multi-slice approach,” JOSA A 29, 1606–1614 (2012). 15. T. Godden, R. Suman, M. Humphry, J. Rodenburg, and A. Maiden, “Ptychographic microscope for threedimensional imaging,” Opt. Express 22, 12513– 12523 (2014). 16. L. Tian and L. Waller, “3D intensity and phase imaging from light field measurements in an LED array microscope,” Optica 2, 104–111 (2015). 17. P. Thibault, M. Dierolf, A. Menzel, O. Bunk, C. David, and F. Pfeiffer, “High-resolution scanning xray diffraction microscopy,” Science 321, 379–382 (2008). 18. M. D. Seaberg, B. Zhang, D. F. Gardner, E. R. Shanblatt, M. M. Murnane, H. C. Kapteyn, and D. E. Adams, “Tabletop nanometer extreme ultraviolet imaging in an extended reflection mode using coherent Fresnel ptychography,” Optica 1, 39–44 (2014). 19. J. M. Rodenburg, A. Hurst, A. G. Cullis, B. R. Dobson, F. Pfeiffer, O. Bunk, C. David, K. Jefimovs, and I. Johnson, “Hard-x-ray lensless imaging of extended objects,” Physical review letters 98, 034801 (2007). 20. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution Fourier ptychographic microscopy,” Nature photonics 7, 739–745 (2013). 21. S. Jiang, C. Guo, P. Song, N. Zhou, Z. Bian, J. Zhu, R. Wang, P. Dong, Z. Zhang, and J. Liao, “Resolution-enhanced parallel coded ptychography for high-throughput optical imaging,” ACS Photonics 8, 3261–3271 (2021).
202 22. S. Jiang, J. Zhu, P. Song, C. Guo, Z. Bian, R. Wang, Y. Huang, S. Wang, H. Zhang, and G. Zheng, “Widefield, high-resolution lensless on-chip microscopy via near-field blind ptychographic modulation,” Lab on a Chip 20, 1058–1065 (2020). 23. Y. Jiang, Z. Chen, Y. Han, P. Deb, H. Gao, S. Xie, P. Purohit, M. W. Tate, J. Park, and S. M. Gruner, “Electron ptychography of 2D materials to deep subångström resolution,” Nature 559, 343–349 (2018). 24. M. Stockmar, P. Cloetens, I. Zanette, B. Enders, M. Dierolf, F. Pfeiffer, and P. Thibault, “Near-field ptychography: phase retrieval for inline holography using a structured illumination,” Scientific reports 3, 1–6 (2013). 25. H. Zhang, S. Jiang, J. Liao, J. Deng, J. Liu, Y. Zhang, and G. Zheng, “Near-field Fourier ptychography: super-resolution phase retrieval via speckle illumination,” Opt. Express 27, 7498–7512 (2019). 26. L.-H. Yeh, S. Chowdhury, and L. Waller, “Computational structured illumination for high-content fluorescence and phase microscopy,” Biomedical optics express 10, 1978–1998 (2019). 27. F. Zhang, G. Pedrini, and W. Osten, “Phase retrieval of arbitrary complex-valued fields through apertureplane modulation,” Physical Review A 75, 043805 (2007). 28. A. M. Maiden, J. M. Rodenburg, and M. J. Humphry, “Optical ptychography: a practical implementation with useful resolution,” Optics letters 35, 2585–2587 (2010). 29. J. Marrison, L. Räty, P. Marriott, and P. O’Toole, “Ptychography–a label free, high-contrast imaging technique for live cells using quantitative phase information,” Scientific reports 3, 1–7 (2013). 30. P. Song, S. Jiang, H. Zhang, Z. Bian, C. Guo, K. Hoshino, and G. Zheng, “Super-resolution microscopy via ptychographic structured modulation of a diffuser,” Optics letters 44, 3645–3648 (2019). 31. Z. Bian, S. Jiang, P. Song, H. Zhang, P. Hoveida, K. Hoshino, and G. Zheng, “Ptychographic modulation engine: a low-cost DIY microscope add-on for coherent super-resolution imaging,” Journal of Physics D: Applied Physics 53, 014005 (2019). 32. Y. Zhang, Z. Zhang, and A. Maiden, “Ptycho-cam: a ptychographic phase imaging add-on for optical microscopy,” Applied Optics 61, 2874–2880 (2022). 33. S. Jiang, C. Guo, T. Wang, J. Liu, P. Song, T. Zhang, R. Wang, B. Feng, and G. Zheng, “Blood-Coated Sensor for High-Throughput Ptychographic Cytometry on a Blu-ray Disc,” ACS sensors 7, 1058–1067 (2022). 34. S. Jiang, C. Guo, P. Song, T. Wang, R. Wang, T. Zhang, Q. Wu, R. Pandey, and G. Zheng, “Highthroughput digital pathology via a handheld, multiplexed, and AI-powered ptychographic whole slide scanner,” Lab on a Chip 22, DOI: https://doi.org/ 10.1039/D1032LC00084A (2022). 35. H. Sha, C. He, S. Jiang, P. Song, S. Liu, W. Zou, P. Qin, H. Wang, and Y. Zhang, “Lensless coherent
S. Jiang et al.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
diffraction imaging based on spatial light modulator with unknown modulation curve,” arXiv preprint arXiv:2204.03947 (2022). S. Jiang, C. Guo, Z. Bian, R. Wang, J. Zhu, P. Song, P. Hu, D. Hu, Z. Zhang, K. Hoshino, B. Feng, and G. Zheng, “Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution,” Biosensors and Bioelectronics 196, 113699 (2022). M. G. Gustafsson, “Surpassing the lateral resolution limit by a factor of two using structured illumination microscopy,” Journal of microscopy 198, 82–87 (2000). P. C. Konda, L. Loetgering, K. C. Zhou, S. Xu, A. R. Harvey, and R. Horstmeyer, “Fourier ptychography: current applications and future promises,” Opt. Express 28, 9603–9630 (2020). G. Zheng, C. Shen, S. Jiang, P. Song, and C. Yang, “Concept, implementations and applications of Fourier ptychography,” Nature Reviews Physics 3, 207–223 (2021). A. Pan, C. Zuo, and B. Yao, “High-resolution and large field-of-view Fourier ptychographic microscopy and its applications in biomedicine,” Reports on Progress in Physics 83, 096101 (2020). S. Dong, R. Horstmeyer, R. Shiradkar, K. Guo, X. Ou, Z. Bian, H. Xin, and G. Zheng, “Aperture-scanning Fourier ptychography for 3D refocusing and superresolution macroscopic imaging,” Opt. Express 22, 13586–13599 (2014). J. Holloway, Y. Wu, M. K. Sharma, O. Cossairt, and A. Veeraraghavan, “SAVI: Synthetic apertures for long-range, subdiffraction-limited visible imaging using Fourier ptychography,” Science advances 3, e1602564 (2017). K. Wakonig, A. Diaz, A. Bonnin, M. Stampanoni, A. Bergamaschi, J. Ihli, M. Guizar-Sicairos, and A. Menzel, “X-ray Fourier ptychography,” Science advances 5, eaav0282 (2019). R. Horstmeyer, J. Chung, X. Ou, G. Zheng, and C. Yang, “Diffraction tomography with Fourier ptychography,” Optica 3, 827–835 (2016). C. Zuo, J. Sun, J. Li, A. Asundi, and Q. Chen, “Widefield high-resolution 3D microscopy with Fourier ptychographic diffraction tomography,” Optics and Lasers in Engineering 128, 106003 (2020). G. Zheng, X. Ou, R. Horstmeyer, and C. Yang, “Characterization of spatially varying aberrations for wide field-of-view microscopy,” Opt. Express 21, 15131– 15143 (2013). A. Maiden, D. Johnson, and P. Li, “Further improvements to the ptychographical iterative engine,” Optica 4, 736–745 (2017). X. Ou, R. Horstmeyer, C. Yang, and G. Zheng, “Quantitative phase imaging via Fourier ptychographic microscopy,” Optics letters 38, 4845–4848 (2013). X. Ou, R. Horstmeyer, G. Zheng, and C. Yang, “High numerical aperture Fourier ptychography: principle, implementation and characterization,” Opt. Express 23, 3472–3491 (2015).
11 Coded Ptychographic Imaging 50. J. Sun, C. Zuo, L. Zhang, and Q. Chen, “Resolutionenhanced Fourier ptychographic microscopy based on high-numerical-aperture illuminations,” Scientific reports 7, 1–11 (2017). 51. M. Liang and C. Yang, “Implementation of free-space Fourier Ptychography with near maximum system numerical aperture,” Opt. Express 30, 20321–20332 (2022). 52. X. Ou, G. Zheng, and C. Yang, “Embedded pupil function recovery for Fourier ptychographic microscopy,” Opt. Express 22, 4960–4972 (2014). 53. P. Song, S. Jiang, H. Zhang, X. Huang, Y. Zhang, and G. Zheng, “Full-field Fourier ptychography (FFP): Spatially varying pupil modeling and its application for rapid field-dependent aberration metrology,” APL Photonics 4, 050802 (2019). 54. P. Li, D. J. Batey, T. B. Edo, and J. M. Rodenburg, “Separation of three-dimensional scattering effects in tilt-series Fourier ptychography,” Ultramicroscopy 158, 1–7 (2015). 55. C. Zuo, J. Li, J. Sun, Y. Fan, J. Zhang, L. Lu, R. Zhang, B. Wang, L. Huang, and Q. Chen, “Transport of intensity equation: a tutorial,” Optics and Lasers in Engineering, 106187 (2020). 56. J. Sun, Q. Chen, J. Zhang, Y. Fan, and C. Zuo, “Single-shot quantitative phase microscopy based on color-multiplexed Fourier ptychography,” Optics letters 43, 3365–3368 (2018). 57. J. Sun, C. Zuo, J. Zhang, Y. Fan, and Q. Chen, “Highspeed Fourier ptychographic microscopy based on
203
58.
59.
60.
61.
62.
63.
64. 65.
programmable annular illuminations,” Scientific reports 8, 1–12 (2018). L. Tian, Z. Liu, L.-H. Yeh, M. Chen, J. Zhong, and L. Waller, “Computational illumination for high-speed in vitro Fourier ptychographic microscopy,” Optica 2, 904–911 (2015). M. Guizar-Sicairos, S. T. Thurman, and J. R. Fienup, “Efficient subpixel image registration algorithms,” Optics letters 33, 156–158 (2008). P. Song, C. Guo, S. Jiang, T. Wang, P. Hu, D. Hu, Z. Zhang, B. Feng, and G. Zheng, “Optofluidic ptychography on a chip,” Lab on a Chip 21, 4549–4556 (2021). P. Song, S. Jiang, T. Wang, C. Guo, R. Wang, T. Zhang, and G. Zheng, “Synthetic aperture ptychography: coded sensor translation for joint spatial-Fourier bandwidth expansion,” Photonics Research 10, 1624– 1632 (2022). J. Rodenburg and A. Maiden, “Ptychography,” in Springer Handbook of Microscopy (Springer, 2019), pp. 819–904. M. Guizar-Sicairos and P. Thibault, “Ptychography: A solution to the phase problem,” Physics Today 74, 42–48 (2021). F. Pfeiffer, “X-ray ptychography,” Nature Photonics 12, 9–17 (2018). L. Loetgering, S. Witte, and J. Rothhardt, “Advances in laboratory-scale ptychography using high harmonic sources,” Opt. Express 30, 4133–4164 (2022).
Part III Coded Depth Imaging
Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy
12
Tingting Wu and Matthew D. Lew
Abstract
Fluorescent molecules are versatile nanoscale emitters that enable detailed observations of biophysical processes with nanoscale resolution. Because they are well-approximated as electric dipoles, imaging systems can be designed to visualize their 3D positions and 3D orientations, so-called dipolespread function (DSF) engineering, for 6D super-resolution single-molecule orientationlocalization microscopy (SMOLM). We review fundamental image-formation theory for fluorescent dipoles, as well as how phase and polarization modulation can be used to change the image of a dipole emitter produced by a microscope, called its DSF. We describe several methods for designing these modulations for optimum performance, as well as compare recently developed techniques, including the doublehelix, tetrapod, crescent, and DeepSTORM3D learned point-spread functions (PSFs), in addition to the tri-spot, vortex, pixOL, raPol, CHIDO, and MVR DSFs. We also cover common imaging system designs and techniques for implementing engineered DSFs. Finally, we discuss recent biological applications of 6D SMOLM and future
T. Wu · M. D. Lew () Department of Electrical and Systems Engineering, Washington University, St. Louis, MO, USA e-mail: [email protected]; [email protected]
challenges for pushing the capabilities and utility of the technology. Keywords
Single-molecule orientation-localization microscopy · Fluorescence · Molecular orientation · Dipole emission · Point-spread function · Phase mask · Polarization-sensitive imaging · Estimation theory · Optimal design · Lipid membranes · Amyloid aggregation · DNA structure · Actin filaments
12.1
Introduction
To break the Abbé diffraction limit, which prevents optical microscopy from resolving neighboring emitters closer than λ/2NA [1], where λ is the optical wavelength and NA represents the numerical aperture of the objective lens, singlemolecule localization microscopy (SMLM) utilizes fluorophores that blink over time. These molecules can be switched between dark and emissive states through transient binding of dyes to targets [2–4], photoactivation [5], photochemical switching [6], and a variety of other mechanisms [7, 8]. If the concentration of actively emitting fluorophores is sparse enough such that their images or point spread functions (PSFs) are well separated, then an appropriate SMLM image analysis algorithm or neural network can measure
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_12
207
208
T. Wu and M. D. Lew
Fig. 12.1 Super-resolved fluorescence imaging via single-molecule localization microscopy (SMLM). (a) Diffraction-limited image of the target. (b) Singlemolecule images where the PSFs of individual emitters are well-separated spatially over time. (c) Super-resolution
image reconstructed by estimating the 2D positions of each emitter shown in (b). (Reprinted by permission from the American Chemical Society: Chemical Reviews, Three-Dimensional Localization of Single Molecules for Super-Resolution Imaging and Single-Particle Tracking, von Diezmann, L., et al., © 2017)
each molecule’s position with high precision [9]. These blinking events can be accumulated over time into a super-resolved reconstruction of the biological target of interest (Fig. 12.1). Utilizing the standard diffraction-limited PSF, the 2D positions of individual emitters can be measured with high precision (typically 15 nm with 1000 photons detected per localization and 10 background photons per pixel). It is well known that the standard PSF is poorly suited for localizing point-like emitters in 3D, and thus, many coded aperture methods have been developed for 3D SMLM [10]. Here, we consider fluorescent molecules as oscillating electric dipoles (Fig. 12.2a), and thus, their radiation patterns, which contain characteristic intensity and polarization distributions (Fig. 12.2b), contain information about their orientations with respect to the imaging system. These orientations can, for example, probe the chemical environment surrounding the fluorescent molecule and provide additional insights into various biochemical processes [11– 14]. Perhaps unsurprisingly, the standard PSF is also suboptimal for measuring the orientations of fluorescent molecules [15, 16], so we review methods for measuring the 3D position and 3D orientation of fluorophores, thereby facilitating super-resolved single-molecule orientationlocalization microscopy (SMOLM).
Examining the fluorescence collected from a single molecule at the back focal plane (BFP, and also called the pupil or Fourier plane) and the image plane of the microscope shows how imaging systems can be engineered for superior performance (Fig. 12.2d, e). Molecules with varying axial (z) positions (Fig. 12.2d) and orientations (Fig. 12.2e) clearly exhibit different intensity patterns at the BFP, but the intensity patterns in the image plane, called the molecule’s dipole spread function (DSF), are extremely similar. Thus, the standard DSF has poor sensitivity for measuring the 3D positions and 3D orientations of single fluorophores. To solve this problem, many techniques modulate the phase and/or polarization of the fluorescence distribution at the BFP to better encode this 6D information into the DSFs measured in the image plane. We note that many other techniques modulate the polarization and/or spatial distribution of the illumination laser to measure molecular orientation; these techniques are outside the scope of this chapter, but refer interested readers to other literature [13, 17–22].
12.2
Forming Images of Dipoles: The Dipole-Spread Function (DSF)
We model a fluorophore as a dipole-like emitter [23–25] with a mean orientation of [θ, φ] in
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy
a
209
c
b h
Ω
coverslip
BFP
d
e 0 nm
200 nm
400 nm
(i) magnitude (x-pol)
(ii)
(iii)
phase (x-pol)
intensity
(i) 1
magnitude (x-pol)
1
0
0
(ii)magnitude (y-pol) π
1
-π
0
(iii) 1 0
intensity
1 0
Fig. 12.2 Imaging dipole-like emitters. (a) Schematic of a standard microscope with an emitter located at a height h and an objective focused at a height −z. (b) The emission pattern (white lines) of an oscillating dipole. The dipole’s emission probability (depicted as the distance separating the dipole and the purple surface) is proportional to sin2 η, where η is the angle between the emission dipole moment and the direction of fluorescence emission. (c) Modeling the rotational “wobble” of a dipole within a cone oriented at [θ, φ] with solid angle Ω . (d) (i) The magnitude and (ii) phase of the x-polarized electric field at the back focal plane (BFP), and (iii) the unpolarized intensity
(x and y polarizations summed together) at the image plane for dipoles oriented along μx . The three dipoles (left to right) are located at h = 0 nm, 200 nm, and 400 nm. (e) The magnitude of the (i) x-polarized and (ii) y-polarized electric field at the BFP, and (iii) the unpolarized intensity at the image plane for dipoles located at h = 200 nm. The three dipoles (left to right) have orientations [θ , φ, Ω] of [90◦ , 0◦ , 0 sr], [90◦ , 45◦ , 0 sr], and [0◦ , 0◦ , 0 sr]. For (d) and (e), the microscope is focused at z = − 200 nm, and the refractive indexes of the immersion medium of the objective and the target sample are equal (n = 1.518)
spherical coordinates, or equivalently a transition dipole moment μ = [μx , μy , μz ] in Cartesian space, and a “wobble” solid angle Ω that characterizes its rotational diffusion [26] (Fig.
12.2c). Using vectorial Green’s functions [27], the electric field EBFP distribution at the BFP of the imaging system of a dipole with orientation μ can be calculated as
210
T. Wu and M. D. Lew
E xBFP .E BFP (μ) = y E BFP
⎡ ⎤ μ g xx , g xy , g xz ⎣ x ⎦ μy = y y y gx , gy , gz μz
∈ C2×U ×V , (12.1) where the BFP is sampled using U × V discrete q spatial grid points, .E BFP ∈ CU ×V are the x and y q polarized electric fields, respectively, and .g i ∈ U ×V C is a so-called “basis field” observed at the BFP from a dipole with orientation μi . The superscript q represents the two orthogonal polarizations (x and y) of the electric field, which may be detected separately in a polarization-sensitive imaging system or simply summed incoherently in a standard epifluorescence microscope. Since the light travels mostly parallel to the z direction in the paraxial limit due to the magnification of the objective lens, the electric field does not have a z-polarized component. Note that the polarization components may be expressed equivalently on any basis, e.g., the radial and azimuthal directions [28]. For a microscope focused on a nominal focal plane (NFP) of −z (above the coverslip) and capturing fluorescence emitted by a dipole located at h (also above the coverslip) (Fig. 12.2a), the electric field at the BFP has an additional defocus phase modulation given by E BFP (μ, z, h) =
.
pf ⊙ pf ⊙
E xBFP y E BFP
As shown in Eqs. 12.1 and 12.2, emitters with different orientations and axial positions have different electric field magnitude, phase, and polarization distributions at the BFP. Quantum estimation theory shows that measuring the 3D orientation and 3D position of a single molecule using this optical intensity at the BFP can achieve optimal measurement precision, namely, precision limited by the quantum Cramér-Rao bound [29– 31]. However, the emission light from all fluorophores in the sample overlaps at the BFP; therefore, measuring molecular orientation at the BFP requires exciting and measuring one molecule at a time [32]. In contrast, well-separated PSFs at the image plane enable measuring the orientations and positions of multiple emitters simultaneously. The tube lens of the microscope performs a Fourier transform .F on the electric field EBFP at the BFP to produce the electric field Eimg at the image plane as E img =
.
I
. BFP
with pf = exp j kz z 1 − u2 + v 2
n2z 2 2 ∈ CU ×V , exp j kh h 1 − 2 u + v nh (12.3) where ⊙ is the element-wise multiplication operator, u,v ∈ RU × V are the locations of the U × V grid points within the BFP, .kz = 2π and .kh = nz 2π are the wavenumbers for light propagating in nh the immersion medium of the objective with a refractive index of nz (typically nz = 1.518) and in the sample with a refractive index √ of nh (nh = 1.33 for water), respectively, and .j = −1.
=
F E xBFP . y F E BFP
(12.4)
= E BFP E ∗BFP and
(12.5)
= E img E ∗img ,
(12.6)
∈ C2×U ×V and
.
The intensities IBFP at the BFP and Iimg at the image plane are the squared magnitudes of the electric fields and therefore are given by
I
. img
(12.2)
E ximg y E img
where * represents the complex conjugate operator. Extending Eqs. 12.5 and 12.6 using Eq. 12.1, the intensity distribution can be represented as a linear combination of six basis images weighted by the orientational second moments m as BFP BFP BFP BFP BFP I BFP = B BFP xx , B yy , B zz , B xy , B xz , B yz .
T 2 2 2 μx , μy , μz , μx μy , 〈μx μz 〉 , μy μz T = B BFP mxx , myy , mzz , mxy , mxz , myz = B BFP m, and (12.7)
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy
img img img img img I = B img xx , B yy , B zz , B xy , B xz , B yz 2 2 2 T μx , μy , μz , μx μy , 〈μx μz 〉 , μy μz
. img
= B BFP m, (12.8) where the basis matrices BBFP at the BFP and Bimg at the image plane are the system’s responses to each orientational moment (Fig. 12.3) and are defined as
x x ∗ p ⊙ g ⊙ g p BFP f f k l and (12.9) .B kl =
y
y ∗ pf ⊙ g k pf ⊙ g l
211
where Am ∈ RU × V and Pm ∈ RU × V represent amplitude and phase modulation at BFP coordinates (u, v), respectively. Notably, the modulation Am , and Pm may vary with position. Since the number of detected photons before photobleaching is extremely limited in SMLM, one may maximize the signal to noise ratio by constraining J to be a unitary transformation such that J =e
.
jγ
α −β ∗ , β α∗
(12.13)
where |α i |2 + |β i |2 = 1, γ i ∈ [0, 2π ], and the subscript i refers to the ith element of α, β ∈ CU × V ,
γ ∈ RU × V , and |·| represents the magnitude F p f ⊙ g xk F ∗ p f ⊙ g xl img . .B kl = of a complex number. We therefore obtain the y y F pf ⊙ g k F ∗ pf ⊙ g l constraints (12.10) 2 |A | . |A1 | = |A4 | = 1 − = 1 − |A3 |2 and 2 The orientational second moments mkl = 〈μk μl 〉 (12.14) are products of the first moments μ(t) timeaveraged over the exposure time e as .P 1 + P 4 = P 2 + P 3 = 2γ , (12.15) t=e .
〈μk μl 〉 =
μk (t)μl (t)dt,
(12.11)
t=0
where k and l represent the Cartesian components x, y, and z. Based on Eqs. 12.9 and 12.10, it is clear that any electric field modulation at the BFP will change system’s basis images BBFP at the BFP and Bimg at the image plane. Similar to the basis electric fields EBFP ∈ C2 × U × V , the basis matrices BBFP and Bimg contain both x- and ypolarized components (Fig. 12.3). Ding et al. and Zhang et al. have shown that directly imaging the two orthogonal polarization components separately and simultaneously enhance the precision of estimating molecular orientation [12, 28]. One may model phase and polarization modulation at the BFP using a modulation tensor J, given by J =
.
=
J1 J2 J3 J4
A1 ⊙ exp (j P 1 ) A2 ⊙ exp (j P 2 ) , A3 ⊙ exp (j P 3 ) A4 ⊙ exp (j P 4 ) (12.12)
where γ can be treated as a universal phase modulation that is independent of polarization. With modulation at the BFP, the x- and ypolarized electric fields and basis images become .
g xk,mod y g k,mod
y pf ⊙ J 1 ⊙ g xk + J 2 ⊙ g k = y pf ⊙ J 3 ⊙ g xk + J 4 ⊙ g k (12.16)
F g xk,mod F ∗ g xl,mod . = y y F g k,mod F ∗ g l,mod
img .B kl
(12.17)
Modulation at the BFP creates a shift-invariant DSF at the image plane. Recently, modulation outside of the Fourier plane has been explored. Due to the complex nature of shift-varying optical responses, such approaches require sophisticated algorithms to model light propagation and/or employ machine learning to design the appropriate modulation. Since these efforts are out of the scope of this review, we refer readers to reference [33].
212
T. Wu and M. D. Lew
Fig. 12.3 Basis images of x- and y-polarized microscopy. The (a) basis images at the back focal plane (BFP) are linked to the (b) basis images at the image plane through an optical Fourier transform .F performed by the microscope tube lens on the corresponding electric fields. Top row
12.3
How to Engineer a DSF
For estimating the 3D positions and 3D orientations of dipole-like emitters, imaging systems need (1) to avoid degeneracy (Fig. 12.4a), i.e., molecules with different 3D orientations or 3D positions need to generate distinct images on the
(red): x-polarized light, bottom row (blue): y-polarized light. The electric field at the BFP can be modulated by phase, amplitude, and/or polarization masks; only the standard DSF (no modulation) is shown here. Scale bar: 500 nm
camera; (2) to exhibit high precision, i.e., the dipole images must change dramatically if the molecule changes its orientation and/or position; and (3) have high detection sensitivity, i.e., images of dim dipoles must be easily detected above background noise. Early methods for measuring orientation focused on tackling the degeneracy problem; e.g.,
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy
1500 nm
a
0 nm
standard
(i)
1
500 nm
(ii)
0 -1
b
0
1 0
tetrapod
(i)
(ii)
2 0 -2
d
1
(ii)
2 0 -2
(i)
2000 nm
1000 nm
double helix
(i)
c
213
crescent
1 0
(ii)
2 0 -2
1 0
e DeepSTORM3D learned
(i)
(ii)
2 0 -2
1 0
Fig. 12.4 Engineered PSFs for estimating the 3D positions of single molecules: (a) standard, (b) double-helix [34], (c) tetrapod [35, 36], (d) crescent [37], and (e) DeepSTORM3D [38] learned PSFs. (i) The phase / J1 of modulation tensor J. For the five techniques, / J1 =/ J4 , |J1 | = |J4 | = 1, and |J2 | = |J3 | = 0. (ii) PSFs for isotropic
emitters (Ω = 2π) located at different axial positions h = {0,500,1000,1500,2000} nm. The microscope is focused at an NFP of z = − 1000 nm, and the emitters are immersed in water (n = 1.334). Images are normalized to the maximum intensity in each image. Scale bars: 500 nm
simply measuring linear dichroism, i.e., the intensity ratio between two orthogonally polarized detection channels, cannot distinguish between fixed dipoles oriented ±45◦ relative to the polarization axes. Defocusing the microscope to create ring-shaped DSFs enable sensitive measurements of molecular orientation [39], and recently, more advanced image models and algorithms enable orientation measurements from focused images of immobile molecules [40, 41]. Extending to 3D, the astigmatic [42] and double helix (Fig. 12.4b) [34] PSFs break the symmetry of imaging emitters below and above the focal plane to enable 3D
SMLM. On the other hand, emitters with different orientations show obvious differences in their intensity distributions at the BFP (Fig. 12.2e), and early designs divide the intensity at the BFP into multiple spots in the image plane for 3D orientation measurement, e.g., the bisected [43], quadrated pupil [44], and tri-spot (Fig. 12.5a) [45] DSFs. While the double-helix PSF was not explicitly designed for orientation measurements, it can be used to make accurate 6D SMOLM images (Fig. 12.4b) [46]. As improved iterative maximum likelihood estimators [47] and neural networks [38–49] for
214
T. Wu and M. D. Lew
Fig. 12.5 Engineered DSFs for measuring the 3D orientations of single molecules: phase modulation methods (a) tri-spot [45], (b) vortex [50], and (c) pixOL [51]; and combined polarization and phase modulation methods (d) radial and azimuthal polarization (raPol) [28], (e) CHIDO [52], and (f) multi-view reflector (MVR) [53]. (a–c) (i) phase / J (rad) of the modulation tensor. For phase modulation, |J1 | = |J4 | = 1 and |J2 | = |J3 | = 0. (ii) Simulated DSFs for dipoles with different orientations. Four dipoles
(left to right) have orientations [θ, φ, Ω] of [90◦ , 0◦ , 0 sr], [45◦ , 45◦ , 0 sr], [0◦ , ∼, 0 sr], and [∼, ∼, 2π sr]. (d,e,f) (i) The magnitude |J| and (ii) phase / J of the modulation tensor J, and (iii) simulated DSFs as above. Red: xpolarized channel. Blue: y-polarized channel. Images are normalized to the maximum intensity of the first emitter ([θ, φ, Ω] =[90◦ , 0◦ , 0 sr]) for each method. Scale bars: 500 nm
estimating orientation became available, new PSF designs aimed for optimal precision, which typically results in PSFs with complex shapes. To
quantitatively compare how well these shapes encode orientation information, designers compute the Fisher information Kx of a parameter x
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy
contained within an image I as N 1 ∂I Tl ∂I l .Kx = , Il ∂x ∂x l=1
(12.18)
where l indexes the N pixels within an image of a single molecule corrupted by Poisson noise and neglecting camera readout noise [54]. The inverse of Fisher information, called the CramérRao (lower) bound (CRB)
[54] quantifies the best-possible variance .Var xˆ of an unbiased estimator of parameter x, given by
Var xˆ ≥ Kx −1 .
.
(12.19)
By using the CRB as the loss function of an optimization algorithm, the tetrapod PSF (Fig. 12.4c) [35, 36] and pixOL DSF (Fig. 12.5c) [51] minimize the CRB for estimating emitters’ 3D positions and 3D orientations, respectively. One may also minimize the CRB of estimating distances between two emitters; this strategy produced the crescent PSF (Fig. 12.4d), a nearly optimal phase mask for estimating the 3D positions of closely spaced emitters [37]. A significant difficulty with this approach is optimizing degeneracy, precision, and detectability simultaneously, which often exhibit tradeoffs with one another. One natural solution is to design DSFs that directly maximize estimation performance on trial measurements using training data. In the context of 3D SMLM, Nehme et al. codesigned an estimation neural network and a phase mask to achieve high estimation precision and high detectability for images containing dense emitters [38] (Fig. 12.4e). Recently, a similar approach was used to design the arrowhead DSF [55]. For training purposes, the Jaccard index is typically used to quantify the success rate of the estimation algorithm. DSFs can also be optimized for specific imaging conditions, e.g., molecules lying in the xy-plane or out-of-plane, dense concentrations of blinking molecules, high fluorescence background, or samples that cause relatively large optical aberrations. The duo-spot DSF [11] and radially and azimuthally polarized (raPol)
215
DSF (Fig. 12.5d) [28] have been shown to have high performance for emitters tilted out of the coverslip plane. Many engineered PSFs have large footprints, especially those composed of multiple spots [44, 45]. For samples containing dense emitters, images of neighboring molecules may suffer from severe overlap on the camera and become difficult or impossible to resolve. The vortex [50, 56], pixOL [51], and CHIDO [52] DSFs all have small footprints, high precision, and high detectability (Fig. 12.5b, c, e). Other methods utilize polarization filters to accomplish similar goals, including 4polar-STORM [20] and POLCAM [22]. A key difficulty of measuring 3D positions and 3D orientations simultaneously is that the covariance between measurement parameters is often nonzero, leading to degraded estimation precision or, even worse, coupled biases in the estimates. To overcome this limitation, Zhang et al. designed a multi-view reflector microscope that separates fluorescence into radially and azimuthally polarized channels, and then further separates fluorescence at the pupil plane into four nonoverlapping regions on a camera [53] (Fig. 12.5f). In each channel, the DSFs are similar to that of a standard microscope, and each channel effectively views the sample from a slightly different direction and exhibits a smaller effective numerical aperture. A key feature of this design is that the position of a molecule, including its axial location, is encoded into the lateral position of the DSF in each channel. Orientation information, however, is contained within the relative brightness of the DSF in each channel, thereby decoupling position estimates from orientation estimates. The authors demonstrate robust, accurate, and precise 6D measurements of molecular position and orientation in the presence of refractive index mismatch.
12.4
Implementing an Engineered DSF
DSF engineering is most commonly achieved by placing optical elements at the BFP, as this position provides convenient access to Fourier space
216
T. Wu and M. D. Lew
a
objective
b
tube lens
lens 1
BFP
camera
lens 2
IIP
c
x y z lens 1
IIP
d
PBS M1
IIP
birefringent/phase vaWP mask lens 1
M1
PBS
M3
i LC SLM z
M2
x y
mirror ii LC SLM z y x
M5 M6
M3 M4
lens 2
lens 2
M2
M4
lens 3
lens 3 camera 1
mirror
Fig. 12.6 Imaging systems for implementing DSF engineering. (a) Fluorescence is collected by an objective. An intermediate imaging plane (IIP) is formed after the tube lens. (b) To access the back focal plane (BFP), a pair of lenses (1 and 2) is used to form a 4f system and focus the fluorescence onto a camera. (c) Modulating light using a reflective liquid crystal (LC) spatial light modulator (SLM). The extended 4f system in (b) is reconfigured into (c) with the LC SLM placed at the BFP. A polarizing beam splitter (PBS) is added after the IIP to split the fluorescence into x- (red) and y-polarized (blue) channels. A pyramid mirror is used to reflect the light from two channels onto
the surface of the reflective SLM. The pyramid mirror rotates x- and y-polarized light such that they are approximately parallel at the SLM surface (insets (i) and (ii)). Two lenses (2 and 3) focus the light onto two non-overlapping regions of a single camera. (d) Modulating light using a transmissive mask. For modulating polarization, a variable wave plate (vaWP) can be used to correct retardance variations. A birefringent or phase mask is placed at the BFP to modulate the light. A PBS splits the fluorescence into x- and y-polarized channels and two lenses (2 and 3) focus the light onto two non-overlapping regions of a single camera
so that the light captured from each molecule can be manipulated identically and simultaneously. Single-molecule imaging typically uses an objective with a large numerical aperture (NA) to maximize the number of captured photons and improve imaging resolution. The physical pupil of many microscope objectives is typically located within the lens housing (Fig. 12.6a). Thus, to access the BFP, researchers use imaging relays [27], e.g., a pair of lenses forming a 4f system, to simultaneously form a conjugate BFP and a conjugate image plane (Fig. 12.6b). By adjusting the focal lengths of the two lenses, the physical size of the BFP can be adjusted to match the optical components used for DSF engineering. Spatial light modulators (SLMs) are widely employed to modulate the phase, intensity, and/or polarization of the optical field. They are electronically programmable to provide pixel-
level control, and two technologies are popular: micro-electromechanical (MEMS) deformable mirrors and liquid crystals (LCs). MEMS deformable mirrors use electromechanical actuators to deform a flexible mirror surface, thereby controlling phase locally near each actuator [57]. Alternatively, segmented MEMS mirrors feature multiple reflective surfaces whose tilts are controlled independently. Although MEMS deformable mirrors have fast modulation speeds (1–100 kHz) and are widely used in adaptive optics [58, 59], they have relatively few independent mirror segments (typically hundreds to thousands) that exhibit a moderate degree of crosstalk and are thus less suitable for creating large-magnitude or discontinuous optical phase masks. Alternatively, LC SLMs generally feature many more independent pixels (104 –106 ) but
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy
suffer from smaller fill factors and slower response times than MEMS deformable mirrors. Above a certain threshold of the applied electric field, the long axes of the molecules within an LC pixel will align with the field. Since these molecules are optically birefringent, any systematic change in molecular alignment will change the effective refractive index and, thus, the phase of the LC pixel for a suitable incident polarization. Since LC materials are dispersive, the user must calibrate the optical phase response as a function of the applied electric field before using them for DSF engineering. In addition, most LC SLMs cannot modulate two orthogonal polarization states simultaneously, and one must filter out the unwanted polarization or build a specific optical system to rotate and align the optical polarizations with the SLM’s LCs (Fig. 12.6c). Other than using programmable phase modulators, one may manufacture static phase masks that have higher photon efficiencies than SLMs. These masks are often manufactured using photolithography [60], but recent developments in 3D printing [61, 62] have reduced the complexity of manufacturing high-quality masks. Doublehelix [34], tetrapod [35, 36], and vortex [56] phase masks are commercially available. Polarization optics, such as vortex half-waveplates, whose fast axes rotate systematically across the optic, have been useful for implementing radially and azimuthally polarized detection [28, 53] in SMOLM. More complex optical components, such as metasurface masks [63] and birefringent optical elements, can modulate the phases of the two orthogonal polarization states independently and simultaneously. The y-phi metasurface mask [64] uses a hexagonal array of elliptical nanoposts etched out of amorphous silicon to convert radially and azimuthally polarized light into x and y polarized light, respectively; Backlund et al. demonstrated this mask for removing dipoleinduced localization bias in SMLM. CHIDO [52] uses a stress-engineered BK7 glass window [65] to encode position and orientation information into left- and right-handed circularly polarized detection channels of a fluorescence microscope.
217
In fluorescence imaging, one must take care to compensate for polarization-dependent retardances stemming from any dichroic mirrors (DMs) in the optical path; a variable waveplate or birefringent polarization compensator can be used to restore the proper phase relationship between polarizations (Fig. 12.6d).
12.5
Applications of SMOLM Imaging
Three-dimensional SMLM reconstructs the 3D morphology of biological structures with nanoscale detail, but since it assumes fluorescent emitters are point sources, all vectorial information about molecular conformations and architecture is lost. DSF engineering inherently enhances imaging fidelity beyond standard 3D SMLM by enabling 6D imaging of 3D molecular orientations and 3D positions with nanoscale precision and accuracy. Here, we briefly review how the additional molecular orientation information provided by SMOLM yields rich biophysical insights beyond standard SMLM. Tracking a molecule’s position and orientation within soft matter is critical for understanding the intrinsically heterogeneous and complex interactions of its various components across length scales. Lu et al. measured the orientation of lipophilic probes within supported lipid bilayers (SLBs) with different chemical compositions [11]. SMOLM imaging showed that the orientational dynamics of the lipophilic probe Nile red (NR) are sensitive to cholesterol concentration; namely, as cholesterol increased, NR became increasingly parallel to the lipid molecules, and its rotational “wobble” decreased (Fig. 12.7a(i)). NR orientations could also be used to discern the degree of saturation of the lipid acyl chains (i.e., detecting the difference between DDPC, DOPC, and POPC) within the SLB (Fig. 12.7a(ii)). NR’s binding behavior within a spherical SLB has been used to experimentally demonstrate accurate and precise 3D orientation and 3D position imaging using various DSFs [22, 51, 53]. Zhang et al. track the
Fig. 12.7 Applications of SMOLM imaging. (a) Measuring the tilt θ and wobble Ω of the lipophilic probe Nile red (NR) transiently bound to supported lipid bilayers (SLBs) (i) with various concentrations of cholesterol and (ii) composed of lipids whose acyl chains exhibit different degrees of saturation. (b) The azimuthal orientations φ of NR transiently bound to amyloid fibrils. Inset: NR molecules are mostly aligned parallel to the long axis of each fibril. (i,ii,iii) All NR orientation measurements within the boxes shown in (b). The lines are oriented and color-coded according to the estimates of φ. (c) The orientations of NR transiently bound to spherical SLBs (350-nm radius) without amyloid beta (Aβ42) and after 7 days of incubation with Aβ42 monomers. (i) y-z and (ii) x-y views of the NR localizations at three axial slices through the SLB (z1 = 600, z2 = 400, z3 = 200 nm). (d)(i)(ii) The orientation of SYTOX on λ-DNA strands, color-coded according to the mean azimuthal orientation φ of dye molecules measured within 30 nm voxels. (ii) inset: all SYTOX orientation measurements within the boxes shown in (ii). The lines are oriented according to the estimates of φ and color-coded with wobble angle Ω. (e) STORM images of actin filament organization in (i) control and (ii) blebbistatin-treated U2OS cells. (iiiiv) The orientation of AlexaFluor 488 within the boxes
shown in (i)(ii). The lines are oriented and color-coded according to the estimates of φ. Only emitters with a small wobble angle are plotted here (refer to Ref. [20] for details). (Panel (a) is reprinted by permission from the John Wiley and Sons: Angewandte Chemie International Edition, Single-Molecule 3D Orientation Imaging Reveals Nanoscale Compositional Heterogeneity in Lipid Membranes, Lu, J., et al. [11], © 2020. Panel (b) is reprinted by permission from the Optica publication group: Optica, Single-molecule orientation localization microscopy for resolving structural heterogeneities between amyloid fibrils, Ding, T., et al. [12], © 2020. Panel (c) is reprinted by permission from the Springer Nature: Nature photonics, Six-dimensional single-molecule imaging with isotropic resolution using a multi-view reflector microscope, Zhang, O., et al. [53], © 2023. Panel (d) is reprinted by permission from the Optica publication group: Optica, Enhanced DNA imaging using super-resolution microscopy and simultaneous single-molecule orientation measurements, Backer, A. S., et al. [18], © 2016. Panel (e) is reprinted by permission from the Springer Nature: Nature communications, 4polar-STORM polarized superresolution imaging of actin filament organization in cells, Rimoli, C. V., et al. [20], © 2022)
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy
positions and orientations of dyes in SLBs treated with cholesterol-loaded methyl-β-cyclodextrin (MβCD-chol) [28]. Before treatment, NR mostly exhibited small translational motions and large rotational movements. However, after MβCD-chol treatment, Nile red exhibited “jump diffusion” between regions of the intact membrane, i.e., translational motions were large but NR orientations were relatively fixed. Thus, SMOLM position and orientation information can complement one another to provide rich nanoscale detail about molecular interactions in biological systems. Amyloid aggregates are signatures of various neurodegenerative disorders. Shaban et al. [66] and Ding et al. [12] showed that amyloidophilic dyes bind to amyloid fibrils in specific configurations. Both thioflavin T and Nile red bind parallel to the grooves formed by crossed-beta sheets within amyloid-beta fibers, i.e., along their long axes (Fig. 12.7b). However, Ding and Lew [50] also observed remarkable heterogeneity in NR orientations for other amyloid aggregates, thus implying similar variations in their nanoscale organization. Both ordered, “fibril-like” oligomers and disordered, amorphous oligomers were measured, even though these particles were aggregated under identical conditions and imaged in the same field of view. Similarly, Zhang et al [53] observed the disruption of spherical SLBs by the aggregation of amyloid-beta. As the lipid membranes were infiltrated by amyloid-beta, NR exhibited increasingly varied orientations and wobble, which is in direct contrast to its well-ordered orientations and small wobble when in contact with amyloid fibers (Fig. 12.7c). Dye orientations have also been utilized to characterize DNA structure. Several groups [13, 18, 47, 56, 67] showed that the intercalating dyes SYTOX Orange and YOYO-1 bind perpendicular to the long axis of the DNA (Fig. 12.7d). In contrast, the minor-grove binding dye, SiRHoechst, shows a broad orientation distribution when bound to DNA. Backer et al. [13, 68] further used YOYO-1 to resolve the inclination of DNA base pairs within S-DNA as a function of DNA pulling geometry, torsional constraint, and negative supercoiling.
219
Furthermore, fluorophore orientations have been used to study the orientational order of actin stress fibers. Notably, the structure of each dye significantly affected how it was oriented relative to its parent fiber [67]; Alexa Fluor 488-phalloidin pointed along the fiber, Atto 633-phalloidin was oriented perpendicular to the fiber, and Alexa Fluor 647-phalloidin was relatively free to rotate and had no preferred direction. Rimoli et al. [20] thus used Alexa Fluor 488 to study actin stress fibers in U2OS and B16-F1 cells, noting that ventral fibers exhibit the highest alignment among the various types and that blebbistatin, a drug that inhibits myosin II activity, decreased this alignment slightly (Fig. 12.7e). Interestingly, lamellipodia SMOLM imaging suggests that actin filaments with preferred angular distributions and with more isotropic distributions in 3D coexist simultaneously.
12.6
Challenges and Future Opportunities
In this chapter, we described the theory, design, implementation, and applications of encoding information on the 3D positions and 3D orientations of single molecules into the images produced by a microscope, i.e., DSF engineering. These technologies have been demonstrated for in vitro studies of proteins and model biological systems, as well as cellular imaging. However, in vivo imaging of tissues and organisms requires additional development to design engineered DSFs that can cope with the scattering and autofluorescence within thick samples. One way to achieve this goal is to integrate engineered DSFs with advanced illumination strategies, e.g., light-sheet microscopy [69, 70]. Furthermore, since some engineered DSFs encode the 6D information into subtle changes in DSF shape (Fig. 12.5), optical aberrations will influence and likely degrade estimation precision and accuracy. In this context, some combination of adaptive optics [71–73] and reliable methods to calibrate both phase and polarization aberrations will be critical for robust SMOLM imaging [74, 75].
220
Decoding 6D information from DSFs in the presence of photon shot noise also requires further development in robust estimation algorithms that are unbiased, achieve precision close to the CRB, and are capable of processing data in realtime [22]. Machine learning will undoubtedly become more important in single-molecule image [38, 48, 49, 76] and data analysis [77, 78]. Further, most engineered DSFs require an additional 4f system and optical elements for modulating phase, polarization, or both. While these instruments can be built by trained optical engineers and microscopists, these designs are not suitable for the wider scientific community. Simpler, “plug and play” systems, such as using a polarization-sensitive camera [22] in place of a standard sensor, will aid in wider adoption of SMOLM techniques. The utility of SMOLM depends upon linking the positions and orientations of fluorescent molecules to the biophysical processes of interest. Thus, dye-biomolecule interactions must be carefully characterized [11] or engineered to ensure that 6D SMOLM imaging data are relevant. In transient-binding [2, 4, 79, 80] and DNA PAINT labeling schemes [81, 82], the rigidity and orientational configuration of the binding moiety determine how the dye’s orientational dynamics are coupled to the biomolecular target. For traditional covalent attachment of fluorophores to biomolecules, the structure and rigidity of the linker are critical [83, 84]. While multi-functional attachment, i.e., linking the dye and biomolecule at more than one location, has been extraordinarily useful in a variety of single-molecule studies [85, 86], more work is needed to make these schemes more facile and robust for a wider variety of biomolecular targets. This chapter reviews DSF engineering for the specific use of 6D super-resolution microscopy. However, one can imagine controlling more degrees of freedom for each photon in the imaging system. For example, the wavelength of each photon can be encoded into the PSF to enable multi-color imaging using a grayscale camera [87]; ambient photons may be manipulated for object detection and depth estimation in computer vision [88]; and the wavelength and phase of each
T. Wu and M. D. Lew
photon may be modulated such that an imaging system is optimized for measuring the separation between two objects, rather than their absolute positions [89]. We eagerly anticipate the new capabilities and discoveries that are enabled by higher dimensional control and imaging in nextgeneration coded optical imaging systems.
References 1. Lauterbach, M. A. Finding, defining and breaking the diffraction barrier in microscopy–a historical perspective. Optical Nanoscopy 1, 1–8 (2012). 2. Sharonov, A. & Hochstrasser, R. M. Wide-field subdiffraction imaging by accumulated binding of diffusing probes. Proceedings of the National Academy of Sciences 103, 18911–18916 (2006). 3. Jungmann, R. et al. Single-Molecule Kinetics and Super-Resolution Microscopy by Fluorescence Imaging of Transient Binding on DNA Origami. Nano Lett 10, 4756–4761 (2010). 4. Schoen, I., Ries, J., Klotzsch, E., Ewers, H. & Vogel, V. Binding-activated localization microscopy of DNA l. Nano Lett 11, 4008–4011 (2011). 5. Betzig, E. et al. Imaging intracellular fluorescent proteins at nanometer resolution. 313, 1642–1645 (2006). 6. Rust, M. J., Bates, M. & Zhuang, X. Sub-diffractionlimit imaging by stochastic optical reconstruction microscopy (STORM). Nat Methods 3, 793–796 (2006). 7. Li, H. & Vaughan, J. C. Switchable Fluorophores for Single-Molecule Localization Microscopy. Chem Rev 118, 9412–9454 (2018). 8. Jradi, F. M. & Lavis, L. D. Chemistry of Photosensitive Fluorophores for Single-Molecule Localization Microscopy. ACS Chem Biol 14, 1077–1090 (2019). 9. Sage, D. et al. Super-resolution fight club: assessment of 2D and 3D single-molecule localization microscopy software. Nat Methods 16, 387–395 (2019). 10. von Diezmann, L., Shechtman, Y. & Moerner, W. E. Three-Dimensional Localization of Single Molecules for Super-Resolution Imaging and Single-Particle Tracking. Chem Rev 117, 7244–7275 (2017). 11. Lu, J., Mazidi, H., Ding, T., Zhang, O. & Lew, M. D. Single-Molecule 3D Orientation Imaging Reveals Nanoscale Compositional Heterogeneity in Lipid Membranes. Angewandte Chemie International Edition 59, 17572–17579 (2020). 12. Ding, T., Wu, T., Mazidi, H., Zhang, O. & Lew, M. D. Single-molecule orientation localization microscopy for resolving structural heterogeneities between amyloid fibrils. Optica 7, 602 (2020). 13. Backer, A. S. et al. Single-molecule polarization microscopy of DNA intercalators sheds light on the structure of S-DNA. Sci Adv 5, eaav1083 (2019).
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy 14. Beausang, J. F., Shroder, D. Y., Nelson, P. C. & Goldman, Y. E. Tilting and wobble of Myosin v by high-speed single-molecule polarized fluorescence microscopy. Biophys J 104, 1263–1273 (2013). 15. Agrawal, A., Quirin, S., Grover, G. & Piestun, R. Limits of 3D dipole localization and orientation estimation for single-molecule imaging: towards Green’s tensor engineering. Opt Express 20, 26667 (2012). 16. Zhang, O. & Lew, M. D. Single-molecule orientation localization microscopy II: a performance comparison. Journal of the Optical Society of America A 38, 288 (2021). 17. Backlund, M. P., Lew, M. D., Backer, A. S., Sahl, S. J. & Moerner, W. E. The role of molecular dipole orientation in single-molecule fluorescence microscopy and implications for super-resolution imaging. ChemPhysChem 15, 587–599 (2014). 18. Backer, A. S., Lee, M. Y. & Moerner, W. E. Enhanced DNA imaging using super-resolution microscopy and simultaneous single-molecule orientation measurements. Optica 3, 659 (2016). 19. Blanchard, A. T., Brockman, J. M., Salaita, K. & Mattheyses, A. L. Variable incidence angle linear dichroism (VALiD): a technique for unique 3D orientation measurement of fluorescent ensembles. Opt Express 28, 10039 (2020). 20. Rimoli, C. V., Valades-Cruz, C. A., Curcio, V., Mavrakis, M. & Brasselet, S. 4polar-STORM polarized super-resolution imaging of actin filament organization in cells. Nat Commun 13, (2022). 21. Thorsen, R. Ø., Hulleman, C. N., Rieger, B. & Stallinga, S. Photon efficient orientation estimation using polarization modulation in single-molecule localization microscopy. Biomed Opt Express 13, 2835 (2022). 22. Bruggeman, E. et al. POLCAM: Instant molecular orientation microscopy for the life sciences. bioRxiv (2023) https://doi.org/10.1101/2023.02.07.527479. 23. Novotny, L. & Hecht, B. Principles of Nano-Optics (Cambridge University Press, 2006) 24. Chandler, T., Shroff, H., Oldenbourg, R. & Rivière, P. L. Spatio-angular fluorescence microscopy I. Basic theory. J. Opt. Soc. Am. A 36, 1334–1345 (2019). 25. Chandler, T., Shroff, H., Oldenbourg, R. & La Rivière, P. Spatio-angular fluorescence microscopy II. Paraxial 4f imaging. J. Opt. Soc. Am. A 36, 1346– 1360 (2019). 26. Stallinga, S. Effect of rotational diffusion in an orientational potential well on the point spread function of electric dipole emitters. Journal of the Optical Society of America A 32, 213 (2015). 27. Backer, A. S. & Moerner, W. E. Extending singlemolecule microscopy using optical fourier processing. J Phys Chem B 118, 8313–8329 (2014). 28. Zhang, O., Zhou, W., Lu, J., Wu, T. & Lew, M. D. Resolving the three-dimensional rotational and translational dynamics of single molecules using radially and azimuthally polarized fluorescence. Nano Lett 22, 1024–1031 (2022).
221
29. Zhang, O. & Lew, M. D. Quantum limits for precisely estimating the orientation and wobble of dipole emitters. Phys Rev Res 2, 33114 (2020). 30. Zhang, O. & Lew, M. D. Single-molecule orientation localization microscopy I: fundamental limits. Journal of the Optical Society of America A 38, 277–287 (2021). 31. Backlund, M. P., Shechtman, Y. & Walsworth, R. L. Fundamental precision bounds for three-dimensional optical localization microscopy with poisson statistics. Phys Rev Lett 121, 023904 (2018). 32. Andreas Lieb, M., Zavislan, J. M., & Novotny, L. Single-molecule orientations determined by direct emission pattern imaging. J. Opt. Soc. Am. B, 21(6), 1210 (2004). 33. Ferdman, B., Saguy, A., Xiao, D. & Shechtman, Y. Diffractive optical system design by cascaded propagation. Opt Express 30, 27509 (2022). 34. Pavani, S. R. P. et al. Three-dimensional, singlemolecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function. Proceedings of the National Academy of Sciences 106, 2995–2999 (2009). 35. Shechtman, Y., Sahl, S. J., Backer, A. S. & Moerner, W. E. Optimal point spread function design for 3D imaging. Phys Rev Lett 113, 133902 (2014). 36. Shechtman, Y., Weiss, L. E., Backer, A. S., Sahl, S. J. & Moerner, W. E. Precise three-dimensional scanfree multiple-particle tracking over large axial ranges with tetrapod point spread functions. Nano Lett 15, 4194–4199 (2015). 37. Jusuf, J. M. & Lew, M. D. Towards optimal point spread function design for resolving closely spaced emitters in three dimensions. Opt Express 30, 37154 (2022). 38. Nehme, E. et al. DeepSTORM3D: dense 3D localization microscopy and PSF design by deep learning. Nat Methods 17, 734–740 (2020). 39. Böhmer, M. & Enderlein, J. Orientation imaging of single molecules by wide-field epifluorescence microscopy. Journal of the Optical Society of America B 20, 554 (2003). 40. Mortensen, K. I., Churchman, L. S., Spudich, J. A. & Flyvbjerg, H. Optimized localization analysis for single-molecule tracking and super-resolution microscopy. Nat Methods 7, 377–81 (2010). 41. Mortensen, K. I., Sung, J., Flyvbjerg, H. & Spudich, J. A. Optimized measurements of separations and angles between intra-molecular fluorescent markers. Nat Commun 6, 8621 (2015). 42. Huang, B., Wang, W., Bates, M. & Zhuang, X. Threedimensional super-resolution imaging by stochastic optical reconstruction microscopy. Science (1979) 319, 810–813 (2008). 43. Backer, A. S., Backlund, M. P., von Diezmann, L., Sahl, S. J. & Moerner, W. E. A bisected pupil for studying single-molecule orientational dynamics and its application to three-dimensional super-resolution microscopy. Appl Phys Lett 104, 193701 (2014).
222 44. Backer, A. S., Backlund, M. P., Lew, M. D. & Moerner, W. E. Single-molecule orientation measurements with a quadrated pupil. Opt Lett 38, 1521 (2013). 45. Zhang, O., Lu, J., Ding, T. & Lew, M. D. Imaging the three-dimensional orientation and rotational mobility of fluorescent emitters using the Tri-spot point spread function. Appl Phys Lett 113, 031103 (2018). 46. Backlund, M. P. et al. Simultaneous, accurate measurement of the 3D position and orientation of single molecules. Proceedings of the National Academy of Sciences 109, 19087–19092 (2012). 47. Mazidi, H., King, E. S., Zhang, O., Nehorai, A. & Lew, M. D. Dense super-resolution imaging of molecular orientation via joint sparse basis deconvolution and spatial pooling. in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) vols 2019-April 325–329 (IEEE, 2019). 48. Wu, T., Lu, P., Rahman, M. A., Li, X. & Lew, M. D. Deep-SMOLM: deep learning resolves the 3D orientations and 2D positions of overlapping single molecules with optimal nanoscale resolution. Opt Express 30, 36761 (2022). 49. Nehme, E., Weiss, L. E., Michaeli, T. & Shechtman, Y. Deep-STORM: super-resolution single-molecule microscopy by deep learning. Optica 5, 458 (2018). 50. Ding, T. & Lew, M. D. Single-molecule localization microscopy of 3D orientation and anisotropic wobble using a polarized vortex point spread function. J Phys Chem B 125, 12718–12729 (2021). 51. Wu, T., Lu, J. & Lew, M. D. Dipole-spread-function engineering for simultaneously measuring the 3D orientations and 3D positions of fluorescent molecules. Optica 9, 505 (2022). 52. Curcio, V., Alemán-Castañeda, L. A., Brown, T. G., Brasselet, S. & Alonso, M. A. Birefringent Fourier filtering for single molecule coordinate and height super-resolution imaging with dithering and orientation. Nat Commun 11, 5307 (2020). 53. Zhang, O. et al. Six-dimensional single-molecule imaging with isotropic resolution using a multi-view reflector microscope. Nat Photonics 17, 179–186 (2023). 54. Chao, J., Sally Ward, E. & Ober, R. J. Fisher information theory for parameter estimation in single molecule microscopy: tutorial. Journal of the Optical Society of America A 33, B36 (2016). 55. Jouchet, P., Roy, A. R., & Moerner, W. E. Combining Deep Learning Approaches and Point Spread Function Engineering for Simultaneous 3D Position and 3D Orientation Measurements of Fluorescent Single Molecules. Optics Communications. 542, 129589 (2023). 56. Hulleman, C. N. et al. Simultaneous orientation and 3D localization microscopy with a Vortex point spread function. Nat Commun 12, 5934 (2021). 57. Bifano, T. MEMS deformable mirrors. Nat Photonics 5, 21–23 (2011).
T. Wu and M. D. Lew 58. Yoon, S. et al. Deep optical imaging within complex scattering media. Nature Reviews Physics 2, 141–158 (2020). 59. Wan, Y., McDole, K. & Keller, P. J. Light-sheet microscopy and its potential for understanding developmental processes. Annu Rev Cell Dev Biol 35, 655– 681 (2019). 60. Gahlmann, A. et al. Quantitative multicolor subdiffraction imaging of bacterial protein ultrastructures in three dimensions. Nano Lett 13, 987–993 (2013). 61. Boominathan, V., Adams, J. K., Robinson, J. T. & Veeraraghavan, A. PhlatCam: designed phase-mask based thin lensless camera. IEEE Trans Pattern Anal Mach Intell 42, 1618–1629 (2020). 62. Orange-Kedem, R. et al. 3D printable diffractive optical elements by liquid immersion. Nat Commun 12, 3067 (2021). 63. Arbabi, A., Horie, Y., Bagheri, M. & Faraon, A. Dielectric metasurfaces for complete control of phase and polarization with subwavelength spatial resolution and high transmission. Nat Nanotechnol 10, 937– 943 (2015). 64. Backlund, M. P. et al. Removing orientation-induced localization biases in single-molecule microscopy using a broadband metasurface mask. Nat Photonics 10, 459–462 (2016). 65. Ramkhalawon, R. D., Brown, T. G. & Alonso, M. A. Imaging the polarization of a light field. Opt Express 21, 4106 (2013). 66. Shaban, H. A., Valades-Cruz, C. A., Savatier, J. & Brasselet, S. Polarized super-resolution structural imaging inside amyloid fibrils using Thioflavine T. Sci Rep 7, 1–10 (2017). 67. Valades Cruz, C. A. et al. Quantitative nanoscale imaging of orientational order in biological filaments by polarized superresolution microscopy. Proceedings of the National Academy of Sciences 113, E820– E828 (2016). 68. Backer, A. S. et al. Elucidating the role of topological constraint on the structure of overstretched DNA using fluorescence polarization microscopy. J Phys Chem B 125, 8351–8361 (2021). 69. Gustavsson, A.-K., Petrov, P. N. & Moerner, W. E. Light sheet approaches for improved precision in 3D localization-based super-resolution imaging in mammalian cells [Invited]. Opt Express 26, 13122 (2018). 70. Wan, Y., Mcdole, K. & Keller, P. J. Light-sheet microscopy and its potential for understanding developmental processes. Annu Rev Cell Dev Biol. 35, 655– 681 (2019). 71. Hampson, K. M. et al. Adaptive optics for highresolution imaging. Nature Reviews Methods Primers 1, 68 (2021). 72. Siemons, M. E., Hanemaaijer, N. A. K., Kole, M. H. P. & Kapitein, L. C. Robust adaptive optics for localization microscopy deep in complex tissue. Nat Commun 12, (2021).
12 Dipole-Spread Function Engineering for Six-Dimensional Super-Resolution Microscopy 73. Xu, F. et al. Three-dimensional nanoscopy of whole cells and tissues with in situ point spread function retrieval. Nat Methods 17, 531–540 (2020). 74. Ferdman, B. et al. VIPR: vectorial implementation of phase retrieval for fast and accurate microscopic pixel-wise pupil estimation. Opt Express 28, 10179 (2020). 75. Alemán-Castañeda, L. A. et al. Using fluorescent beads to emulate single fluorophores. Journal of the Optical Society of America A 39, C167 (2022). 76. Speiser, A. et al. Deep learning enables fast and dense single-molecule localization with high accuracy. Nat Methods 18, 1082–1090 (2021). 77. Granik, N. et al. Single-particle diffusion characterization by deep learning. Biophys J 117, 185–192 (2019). 78. Khater, I. M., Nabi, I. R. & Hamarneh, G. A review of super-resolution single-molecule localization microscopy cluster analysis and quantification methods. Patterns 1, 100038 (2020). 79. Ries, J. et al. Superresolution imaging of amyloid fibrils with binding-activated probes. ACS Chem Neurosci 4, 1057–61 (2013). 80. Spehar, K. et al. Super-resolution imaging of amyloid structures over extended times by using transient binding of single thioflavin T molecules. ChemBioChem 19, 1944–1948 (2018). 81. Chung, K. K. H. et al. Fluorogenic DNA-PAINT for faster, low-background super-resolution imaging. Nat Methods 19, 554–559 (2022).
223
82. van Wee, R., Filius, M. & Joo, C. Completing the canvas: advances and challenges for DNA-PAINT superresolution imaging. Trends Biochem Sci 46, 918–930 (2021). 83. Ham, T. R., Collins, K. L. & Hoffman, B. D. Molecular tension sensors: moving beyond force. Curr Opin Biomed Eng 12, 83–94 (2019). 84. Gräwe, A. & Stein, V. Linker engineering in the context of synthetic protein switches and sensors. Trends Biotechnol 39, 731–744 (2021). 85. Corrie, J. E. T., Craik, J. S. & Munasinghe, V. R. N. A homobifunctional rhodamine for labeling proteins with defined orientations of a fluorophore. Bioconjug Chem 9, 160–167 (1998). 86. Griffin, B. A., Adams, S. R. & Tsien, R. Y. Specific covalent labeling of recombinant protein molecules inside live cells. Science 281, 269–272 (1998). 87. Hershko, E., Weiss, L. E., Michaeli, T. & Shechtman, Y. Multicolor localization microscopy and pointspread-function engineering by deep learning. Opt Express 27, 6158 (2019). 88. Chang, J. & Wetzstein, G. Deep optics for monocular depth estimation and 3D object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision 10193–10202 (2019). 89. Goldenberg, O., Ferdman, B., Nehme, E., Ezra, Y. S. & Shechtman, Y. Learning optimal multicolor PSF design for 3D pairwise distance estimation. Intelligent Computing 2022, (2022).
Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH)
13
Joseph Rosen, Nathaniel Hai, and Angika Bulbul
Abstract
Digital holography has long been known for its beneficial capabilities in optical imaging, especially as a three-dimensional and phase imaging enabler. In the modern era, when electro-optical and computational resources are almost endless, introducing coded masks for practical beam modulation is an important milestone in the field. This chapter surveys the evolution of correlation holography from the early stage of Fresnel incoherent systems to mature, general, coded aperture systems. By surveying various applicationdriven designs of coded mask-aided imaging systems, we highlight the significance of modularity in this ongoing research. The ability to adapt the presented methodology to solve current challenges, a capacity granted by the coded masks, is shown to impact the broader field of optical imaging. As a case study, we examine the combination of two coded correlation holographic modalities to address a well-known bottleneck in these depth-resolving imaging systems. Violation of the Lagrange invariant in the early versions is successfully maintained in the later, chaotic mask implementation to provide an uncompromised super-resolution technique. Hopefully, this and other implementations J. Rosen () · N. Hai · A. Bulbul School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel e-mail: [email protected]; [email protected]
mentioned in a nutshell within this chapter provide concrete evidence that coded mask systems are a key player in the landscape of digital holography and essential for further development of the field. Keywords
Holography · Digital holography · Incoherent holography · Imaging systems · Computer holography · Coded aperture · Diffraction and gratings · Diffractive optics · Optical microscopy · Fresnel incoherent correlation holography · Digital holographic microscopy · Phase-shifting interferometry · Computational imaging
13.1
Introduction
An appropriate introduction to the topic of holography by coded apertures is to put the methods described in the chapter in the right historical context. Coded aperture correlation holography (COACH), mentioned in the chapter title, was proposed as a new technique of incoherent digital holography [1], and hence, we open this chapter with a brief history of holography [2], digital holography [3], and incoherent holography [4] (Fig. 13.1). Because of space limitations, we briefly mention only the events in the history of holography that are relevant to the topic of COACH. Since many holograms in the past and
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_13
225
226
J. Rosen et al.
Coherent Light
Incoherent Light
Young Experiment (1803) [5] Gabor Hologram (1948) [6] Leith-Upatnieks Off-Axis Hologram (1962) [8] Vander Lugt Fourier Hologram (1964) [21]
Self-Interference Holograms (19651985) [10-17]
Optical Hologram Digital Hologram
Goodman-Lawrence OffAxis Digital Hologram (1967) [19]
Optical Scanning Hologram (1985) [24-26]
Yamaguchi-Zhang On-Axis Digital Hologram (1997) [22] Multiple View Projection Hologram (2001) [27-29] Fresnel Incoherent Correlation Holography (FINCH 2007) [30] Fourier Incoherent Single-Channel Holography (FISCH 2012) [35] Coded Aperture Correlation Holography (COACH 2016) [61]
Interferenceless Coherent COACH (2019) [88] Coherent Sparse COACH (2020) [92]
Interferenceless COACH (2017) [64]
Coded Aperture with FINCH Intensity Responses (CAFIR 2021) [99]
Fig. 13.1 Scheme of holography history as described in the text. The blue arrows indicate the flow and influence of the various ideas
today have been recorded as the result of interference between two light waves, wave interference is the natural starting point. The phenomenon of optical two-wave interference has been well
known since the first decade of the nineteenth century when Thomas Young published his famous double-slit experiment [5]. In Young’s experiment, interference occurs between two waves
13 Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH)
where none of them contains any image information. Therefore, Young’s experiment produces an interference pattern between two light waves, but this pattern is not considered a hologram. The revolutionary transition from Young’s interference pattern to a hologram occurred in 1948 in Dennis Gabor’s pioneering work presenting for the first time what is known today as the Gabor hologram [6]. This and similar holograms are recorded by two-wave interference between a wave carrying the object information and another wave called a reference wave, which does not contain any object information. However, the reference wave in the Gabor hologram passes through the observed object before the interference pattern between the beams is recorded on the photographic plate [6]. This type of hologram in which light from the object is used as a reference beam (although it does not contain any image information of the object but only the image background), is called a self-reference hologram [7]. Another distinct feature of the Gabor hologram, in contrast to the Young experiment, is the zero angle between the two interfering beams. A holographic recording system in which there is no angle between the reference and image beams is called an on-axis system. The Gabor hologram is also classified as a spatially coherent hologram because the light source illuminating the object is a point-like source. Holography, in general, is classified into coherent and incoherent holography depending on the light nature used for object illumination. Wave interference can be easily achieved with coherent light beams, but many imaging tasks are widely applicable only under incoherent illumination. In general, imaging systems under incoherent illumination have a frequency response called the modulation transfer function, with a larger spatial bandwidth than coherent systems with the same aperture dimensions [5]. Hence, the incoherent image usually has a higher image resolution than the coherent image. From now on, unless something else is explicitly said, “incoherent light” throughout this chapter refers to quasi-monochromatic spatially incoherent light. The next historical milestone in holography is the off-axis hologram proposed by Leith and Up-
227
atnieks in 1962 [8]. The recording configuration of this hologram is characterized by a nonzero angle between the image and reference beams, and consequently, the twin-image problem of the Gabor hologram has been solved. The twin-image problem is the inability to extract the desired component representing the required image out of four components recorded on the raw hologram [9]. Because the twin-image problem is no longer a problem, the image of the observed object can be reconstructed from the off-axis hologram by illuminating it with a reference beam, and this image can be viewed clearly without interruptions by other light waves. The off-axis hologram is not a self-reference one, and by that aspect, it also differs from the Gabor hologram. In the aspect of spatial coherence of the illumination, the off-axis hologram is similar to Gabor’s – they are both considered spatially coherent holograms. The transition from the Gabor hologram to the off-axis hologram was easier with the invention of the laser with its relatively high temporal coherence since the optical path difference between the object and reference beams is not restricted as it is in the Gabor hologram. Incoherent holograms have appeared since the mid-sixties, and all of them were based on different implementations of the self-interference principle [10–18]. The self-interference principle means that the light from each object point splits into two waves modulated differently before creating an interference pattern on the recording plane. According to this definition, a self-interference hologram is also a selfreference hologram because both interfering beams come from the same object. However, unlike self-interference, in a self-reference hologram, the reference beam does not contain image information. Under the self-interference principle, Bryngdahl and Lohmann suggested sorting interferometers for recording incoherent holograms into two types [15]. The first is radial shear, in which the observed image is replicated into two replications with two different scales. The other type is rotational shear, in which the observed image is also replicated into two versions, but in this case, one replication is rotated by some angle relative to the other
228
replication. The entire holograms recorded using the self-interference principle are the stage in the evolutionary chain of holography in which both interfering waves carry the object’s image. As discussed in the next section, this new stage has practical meaning; under certain conditions, the self-interference principle leads to the violation of the Lagrange invariant [5], leading to better image resolution. The next significant event in hologram history occurred in 1967 with the invention of the digital hologram by Goodman and Lawrence [19]. Digital holography is an indirect imaging technique where holograms are first acquired using a digital camera, and then the image is reconstructed digitally by a computational algorithm [1, 3]. Thus, digital holography is a two-step process that has some advantages over regular digital imaging. For example, a hologram can contain depth information of three-dimensional (3D) objects utilizing phase information encoded in the interference patterns between an object and the reference beams [1, 3]. Other useful information recorded on a hologram might be the wavefront shape of the wave passing through the object, enabling quantitative phase imaging (QPI) [20]. The first digital hologram was coherent and recorded on a digital camera by an off-axis setup [19]. Another notable difference between this new digital hologram and those mentioned above is the transformation between the complex amplitudes on the object and the hologram planes. The two-dimensional (2D) Fourier transform was the transformation from the object to the hologram planes in the case of the Goodman-Lawrence hologram, thus indicating the type of the hologram as a Fourier hologram. An optical (nondigital) Fourier hologram was proposed a few years before by Vander Lugt [21]. In 1997, Yamaguchi and Zhang recorded on-axis digital holograms in which the twinimage problem was solved by recording four different holograms of the coherently illuminated object and processing them in the computer in a procedure called phase shifting [22]. The transformation between the object and camera planes in the Yamaguchi-Zhang system follows Fresnel free-space propagation, and hence,
J. Rosen et al.
this digital hologram is considered a Fresnel hologram [6, 8, 23]. In the field of incoherent digital holography, technology evolved to unexpected solutions. The minimal number of camera shots, one in the Goodman-Lawrence hologram [19] and four in the Yamaguchi-Zhang technique [22], was replaced by scanning techniques that do not make use of the self-interference principle. Under scanning techniques, there are two main methods of recording incoherent digital holograms of a general 3D scene. The more well-known method has been optical scanning holography [24–26], in which the 3D object is scanned by an interference pattern between two spherical waves, and the reflected light is summed into a point detector. In optical scanning holography, the wave interference is between two spherical waves, neither of which carries any image. Moreover, the interference pattern is not recorded but is used as a detector of the object points’ depth. The other scanning technique was implemented without wave interference and has been a more computer-aided method in which the hologram is generated from multiple view projections of the 3D scene [27–29]. Both methods are based on different processes of time-consuming scanning of the observed scene to yield a 2D correlation between an object and a 2D quadratic phase function. The next landmark has shown that the required 2D correlation in Refs. [24–29] can be performed without scanning. Fresnel incoherent correlation holography (FINCH), published in 2007 [30], was a return to the principle of selfinterference and was proposed as an alternative to the scanning-based holography methods mentioned above. Following the first FINCH, many other incoherent digital holograms have been proposed, and most of them are based on the self-interference principle [31–59]. Fourier incoherent single-channel holography (FISCH) [35] is a typical example of using the selfinterference principle, but the obtained hologram, in this case, is a 2D cosine Fourier transform of the object. As mentioned above, in the entire holograms recorded using the self-interference principle, both interfering waves carry the
13 Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH)
object’s image. However, the image information is never the same in both interferometer channels. In FINCH, the images are in-focus at different distances from the aperture, while an infinite distance is also legitimate. In FISCH, one image is rotated by 180◦ around the origin of the image plane relative to the other image. In terms of the Bryngdahl-Lohmann analysis, FINCH is radial shear, and FISCH is rotational shear. An exceptional example of an incoherent digital hologram based on the self-reference rather than the self-interference principle was proposed by Pedrini et al. [60], but the energetic inefficiency of this hologram recorder has probably prevented further developments in this direction. COACH [61–63] is a new evolutionary stage in which one of the two replicated objects’ images passes through a coded scattering mask, resulting in the camera plane being a convolution of the image with some chaotic function. The other image is in focus at an infinite distance from the aperture. According to the classifications mentioned above, COACH is radial shear and belongs to incoherent self-interference onaxis digital holography. A significant difference between COACH and Fresnel holograms is in the image reconstruction process. In the Fresnel case, the image at a distance of z is reconstructed by a correlation between the hologram and a quadratic phase function parameterized with z. On the other hand, in COACH, the 3D image is reconstructed by a correlation between the hologram and a library of point responses acquired in the system calibration. From the COACH stage, the technology surprisingly evolved to a system without twowave interference following the discovery that 3D holographic imaging could be achieved with a single beam configuration. The interferenceless COACH (I-COACH) [64] has been found to be simpler and more efficient than the COACH with two-wave interference. I-COACH is considered a digital hologram because the digital matrix obtained from the observed scene contains the scene’s 3D information, and the 3D image is reconstructed from the digital matrix in a similar way as done with interference-based digital holograms. Although there is no two-wave interference in I-COACH, it is classified as on-axis
229
digital holography because the recording setup contains components that are all arranged along a single longitudinal axis. Using the interferenceless version of COACH has enabled adapting concepts from coded aperture imaging by X-ray [65, 66], in which the observed image is replicated over a finite number of randomly distributed points. In other words, the point response of the system has been modified from the continuous chaotic light distribution [61–64] to a chaotic ensemble of light dots [67]. Moreover, by integrating concepts from optical pattern recognition [68, 69], the process of correlation-based image reconstruction has been modified to what is known as nonlinear image reconstruction [70]. Because of the two modifications, the modified impulse response and the change in the reconstruction process, I-COACH’s signal-to-noise ratio (SNR) has been improved significantly [71]. Other imaging properties, in addition to SNR, have also been treated in the framework of COACH research. The image resolutions of I-COACH have been improved by several different techniques [71–73]. Field-of-view (FOV) extension in I-COACH systems has been addressed in [74] by a special calibration procedure. Ideas adapted from axial beam shaping have enabled engineering the depth-offield (DOF) of an I-COACH system [75]. Sectioning the imaging space, or in other words, removing the out-of-focus background from the resulting picture, was demonstrated by point spread functions of tilted pseudo-nondiffracting beams in I-COACH [76]. Color imaging using various I-COACH systems has been treaded in I-COACH [77] and in a setup with a quasi-random lens [78]. COACH can implement several applications in addition to the initial and widely used application of 3D holographic imaging. For example, noninvasive imaging through scattering layers [79] can be more efficient if the light emitted from the scattering layer is modulated by a phase aperture, as demonstrated in [80]. Another application is imaging by telescopes with an annular aperture, which is a way to reduce the weight of spacebased telescopes [81]. The images produced by such telescopes might be clearer and sharper using COACH [82]. Imaging with a synthetic aperture system is another example that enables better
230
image resolution without changing the physical size of the optical aperture [83]. COACH can image targets with an incoherent synthetic aperture with the advantage that the relatively small apertures move only along the perimeter of the relatively large synthetic aperture [84]. Although interferenceless imaging systems are simpler and more power efficient than systems with wave interference, the latter systems still have an important role in the technology, and the annular synthetic aperture [84, 85] is an example of using two-wave interference between beams reflected from a pair of sub-apertures located along the aperture perimeter. More details about these advances and others of COACH and I-COACH can be found in two review articles [86, 87]. The scheme of Fig. 13.1 summarizes the holography history as described above, where the blue arrows indicate the flow and influence of the various ideas. The next natural step was to explore the new COACH concept in the area of coherent holography. In addition to 3D imaging, QPI is another main application for coherent holography. 3D imaging under coherent light using I-COACH was demonstrated in Ref. [88] but without phase imaging capability. QPI could not be performed by I-COACH, but various ways to implement QPI using phase apertures with [89] and without [90] two-wave interference and with [91] and without [92] using self-reference holography were found. Specifically, COACH’s concepts have been integrated into a Mach–Zehnder interferometer [92] with the benefit of a broader FOV than a conventional QPI interferometer. A closely related technique of QPI is wavefront sensing, where a COACH-based Shack–Hartmann wavefront sensor was proposed recently [93] with the advantage of higher accuracy over the conventional Shack– Hartmann wavefront sensor. The development of holography has not ended, and from time to time, a new improvement is being published, so this chapter is only an interim summary of the field. However, the rapid development of COACH and other methods of phase aperture digital holography in incoherent and coherent optics might make this review a useful source for the holography community. We review the recent developments of COACH in
J. Rosen et al.
two main sections. The section following this introduction is dedicated to a system recording incoherent self-interference holograms in general and the integration of FINCH and COACH in particular. The following section is divided into three parts, starting with FINCH, continuing to COACH, and finishing with a particular version of a hybrid system called CAFIR, which is the acronym for coded aperture with FINCH intensity responses.
13.2
Fresnel Incoherent Correlation Holography (FINCH)
COACH [61] was introduced in 2016, nearly a decade after FINCH [30], as an attempt to generalize the phase aperture function of the FINCH system. To further understand the transition from FINCH to COACH and because FINCH is a special case of the general COACH, we briefly summarize FINCH, while a more detailed analysis can be found in [33, 36, 58]. FINCH is a single-channel imaging technique to capture 2D digital holograms of incoherently illuminated 3D scenes based on the self-interference principle. In FINCH, light emitted from an object point splits into two spherical waves, each of which propagates through a different quadratic phase mask. The interference pattern between the two beams is recorded by a digital camera such that the sum of the entire interference patterns from all the object points is the Fresnel digital hologram of the object. Since its invention, FINCH has undergone several improvements and optimizations, which has led the technique to a point where it violates the Lagrange invariant property of imaging systems [94–96]. Thus, FINCH inherently has an improved imaging resolution compared with an equivalent standard system having a similar numerical aperture (NA). Because of violating the Lagrange invariant, reconstructed images in FINCH can exhibit approximately 2 and 1.5 times better resolving power than the equivalent coherent and incoherent imaging systems, respectively. However, FINCH exhibits lower axial resolution than direct imaging [63] and COACH.
13 Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH)
The optimal scheme of FINCH [33] is shown in Fig. 13.2a, in which a wave from a point object is collimated by lens Lo and modulated by two different diffractive lenses multiplexed on the same physical component. Consequently, two spherical waves are generated such that one diverges from, and the other converges toward, two duplicated image points of the same object point. The two spherical waves interfere at the hologram plane, where we assume that they are as close as possible to a full overlap of their projected spots. This interference intensity is described as follows: iπ (zh −z1 ) x0 2 IH (r) = I1 exp x+ λz1 f0 . (zh −z1 ) y0 2 + y+ f0 −iπ (zh + z2 ) x0 2 + I2 exp x+ λz2 f0 2 (zh + z2 ) y0 2 + y+ f0
.
iπ I1 I2 exp D|r|2 λ |r 0 |2 2D (r 0 · r) zh +B 2 + + C.C. , f0 f0 = I1 + I2 +
.
(13.1) where D = 1/z1 + 1/z2 and B = [(zh − z1 )2 /z1 ] + [(zh + z2 )2 /z2 ]. .r=(x,y) is the coordinate vector of the hologram plane, and .r o =(x0 ,y0 ) is the vector of the off-axis displacement in the object plane. fo is the focal length of the lens Lo , λ is the illumination central wavelength, zh is the gap between the phase mask and the camera, I1 and I2 are the intensities of the two duplicated image points, and z1 and z2 are the distances from the hologram plane to the two duplicated image points. C.C. stands for the complex conjugate term. Usually, the reconstruction process in FINCH is carried out by extracting one of the cross-terms (also known as an interference term)
231
from the intensity of Eq. (13.1) using a phase-shift procedure, followed by Fresnel backpropagation to the best in-focus plane. From the geometry of Fig. 13.2a, it follows that under the condition of zh = 2z1 z2 /(z2 − z1 ), called the overlap condition, there is a perfect overlap between the two interfering beams [33]. Based on Eq. (13.1), the reconstructed off-axis point is positioned at a distance zr = 1/D = z1 z2 /(z2 + z1 ) from the hologram at a height of .|r 0 | zh /fo . Therefore, the transverse magnification of the complete system is MT = zh /fo . On the other hand, the size of any reconstructed point is dependent on the NA and the magnification of the imaging system. To calculate the magnification of a single reconstructed on-axis point, we take the product of the magnification of the first imaging system, (zh − z1 )/fo , and the magnification of the hologram reconstruction zr /z1 . Substituting the ratio z1 /z2 obtained from the abovementioned overlap condition and the distance zr into the overall point magnification Mo = [zr /z1 ][(zh − z1 )/fo ] gives the result that the magnification of a single reconstructed on-axis point is Mo = zh /(2fo ). Therefore, when two points are imaged by a direct imaging system and by FINCH with the same NA and magnification, the size of the points is the same in the two systems, but the separation between the points is doubled in FINCH in comparison to direct imaging. The ratio MT /Mo is 1 in direct imaging systems and in all the systems that satisfy the Lagrange invariant, but MT /Mo = 2 in FINCH with the overlap condition. In other words, the image resolution of FINCH is inherently better than any direct imaging system with the same NA. However, the inherent nonclassical superior transverse resolution capability of FINCH comes at the price of low axial resolution [63].
13.3
Coded Aperture Correlation Holography (COACH)
In 2016, the research question was about the possibility of using a chaotic phase aperture instead of the quadratic phase function of FINCH and still recording incoherent holograms of an arbitrary 3D scene. Of course, after recording
232
J. Rosen et al.
(a)
Quadratic Phase Masks
z1
Lens Lo
zh
fo (c)
z2
(b)
Lens Lo
Digital Camera
Coded Phase Mask
Coded Phase Mask
Coded Phase Mask
(d)
fo
z1
Lens Lo
Lens Lo
zh
Digital Camera
fo
Digital Camera
zh
fo
zh
z2
Digital Camera
Fig. 13.2 Schemes of the optical configurations of (a) FINCH, (b) COACH, (c) sparse I-COACH and (d) CAFIR. Solid and dashed lines illustrate the optical path of the marginal rays emitted from a single object point.
Each coded phase mask in (a), (b) and (d) contains two diffractive optical elements, such that different waves are diffracted from each of them; one is denoted by a solid line and the other by a dashed line
holograms, one needs to find a way to reconstruct 3D images from these holograms in an acceptable quality. The answer was COACH, in which the emitted light from every object’s point splits into two beams. One of the beams is modulated by a chaotic coded phase mask (CPM), and the other propagates without modulation. The two mutually coherent waves interfere on the camera plane, where the recorded hologram is obtained as the accumulated interference patterns contributed from the entire object points. The recorded holograms are digitally processed to reconstruct a 3D image of the 3D object. The image reconstruction is enabled by a one-time guide star calibration process, in which different point spread holograms (PSHs) of the system are recorded for a point object placed at different axial planes along the range of interest. In the reconstruction process, all the PSHs are crosscorrelated with the recorded object hologram, and the order of the PSH that reconstructs the best in-focus image indicates the axial location of the object. Of course, a successful reconstruction is achieved if and only if the same CPMs are used in both the calibration stage and object hologram recording.
Since its invention, COACH has evolved in several directions [62–64, 67, 70–75]. In the first COACH configuration shown in Fig. 13.2b, one of the two apertures of the system is a chaotic phase function, and the other is an open nonmodulating aperture. In this COACH, the intensity distribution at the camera plane in response to a point in the origin of the input plane takes the form of interference between two types of beams: one, which is modulated by the CPM displayed on the spatial light modulator (SLM), and another, which is not affected by the SLM and remains collimated. Hence, the intensity on the camera plane is I (r; θk ) = |A + H (r) exp (iθk )|2 ,
. C
(13.2)
where A is a constant representing the collimated wave and H(.r) is the wave arriving at the camera and is modulated by the CPM. To extract the interference term A* H(.r) from the pattern of Eq. (13.2), the modulated signal is further multiplied by a phase constant three times (θ k = [0, 2π/3, 4π/3]) in a standard on-axis phase-shifting procedure [22, 30]. H(.r) is the system’s PSH containing information on the lateral and axial
13 Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH)
distribution of the point response. The intensity of an incoherently illuminated 2D object can be regarded as a collection of laterally shifted points as follows, O(.r) = i ai δ(.r-.r i ), with different intensities ai . Since the light source is spatially incoherent, interference occurs only between waves emitted from the same object point. Based on the shift-invariance property of the system, the cross-term of the interference after the phaseshift procedure, in the case of the input object O(.r), is i Ai * H(.r-Mt.r i ). Ai is a constant of the ith point response, and Mt = zh /fo is the lateral magnification. This sum contains the information of the 3D scene and therefore is termed the object hologram. To reconstruct the image of the object, the object hologram should be cross-correlated with the PSH, as follows:
.IO
(ρ) =
A∗i H (r − Mt r i )
i
×H ∗ (r − ρ) dρ ≈
A∗i (ρ − Mt r i ) ∝ O (ρ/Mt ) ,
i
233
used to improve the features of imagers and to implement various types of COACH systems suitable for different imaging applications. Higher SNRs in I-COACH systems [67], improved spatial [71–73] and temporal [70, 97] resolutions, extended FOV [74], and partial aperture imaging with improved characteristics [82] are a few examples of CPM engineering to enhance specific imaging qualities. The common concept of some of these systems is the use of sparsely and randomly distributed focal points over the hologram plane as the response to a point in the system input, as shown in Fig. 13.2c. Synthesizing the CPM is carried out by a modified Gerchberg and Saxton algorithm [98], in which the intensity distribution on the camera is forced to some pattern of dots, and the phase distribution is used as the degree of freedom. On the CPM side, the constraint is dictated by the nature of the phaseonly SLM on which the CPM is displayed. The iterative transfer back and forth from the CPM to the camera planes is usually done by Fourier and inverse Fourier transforms.
(13.3) where is a δ-like function, approximately equal to 1 around the origin and small negligible values elsewhere. Since the beam modulated by the CPM is diffracted to a chaotic pattern, the crosscorrelation of Eq. (13.3) is sensitive to axial translations of the object point. Therefore, COACH has inherent 3D holographic imaging capability. All the PSHs are cross-correlated with the recorded object hologram, and the axial value of the PSH that reconstructs the best in-focus image indicates the axial location of the object. Importantly, the lateral and axial resolutions of a regular COACH system [61] are determined by the corresponding correlation distances. Since these distances are governed by the system aperture size, the resolving capability of COACH is equivalent to a direct imaging system with the same NA. Consequently, the COACH framework is expected to have a better axial and worse lateral resolution than FINCH. An important advantage of COACH systems is the capability to engineer the CPM according to the desired applications. This capability has been
13.4
FINCH-COACH Hybrid System
The main benefit of sparse COACH is the ability to control the SNR and the visibility of the reconstructed image through the sparsity and complexity of the PSHs [67, 88]. However, in Ref. [99], we employ the sparse response of COACH to merge the imaging merits of FINCH and COACH into a single holographic system. A COACHFINCH hybrid system was already demonstrated in the past [63]. Nonetheless, in Ref. [63], the combination of the two methods was achieved by a weighted sum of the two characteristic masks, diffractive lens for FINCH and CPM for COACH, into a single phase-only mask so that the resolution values of the hybrid system have been averaged. As discussed next, in the apparatus of Ref. [99] shown in Fig. 13.2d, the combination is done by granting the sparse COACH a response of a FINCH-type self-interference mechanism so that neither of the resolution types is compromised. In other words, we deal with a recently proposed
234
imaging method that integrates advantages from both FINCH and COACH techniques, such that this hybrid system has the improved lateral resolution of FINCH with the same axial resolution of COACH. The study in Ref. [99], which is reviewed in the following, presents an improved version of COACH termed coded aperture with FINCH intensity responses (CAFIR) with a better axial resolution than FINCH and better lateral resolution than the regular COACH. CAFIR is an integration of FINCH and COACH in the sense that the point response of CAFIR is a mix of the point responses of FINCH and COACH. On the one hand, the point response of COACH is an ensemble of randomly distributed dots [67, 88], and on the other hand, the point response of FINCH is the interference between two different spherical waves [37]. Hence, the integration of FINCH and COACH means that the impulse response of CAFIR is an ensemble of randomly distributed patterns of interference between two different spherical waves. This integration guarantees violating the Lagrange invariant on the one hand and the randomness of the point response guarantees the same depth of focus as COACH on the other hand. Thus, CAFIR preserves the lateral resolution of FINCH and the axial resolution of COACH. One way to address the poor axial resolution of FINCH is to integrate the chaotic CPM of COACH such that it still guarantees the violation of the Lagrange invariant in a way that improves the lateral resolution. Since the PSH of sparse COACH is composed of randomly distributed, diffraction-limited focal points, one can think of the CPM as a random ensemble of spatially translated FINCH-type diffractive lenses having a focal length of zh -z1 . Next, we update the constraint matrix to contain the same group of randomly distributed points, all of which are focused on a plane at a distance zh + z2 from the SLM. Effectively, another ensemble of translated FINCHtype diffractive lenses with different focal lengths is added, which is coaxial with the previous ensemble. To achieve FINCH-type interference, the image sensor should be placed at a distance of zh from the SLM such that a full overlap between
J. Rosen et al.
the multiple spots of the two series of interfering spherical waves occurs. This description is shown schematically in Fig. 13.2d, and the intensity response to a point source located at the front focal plane of the glassy lens out of the optical axis at .r o =(x0 ,y0 ) can be described as. N I1,j exp iπ x − xj j =1 λz1 2 (zh −z1 ) x0 (zh −z1 ) y0 2 + + y−yj + fo fo ICA (r) =
.
.
(zh + z2 ) x0 2 x − xj + fo 2 (zh + z2 ) y0 2 + y − yj + fo
−iπ + I2,j exp λz2
(13.4) where .r j = (xj ,yj ) is the translation of the j-th two focal points, one at z1 from and in front of the camera plane and the other at z2 from and beyond the camera plane. N is the number of randomly distributed focal points at each focal plane. In Eq. (13.4), we assume that only the pairs of dots from coaxial diffractive lenses create interference patterns, and the interferences between non-coaxial waves are negligible, an assumption that can be guaranteed by increasing the space between neighboring dots. By capturing three intensity patterns with different phase shifts between the two sets of dots, one can extract the cross-terms of Eq. (13.4) given by
.
2 iπ D r − rj j =1 λ 2 r 0 · r − r j zh D |r 0 |2 +B 2 + , fo fo
h (r) = Co
N
exp
(13.5) where Co is a complex constant, D and B are the same constants as are given below Eq. (13.1). Note that Eq. (13.5) highlights that the inherent superior lateral resolution of FINCH is preserved in CAFIR as long as the full overlap condition
13 Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH)
between the two interfering spherical beams is satisfied for all the N pairs at the image sensor plane. Incoherent imaging systems are linear with respect to the wave’s intensity. Therefore, upon placing an object at the front focal plane of the glassy lens, the pattern captured by the image sensor is the 2D convolution of the object intensity function O(.r) = i ai δ(.r-.r i ) with the system’s point spread function of Eq. (13.5). Applying the phase-shift procedure on three recorded patterns yields the interference term given by the convolution H(.r) = O(.r/Mt )∗ h(.r), where the sign ‘∗ ’ stands for 2D convolution. H(.r) is regarded as the complex-valued object hologram. Reconstruction of the object’s image is done by a crosscorrelation between H(.r) and h(.r) as follows:
235
almost constant Fourier magnitude, which can be approximated more easily if the focal points of the PSH are randomly distributed. Similar to conventional COACH systems, the capability of 3D reconstruction is achieved by cross-correlating the object hologram with the reconstructing function h(.r) corresponding to the desired transverse plane. By using Eq. (13.6) for this task, the proposed CAFIR system is expected to have an axial resolution that is similar to that of the conventional direct imaging system, together with the superior lateral resolution of FINCH. The experimental results are shown in Fig. 13.3. Raw CAFIR PSHs shown in Fig. 13.3a1 – a3 and CAFIR object holograms shown in Fig. 13.3b1 -b3 are captured with three phases of ·1,2,3 = 0, 2π/3, and 4π/3. The same set of six raw holograms of FINCH are shown in Fig. .Iimg = [O (r/Mt ) ∗ h (Mt )] ⊗α [h (r)] 13.3c1 -c3 for PSHs and in Fig. 13.3d1 -d3 for the object holograms. The phase and magnitude of the processed complex-valued holograms −1 . = F F {O (r/Mt )} · F {h (r)} · |F {h (r)}|α resulting from the phase-shifting procedure are shown in Fig. 13.3a4 , a5 , b4 , b5, c4 , c5 , d4 , d5 ×exp −i· arg (F {h (r)}) for PSHs and object holograms and for CAFIR (13.6) and FINCH, respectively. For comparison, the direct image obtained from an equivalent system with the same NA is shown in Fig. 13.3e. The . = F−1 {F {O (r/Mt )} · |F {h (r)}| image of CAFIR was reconstructed by cross ×exp i· arg (F {h (r)}) · |F {h (r)}|α correlating the complex-valued object hologram and phase-only filtered (POF) version of the PSH ×exp −i· arg (F {h (r)}) (α = 0), as shown in Fig. 13.3f. The reconstructed FINCH image shown in Fig. 13.3g was obtained by back-propagating the complex-valued FINCH −1 . = F F {O (r/Mt )} · |F {h (r)}|1+α hologram to the in-focus plane. ≈ O (r/Mt ) , The image resolution can be evaluated by comparing the visibility along one of the grating where .F and .F−1 are Fourier and inverse Fourier cross-sections in the images of Fig. 13.3e–g. transforms, respectively. ⊗α denotes the 2D linear Figure 13.4 shows plots of such cross-sections for cross-correlation that follows the the direct image, FINCH and CAFIR presented with a function relation:.F−1 |F {h (r)}|α exp i· arg (F {h (r)}) . in Fig. 13.3e–g. We can observe that FINCH The regularization parameter α is a real number and CAFIR have better visibility than direct between −1 (inverse filter) and 1 (matched imaging. However, the visibility of FINCH is filter), which is chosen by an optimization slightly better than CAFIR. The axial resolution process to suppress the noise in the reconstructed of CAFIR in comparison to FINCH and direct image. The approximation of the last line imaging was studied next. The plots in Fig. 13.5 of Eq. (13.6) is valid more for h(.r) having for a reconstructed point object are depicted for CAFIR and FINCH along the z-axis with a 1 mm
236
J. Rosen et al.
Fig. 13.3 (a1 -a3 ), (b1 -b3 ) and (c1 -c3 ), (d1 -d3 ) recorded intensity for point object (h) and object (H), respectively, with · = 0, 2π/3 and 4π/3. (a4 , a5 ), (b4 , b5 ), (c4 , c5 ), and (d4 , d5 ), phase and magnitude of superimposed PSH and object holograms for CAFIR and FINCH, respectively. (e) Direct imaging, (f) POF reconstructed image for CAFIR
(g) FINCH reconstructed image. (Adapted under a Creative Commons Attribution 4.0 International License from Optica Publishing Group: Optics Express, Coded Aperture Correlation Holography (COACH) with a Superior Lateral Resolution of FINCH and Axial Resolution of Conventional Direct Imaging Systems, Bulbul, A., © 2021)
Fig. 13.4 Average cross-sections of gratings of direct image and reconstructed images with FINCH and CAFIR of Fig. 13.3e–g. (Adapted under a Creative Commons Attribution 4.0 International License from Optica Publish-
ing Group: Optics Express, Coded Aperture Correlation Holography (COACH) with a Superior Lateral Resolution of FINCH and Axial Resolution of Conventional Direct Imaging Systems, Bulbul, A., © 2021)
step and up to 51 mm. The PSHs were recorded by placing a 25-micron pinhole at different axial distances and then cross-correlated with a PSH at the central position. The plot of direct imaging was obtained by moving the camera along the z-axis and measuring the direct image for each z
location. Based on Fig. 13.5, one can conclude that the depth of field of CAFIR is similar to that of direct imaging, whereas the axial resolution of FINCH is poorer than that of the other two imaging methods.
13 Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH)
Fig. 13.5 Maximum intensity of point-object reconstructions along the axial distance from the front focal plane of the glassy lens in mm. (Adapted under a Creative Commons Attribution 4.0 International License from Op-
13.5
Summary
To conclude, in this chapter, CAFIR is a combination of FINCH and COACH to maximize the advantages of both. CAFIR incorporates the superior axial resolution of COACH with the superior lateral resolution of FINCH. CAFIR can violate the Lagrange invariant like FINCH in a way that the transverse magnification of the gap between every two points is up to twice as large as the magnification of each point. This kind of violation is the reason for the enhanced lateral resolution of FINCH and CAFIR over regular COACH and direct imaging. In terms of the axial resolution, the experimental investigation indicates that CAFIR preserves the axial resolution of direct imaging, which is higher than that of FINCH. The CAFIR system has additional advantages along with the great advantage of enhanced lateral resolution with optimal axial resolution. The cross-correlation with the optimal power value of α helps to improve the reconstructed image significantly. This study shows that sometimes the combination of methods can yield a system with the best features of the combined methods instead of their mean features. CAFIR, created
237
tica Publishing Group: Optics Express, Coded Aperture Correlation Holography (COACH) with a Superior Lateral Resolution of FINCH and Axial Resolution of Conventional Direct Imaging Systems, Bulbul, A., © 2021)
under the inspiration of FINCH, COACH, and ICOACH, as seen in Fig. 13.2, is evidence that different independent ideas can be combined into one system with superior features. It is also an example that the flow map of ideas such as the one in Fig. 13.1 can be important for the search for the next efficient solution for the desired application.
References 1. T. Tahara, Y. Zhang, et al: Roadmap of incoherent digital holography, Appl. Phys. B 128, 193 (2022) 2. J. T. Sheridan, R. K. Kostuk, et al: Roadmap on holography, J. Opt.22 123002 (2020) 3. B. Javidi, A. Carnicer, A. et al: Roadmap on digital holography, Opt. Express 29, 35078-35118 (2021) 4. J. Braat, S. Lowenthal: Short-exposure spatially incoherent holography with a plane-wave illumination, J. Opt. Soc. Am. 63, 388-390 (1973) 5. M. Born, E. Wolf: Principles of Optics, 7th ed. (Cambridge Univ. Press, Cambridge 2019) 6. D. Gabor: A new microscopic principle, Nature 161, 777-778 (1948) 7. L. Ma, Y. Wang: Self-reference hologram, Proc. SPIE 6837, 68371G (2008) 8. E. Leith, J. Upatnieks: Reconstructed Wavefronts and Communication Theory*, J. Opt. Soc. Am. 52, 11231130 (1962)
238 9. E. Stoykova, H. Kang, J. Park: Twin-image problem in digital holography-a survey (Invited Paper), Chin. Opt. Lett. 12, 060013 (2014) 10. A. W. Lohmann: Wavefront reconstruction for incoherent objects, J. Opt. Soc. Am.55, 1555–1556 (1965) 11. G. W. Stroke, R. C. Restrick: Holography with spatially noncoherent light, Appl. Phys. Lett. 7, 229-231 (1965) 12. G. Cochran: New method of making Fresnel transforms, J. Opt. Soc. Am. 56, 1513–1517 (1966) 13. P. J. Peters: Incoherent holography with mercury light source, Appl. Phys. Lett. 8, 209–210 (1966) 14. H. R. Worthington: Production of holograms with incoherent illumination, J. Opt. Soc. Am. 56, 1397– 1398 (1966) 15. O. Bryngdahl, A. Lohmann: Variable magnification in incoherent holography, Appl. Opt. 9, 231–232 (1970) 16. A. Marathay: Noncoherent-object hologram: its reconstruction and optical processing, J. Opt. Soc. Am. A 4, 1861-1868 (1987) 17. G. Sirat, D. Psaltis: Conoscopic holography, Opt. Lett. 10, 4-6 (1985) 18. J. Rosen, A. Vijayakumar, M. Kumar, M.R. Rai, R. Kelner, Y. Kashter, A. Bulbul, S. Mukherjee: Recent advances in self-interference incoherent digital holography, Adv. Opt. Photon. 11, 1-66 (2019) 19. J. W. Goodman, R.W. Lawrence: Digital image formation from electronically detected holograms, Appl. Phys. Lett. 11, 77–79 (1967) 20. V. Balasubramani, M. Kujawi´nska, et al: Roadmap on Digital Holography-Based Quantitative Phase Imaging, J. Imaging7, 252 (2021) 21. A. Vander Lugt: Signal detection by complex spatial filtering, IEEE Trans. Inf. Theory IT-10, 139–145 (1964) 22. I. Yamaguchi, T. Zhang, Phase-shifting digital holography, Opt. Lett. 22, 1268-1270 (1997) 23. S. Guel-Sandoval, J. Ojeda-Castañeda: Quasi-Fourier transform of an object from a Fresnel hologram, Appl. Opt. 18, 950-951 (1979) 24. T.-C. Poon: Scanning holography and twodimensional image processing by acousto-optic two-pupil synthesis, J. Opt. Soc. Am. A 2, 521-527 (1985) 25. T.-C. Poon, K.B. Doh, B.W. Schilling, M.H. Wu, K.K. Shinoda, Y. Suzuki: Three-dimensional microscopy by optical scanning holography, Opt. Eng. 34, 13381344 (1995) 26. T.-C. Poon: Three-dimensional image processing and optical scanning holography, Adv. Imag. Electron Phys. 126, 329–350 (2003) 27. Y. Li, D. Abookasis, J. Rosen: Computer-generated holograms of three-dimensional realistic objects recorded without wave interference, Appl. Opt. 40, 2864-2870 (2001) 28. Y. Sando, M. Itoh, T. Yatagai: Color computergenerated holograms from projection images, Opt. Express 12, 2487-2493 (2004)
J. Rosen et al. 29. N. T. Shaked, B. Katz, J. Rosen: Review of three-dimensional imaging by multiple-viewpointprojection based methods, Appl. Opt. 48, H120–H136 (2009) 30. J. Rosen, G. Brooker: Digital spatially incoherent Fresnel holography, Opt. Lett. 32, 912–914 (2007) 31. M. K. Kim: Adaptive optics by incoherent digital holography, Opt. Lett. 37, 2694-2696 (2012) 32. P. Bouchal, J. Kapitán, R. Chmelík, Z. Bouchal: Point spread function and two-point resolution in Fresnel incoherent correlation holography, Opt. Express 19, 15603–15620 (2011) 33. J. Rosen, N. Siegel, G. Brooker: Theoretical and experimental demonstration of resolution beyond the Rayleigh limit by FINCH fluorescence microscopic imaging, Opt. Express 19, 26249-26268 (2011) 34. G. Brooker, N. Siegel, V. Wang, J. Rosen: Optimal resolution in Fresnel incoherent correlation holographic fluorescence microscopy, Opt. Express 19, 5047-5062 (2011) 35. R. Kelner, J. Rosen: Spatially incoherent single channel digital Fourier holography, Opt. Lett. 37, 37233725 (2012) 36. J. Rosen, G. Brooker: Fresnel incoherent correlation holography (FINCH): a review of research, Adv. Opt. Technol., 1, 151-169 (2012) 37. G. Brooker, N. Siegel, J. Rosen, N. Hashimoto, M. Kurihara, A. Tanabe: In-line FINCH super resolution digital holographic fluorescence microscopy using a high efficiency transmission liquid crystal GRIN lens, Opt. Lett. 38, 5264-5267 (2013) 38. Y. Wan, T. Man, D. Wang: Incoherent off-axis Fourier triangular color holography, Opt. Express 22, 8565– 8573 (2014) 39. T. Man, Y. Wan, F. Wu, D. Wang: Four-dimensional tracking of spatially incoherent illuminated samples using self-interference digital holography, Opt. Commun.355, 109–113 (2015) 40. D. Muhammad, C.M. Nguyen, J. Lee, H.-S. Kwon: Spatially incoherent off-axis Fourier holography without using spatial light modulator (SLM), Opt. Express 24, 22097–22103 (2016) 41. Y. Kashter, A. Vijayakumar, Y. Miyamoto, J. Rosen: Enhanced super resolution using Fresnel incoherent correlation holography with structured illumination, Opt. Lett. 41, 1558-1561 (2016) 42. D. Muhammad, C.M. Nguyen, J. Lee, H.-S. Kwon: Incoherent off-axis Fourier holography for different colors using a curved mirror, Opt. Commun.393, 25– 28 (2017) 43. T. Tahara, T. Kanno, Y. Arai, T. Ozawa: Single-shot phase-shifting incoherent digital holography, J. Opt. 19, 065705 (2017) 44. T. Nobukawa, T. Muroi, Y. Katano, N. Kinoshita, N. Ishii: Single-shot phase-shifting incoherent digital holography with multiplexed checkerboard phase gratings, Opt. Lett. 43, 1698-1701 (2018) 45. C. M. Nguyen, H.-S., Kwon: Common-path off-axis incoherent Fourier holography with a maximum overlapping interference area, Opt. Lett. 44, 3406–3409 (2019)
13 Three-Dimensional Imaging Using Coded Aperture Correlation Holography (COACH) 46. D. Liang, Q. Zhang, J. Wang J. Liu, Single-shot Fresnel incoherent digital holography based on geometric phase lens, J. Mod. Opt. 67, 92-98 (2020) 47. S. Sakamaki, N. Yoneda, T. Nomura: Single-shot inline Fresnel incoherent holography using a dual-focus checkerboard lens, Appl. Opt. 59, 6612-6618 (2020) 48. A. Vijayakumar, T. Katkus, S. Lundgaard, D. Linklater, E.P. Ivanova, S.H. Ng, S. Juodkazis: Fresnel incoherent correlation holography with single camera shot, Opto-Electron. Adv. 3, 200004 (2020) 49. A. Marar, P. Kner: Three-dimensional nanoscale localization of point-like objects using self-interference digital holography, Opt. Lett. 45, 591-594 (2020) 50. M. Potcoava, C. Mann, J. Art, S. Alford: Spatiotemporal performance in an incoherent holography lattice light-sheet microscope (IHLLS), Opt. Express 29, 23888-23901 (2021) 51. T. Tahara, T. Koujin, A. Matsuda, A. Ishii, T. Ito, Y. Ichihashi, R. Oi: Incoherent color digital holography with computational coherent superposition for fluorescence imaging [Invited], Appl. Opt. 60, A260A267 (2021) 52. V. Anand, T. Katkus, S.H. Ng, S. Juodkazis: Review of Fresnel incoherent correlation holography with linear and nonlinear correlations [Invited], Chin. Opt. Lett. 19, 020501 (2021) 53. N. Siegel, G. Brooker: Single shot holographic superresolution microscopy, Opt. Express 29, 15953-15968 (2021) 54. M. Wu, M. Tang, Y. Zhang, Y. Du, F. Ma, E. Liang, Q. Gong: Single-shot Fresnel incoherent correlation holography microscopy with two-step phase-shifting, J. Mod. Opt. 68, 564-572 (2021) 55. T. Tahara, R. Oi: Palm-sized single-shot phaseshifting incoherent digital holography system, OSA Continuum 4, 2372-2380 (2021) 56. T. Tahara, T. Ito, Y. Ichihashi, R. Oi: Multiwavelength three-dimensional microscopy with spatially incoherent light, based on computational coherent superposition, Opt. Lett. 45, 2482–2485 (2020) 57. T. Hara, T. Tahara, Y. Ichihashi, R. Oi, T. Ito: Multiwavelength-multiplexed phase-shifting incoherent color digital holography, Opt. Express 28, 10078-10089 (2020) 58. J. Rosen, S. Alford, et al.: Roadmap on recent progress in FINCH technology, J. Imaging7, 197 (2021) 59. G. Brooker, N. Siegel: Historical development of FINCH from the beginning to single-shot 3D confocal imaging beyond optical resolution [Invited], Appl. Opt. 61, B121-B131 (2022) 60. G. Pedrini, H. Li, A. Faridian, W. Osten: Digital holography of self-luminous objects by using a Mach–Zehnder setup, Opt. Lett. 37, 713-715 (2012) 61. A. Vijayakumar, Y. Kashter, R. Kelner, J. Rosen: Coded aperture correlation holography—a new type of incoherent digital holograms, Opt. Express 24, 12430–12441 (2016) 62. A. Vijayakumar, J. Rosen: Spectrum and space resolved 4D imaging by coded aperture correlation
63.
64.
65.
66. 67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
239
holography (COACH) with diffractive objective lens, Opt. Lett. 42, 947-950 (2017) A. Vijayakumar, Y. Kashter, R. Kelner, J. Rosen: Coded aperture correlation holography system with improved performance [Invited], Appl. Opt. 56, F67F77 (2017) A. Vijayakumar, J. Rosen: Interferenceless coded aperture correlation holography—A new technique for recording incoherent digital holograms without two-wave interference, Opt. Express 25, 13883– 13896 (2017) J. G. Ables: Fourier transform photography: a new method for X-ray astronomy, Proc. Astron. Soc. Aust. 1, 172–173 (1968) R. H. Dicke: Scatter-hole cameras for X-rays and gamma rays, Astrophys. J. 153, L101 (1968) M. R. Rai, J. Rosen: Noise suppression by controlling the sparsity of the point spread function in interferenceless coded aperture correlation holography (ICOACH), Opt. Express 27, 24311-24323 (2019) M. Fleisher, U. Mahlab, J. Shamir: Entropy optimized filter for pattern recognition, Appl. Opt. 29, 2091– 2098 (1990) T. Kotzer, J. Rosen, J. Shamir: Multiple-object input in nonlinear correlation, Appl. Opt. 32, 1919–1932 (1993) M. R. Rai, A. Vijayakumar, J. Rosen: Nonlinear adaptive three-dimensional imaging with interferenceless coded aperture correlation holography (I-COACH), Opt. Express 26, 18143-18154 (2018) M. R. Rai, J. Rosen: Resolution-enhanced imaging using interferenceless coded aperture correlation holography with sparse point response, Sci. Rep. 10, 5033 (2020) M. R. Rai, A. Vijayakumar, Y. Ogura, J. Rosen: Resolution enhancement in nonlinear interferenceless COACH with point response of subdiffraction limit patterns, Opt. Express 27, 391-403 (2019) M. R. Rai, A. Vijayakumar, J. Rosen: Superresolution beyond the diffraction limit using phase spatial light modulator between incoherently illuminated objects and the entrance of an imaging system, Opt. Lett. 44, 1572-1575 (2019) M. R. Rai, A. Vijayakumar, J. Rosen: Extending the field of view by a scattering window in an I-COACH system, Opt. Lett. 43, 1043-1046 (2018) M. R. Rai, J. Rosen: Depth-of-field engineering in coded aperture imaging, Opt. Express 29, 1634-1648 (2021) N. Hai, J. Rosen: Single viewpoint tomography using point spread functions of tilted pseudo-nondiffracting beams in interferenceless coded aperture correlation holography with nonlinear reconstruction, Opt. Laser Technol. 167, 109788 (2023) N. Dubey, R. Kumar, J. Rosen: Multi-wavelength imaging with extended depth of field using coded apertures and radial quartic phase functions, Submitted for Publication, Opt. Lasers Eng. 169, 107729 (2023)
240 78. V. Anand, S. Ng, T. Katkus, S. Juodkazis: White light three-dimensional imaging using a quasi-random lens, Opt. Express 29, 15551-15563 (2021) 79. O. Katz, E. Small, Y. Silberberg: Looking around corners and through thin turbid layers in real time with scattered incoherent light, Nature photon. 6, 549–553 (2012) 80. S. Mukherjee, A. Vijayakumar, J. Rosen: Spatial light modulator aided noninvasive imaging through scattering layers, Sci. Rep. 9, 17670 (2019) 81. J. J. Rey, et al.: A deployable, annular, 30m telescope, space-based observatory, Space Telescopes and Instrumentation 2014, Proc. of SPIE 9143, 18 (2014) 82. A. Bulbul, J. Rosen: Partial aperture imaging system based on sparse point spread holograms and nonlinear cross-correlations, Sci. Rep. 10, 21983 (2020) 83. F. Merkle: Synthetic-aperture imaging with the European Very Large Telescope, J. Opt. Soc. Am. A 5, 904-913 (1988) 84. A. Bulbul, A. Vijayakumar, J. Rosen: Superresolution far-field imaging by coded phase reflectors distributed only along the boundary of synthetic apertures, Optica 5, 1607-1616 (2018) 85. J. P. Desai, R. Kumar, J. Rosen: Optical incoherent imaging using annular synthetic aperture with superposition of phase-shifted optical transfer functions, Opt. Lett. 47, 4012-4015 (2022) 86. J. Rosen, V. Anand, M. R. Rai, S. Mukherjee, A. Bulbul: Review of 3D imaging by coded aperture correlation holography (COACH), Appl. Sci.9, 605 (2019) 87. J. Rosen, N. Hai, M. R. Rai: Recent progress in digital holography with dynamic diffractive phase apertures [Invited], Appl. Opt. 61, B171-B180 (2022) 88. N. Hai, J. Rosen: Interferenceless and motionless method for recording digital holograms of coherently illuminated 3D objects by coded aperture correlation holography system, Opt. Express 27, 24324-24339 (2019) 89. N. Hai, J. Rosen: Phase contrast-based phase retrieval: a bridge between qualitative phase contrast and quantitative phase imaging by phase retrieval algorithms, Opt. Lett. 45, 5812-5815 (2020)
J. Rosen et al. 90. N. Hai, R. Kumar, J. Rosen: Single-shot TIE using polarization multiplexing (STIEP) for quantitative phase imaging, Opt. Lasers Eng. 151, 106912 (2022) 91. N. Hai, J. Rosen: Single-plane and multiplane quantitative phase imaging by self-reference on-axis holography with a phase-shifting method, Opt. Express 29, 24210-24225 (2021) 92. N. Hai, J. Rosen: Coded aperture correlation holographic microscope for single-shot quantitative phase and amplitude imaging with extended field of view, Opt. Express 28, 27372-27386 (2020) 93. N. Dubey, R. Kumar, J. Rosen: COACH-based ShackHartmann wavefront sensor with an array of phase coded masks, Opt. Express 29, 31859-31874 (2021) 94. X. Lai, S. Zeng, X. Lv, J. Yuan, L. Fu: Violation of the Lagrange invariant in an optical imaging system, Opt. Lett. 38, 1896–1898 (2013) 95. X. Lai, S. Xiao, Y. Guo, X. Lv, S. Zeng: Experimentally exploiting the violation of the Lagrange invariant for resolution improvement, Opt. Express 23, 3140831418 (2015) 96. J. Rosen, R. Kelner: Modified Lagrange invariants and their role in determining transverse and axial imaging resolutions of self-interference incoherent holographic systems, Opt. Express 22, 29048-29066 (2014) 97. N. Hai, J. Rosen: Doubling the acquisition rate by spatial multiplexing of holograms in coherent sparse coded aperture correlation holography, Opt. Lett. 45, 3439-3442 (2020) 98. R. W. Gerchberg, W. O. Saxton: A practical algorithm for the determination of phase from image and diffraction plane pictures, Optik 35, 227–246 (1972) 99. A. Bulbul, N. Hai, and J. Rosen: Coded aperture correlation holography (COACH) with a superior lateral resolution of FINCH and axial resolution of conventional direct imaging systems, Opt. Express 29, 42106-42118 (2021)
Fringe Projection Profilometry
14
Cheng Jiang, Yixuan Li, Shijie Feng, Yan Hu, Wei Yin, Jiaming Qian, Chao Zuo, and Jinyang Liang
Abstract
Driven by industrial needs, medical applications, and entertainment, three-dimensional (3D) surface measurement techniques have received extensive studies. As one of the most popular techniques for non-contact 3D surface measurement, fringe projection profilometry (FPP) has been growing rapidly over the past decades. By leveraging the structured light illumination with triangulation, FPP has demonstrated its uniqueness with high measurement accuracy, fast speed, easy implementation, and robustness with imaging complex shapes of multiple objects. This chapter presents an overview of the mainstream methods for fringe generation and analysis. Typical error sources in FPP are discussed, and corresponding solutions
C. Jiang · J. Liang () Laboratory of Applied Computational Imaging, Centre Énergie Matériaux Télécommunications, Institut National de la Recherche Scientifique, Université du Québec, Varennes, Québec, Canada e-mail: [email protected]; [email protected] Y. Li · S. Feng · Y. Hu · W. Yin · J. Qian · C. Zuo () Smart Computational Imaging (SCI) Laboratory, Nanjing University of Science and Technology, Nanjing, Jiangsu Province, China Jiangsu Key Laboratory of Spectral Imaging & Intelligent Sense, Nanjing University of Science and Technology, Nanjing, Jiangsu Province, China e-mail: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
are reviewed. In addition, representative applications of FPP in both industry and scientific studies are included.
Keywords
Three-dimensional surface measurement · Non-contact surface measurement · Structured light illumination · Triangulation · Calibration · Fringe projection · Fringe analysis · Wrapped phase · Phase unwrapping · Error analysis
14.1
Introduction
Perceiving depth based on the binocular stereopsis formed by the eyes, human beings [1] acquire and process three-dimensional (3D) information to recognize and change the world. Inspired by this configuration, conventional cameras, which can only acquire 2D intensity information of the scene, can be used to quantitatively obtain 3D geometric information as a data basis for clearer understanding and better comprehension of the state and function of real-world objects. 3D surface measurement is also an important topic in computer vision [2]. It has become the foundation of different applications, such as industrial inspection [3–5], reverse engineering [6–8], crime scene investigation [9–11], fluid-flag interactions
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_14
241
242
[12–14], and biomedical inspection of wounds [15–17]. 3D surface measurement techniques can be classified into two categories: contact and noncontact. Contact methods measure and reconstruct 3D geometry by probing the 3D surface through physical touch [18–20]. While achieving high accuracy, this type of measurement usually has low measurement efficiency. It is also undesirable for the measurement of delicate objects (e.g., cultural heritage artworks and biomedical tissue samples) due to the potential damage induced by physical contact [21]. To surmount these limitations, non-contact 3D surface measurement methods have been developed, including optical interferometry [22, 23], time-offlight (TOF) techniques [24–28], stereo vision [29–33], shape from focus [34, 35], and structured illumination profilometry [36–39]. Among them, structured illumination profilometry has been proven to be one of the most potent techniques that provide advantages in terms of simple hardware configuration, high measurement accuracy, high point density, high speed, and low cost [40]. A typical structured illumination profilometry system consists of one projection unit and one or more cameras. During the measurement, patterns with known structures are projected sequentially onto the 3D object being measured [41]. Then, deformed structured images of the object under the projections are captured by one or more cameras. By using the triangulation method between the camera and the projector as well as the knowledge of the illumination patterns, the 3D shape of the object can be reconstructed from the captured images based on the pre-calibrated geometric parameters of the system [42]. The most commonly used type of pattern in structured illumination profilometry is fringe patterns [43, 44]. The techniques based on fringe pattern illumination are often referred to as fringe projection profilometry (FPP). The codification schemes used in FPP are mainly focused on fringe patterns with sinusoidal intensity distributions [45, 46]. The projected fringes are distorted by the depth variation on the object surfaces. In
C. Jiang et al.
this way, the depth information is encoded into the phase of the captured fringe images. Then, these images are processed by fringe analysis algorithms to extract the phase distribution, which is used to reconstruct the surface of interest in the 3D space based on geometrical relations of the triangulation optical arrangement [47–51]. Benefiting from the continuity and periodicity nature of sinusoidal patterns, FPP provides 3D data with both high spatial resolution and high depth accuracy [52, 53]. This chapter aims to review the basic principle of the FPP techniques from the perspectives of fringe generation, fringe analysis, and error analysis. Then, representative applications of FPPbased 3D surface imaging are presented. Finally, the challenges and future developments are discussed.
14.2
Methods
14.2.1 Basic Principle 14.2.1.1 Geometry of the Triangulation Method Figure 14.1 shows the schematic diagram of a typical 3D shape measurement system based on FPP. The projector illuminates the object with fringe patterns, which are deformed by the geometry of the object’s surface. A camera captures the deformed structured images from another perspective. In such a system, the correspondence between the camera pixel C and the phase line coordinate A is established by analyzing the deformation of captured structured images with known features (e.g., a single phase line) projected by the projector. Once the correspondence is obtained by fringe analysis (detailed in Sect. 14.3), with the system calibration information, the world coordinates (x, y, z) of object point P can be reconstructed using triangulation. 14.2.1.2 System Modeling The transformation from the world coordinate system to the camera/projector coordinate system can be mathematically described by the pinhole
14 Fringe Projection Profilometry
243
Fig. 14.1 Schematic diagram of a typical fringe projection profilometry system. (Adapted by permission from Optica: Optics Express, Camera-free three-dimensional dual photography, Kilcullen, P., et al. © 2020)
model, which provides the solution for the triangulation method. For a 3D point (x, y, z) in the world coordinate system, in a perfect imaging condition, the pinhole model describes its corresponding pixel on the camera sensor, (uc , vc ), to be ⎡ ⎤ ⎤ x uc ⎢y⎥ ⎥ .sc ⎣ vc ⎦ = Ac [Rc Tc ] ⎢ ⎣z⎦ 1 1 ⎡ ⎤ ⎡ ⎤ x fcu 0 ucp ⎢y⎥ ⎥ = ⎣ 0 fcv vcp ⎦ [Rc Tc ] ⎢ ⎣ z ⎦. 0 0 1 1 ⎡
(14.1)
Overall, this operation can be expressed by (uc , vc ) = Projc (x, y, z). Akin to the pinhole camera model, the projective behavior of the projector can also be described based on the corresponding projector pixel (up , vp ) as
⎤ x up ⎢ y ⎥ ⎥ sp ⎣ vp ⎦ = Ap Rp Tp ⎢ ⎣z⎦ 1 1 . ⎡ ⎤ ⎡ ⎤ x fpu 0 upp ⎢ y ⎥ ⎥ = ⎣ 0 fpv vpp ⎦ Rp Tp ⎢ ⎣ z ⎦. 0 0 1 1 ⎡
⎤
⎡
(14.2)
Similarly, the projection of the 3D point to the projector is modeled by (up , vp ) = Projp (x, y, z). In Eqs. (14.1) and (14.2), the matrices Ac and Ap contain the intrinsic parameters of the camera and projector: (fcu , fcv ) and (fpu , fpv ) describe the effective focal lengths along the axes of the camera and projector; (ucp , vcp ) and (upp , vpp ) are the coordinates of the principal points of the camera and projector, separately. Defined as the extrinsic parameters of the camera and projector, Rc and Rp are 3 × 3 matrices accounting for rotation. Tc and Tp are 3 × 1 matrices for translation. Finally, sc
244
C. Jiang et al.
and sp are scalar factors for numerical extraction of (uc , vc ) and (up , vp ). The intrinsic and extrinsic parameters can be extracted by system calibration, which will be described in detail in Sect. 14.2.1.3. ⎤ ⎡ a11 · · · a14 ⎥ ⎢ Here, we define .Ac [Rc Tc ] = ⎣ ... . . . ... ⎦ a · · · a34 ⎤31 b11 · · · b14 ⎥ ⎢ and .Ap Rp Tp = ⎣ ... . . . ... ⎦. Once the b31 · · · b34 correspondence between the camera pixel (uc , vc ) and one of the projector phase line’s coordinates (i.e., up or vp ) is established, the world coordinates of the 3D point P (i.e., [x, y, z]T ) can be calculated as [54]. ⎡
.
⎡ ⎤ x
⎣ y ⎦ = Tri uc , vc , up z ⎤−1 ⎡ a11 − uc a31 a12 − uc a32 a13 − uc a33 = ⎣ a21 − vc a31 a22 − vc a32 a23 − vc a33 ⎦ b11 − up b⎡ b13 − up b33 31 b12 − up b32 ⎤ uc a34 − a14 ⎣ vc a34 − a24 ⎦ up b34 − b14 (14.3)
14.2.1.3 System Calibration Accurate system calibration, which calculates the intrinsic and extrinsic parameters of the camera and projector, is crucial to 3D shape measurements (i.e., by solving Eq. (14.3)). In a typical process, a flat calibration plane with known feature points (e.g., checkerboard) is used for camera calibration. The flat checkerboard positioned with different poses is imaged by the camera. All captured checkerboard images are processed using the open-source MATLAB toolbox to extract the grid corners (Fig. 14.2a, b) and calculate the camera’s calibration parameters [55]. The calibration of the projector is conventionally difficult and complex because the projector cannot directly capture images like a camera. Zhang and Huang [56] developed a method that is now extensively adopted. This method enables the projector to capture images like a cam-
era so that the complicated projector calibration problem becomes a well-established camera calibration problem. In particular, a phase-shifting method (detailed in Sect. 14.2.3.2.1.2) is implemented to establish the correspondence between the camera pixels to the projector pixels, which is then used to transform the camera-captured checkerboard images into projector-captured images. As displayed in Fig. 14.2c, both vertical and horizontal sets of fringe patterns with their corresponding center line patterns are projected onto the checkerboard plane. Estimation of the absolute phase from these images is carried out using the four-step phase-shifted algorithm. Then, the absolute phase maps extracted for both the horizontal and vertical directions are used to determine a pixel-wise mapping from a cameracaptured image of the checkerboard plane (Fig. 14.2d) to a correctly altered image representing the view of the checkerboard plane from the perspective of the projector (Fig. 14.2e). Finally, the same MATLAB toolbox used in camera calibration is implemented to compute the corresponding calibration parameters of the projector.
14.2.2 Fringe Generation Fringe generation is the first step of the 3D surface measurement based on FPP. Many different types of fringe patterns (e.g., binary, grayscale, and color) have been successfully employed for 3D shape reconstruction. Conventionally, before digital video projectors became available, fringe patterns were generated by shining a laser beam through a mechanically rotating grating (shown in Fig. 14.3a). The physically made pattern is then projected by imaging optics [57–59]. In addition, interference is fundamentally suitable for fringe pattern projection [60–63]. Based on the property of the electromagnetic wave of a coherent laser source, superposing two laser beams with the same wavelength generates a sinusoidal pattern (Fig. 14.3b). By carefully designing the fiberoptic projection system, high-quality sinusoidal fringe patterns are projected onto targeted object regions [64].
14 Fringe Projection Profilometry
(a)
245
(b) x (mm)
25
-55
(c)
(d)
(e)
Fig. 14.2 System calibration. (a) Grid corners extraction of a selected camera-captured image. “O” represents the selected corner point; “X” and “Y” represent the 2D coordinates; red points represent the software-extracted grid corners. (b) Camera calibration results show all the positions of the checkerboard poses. (c) Images of a selected horizontal fringe pattern, the horizontal centerline,
a selected vertical fringe pattern, and the vertical centerline. (d) Camera image with selected grids. (e) “Projector image” with the corresponding grids. (Adapted by permission from SPIE: Emerging Digital Micromirror Device Based Systems and Applications XIII, High-speed threedimensional surface measurement using band-limited illumination profilometry (BLIP), Jiang, C., et al. © 2021)
Digital video projectors largely facilitate produce fringe patterns (Fig. 14.3c). Since such fringe patterns can be digitally created using a computer [65–67], digital fringe projection techniques have more relaxing requirements in the coherence of illumination. They also have a more flexible control in the fringe period, the design of the sequences based on phase-shifting methods, and the pattern refreshing rate [68]. These advantages make the digital projection method the most commonly used projection scheme in modern FPP systems [42].
Among the research and development trends of FPP, high-speed and high-accuracy 3D imaging is a fast-growing demand. To achieve this goal, the key device in the digital video projector is the digital micromirror device (DMD), which is a programmable spatial light modulator [69–73]. The fringe patterns are commonly generated by the DMD. Each micromirror on the DMD can be independently tilted to either +12◦ or −12◦ from its surface normal. In this way, the DMD, as a binary amplitude spatial light modulator, is capable of generating binary fringes with different periods at up to tens of kilohertz (kHz) (Fig. 14.4).
Fig. 14.3 Representative methods of fringe pattern projection. (a) Schematic diagram of a fringe projection profilometry based on a rotating mask. (Adapted by permission from MPDI: Applied Sciences, High Speed 3D Shape Measurement with Temporal Fourier Transform Profilometry, Zhang, H., et al. © 2019). (b) Geometry model of the interference-based fringe projection system. (Adapted by permission from Elsevier: Optics & Laser Technology,
Phase stabilizing method based on PTAC for fiber-optic interference fringe projection profilometry, Duan, X., et al. © 2013). (c) Measurement system based on a digital projector for fringe pattern projection. (Adapted by permission from Optica: Applied Optics, Self-correction of projector nonlinearity in phase-shifting fringe projection profilometry, Lü, F., et al. © 2017)
14 Fringe Projection Profilometry
247
Fig. 14.4 Sequential binary fringe pattern generation for 3D imaging. (a) Encoding diagram of 5-bit binary fringe sequence. LSB: least significant bit, MSB: most significant bit. (b) Five corresponding binary fringe patterns.
(Adapted by permission from Optica: Advances in Optics and Photonics, Structured-light 3D surface imaging: a tutorial, Geng J. © 2011)
Though being a binary amplitude spatial light modulator, the DMD can generate grayscale sinusoids in various ways [74]. The conventional dithering method forms a grayscale image by controlling the average reflectance of each micromirror over time, which clamps the projection rate at hundreds of hertz. To improve the fringe projection speed, binary defocusing techniques [75–77] and band-limited illumination profilometry (BLIP) [12, 54, 78–80] have been developed, both of which can produce a grayscale sinusoidal pattern from a single binary DMD mask [81, 82]. Their fringe pattern projection speeds can keep up with the DMD’s refreshing rate (i.e., up to tens of kHz), capable of high-speed 3D visualizations. Figure 14.5a shows a typical system demonstration of binary defocusing techniques. Although retained at the projector’s focal plane, these patterns become blurred on the out-offocus plane, such that quasi-sinusoidal fringes
are created. The quality of these sinusoidal fringes can be improved by proper designs based on dithering methods [83–85] and carefully calculated defocusing operations [86, 87]. Although overcoming the limitation in projection speed, binary defocusing could bring instability in pattern quality. In this method, quasi-sinusoidal fringe patterns are generated behind the system’s image plane. Consequently, DMD’s uneven surface could induce image deformation to the defocused binary DMD masks, which may decrease the 3D measurement accuracy, especially under coherent illumination. In addition, because the degree of proper defocusing to eliminate the high-order harmonics from the binary mask is frequency-dependent, this technique is less compatible with sinusoidal patterns of different periods. Furthermore, the defocusing operation reduces the contrast of quasi-sinusoidal fringes, especially at the far
248
C. Jiang et al.
Fig. 14.5 Representative methods of high-speed grayscale fringe pattern generation based on digital projection. (a) Schematic of the binary defocusing technique. (Adapted by permission from Optica: Optics Express, Motion-induced error reduction for binary defocusing profilometry via additional temporal sampling, Wang, Y., et al. © 2019). (b) System schematic of bandlimited illumination profilometry (BLIP). The binarization
of fringe patterns based on the error-diffusion algorithm is indicated as the yellow box. The band-limited filtering in the Fourier plane is indicated as the blue box. The beam profile after filtering is indicated as the green box. (Adapted by permission from Optica: Optics Letters, Realtime high-speed three-dimensional surface imaging using band-limited illumination profilometry with a CoaXPress interface, Jiang, C., et al. © 2020)
end of the measurement volume. Finally, the binary defocusing technique compromises the depth-sensing range [88]. BLIP can overcome the limitations of binarydefocusing techniques [79]. As displayed in Fig. 14.5b, after expansion and collimation, the laser
beam is directed to a DMD at an incident angle of 24◦ to its surface normal. The binary fringe patterns, generated by an error diffusion algorithm from their corresponding grayscale sinusoidal patterns, are loaded onto the DMD (shown in the yellow box of Fig. 14.5b). The generated
14 Fringe Projection Profilometry
binary DMD patterns possess blue noise characteristics in the spatial frequency domain [69, 81, 89], which is manifested by precisely matched imaging content to the corresponding grayscale pattern within the system’s bandwidth (shown in the blue box of Fig. 14.5b). A band-limited 4f imaging system that consists of two lenses (L1 and L2 in Fig. 14.5b) and one pinhole, which filters high-spatial-frequency noise on the Fourier plane in the 4f imaging system, converts these binary patterns to grayscale fringes at the intermediate image plane. The resultant beam profiles are high-quality grayscale sinusoidal fringes (shown in the green box of Fig. 14.5b). As a result, in BLIP, regardless of their frequencies, sinusoidal fringe patterns are always generated on the image plane of the DMD. Compared to binary defocusing, this configuration eliminates the distortion brought by uneven DMD surfaces and maintains the contrast of projected fringe patterns. It is also inherently compatible with multi-frequency fringe projection, varying working distances, and different fields of view.
14.2.3 Fringe Analysis After the fringe projection, the projected patterns are deformed by the geometry of the object’s surface. The deformed structured patterns are then captured by the camera. Based on Eq. (14.3), the 3D reconstruction of an object’s surface relies on the determination of (uc , vc , up ). Thus, the following step is to extract the correspondence between the camera pixel (uc , vc ) and the projector coordinate up . The fringe analysis can be categorized into two parts based on the fringe projection scheme: binary fringe and sinusoidal fringe.
14.2.3.1 Binary Fringe Analysis The binary coding technique, by its name, uses black and white fringes to form a sequence of projection patterns, such that the surface of the object possesses a unique binary code regionally with a unit of the finest binary fringe, which refers to a unique intensity profile in the captured image [90, 91]. In general, N-bit binary encoding can gener-
249
ate 2N combinations of stripes in the sequence. As an example, Fig. 14.6 shows an encoding diagram for a 3-bit binary fringe projection that generates eight (i.e., 23 ) areas, each encoded with a unique stripe sequence. On the camera captures images, each pixel reports a 3-bit binary number, which establishes the correspondence between the camera pixel and the projection column. With the knowledge from calibration (described in Sect. 14.2), the 3D coordinates (x, y, z) can be computed based on triangulation for all eight points along each horizontal line on the object’s surface. The binary coding technique is reliable and less sensitive to surface characteristics since only binary values exist in all pixels. However, the 3D imaging resolution is fundamentally limited by the number of binary fringes projected. To reach a pixel-level spatial resolution by identifying the specific projection line vector, a large number of sequential patterns need to be projected, which falls short in high-speed 3D reconstruction [1].
14.2.3.2 Sinusoidal Fringe Analysis To reach a high 3D imaging resolution while minimizing the number of projected patterns, sinusoidal fringes are applied with a pixel-level continuous intensity encoding on the projector. Fringe projection techniques extract the phase for 3D reconstruction. The fringe analysis of sinusoidal patterns involves two steps: wrapped phase extraction and phase unwrapping. In the following, we will detail the operating principle of both steps and discuss representative methods. 14.2.3.2.1 Wrapped Phase Extraction There are two major techniques for wrapped phase extraction: Fourier transform profilometry (FTP) and phase-shifting fringe projection profilometry (PSFPP). Theoretically, FTP can use one fringe pattern to calculate the pixelwise phase value with the assistance of a spatial carrier [57, 92]. The phase is extracted by applying a properly designed band-pass filter in the spatial frequency domain. The small number of projected patterns (typically, less than three) of FTP makes it suitable for the 3D shape measurement of dynamic surfaces [93, 94].
250
C. Jiang et al.
Fig. 14.6 Principle of binary fringe encoding technique. (Adapted by permission from IEEE: IEEE/RSJ International Conference on Intelligent Robots and Systems,
High-speed 3D image acquisition using coded structured light projection, Ishii I., et al. © 2007)
In the contrast, PSFPP projects sequences of phase-shifted fringe patterns to extract the phase information [39, 52, 53]. The multiple-shot nature of PSFPP makes it more robust and can achieve pixel-wise phase measurement with higher resolution and accuracy. Furthermore, the PSFPP measurements are more resilient to non-uniform background intensity and fringe modulation.
.
I (uc , vc ) = Ib (uc , vc ) + • Iva (uc , vc ) j [2πf u +ϕ(u0.5
0 c c ,vc )] + e−j [2πf0 uc +ϕ(uc ,vc )] . e (14.5)
The 2D Fourier transform of Eq. (14.5) gives
Iˆ fx , fy = Iˆb fx , fy + Iˆf fx − f0 , fy
. +Iˆf∗ fx + f0 , fy , (14.6)
14.2.3.2.1.1 Fourier Transform Profilometry
As shown in Fig. 14.7, in FTP, one fringe pattern is projected to the object and then captured by the camera. The intensity profile of a pixel on the camera-captured image can be expressed as .
I (uc , vc ) = Ib (uc , vc ) + Iva (uc , vc ) cos [2πf0 uc + ϕ (uc , vc )] ,
(14.4)
where I(uc , vc ) represents the intensity in the deformed structured image, Ib (uc , vc ), Iva (uc , vc ), f0 , and ϕ(uc , vc ) represent the background, intensity variation, carrier frequency of the captured image, and depth-dependent phase, respectively. By rewriting Eq. (14.4) in a complex form, the intensity profile can be expressed as:
where (fx , fy ) represents spatial frequencies.
Iˆ fx , fy and .Iˆf fx − f0 , fy are the Fourier and Ib (uc , vc ) transforms of j [2πf0 uc +ϕ(uc ,vc )] .0.5 • Iva (uc , vc ) e , respectively, and * denotes the complex conjugate. Then, a band-pass filter is applied in the spatial frequency domain to filter the background intensity and the conjugate component. After the inverse Fourier transform, the filtered image is expressed by: . b
∼ .
I (uc , vc ) = 0.5 • Iva (uc , vc ) ej [2πf0 uc +ϕ(uc ,vc )] . (14.7)
14 Fringe Projection Profilometry
251
FT
Fourier spectrum
Deformed structured image
Band-pass filter
IFT Phase extraction Wrapped phase map
Filtered Fourier spectrum
Fig. 14.7 Image processing diagram of Fourier transform profilometry. FT: Fourier transform, IFT: inverse Fourier transform. (Adapted by permission from Inte-
chOpen: Digital systems, Fourier transform profilometry in LabVIEW, Marrugo A., et al. © 2018)
Finally, the depth-dependent phase is extracted
pattern or a flat image. By subtracting the two captured images, the DC term and high-frequency noises are removed before the Fourier transform, which improves the accuracy and sensitivity in phase extraction. Though dual-shot FTP solves the spectrum overlapping, spectrum leakage is still inevitable. When a fast Fourier transform is operated to a sinusoidal fringe pattern with a non-integer number of periods, in the spatial frequency domain, the spectrum will spread to all components instead of concentrating on the carrier frequency of the fringe pattern. This is known as spectrum leakage, which will reduce the signal-to-noise ratio in the phase map [98]. An effective approach to reducing spectral leakage is to use windowed FTP [99–102]. In this method, a window (e.g., Hanning, Hamming, Blackman, and Kaiser) reduces the magnitude of the fringes at the edges of the images to such an extent that the non-periodicity is much mitigated, which suppresses the spectrum leakage and hence improves the accuracy of phase extraction.
as
.
ϕ (uc , vc ) = tan−1
⎧ ∼ ⎫ ⎨ I I (uc ,vc ) ⎬ ⎩R
∼
I (uc ,vc )
⎭
,
(14.8)
where .I [] and .R [] respectively denote the imaginary and real parts of a complex variable. Since an arctangent function is used here, Eq. (14.8) calculates a phase value ranging from (−π, +π ], which refers to the wrapped phase. One major limitation of conventional FTP lies in the fact that when the measured surface contains sharp edges, discontinuities, or large surface reflectivity variations, the DC component, or background intensity may overlap with the two conjugate components in the spatial frequency domain [39]. The spectrum overlapping makes it difficult to operate the band-pass filtering, precluding high-accuracy phase reconstruction of complex objects. This problem can be addressed by projecting an additional pattern in data acquisition [95–97]. This pattern can be a π-shift fringe
252
C. Jiang et al.
14.2.3.2.1.2 Phase-Shifting Fringe Projection Profilometry
The FTP method works well with a smooth object surface profile and low fringe noise. The dualshot FTP methods can also largely suppress the problems of spectrum overlapping and spectrum leakage. However, in general, FTP still struggles to handle complex surface reconstruction and high noise. To eliminate the weakness, PSFPP has been developed to capture more fringe patterns with phase shifting. In PSFPP, phase-shifted sinusoidal fringe patterns (see an example in Fig. 14.8a) are projected to the object sequentially by a projector and then captured by a camera. As displayed in Fig. 14.8b, the intensity of the camera pixel (uc , vc ) in the captured deformed structure images can be expressed as Ik (uc , vc ) = Ib (uc , vc ) + Iva (uc , vc ) cos [ϕ (uc , vc ) − 2π k/K] .
.
(14.9)
Here, Ik (uc , vc ) represents the intensity in the kth deformed structure image, K is the number of phase-shifting steps, and k ∈ [0, K − 1]. Based on the phase-shifting algorithm, the relative phase value at each pixel can be calculated as .
ϕ (uc , vc ) = tan−1
K−1 k=0 K−1 k=0
Ik (uc ,vc ) sin( 2πKk ) Ik (uc ,vc ) cos(
2π k K
. ) (14.10)
Equation (14.10) extracts the wrapped phase map in the range of (−π , +π] (Fig. 14.8c). Since there are three unknown factors Ib , Iva and ϕ, three phase-shifted fringe patterns are sufficient to calculate the wrapped phase value. 14.2.3.2.1.3 Deep Learning-Based Wrapped Phase Extraction
The two above-described methods still have limitations. Although capable of extracting phase information with higher efficiency by projecting less than three patterns, FTP is susceptible to complex surfaces and high noise. On the contrary, PSFPP can carry out pixel-wise phase measurements with high accuracy. However, requiring at least three measurements, it is fragile to disturbances and vibrations. Therefore, it is not difficult to see that there is a contradiction between the
reliability and the accuracy of the task of fringepattern analysis. To address this issue, a deep learning technique is introduced into the fringe analysis that shows potential in realizing high-accuracy phase demodulation with a single fringe pattern [48]. To measure the wrapped phase, Eq. (14.8) can be further modified into ⎧ ∼ ⎫ ⎨ I I (uc ,vc ) ⎬ −1 =tan−1 M(uc ,vc ) , . ϕ (uc , vc ) =tan ∼ D(u ,v ) c c ⎩R I (uc ,vc ) ⎭ (14.11) where the numerator M(u c , vc ) characterizes ∼ the imaginary part .I I (uc , vc ) and the denominator D(uc , vc ) represents the real part ∼ .R I (uc , vc ) . To emulate this process, a deep neural network (DNN) can be trained to learn M(uc , vc ) and D(uc , vc ), which are then fed into the arctangent function for retrieving the wrapped phase. As an example, the wrapped phase extraction using a U-Net is illustrated in Fig. 14.9. The UNet is a widely used DNN for medical image segmentation, and it also shows excellent performance for fringe pattern analysis. In the beginning, the input fringe images are processed by the encoder to obtain C-channel feature tensors with 1/2 resolution reduction along both u and v directions. Then, these feature tensors successively go through three convolutional blocks to capture the multi-level feature information. Contrary to the encoder subnetwork, the decoder subnetwork then performs up-sampling operations to restore the results of the input image’s original size. It is implemented by bilinear interpolation and is followed by two convolution layers. In the U-Net, at every step of the decoder, a skip connection is used to concatenate the convolution layers’ output with feature maps from the encoder at the same level. This structure helps obtain low-level and high-level information at the same time and weakens the typical gradient vanishing in deep convolutional networks, which is beneficial to achieve accurate results. The last layer of the net-
14 Fringe Projection Profilometry
253
Fig. 14.8 Three-step phase-shifting for wrapped phase extraction. (a) Three-step phase shifting patterns. (b) Selected intensity cross-sections of I0 , I1 , and I2 . (c) Ex-
tracted wrapped phase map. (Adapted by permission from Elsevier: Optics and Lasers in Engineering, Phase shifting algorithms for fringe projection profilometry: A review, Zuo, C., et al. © 2018)
work is a convolutional layer activated by a linear activation function and outputs two-channel data consisting of the numerator and the denominator. The objective of the neural network is to minimize the following loss function
counts in u and v axes, respectively. Y is a tensor that consists of the predicted numerator M(uc , vc ) and denominator D(uc , vc ) (see Eq. (14.11)). G is also a tensor that includes the ground-truth numerator and denominator. To train the U-Net, various scenes are selected to generate the training data. To obtain the ground-truth data, the phase-shifting algorithm with a large number of steps (i.e., a twelve-step phase-shifting algorithm) can be exploited. Of the data, 80% can be used for training and the rest for validation. Before being fed into the network,
.
Loss (θ) =
1 HW
(H,W ) (uc ,vc )
2 Y(uc ,vc ) (θ) − G(uc ,vc ) , (14.12)
where θ represents the set of parameters in the neural network that is adjusted automatically during the training. H and W are the camera pixel
254
C. Jiang et al.
Fig. 14.9 Architecture of U-Net for wrapped phase extraction. C: number of channels. More channels (i.e., from 2C to 16C) are used to extract further information for deep layers. The input is a fringe pattern and it learns to predict the numerator M(uc , vc ) and the denominator D(uc , vc ),
which are then fed into the arctangent function. (Adapted by permission from SPIE: Advanced Photonics, Fringe pattern analysis using deep learning, Feng, S., et al. © 2019)
the input fringe pattern is divided by 255 for normalization, which can make learning easier for the network. In the training, the adaptive moment estimation (ADAM) can be used to tune the parameters to find the minimum of the loss function [103]. In the implementation of ADAM, one can start the training with a learning rate of 10−4 . It is dropped by a factor of 2 if the validation loss has stopped improving for 10 epochs, which helps the loss function get out of local minima during training. All the data processing and the DNN’s training and testing can be implemented in Python by using any deep learning frameworks, e.g., Keras, Tensorflow, and Pytorch. A tutorial of the source codes and the trained data set can be found at https://doi.org/10.1038/s41377-02200714-x [104].
encoded FPP [105] is shown in Fig. 14.10. The color-encoded fringe pattern, which encodes three fringe patterns of different wavelengths λR , λG , and λB into the red, green, and blue channels of one color image, is projected by a projector, and is deformed by the object’s surface. Then, the color-encoded image captured by a color camera c can be represented as .IRGB , and the gray images of its three channels can be expressed as .IRc , c c .IG , and .IB , respectively. With the input of three channels intensity profiles, a convolutional neural network (CNN) has been developed to output the numerator M and the denominator D, which are then used to calculate the wrapped phase value following Eq. (14.11). The upper right part of Fig. 14.10 reveals the internal structure of the network, which is a five-path CNN. For each convolutional layer, the kernel size is 3 × 3 with convolution stride one, and the output is a 3D tensor of shape (H, W, C), where C represents the number of filters used in each convolutional layer (C = 64). In the first path of the CNN, the inputs are processed by a convolutional layer, followed by four
Deep Learning-Based Color-Encoded Wrapped Phase Extraction
The established deep learning-based wrapped phase extraction method has been further developed and applied in different FPP systems. The flowchart of the deep learning-based color-
14 Fringe Projection Profilometry
255
Fig. 14.10 Flowchart of deep learning-based color fringe projection profilometry. Step 1: input three gray fringe images .IRc , .IGc , and .IBc of the color image channels, and output the numerator and denominator terms .MGc and c
.DG
∼
and the low-accuracy absolute phase .ϕ l in the green
residual blocks and another convolutional layer. Also, implementing shortcuts between residual blocks contributes to convolution stability. In the other four paths, the data are down-sampled by the max-pooling layers by two times, four times, eight times, and sixteen times, respectively, for better feature extraction, and then up-sampled by the up-sampling blocks to match the original size. The outputs of all paths are concatenated into a tensor with five channels. Finally, three channels are generated in the last convolution layer. During the training session, 600 groups of images are captured. Each group contains a colorc encoded deformed structured image .IRGB , as well as the deformed structured images with twelvestep phase-shifting fringe projection, which are used for the calculation of the ground-truth data (the frequencies of the latter are consistent with those of the three channels of the former). To
channel. Step 2: obtain the high-accuracy absolute phase ∼ by Eqs. (14.11) and (14.13). Step 3: reconstruct the 3D information by the calibration parameters. (Adapted by permission from Optica: Optics Letters, Single-shot absolute 3D shape measurement with deep-learning-based color fringe projection profilometry, Qian J., et al. © 2020)
.ϕ h
avoid crosstalk and chromatic aberration problems at the source, when the phase-shifting images are collected, the green fringe patterns are projected, and only the green channels of the captured images are utilized for the label. Therefore, .IRc , .IGc , and .IBc are fed into the constructed neural network, and the outputs are the numerator .MGc and the denominator .DGc in the green channel. Since three fringe images with different wavelengths are used, a low-accuracy abso∼ lute phase map .ϕ l (uc , vc ) can be predicted by the projection distance minimization approach [105], which obtains the pixel-wise qualified fringe order according to the wrapped phase distribution in three fringe images. After the network pre∼ dicts .MGc , .DGc , and .ϕ l (uc , vc ), the wrapped phase ϕG (uc , vc ) of wavelength λG can be obtained by Eq. (14.11). Then the high-quality absolute phase ∼ .ϕ h (uc , vc ) can be acquired by
256
C. Jiang et al. ∼
ϕ h (uc , vc )= ϕG (uc , vc ) ∼ . ϕ l (uc ,vc )−ϕG (uc ,vc ) , +2π • Round 2π
(14.13)
which will then be used for the 3D reconstruction (detailed in Sect. 14.2.3.2.2). Deep Learning-Based Frequency-Multiplexed Wrapped Phase Extraction
Besides being applied in color-encoded FPP systems, deep learning-based wrapped phase extraction has also been utilized in frequencymultiplexed FPP systems for single-shot fringe analysis. As early as 1997, Takeda et al. [106] introduced frequency multiplexing, a longestablished technique of sharing spectrum in telecommunications, to FPP to encode two fringe patterns with different spatial carriers into a snapshot. Considering that the traditional frequency-multiplexed method cannot guarantee single-shot high-accuracy 3D imaging, Li et al. [107] proposed a deep learning-based frequencymultiplexed FPP that uses DNNs to robustly recover the wrapped phase from a unique fringe image involving spatially multiplexed fringe patterns of different frequencies and then performs phase unwrapping for 3D reconstruction. The flowchart of this work is shown in Fig. 14.11a, which includes generating spatial frequency multiplexing composite fringe pattern (Fig. 14.11b), preparing network training dataset (Fig. 14.11c), and building a phase retrieval network (Fig. 14.11d). This method constructs two parallel U-shaped networks with the same structure—U-Net1 and U-Net2. U-Net1 takes the dual-frequency composite fringe images with multiple steps of phase shifting as the network input to predict the numerator M and the denominator D of the wrapped phase function. U-Net2 is designed to predict a low∼ precision absolute phase map .ϕ coarse from the dual-frequency composite fringe image input. Although the absolute phase output by U-Net2 is relatively coarse, its accuracy is sufficient for determining the fringe order n of the high-quality
wrapped phase map (detailed in Sect. 14.2.3.2.2), which is output by U-Net1. Finally, the high∼ precision absolute phase map .ϕ is obtained by using the wrapped phase map ϕ together with the fringe order n. Based on the pre-calibrated geometric parameters of the FPP system, a highprecision 3D point cloud can be reconstructed, and thus, single-shot structured light 3D imaging can be realized. This work incorporates the concept of spatial frequency multiplexing in deep learning and designs an unambiguous composite fringe image input to ensure that the networks have robust phase unwrapping performance. Besides, to provide the DNNs with the capability to overcome the serious spectrum aliasing problem that traditional spectrum separation technology cannot deal with, the fringe projection images without this problem are used to generate aliasing-free labels. After proper training, the neural networks can directly achieve high-quality phase extraction and robust phase unwrapping through a composite fringe input image for objects with discontinuous or complex surfaces. 14.2.3.2.2 Phase Unwrapping From Eqs. (14.8) and (14.10), it can be seen that the wrapped phase map is not available for converting the phase value to a specific projector’s column coordinate pixel by pixel. Thus, the phase obtained from fringe analysis techniques is required to be unwrapped before 3D coordinates conversion [49]. This process aims to determine the fringe order n(uc , vc )—an integer number for each pixel—to remove the 2π discontinuities. The unwrapped phase value can be calculated by .
∼
ϕ (uc , vc ) = ϕ (uc , vc ) + n (uc , vc ) × 2π. (14.14)
The state-of-the-art phase unwrapping algorithms can be classified into four major categories: spatial phase unwrapping, temporal phase unwrapping, geometric-constraint-based phase unwrapping, and deep learning-based methods.
14 Fringe Projection Profilometry
257
Fig. 14.11 Deep learning-based frequency-multiplexed FPP. (a) Flowchart of the fringe analysis. (b) Frequencymultiplex FPP system and the cross-section intensity distribution of the composite multi-frequency fringe pattern. (c) Part of network training data sets. (d) The architec-
ture of the U-Net network (U-Net1/U-Net2). (Adapted by permission from Opto-Electronic Journals Group: Opto-Electronic Advances, Deep-learning-enabled dualfrequency composite fringe projection profilometry for single-shot absolute 3D shape measurement, Li Y., et al. © 2022)
14.2.3.2.2.1 Spatial Phase Unwrapping
path [47, 108]. Using this principle, spatial phase unwrapping is carried out from one pixel to its neighboring pixels and through the entire image. Supposing that (uc1 , vc1 ) and (uc2 , vc2 ) are two adjacent pixels on the path, then the fringe order n(uc2 , vc2 ) is calculated by
A spatial phase unwrapping method unwraps the phase by referring to the phase values of neighboring points on a single-wrapped phase map. It assumes the surface geometry is smooth, which means that there are no more than 2π phase changes between two adjacent points along the
258
C. Jiang et al.
⎧ ⎪ ⎪ ⎨ n (uc1 , vc1 ) − 1, . n (uc2 , vc2 ) = n (uc1 , vc1 ) + 1, ⎪ ⎪ ⎩ n (u , v ) , c1
c1
ifΔϕ = 2π ifΔϕ = −2π
,
otherwise
(14.15) where Δϕ = ϕ(uc2 , vc2 ) − ϕ(uc1 , vc1 ). Due to the noise and the sharp depth change, the wrapped phase difference Δϕ may not exhibit 2π steps, which limits the practical robustness of this approach. Several modified spatial phase unwrapping algorithms have been developed to enhance the unwrapping accuracy [109–111]. They identify and separate the objects’ surface into high-quality areas (with smooth phase variance) and low-quality areas (with more abrupt phase changes). Phase unwrapping is then carried out from higher-quality phase points to lower-quality phase points to reduce the probability of incorrectly unwrapped phase points. After obtaining the unwrapped phase map, an additional image, which is captured by illumination with a center line on the projector, is used to provide a phase offset value ϕ0 corresponding to the projector’s center [12, 56]. Then, the horizontal projector coordinate up can be obtained using ∼ the absolute phase value .ϕ (uc , vc ) as .
∼ P up = ϕ (uc , vc ) − ϕ0 2πp + 12 (Nu − 1) , (14.16)
where Pp is the fringe period (in pixels), and Nu is the width of the projector (in pixels). Once up is determined with respect to specific camera pixel (uc , vc ), the 3D coordinate (x, y, z) can be extracted by Eq. (14.3). 14.2.3.2.2.2 Temporal Phase Unwrapping
Temporal phase unwrapping employs more than one unwrapped phase map to provide extra information about the fringe orders. Different from spatial phase unwrapping, this approach allows each spatial pixel from the measured data to be unwrapped independently from its neighbors [112–114]. The temporal phase unwrapping method can recover an unambiguous absolute phase even in the presence of large discontinuities or spatially isolated surfaces.
Temporal phase unwrapping is represented by multi-frequency (hierarchical), multiwavelength (heterodyne), and number-theoretical approaches. A comparative review has analyzed and discussed the unwrapping success rate and anti-noise performance of these algorithms, revealing that the multi-frequency temporal phase unwrapping approach provides the highest unwrapping reliability and best noise-robustness among others [112, 115, 116]. This strategy projects the fringe patterns with different fringe periods. The coarsest fringe pattern has only one fringe, of which the phase without any “wraps” [because its values do not exceed the (−π, π] range] is then used as fundamental information for the ensuing phase unwrapping. The other phase maps are unwrapped based on their previous unwrapped phase maps one by one according to the relation of their frequencies or fringe numbers. Since the phases are unwrapped from the coarsest layer to the finer ones, this method is also well known as the “hierarchical” unwrapping approach [117]. In principle, the coarsest fringe pattern, with no more than one period, has its absolute ∼ phase [defined as .ϕ 0 (uc , vc )] equal to the wrapped phase map ϕ0 (uc , vc ). The unwrapped phase maps of two adjacent sets of fringe ∼ projections a relationship of .ϕ m (uc , vc ) = have ∼ Ppm−1 /Ppm ϕ m−1 (uc , vc ), where m denotes the sequence index of projected fringe patterns. Here, .Ppm indicates the fringe period of projected patterns. Then, the absolute phase map for m > 0 can be calculated by ∼
ϕ m (u c , vc ) = ϕ∼m (uc , vc ) + . (Ppm−1 /Ppm )ϕ m−1 (uc ,vc )−ϕm (uc ,vc ) . 2π • Round 2π (14.17) Equation (14.17) shows that, unlike singlefrequency phase unwrapping, the phase of each pixel is unwrapped independently, which avoids incorrect depth quantification in analyzing spatially isolated objects. Using this method, the unwrapped phase map of each set of fringe ∼ patterns is computed successively. .ϕ ma (uc , vc ),
14 Fringe Projection Profilometry
which represents the absolute phase map according to the densest set of fringe patterns, is used to compute the horizontal coordinate of the projector up as .
∼
up = ϕ ma (uc , vc )
Ppma 2π
+ 12 (Nu − 1) . (14.18)
For most algorithms to achieve high 3D reconstruction accuracy, the total number of relative phase maps is typically more than five. In applications where the measurement time needs to be shortened, the required number of maps can be reduced to two [115]. 14.2.3.2.2.3 Geometric-Constraint-Based Phase Unwrapping
All aforementioned phase unwrapping methods require at least one additional image for absolute phase recovery, which could bring challenges for high-speed applications since the object is assumed to be static during the acquisition of the desired number of images for 3D reconstruction. Because of the reduced hardware cost and flexibility of accurate system calibration, methods that use additional hardware components for absolute phase unwrapping without the need of acquiring any additional images have been developed. The multi-view geometry based-absolute phase unwrapping method uses the wrapped phase, the epipolar geometry, the measurement volume, and the phase monotonicity, as constraints to determine the fringe order n(uc , vc ) for each camera pixel and unwrap the phase without requiring additional images [78, 118– 120]. Geometry-constraint-based approaches are efficient to solve the phase ambiguity problem for measuring complex surfaces without requiring additional projection patterns. Figure 14.12 illustrates one example of the trio-geometric constraint-based approach for fringe order determination, which is implemented in the framework of dual-view BLIP with the temporally interlaced acquisition (TIA). In this approach, four deformed structured images are captured alternately by two high-speed CMOS cameras placed side by side (Fig. 14.12a). Depending on their roles in image reconstruction,
259
they are denoted as the main camera and the auxiliary camera. Synchronized by the DMD’s trigger signal, each camera captures half of the sequence (i.e., two images, see Fig. 14.12b). In data acquisition, four-step phase-shifting fringe patterns, whose phases are equally shifted
by' π/2, illuminate a 3D object. For a pixel ' . uca , vca in the images of the auxiliary camera that perfectly corresponds with (ucm , vcm ), the rearrangement Eq. (14.9) allows us to write
' ' I0 (ucm , vcm ) − I1 (u cm , vcm) = I3 uca , vca
. ' −I2 u'ca , vca . (14.19) Each side of Eq. (14.19) contains images captured by the same camera and computes images with a residual fringe component. Retaining sinusoidal characteristics, this residual has the effect of improving the efficiency of line-constrained searches by regularizing encountered patterns of local maxima and minima and by including additional phase information. Moreover, by interpreting its right-hand side as a continuously varying function along the epipolar line, Eq. (14.19) together with bi-linear interpolation allows for the selection of discrete candidates with sub-pixel accuracy. Thus, Eq. (14.19) is selected as the intensity-matching condition. Following this condition, an algorithm has been developed to recover the 3D image of the object pixel by pixel. In brief, for a selected pixel (ucm , vcm ) of the main camera, the algorithm ' locates a matching point . u'ca , vca in the images of the auxiliary camera. From knowledge of the camera calibration, this point then enables determining estimated 3D coordinates as well as recovering a wrapped phase. From knowledge of the projector calibration, this phase value is used to calculate a horizontal coordinate on the projector’s plane. A final 3D point is then recovered using the coordinate-based method. A flowchart of this algorithm with illustrative data is provided in Fig. 14.12c. In the first step, + I1 (ucm , vcm ))/2 (I0 (u cm', vcm' ) ' and . I2 u'ca , vca + I3 uca , vca /2 are calculated and processed to eliminate pixels with low intensities. The thresholding results in a
260
C. Jiang et al.
Fig. 14.12 Operating principle of TIA-BLIP. (a) System schematic. (b) Timing diagram and acquisition sequence. te : camera’s exposure time. (c) Flowchart of coordinate-based 3D point determination with illustrative data. (um , vm ), Coordinates
of the point to be matched for the main camera; . u'e , ve' , Coordinates of the estimated corresponding point for the auxiliary camera; (x, y, z), recovered 3D coordinates; ri , Horizontal distance between the candidates and the estimated corresponding point; ωm , Phase value of the selected point in the main camera calcu' , lated by the Fourier transform profilometry method; .ωai
Phase value of the candidate points in the auxiliary camera calculated by the Fourier transform profilometry method; ' .ϕai , Phase value calculated by the phase-shifting method;
binary quality map (see Step I in Fig. 14.12c). Subsequently, only pixels that fall within the quality map of the main camera are considered for 3D information recovery. In the second step, the selected pixel (ucm , vcm ) determines an epipolar line containing the matching point within the auxiliary camera’s images. Then, the algorithm extracts the
' candidates . u'cai , vcai that satisfy the intensity matching condition (i.e., Eq. (14.19)) in addition to three constraints (see Step II in Fig. 14.12c). The subscript “i” denotes the ith candidate.
The second constraint requires that candidates occur within a segment of the epipolar line determined by a fixed transformation that approximates the location of the matching point. This approximation is provided by a 2D projective transformation (or homography) that determines the estimated corresponding point (.u'e , ve' ). Once (.u'e , ve' ) are determined, the search along the epipolar line is confined to the segment occurring ' over the horizontal interval ' . ue − r0 , ue + r0 , where r0 is an experimentdependent constant. In general, r0 should be as
∼
.ϕ pi ,
Phase value determined on the projector’s plane; Hi , 3D points determined by candidates; Pm , Principal point of the main camera; Pa , Principal point of the auxiliary camera; ΔIm = I0 (um , vm ) − I1 (um , vm );ΔIep , Intensity profile of I3 − I2 along the epipolar line. (Adapted by permission from Optica: Photonics Research, High-speed dual-view band-limited illumination profilometry using temporally interlaced acquisition, Jiang, C., et al. © 2020)
14 Fringe Projection Profilometry
261
small as possible while still covering the targeted depth range. The third constraint requires that the selected point and candidates must have the same sign of their wrapped phases. Estimates of the wrapped phases are obtained using the FTP technique. The constraint requires wrapped phase estimation ωm ' of (ucm , vcm ) and the candidate pixel .ωai to have the same sign in the interval (−π , π]. In the third step, three criteria are used to calculate penalty scores for each candidate (see Step III in Fig. 14.12c). The first and primary criterion compares the phase values of the candidates using two methods. First, the phase inferred from the intensities of candidates and the pixel (ucm , vcm ) is calculated by .
ϕai' = tan−1
' I1 (ucm ,vcm )−I3 (u'cai ,vcai ) ' I0 (ucm ,vcm )−I2 (u'cai ,vcai )
.
(14.20)
' ' Meanwhile, for each candidate . ucai , vcai , the coordinate triple . ucm , vcm , u'cai and knowledge of camera calibration allows determining an estimated 3D point Hi by using the stereo vision method. In addition, with the knowledge of the projector calibration, a point with coordinates (upi , vpi ) on the projector’s plane is determined for each candidate. Then, an unwrapped phase value ∼ .ϕ pi is calculated by .
∼
ϕ pi =
2π Pp
upi − upd ,
Ai =
∼ R ϕai' −ϕ pi π
π
i
candidate is then computed as a weighted linear combination of three individual scores, where the normalized weights are empirically chosen to lead to the results that are most consistent with physical reality. Finally, the candidate with the minimum Si is chosen as the matching point ' (.u'ca , vca ). Its phase values, which are calculated by using Eqs. (14.20) and (14.21), are denoted as
' ∼ ' ' ' ' .ϕa uca , vca and .ϕ p uca , vca , respectively. In the final step, the algorithm determines the final 3D coordinates (see Step IV in Fig.
14.12c). ' ' First, .ϕa' u'ca ,vca is unwrapped as .ϕa' u'ca , vca + ' 2π n u'ca , vca . Then, the coordinate on the projector’s plane, up , is recovered with sub-pixel resolution as ' ' '
' ϕa (uca ,vca ) ' . up = upd + Pp , + n u , v ca ca 2π (14.23) from which the final 3D coordinates (x, y, z) are computed using calibration information associated with the coordinate triple(ucm , vcm , up ).
(14.21)
where upd is a horizontal datum coordinate on the projector’s plane associated with the zero phase. Since these independently inferred phase values must agree if the candidate correctly matches (ucm , vcm ), a penalty score Ai , as a normalized difference of these two phase values, is calculated by .
normalized distance score favoring candidates located closer to the estimated matching point |u' −u' | (.u'e , ve' ), which is calculated by .Bi = e r0 ai . Moreover, Ci is a normalized difference of wrapped phase values using the wrapped phases ' ωm and .ωai , which is calculated by .Ci = |R( ωm −ωai' )| . A total penalty score S for each
,
(14.22)
where the rewrapping function R(•) computes the subtracted difference between wrapped and unwrapped phase values. To improve the robustness of the algorithm, two additional criteria are implemented using data available from the second step. Bi is a
14.2.3.2.2.4 Deep Learning-Based Phase Unwrapping Deep Learning-Based Temporal Phase Unwrapping
For the application of high-speed FPP based on temporal phase unwrapping, to improve the measurement efficiency, it is necessary to make multi-frequency temporal phase unwrapping as reliable as possible while using a minimum number of projection patterns. For the simplest and most efficient case in multi-frequency temporal phase unwrapping, two groups of phaseshifting fringe patterns with different frequencies are used: the high-frequency one is applied for 3D reconstruction of the tested object and the unit-frequency one is used to assist phase unwrapping for the wrapped phase with high
262
frequency. The final measurement precision or sensitivity is determined by the number of fringes used within the high-frequency pattern, under the precondition that its absolute phase can be successfully recovered without any fringe order errors. However, due to the non-negligible noises and other error sources in actual measurement, the period of the high-frequency fringes is generally restricted to about Round(Nu /16), limiting the measurement accuracy. On the other hand, the usage of an additional intermediate set of fringe patterns (i.e., use three sets of phaseshifting patterns in total) can unwrap the phase with a higher frequency or a higher success rate. As a result, the increased number of required patterns reduces the measurement efficiency of FPP, which is not suitable for measuring dynamic scenes. To solve the above problem, a deep learningbased temporal phase unwrapping method is presented to improve the performance of absolute phase extraction. As shown in Fig. 14.13, this learning-based framework uses two wrapped phases as input, which are extracted from a single-period fringe projection and a high-frequency fringe projection, respectively. The single-period fringe projection results in the wrapped phase value being equal to the ∼ unwrapped phase value (i.e., .ϕl = ϕ l ), which is considered with a lower accuracy compared to the wrapped phase value ϕh extracted using higherfrequency fringe projection. This framework aims to output an unwrapped phase map with high reliability. To realize the highest unwrapping reliability, the residual network is used as the basic skeleton of the neural network, which can speed up the convergence of deep networks and improve network performance by adding layers with considerable depth. Then, the multi-scale pooling layer is introduced to down-sampling the input tensors, which can compress and extract the main features of the tensors for reducing the computational complexity and preventing the over-fitting problem. Correspondingly, it is inconsistent for the tensor sizes in the different paths after the processing of the pooling layer. Therefore, upsampling blocks are used to make the sizes of
C. Jiang et al.
the tensors in the respective paths uniform. In summary, the whole network mainly consists of convolution layers, residual blocks, pooling layers, upsampling blocks, and concatenate layers. To maximize the efficiency of the model, after repeatedly adjusting the hyper-parameters of the network (i.e., the number of layers and nodes), the number of residual blocks for each path is set to 4, and the basic filter numbers of the convolution layers should be 50. The tensor data of each path in the network are downsampled to 1, 1/2, 1/4, and 1/8 by adopting pooling layers with different scales respectively, and then different numbers of upsampling blocks are adopted to make the sizes of the tensors in the corresponding paths uniform. Besides, it has been found that implementing shortcuts between residual blocks contributes to making the convergence of the network more stable. Furthermore, to avoid over-fitting as the common problem of the DNN, l2 regularization is adopted in each convolution layer of residual blocks and upsampling blocks instead of all convolution layers of the network, which can enhance the generalization ability of the network. Although the purpose of building the network is to achieve phase unwrapping and obtain the absolute phase, there is no need to directly set the absolute phase as the network’s label. Since the ∼ absolute phase map .ϕ h is simply the linear combination of the period order map nh and the wrapped ∼ phase map ϕh according to Eq. (14.14), .ϕ h can be obtained immediately if nh is known. Once nh is set as the output data of the network, the purpose of the network is to implement semantic segmentation, which is a pixel-wise classification. It is easy to understand that the complexity of the network will be greatly reduced so that the loss of the network will converge faster and more stably. The prediction accuracy of the network is effectively improved. Different from the traditional spatial and temporal phase unwrapping, in which the phase unwrapping is performed by utilizing the phase information solely in the spatial or temporal domain, it should be noted that the DNN is able to learn feature extraction and data screening. Thus, it can exploit the phase information in the spatial and temporal domains simultaneously, providing more degrees of freedom and possibil-
14 Fringe Projection Profilometry
263
Fig. 14.13 Diagram of the deep learning-based temporal phase unwrapping. The whole framework is composed of a data process, a deep neural network (DNN), and phase-to-height mapping. The data process is performed to extract phases and remove the background from fringe images. The DNN, consisting of convolutional layers,
pooling layers, residual blocks, upsampling blocks, and concatenate layer, is used to predict the period order map ∼ nh from the input data (.ϕ l and ϕh ). (Adapted by permission from Springer Nature: Scientific Reports, Temporal phase unwrapping using deep learning, Yin, W., et al. © 2019)
ities to achieve significantly better unwrapping performance.
approach, whose flowchart is shown in Fig. 14.14, constructs two four-path CNNs (shown as CNN1 and CNN2) with the same structure (except for different inputs and outputs) to learn to obtain the high-quality phase information and unwrap the wrapped phase. As discussed in Sect. 14.2.3.2.1.3, Step 1 aims to achieve high-quality wrapped phase information retrieval. The physical model of the conventional phase-shifting algorithm is considered. The single-frame fringe images captured by camera 1 (i.e., .IkC1 ) and camera 2 (i.e., .IkC2 ) are separately input into CNN1, and the outputs are the numerators (.M C1 and .M C2 ) and denominators (.D C1 and .D C2 ) of the arctangent function corresponding to the two fringe patterns instead of directly linked wrapped phases, since such a strategy bypasses the difficulties associated with reproducing abrupt 2π phase wraps to provide a high-quality phase estimate. In Step 2, after predicting the numerator and denominator terms, high-accuracy wrapped phase maps .ϕ C1 and .ϕ C2 can be obtained according to Eq. (14.11).
Deep Learning-Assisted Geometric Constraints-Based Phase Unwrapping
The stereo phase unwrapping technologies based on geometric constraints can eliminate the phaseambiguity problem through the spatial relationships between multiple cameras and one projector without projecting any auxiliary patterns. Although spatial phase unwrapping maximizes the efficiency of FPP for 3D measurement in high-speed scenes, it still has some defects, such as limited measurement volume, inability to robustly achieve phase unwrapping of highfrequency fringe images, loss of measurement efficiency due to reliance on multi-frame phase acquisition methods, and complexity of algorithm implementation. Inspired by the successes of deep learning in FPP and the advance of geometric constraints, Qian et al. [121] combined CNNs and spatial phase unwrapping to develop a phase unwrapping method based on deeplearning-enabled geometric constraints. This
264
C. Jiang et al.
Fig. 14.14 Flowchart of the deep-learning-enabled geometric constraints and phase unwrapping method. (Adapted by permission from American Institute of
Physics: APL Photonics, Deep-learning-enabled geometric constraints and phase unwrapping for single-shot absolute 3D shape measurement, Qian, J., et al. © 2019)
Step 3 realizes the phase unwrapping. Enlightened by the geometry-constraint-based spatial phase unwrapping that can remove phase ambiguity through spatial relationships between multiple perspectives, the fringe patterns of two perspectives are fed into CNN2. Meanwhile, the idea of assisting phase unwrapping with the reference plane information is integrated into the network, and the data of a reference plane are added to the inputs to allow CNN2 to more efficiently acquire the fringe orders of the measured object. Thus, the raw fringe patterns captured by two cameras, as well as the reference information (containing two fringe images of the reference plane captured by two cameras, and the fringe order map of the reference plane in the perspective of camera 1) are fed into CNN2. It is worth mentioning that the reference plane information is obtained in advance and subsequent experiments do not need to obtain it repeatedly, which means that just one extra reference information is necessary for the whole setup. The output of CNN2 is the fringe order map .nC1 of the measured object in camera 1.
In Step 4, through the wrapped phases and the fringe orders obtained by the previous steps, ∼ C1
the high-quality unwrapped phase .ϕ can be recovered as Eq. (14.14). Finally, in Step 5, after acquiring the highaccuracy absolute phase, the 3D reconstruction can be carried out with the calibration parameters between the two cameras. With extensive data training, the network can learn to obtain the physically meaningful absolute phase from the singleframe projection without the conventional “stepby-step” calculation.
14.2.4 Error Analysis 14.2.4.1 Limited Dynamic Range Besides exhibiting diffuse reflection, real objects’ surfaces can have low-reflective (dark) areas and glare (specular reflection) [122]. FPP, as an image-based 3D shape measuring technique, needs a high dynamic range (HDR) to tolerate the reflectance difference. This requirement has always been considered a challenge. Multiple
14 Fringe Projection Profilometry
methods have been developed to solve this problem. They can be classified into two groups: equipment-based techniques and algorithm-based techniques. For the group of equipment-based techniques [123–126], the emphasis is to find the optimal parameters of the used equipment, e.g., the proper exposure time of the camera or the desired brightness of the projected light, to capture both the low- and high-reflective surfaces in the fringe images. They can be further divided into camera-based techniques, projectorbased techniques, and additional equipmentbased techniques. For camera-based techniques, the most common approach is to adjust the camera’s exposure time [127, 128]. A reduced exposure time is suitable for high-reflective areas, while long exposure is appropriate for low-reflective surfaces. The fringe images are captured several times with different exposures. Then, the captured images are fused into new structured images where pixels with the highest but unsaturated intensity are retained for different parts of the surfaces. This exposure-adjustment approach is easy to implement but sensitive to ambient light. The selection of the exposure time is also empirical, and many exposure times are required. An additional camera view perspective can be used to improve the camerabased approach [129]. Viewing the object from different angles, the two cameras are not likely to be saturated at the same time. In this way, the two views can be used jointly to avoid the saturation error. For projector-based techniques, one can also change the brightness of projected light to capture desired structured images [130]. By estimating the intensity response function of the camera and calibrating the reflectivity of the measured scenarios, sets of fringe patterns with different intensities are projected and the captured images will be combined to generate modulated structured images. Practically, to reduce the projection sets of fringe patterns, Lin et al. [131] have developed a method to use only one set of fringe patterns, whose intensity map is locally adjusted according to the predicted scenarios. Though improving the projection efficiency, this method is highly dependent on the prediction or the prior knowledge
265
of the measured objects’ surface, which will be subjected to geometrically complex objects in dynamic. In addition to adjusting the parameters of the camera or the projector, one can also resort to additional equipment to capture optimal fringes. One solution is to use cross-polarization detection [132]. Compared with diffuse reflection, the specular reflection from shiny surfaces is usually much more strongly polarized. Thus, the separation can be realized with the consideration of their difference in polarization. The projector is equipped with a linear polarizer at a fixed orientation. The camera captures two images. The first image is filtered by a linear polarizer whose transmission axis is parallel to that of the projector’s polarizer. The second image is filtered by a polarizer with a perpendicularly oriented transmission axis. The saturated intensity from specular reflection can be removed in cross-polarization detection. Moreover, Salahieh et al. [133] proposed using a polarization camera to capture the structured images. The projected fringes are linearly polarized before projecting on the object and being captured by a polarization camera with four polarization states. The system targets eliminating any saturation and improving the fringes’ quality by finding the best polarization channel for each pixel across all fringe images. One of the polarized channels is used to retrieve shiny surfaces by replacing the saturated pixel with the second intensive channel. Other channels enhance the fringe contrast for the other areas by identifying the maximum non-saturated channel for each pixel. Finally, Cai et al. [134] applied a structured light field camera for HDR measurements. The structured light field contains information about ray direction and phase-encoded depth through which the scene depth can be estimated from different directions. The depth information can be calculated based on the phase map from each ray in the structured light field independently, giving this method the ability of multidirectional depth reconstruction and selection for HDR 3D imaging. The algorithm-based techniques [135–137] rely on software to extract the depth information from the raw fringe images. Different from the
266
C. Jiang et al.
equipment-based methods mentioned above, the algorithm-based techniques calculate 3D reconstructions from saturated fringe images [138, 139]. They are especially useful when the adjustment of the camera or the projector is not allowed, or additional equipment is not available. Chen et al. [140] found that image saturation can be overcome if the step of phase shift is large enough. The projected fringe intensity is denoted by Ikp (uc , vc ). In data processing, the intensity difference between the captured pixel intensity and the projected fringe can be expressed by
.
ΔIk (uc , vc ) 0, if Ikp (uc , vc ) ≤ Imax = , Ikp (uc , vc ) − Imax , if Ikp (uc , vc ) > Imax (14.24)
where Imax is the maximum allowed intensity of a sensor, e.g., 255 for an 8-bit sensor. The phase error due to saturation can be described as .
Δϕs (uc , vc ) =
∂ϕ(uc ,vc ) ΔIk ∂Ib (uc ,vc )
14.2.4.2 Nonlinearity in Projectors and Cameras The nonlinear intensity response of FPP systems can cause the captured fringe patterns to be nonsinusoidal waveforms and leads to an additional phase measurement error [141]. The photometric calibration technique can calibrate the overall intensity response curve of the FPP system and eliminate the non-sinusoidal phase error effectively. However, the whole calibration procedure is time- and labor-consuming. Instead, an iterative phase error compensation algorithm has been developed and used to compensate for the nonlinear phase error [142]. In nonlinear phase error analysis, the actual non-sinusoidal waveforms are approximated as an ideal sinusoidal function with high-order harmonics. Specifically, Fourier series expansion of the actual intensity of a captured image with a nonlinear response to fifth-order harmonics (which are the commonly observed case) yields
(uc , vc ) . (14.25)
∂ϕ(uc ,vc ) According to Eq. (14.10), . ∂I can be writb (uc ,vc ) ten as K−1 2π k ∂ϕ(uc ,vc ) ∂ k=0 Ikp (uc ,vc ) sin( K ) K−1 • = 2π k ∂Ib (uc ,vc ) ∂Ikp (uc ,vc ) k=0 Ikp (uc ,vc ) cos( K ) . ∂Ikp (uc ,vc ) . ∂Ib (uc ,vc ) (14.26)
.
.
K−1 2 Δϕs (uc , vc ) = KIva (u k=0 sin [ϕ (uc , vc ) c ,vc ) 2π k + K ΔIk (uc , vc ) . (14.27)
Simulation work has been conducted to show the performance of this method [122, 140]. Figure 14.15 shows the phase error reduces sharply with the increase in phase shift. Phase shifting of seven steps or more is practically effective to produce a good reconstruction result.
(14.28)
where .Iknl (uc , vc ) represents the actual intensity of the captured image. a1 , a2 , . . . , a5 are constants, and a2 ,a3 , a4 , a5 ⪡ a1 . Then, the actual phase can be calculated by
ϕa (uc , vc ) .
By combining Eqs. (14.9), (14.25), and (14.26), the phase error can be expressed as
5 ∼ Iknl (u c , vc ) = a0 + 2πl=1 al cos l ϕ (uc , vc ) + Kk ,
= tan−1
! K−1 5 2π k 2π k k=0 a0 + l=1 al cos l ϕ(uc ,vc )+ K sin K . K−1 5 2π k k cos 2π K k=0 a0 + l=1 al cos l ϕ(uc ,vc )+ K
(14.29) For a three-step phase-shifting algorithm, displayed in Fig. 14.16a, b, the residue phase error 3 3 .Δϕnl (uc , vc ) = ϕa (uc , vc ) − ϕ (uc , vc ) can be derived as
3 (u , v ) Δϕnl c c "
# − a2 −a4 sin[3ϕ(uc ,vc )]−a5 sin[6ϕ(uc ,vc )]
. = tan−1 a1 + a2 +a4 cos[3ϕ(uc ,vc )]−a5 cos[6ϕ(uc ,vc )]
∼ = −c sin [3ϕ (uc , vc )] .
(14.30)
14 Fringe Projection Profilometry
267
Fig. 14.15 Phase error correction by the N-step phase-shifting algorithm in simulation. (Adapted by permission from Institute of Physics: Measurement Science and Technology, High dynamic range 3D measurements with fringe projection profilometry: a review, Feng, S., et al. © 2018)
the object surface, the ideal sinusoidal intensity profile across the phase-shifted fringe patterns is violated, which introduces the motion-induced error [143, 144] (Fig. 14.17). Point C1 in the camera plane corresponds to a point D1 on the object’s surface. This relationship holds when the object remains static, but if the object moves to a different location (the dashed curve) in the 3D space while the projected pattern remains the same, then the same camera point will correspond to a different point .O1' on the object’s surface. to the projection model, the two points According i+1 .ϕ (uc , vc ) = ϕa3 (uc , vc ) + c sin 3ϕ i (uc , vc ) , (D1 , .O ' , will correspond to different lines on 1 (14.31) the projector plane (P1 , .P1' ). This mismatch will lead to errors in the wrapped phase map, which where ϕi (uc , vc ) and ϕi + 1 (uc , vc ) are the solutions is the motion-induced error. of the ith and the (i + 1)th round of iterations. A number of methods have been developed The initialization of the iteration can be set to address the problem of motion-induced errors to ϕ0 (uc , vc )=.ϕa3 (uc , vc ). The convergence condiin PSFPP systems. The most straightforward aption is set as |ϕi + 1 (uc , vc ) − ϕi (uc , vc )| < 0.001 rad. proach is to increase the projecting and capturing This iterative algorithm can also be applied to the speeds of the hardware through bit-wise binary four- and five-step phase-shifting algorithms for technologies [86, 145]. However, this approach phase error reduction. As indicated in Fig. 14.16b, leads to an increase in hardware costs. Another c, the phase error after compensation is about representative approach is to apply markers on eight times smaller, with the maximum phase the objects or to use automated object-tracking error decreasing from 0.08 to 0.01 rad. Figure algorithms to follow the position of the object 14.16e shows an experimental demonstration in the fringe projection period [146–148]. The of the phase error compensation algorithm by errors caused by the motion in the phase map imaging a complex plaster model. can be corrected by identifying the trajectory of the object. However, most of these methods were 14.2.4.3 Motion-Induced Error limited to tracking the 2D motion of the obDuring the imaging of a dynamic scene using ject. Furthermore, by combining FTP and PSFPP PSFPP, the object moves within the fringe [149], the errors induced by the object’s motion projection period. Thus, the position of the can be alleviated by FTP, and it can be used to object among different fringe patterns will be update the phase map obtained from PSFPP when mismatched. Consequently, for the same point on objects move at high speed. However, the quality of such a phase map is affected by the inherent Here c denotes the amplitude of the phase error, which can be estimated from the phase error distribution of the reference plane. Similarly, for four-step and five-step phase-shifting algorithms, the corresponding phase error is derived to be .Δϕnl4 (uc , vc ) ∼ = −c sin [4ϕ (uc , vc )] and 5 .Δϕnl (uc , vc ) ∼ = −c sin [5ϕ (uc , vc )]. Based on the analysis of the phase error, an iterative phase error compensation algorithm for the three-step phase-shifting method is proposed to be
268
(b) 0.1 Phase error (rad)
(a)
C. Jiang et al.
0.05 0
-0.05 -0.1 300 320 340 360 380 400 420 Pixel position (pixel)
(d) 0.1 Phase error (rad)
(c)
0.05 0
-0.05 -0.1 300 320 340 360 380 400 420 Pixel position (pixel)
(e)
Fig. 14.16 Nonlinear phase error compensation. (a) Phase error owing to non-sinusoidal waveforms. (b) Phase error profile with selected line section. The red curve shows the average phase error. (c) Compensated phase error. (d) As (b), but showing the phase error after com-
pensation. (e) Phase distribution of a sculpture before (left) and after (right) compensation. (Adapted by permission from Optica: Optics Letters, Phase error analysis and compensation for non-sinusoidal waveforms in phase-shifting digital fringe projection profilometry, Pan, B., et al. © 2009)
limitations of FTP. Finally, Weise et al. [150] also developed motion error reduction methods by predicting the motion of the object in the fringe projection period to perform a pixel-wise compensation in the phase map.
14.2.4.4 System Distortion Manufacturing imperfections in the lenses used in the optical imaging systems cause various image distortions in the projected images and the captured images, giving rise to errors in the 3D shape
14 Fringe Projection Profilometry
Fig. 14.17 Schematic of motion-induced error in phase shifting profilometry. (Adapted by permission from Elsevier: Optics and Lasers in Engineering, Motion induced
measurement of FPP [151]. The lens distortion from the detection side can be eliminated through various powerful camera calibration techniques [152]. However, the lens distortion from the pattern projection side is relatively hard to handle since the projector projects images rather than captures them [153]. The lens distortion problem will cause larger errors when testing large-size objects and becomes one of the major sources which hinder the measurement accuracy of FPP [54]. To understand the issue brought by system distortion, a distortion model is introduced as follows. The normalized camera coordinate without distortion is defined as uˆ − u /f pp cu uˆ cn = c . (14.32) , vˆcn vˆc − vpp /fcv
269
error reduction methods for phase shifting profilometry: A review, Lu, L., et al. © 2021)
coordinate (uc , vc ) is found by using Eq. (14.32). The distortion coefficients can be calculated by using the calibration toolbox described in Sect. 14.2.1.3 [55]. Overall,
thisoperation is expressed as .(uc , vc ) = Distc uˆ c , vˆc . Similarly, the distor tion for
the projector is modeled by . up , vp = Distp uˆ p , vˆp . To solve the system distortion issue, Jiang et al. [54] have used polynomial functions to establish the relationship between the absolute phase distribution and the real-world 3D coordinates. Distortion compensation contains two major steps.
First, the undistorted camera pixel coordinate . uˆ c , vˆc is extracted by performing the inverse operation of the distortion model, i.e., Eq. (14.33). However, the direct inversion of Eq. (14.33) takes the form of a 7th-degree polynomial in both ucn and vcn , rendering direct recovery difficult. Thus, a fixed point iteration method
where . uˆ c , vˆc is the undistorted pixel coordinate. for computing the undistorted normalized camera
Then, the distorted normalized camera coordinate coordinate . uˆ cn , vˆcn . With the initial condition u −u /f (ucn , vcn ) is determined by cn,0 = c pp cu , the ith iteration is of . uvˆˆcn,0
vc −vpp /fcv ucn = 1 + dc1 rcn 2 + dc2 rcn 4 + dc3 rcn 6 uvˆˆcncn + described by vcn
. 2dc4 uˆ cn vˆcn + dc5 rcn 2 + 2uˆ 2cn , uˆ cn,i+1 2 2dc5 uˆ cn vˆcn + dc4 rcn 2 + 2vˆcn vˆcn,i+1 ⎡
2 ⎤ (14.33) 2d4 uˆ cn,i vˆcn,i +d5 rˆcn,i +2uˆ 2cn,i ⎦ . uˆ cn,i ⎣ − vˆcn,i 2 2 +2vˆcn,i 2d5 uˆ cn,i vˆcn,i +d4 rˆcn,i , = 2 4 6 where dc1 , . . . , dc5 are the camera system dis1+d1 rcn,i +d2 rcn,i +d3 rcn,i 2 2 2 coefficients, and .rcn = uˆ cn + vˆcn . Once (14.34)
tortion d d . ucn , vcn is computed, the distorted camera pixel
270 2 2 where .rˆcn,i = uˆ 2cn,i + vˆcn,i . The corresponding undistorted pixel coordinate . uˆ c , vˆc on the camera plane is then calculated by using Eq.
(14.32). Overall, this operation is expressed as . uˆ c , vˆc = DistCompc (uc , vc ). The next step is to obtain the corresponding undistorted projector horizontal pixel coordinate .u ˆ p . Ideally, .uˆ p could be computed by . u , v ˆ ˆ = p p DistCompp up , vp . However, vp cannot be calculated from the projection of vertical fringe patterns. To overcome this limitation, an iterative method has been developed to recover distortioncompensated 3D information without prior knowledge of vp from fringe measurements. First, the coordinate of the 3D point is estimated by Eq. (14.3) by using the coordinate (.uˆ c , vˆc , up ). This 3D point is then projected to the projector plane to extract the initial estimate of vp with the function of Distp , which is input to an iterative algorithm. Each iteration starts with performing the function DistCompp by using up and the variable vp to compute the distortion-compensated projector coordinate .uˆ p , which is then used to extract the 3D coordinate by using Eq. (14.3). The method is presented as the following pseudo-code:
14.2.4.5 Speckle In FPP systems, high measurement accuracy relies on high-quality fringes, which depend strongly on the light sources used for fringe generation. Both coherent and incoherent light sources have been used for generating fringes. Laser sources offer advantages over incoherent sources, such as higher illumination efficiency, larger depth of focus, and higher fringe contrast [154]. However, due to the high coherence of the laser sources, the speckle becomes an additional noise source. Because speckles are inevitable when the roughness of the measured surface is larger than the wavelength of the light source, measurements of most engineering surfaces, such as unpolished metallic surfaces, rough paper surfaces, and diffusive plastic surfaces suffer from speckle-induced measurement noise. Liu et al. [155] have modeled the speckleinduced phase-measurement errors as additive white Gaussian noise. Numerical simulations show that phase errors depend on the average
C. Jiang et al. Algorithm to recover 3D information with distortion compensation
Input:. uˆ c , vˆc , up , maximum iterations N, vertical projector pixel error tolerance TOL,calibration parameters of camera: Ac , Rc , Tc , sc , dc1 , dc2 , dc3 , dc4 , dc5 ;calibration parameters of projector: Ap , Rp , Tp , sp , dp1 , dp2 , dp3 , dp4 , dp5 Output: (x, y, z) q Variables:q, h1 , h2 , h3 , h4 ,.vp ,.uˆ p 1 Set q = 1
2 compute (x, y, z)= Tri. uˆ c , vˆc , up 3 compute (h1 , h2 )=Projp (x, y, z)
q 4 compute. ∼, vp =Distp (h1 , h2 ) 5 while(q ≤ N) do steps 6–11
q 6 compute. uˆ p , ∼ =DistCompp. up , vp
7 compute (x, y, z)= Tri. uˆ c , vˆc , uˆ p 8 compute(h3 , h4 )=Proj p (x, y, z) q+1
= Distp (h3 , h4 ) compute. ∼, vp q+1 q if.vp − vp ≤ T OL, go to step 13
9 10 11 12 13 14
Set q = q + 1 End Output (x, y, z) End
speckle size as well as on the grating pitch in the fringe patterns. For practical systems, speckles cannot be well resolved by individual camera pixels. It is proved that larger system apertures, shorter wavelengths, smaller magnifications, and speckle-suppression measures are beneficial to noise reduction. For high-accuracy measurements, post-processing algorithms are important tools for further improvement of accuracy.
14.3
Applications
14.3.1 3D Imaging for Reverse Engineering Reverse engineering is a process to deduce design features from products with little or no additional knowledge about the procedures involved in their original production. The extraction of 3D information on the targeted products becomes an important topic to enable the step of reverse engineering, i.e., importing the shape data into a CAD
14 Fringe Projection Profilometry
program to yield a full 3D digital model. The high measurement accuracy merits the FPP method as a good candidate for reverse engineering for providing 3D surface information. Burke et al. [156] applied the FPP system to reverse engineering a plastic model of an SR71B reconnaissance aircraft (Fig. 14.18a). As displayed in Fig. 14.18b, the 3D imaging of this object was carried out in a measurement volume of 1 × 1 × 0.5 m3 using a digital fringe projector with a temporal hierarchical phase unwrapping. The 3D information of the airplane model was reconstructed with several positions and view perspectives. After the successful merging of 12 subviews and conversion of the 3D data into a CAD file (Fig. 14.18c), the measured size of the model was in excellent agreement with the calculated values.
14.3.2 3D Imaging for Forensics Improved evidence recognition, collection, and visualization tools become desperately needed to meet the operational requirements of forensic science disciplines [9]. Liao et al. [157] have developed a portable FPP 3D imaging system (Fig. 14.19a) that allows the users to easily and quickly capture high-resolution shoe and tire impressions. The system’s hardware is designed to be portable and can be packed into a case box with a dimension of 371.3 × 258.6 × 152.4 mm3 . By projecting and capturing fringe images at 60 Hz, the 3D information is reconstructed over a field of view of up to 356 × 222 mm2 at 2.4 Hz with a depth resolution of up to 0.027 mm. In addition, an auto-exposure time control is adopted to enable HDR 3D imaging. Many advantages— including the high 3D imaging accuracy, the enhanced dynamic range, the compact package, and the non-destructive 3D imaging—make this system highly suitable for complex forensics. With a graphical user interface (GUI), users can operate the prototype for crime scene documentation (Fig. 14.19b). Fully automated HDR algorithms are also implemented to acquire a high-contrast scene shown in Fig. 14.19c. In this example, the scene includes two shoes, whose level of
271
brightness varies drastically. As displayed in Fig. 14.19d, a combined structured image is produced by adjusting the exposure time. Figure 14.19e represents the corresponding 3D results, which show the ability of this prototype to capture critical evidence that is difficult for conventional casting.
14.3.3 3D Imaging for Robotics The development level of the robot industry has become one of the most important signs to evaluate a country’s level of science and technology innovation and high-end manufacturing. Robotics is also an important strategic field to take the preemptive opportunities in the development of an intelligent society. Real-time dynamic 3D vision helps robots function promptly in diverse applications [158–160]. For a mobile robot, an accurate 3D environment inspection is always desired for efficient and precise navigation. FPP methods have been successfully implemented in diverse applications using robots, such as urban search and rescue. As shown in Fig. 14.20a, b, Zhang et al. [161] developed a PSFPP system to reconstruct high-resolution 3D images of an unknown/unstructured environment in real time (30 Hz), which allowed the robot to sense the landscape for obstacle avoidance. For autonomous manipulated robots, 3D visionassisted systems are used to control the endeffector of the robots to move to the desired position with a specific orientation. As displayed in Fig. 14.20c, d, a robotic drilling system, composed of a robot manipulator, a drilling device, and an FPP system, was used to accurately drill on a free-form surface [162].
14.3.4 3D Imaging for Analyzing Fluid-Flag Interaction The mechanisms and patterns of behavior of the flag flapping problem continue to generate significant scientific interest as a fundamental problem of fluid-structure interaction, including animal ethology, paper engineering, and hydroelectric
272
C. Jiang et al.
Fig. 14.18 3D vision assisted reverse engineering of a plastic model of an SR-71B reconnaissance aircraft. (a) Various positions of the SR-71B airplane model for measurement of the surface. (b) 3D coordinates from FPP measurement. From left: perspectives parallel to the x-, y-
, and z-axes and the 3D representation of the dataset. (c) Two rendered views of the final CAD file. (Adapted by permission from SPIE: Interferometry XI: Applications, Reverse engineering by fringe projection, Burke, J., et al. © 2002)
power generation [13]. Due to the flag’s complex geometries and freely moving boundaries, studying the gravity-induced impact in its interaction with fluid remains a challenge [163]. Thus far, although many simulation works have been carried out to investigate the relation between gravity and the flag’s stability with different flow air conditions, experimental visualization of real-time flag movement has not kept up. A limited number of previous experiments were conducted using low flow speeds to avoid the complication induced by the flag dragging its support [164]. In addition, rigid flags were used in these experiments, which,
however, restricted the analysis of fluid dynamics in 2D [165]. CoaXPress-interfaced (CI) BLIP [12] is implemented in this study. As described in Sect. 14.2.2, BLIP enables a grayscale sinusoidal fringe projection at 5 kHz. Meanwhile, CI highspeed image acquisition and GPU-based image reconstruction in BLIP overcome the limitation of data transmission of existing image acquisition devices. CI-BLIP has enabled, for the first time, real-time 3D position tracking of a single customer-selectable point at 1 kHz. CI-BLIP has empowered the 3D analysis of the wave propagation, gravity-induced phase
14 Fringe Projection Profilometry
273
Fig. 14.19 Portable 3D imaging prototype for forensics. (a) Photo of the system. (b) Graphical user interface. (c) Photo of the scene taken by an ordinary camera. (d) Representative fringe pattern using the HDR technique. (e)
3D image by automatically optimizing the HDR exposure times. (Adapted by permission from Elsevier: Optics and Lasers in Engineering, Status, challenges, and future perspectives of fringe projection profilometry, Xu J., et al. © 2020)
mismatch, and asymmetric flapping motion of a non-rigid flag at 1 kHz [12]. In the experiment, an air blower generated strong wind interacting with the flag mounted on a pole. Figure 14.21a depicts four representative 3D images of the instantaneous poses of the flapping flag from two perspective views. The time histories of the streamwise (the x axis), spanwise (the y axis), and transverse (the z axis) displacements of three selected points are analyzed, as shown in Fig. 14.21b. The displacements show a less intense flapping motion in the part closer to the pole. Moreover, the streamwise and transverse displacements show an apparent phase difference, which is probably attributed to the gravityinduced sagging effect. Furthermore, the phase relation between the spanwise and transverse displacements of pC (Fig. 14.21c) shows an elliptical shape in both experimental and fitted results, demonstrating that the flapping motion is dominated by single-frequency sinusoidal waves.
Finally, the depth curves of the flag’s centerline in all reconstructed images (Fig. 14.21d) show asymmetric flapping motion toward the −z direction, which indicates the uneven forces to the flag surface and a relatively high degree of turbulent flow.
14.3.5 3D Imaging for Industrial Quality Characterization Visual inspection is an important process in an industry to recognize defective parts, assure quality conformity of a product and fulfill customer demands. In assembly and manufacturing activities, product and process inspections are usually performed by human inspectors, but due to various practical limitations (e.g., human fatigues, limited vision to resolve details on small parts, hazardous inspection conditions, and process complexity), human inspection has the undesired quality and
274
C. Jiang et al.
Fig. 14.20 3D vision-assisted robotic systems. (a, b) Sensory system on a mobile robot (a) and its experimental setup (b). (Adapted by permission from Springer: Intelligent Service Robotics, A novel 3D sensory system for robot-assisted mapping of cluttered urban search and rescue environments, Zhang, Z., et al. © 2011). (c, d)
Schematic of a robotic drilling system (c) and its experimental setup (d). (Adapted by permission from IEEE: IEEE/ASME Transactions on Mechatronics, Normal direction measurement and optimization with a dense threedimensional point cloud in robotic drilling, Rao G., et al. © 2017)
has difficulties to pinpoint some types of product non-conformities [3]. TIA-BLIP is applied to the study of the dynamic characterization of glass in its interaction with external forces [78]. As shown in Fig. 14.22a, a glass cup was fixed on a table. A function generator drove a speaker to produce single-frequency sound signals (from 450 Hz to 550 Hz with a step of 10 Hz) through a sound channel placed close to the cup’s wall. Figure 14.22b shows four representative 3D images of the instantaneous shapes of the glass cup driven by the 500-Hz sound signal, showing the dynamic structural deformation of the glass cup. The evolution of depth changes was analyzed using five selected points (marked by PA to PE in the first panel of Fig. 14.22b). As shown in Fig. 14.22c, the depth changes of five points are in
accordance, which is attributed to the rigidness of the glass. The time histories of averaged depth displacements under different sound frequencies were further analyzed. Figure 14.22d shows the results at the driving frequencies of 490 Hz, 500 Hz, and 510 Hz. Each result was fitted by a sinusoidal function with a frequency of 490.0 Hz, 499.4 Hz, and 508.6 Hz, respectively. These results show that the rigid glass cup vibrated in compliance with the driving frequency. Moreover, the amplitudes of fitted results, Δzfit , were used to determine the relationship between the depth displacement and the sound frequency (Fig. 14.22e). The result was fitted by the Lorentz function, which determined the resonant frequency of this glass cup to 499.0 Hz.
14 Fringe Projection Profilometry
z (mm)
0
5 ms
-80 -40
-40
0 40 20 0
30
0
20
0 40 20
-30
0
30
0
-80 -40
-30
0
0
15 ms
0 40 20
20 ms -80
-100 0 1 Normalized intensity
-40 0
0
30
0
-5
29
-11
20
0
-80
-30
40
40
10 ms
x (mm) 0
30
z (mm)
0
20
-30 y (mm)
20
40
x (mm) 0
30
-30 y (mm)
y (mm)
40
x (mm) 0
30
-30
z (mm)
x (mm) 0
z (mm)
30
z (mm)
y (mm)
(a)
275
40 20
-30
-10
(b)
0
30
-30
0
z
(c) (d)
100
7 8
40 13
3 0
5
20 mm
z (mm)
z (mm)
y (mm)
-40 -30
24 47
-65 -10
0 Experimental Fitted
Time (ms)
100
Time (ms)
x (mm)
10 mm -15 11
0
Time (ms)
-80 100 0
Time (ms)
-80 5 100
y (mm)
13
y
Fig. 14.21 CI-BLIP for 3D visualization of a fluid-flag interaction. (a) Two perspective views of reconstructed 3D images of a flapping flag at different times. (b) Evolution of 3D positions of the three selected points marked in the first panel in (a). (c) Phase relation of the y-z positions of pC and the fitted result. (d) Superimposed centerlines of
the flag over time. Note: The depth of the pole is defined as zero on the z axis. (Adapted by permission from Optica: Optics Letters, Real-time high-speed three-dimensional surface imaging using band-limited illumination profilometry with a CoaXPress interface, Jiang, C., et al. © 2020)
14.3.6 3D Imaging for Human Body Movements
detailed surface structures of the volunteer’s lab coat and hands are illustrated by the closeup views. The edge of an optical table and optomechanical components can also be seen as a static background. As displayed in Fig. 14.23c, researchers tracked the 3D positions of two selected points over time: the ring finger’s tip of the volunteer’s left hand and the edge point of a marker on the volunteer’s coat.
3D geometric models of real-world objects, especially humans, are essential for many applications, such as postproduction of animated film design, virtual prototyping, quality assurance, gaming, and special effects [166]. A multi-scale (MS) BLIP system is used to capture and record 3D data for human body movement, which makes it possible for video game designers to scan and import entire characters into computer-generated imagery (CGI) platforms [54]. MS-BLIP can capture the human body in its entirety or specific parts, generating highly detailed 3D point clouds within a few seconds. A photo of the scene is shown in Fig. 14.23a. The volunteer had a height of 1.6 m, wearing proper protective clothing and a laser goggle. Figure 14.23b shows five time-lapse 3D images of the instantaneous poses of the volunteer. The
14.3.7 3D Imaging for Biomedicine In biomedicine, 3D representation techniques make it possible to study complex structures such as bones, acetabular fractures, and craniofacial abnormalities [167]. For remote surgeries or minimally invasive surgeries, where the physicians cannot directly view the surgical sites, 3D profilometry obviates the need for the physician to mentally map the surgical volume
276
C. Jiang et al.
Fig. 14.22 TIA-BLIP of sound-induced vibration on glass. (a) Schematic of the experimental setup. The field of view is marked by the red dashed box. (b) Four reconstructed 3D images of the cup driven by a 500-Hz sound signal. (c) Evolution of the depth change of five points marked in the first panel of (b) with the fitted result. (d) Evolution of the averaged depth change with the fitted results under the driving frequencies of 490 Hz, 500 Hz,
and 510 Hz. Error bar: standard deviation of Δz calculated from the five selected pixels. (e) Response of the depth displacements to sound frequencies. The cyan curve is a fitted result of a Lorentz function. (Adapted by permission from Optica: Photonics Research, High-speed dual-view band-limited illumination profilometry using temporally interlaced acquisition, Jiang, C., et al. © 2020)
from multiple video streams on a flat-screen display [168]. Thus, the usage of the FPP systems, with the advantages of contact-free detection and a large 3D measurement range of biological tissues, has drawn increasing attention. Blair et al. [169] have integrated the FPP method into fluorescence imaging for multichannel near-infrared fluorescence imageguided surgery. As shown in Fig. 14.24, in nude athymic mice bearing human prostate tumors, the bio-inspired image sensor enables the simultaneous detection of two tumor-targeted fluorophores, distinguishing diseased from healthy tissue in an estimated 92% of cases. It also permitted extraction of near-infrared structured illumination enabling the mapping of the 3D topography of tumors and surgical sites to within 1.2-mm error.
cro Fourier transform profilometry (μFTP) [92] enables highly-accurate dense 3D shape reconstruction at 10 kHz without posing restrictions on surface texture, scene complexity, and object motion. In contrast to conventional FTP, which uses a single sinusoidal pattern with a fixed spatial frequency, μFTP introduces small temporal variations in the frequency of multiple spatial sinusoids to eliminate phase ambiguity by an additional white flat illumination. This approach has several advantages over conventional high-speed 3D imaging methods. It can retrieve not only the high-resolution, unambiguous depth information from two images but also the high-quality 2D textures provided simultaneously with the 3D geometry. These advantages allow for filling in the speed gap between high-speed 2D photography and fast 3D sensing, pushing the speed limits of unambiguous, motion-artifact-free 3D imaging to the range of tens of kHz. Zuo et al. [92] have applied μFTP to imaging a one-time transient event: a bullet fired from a toy gun and then rebounded from a plaster wall (Fig. 14.25). Figures 14.25a, b show representative camera captured images with white flat
14.3.8 3D Imaging for High-Speed Scenes In many applications, it is desirable to make 3D measurements at a speed higher than 1 kHz. Mi-
14 Fringe Projection Profilometry
277
Fig. 14.23 MS-BLIP of full human-body 3D movements. (a) Photo of the volunteer. (b) Five time-lapse reconstructed 3D images. Close-up views are shown in the dashed boxes. (c) Evolution of the 3D positions of the selected points [marked in the middle close-up view
of (b)]. (Adapted by permission from Optica: Optics Express, Multi-scale band-limited illumination profilometry for robust three-dimensional surface imaging at video rate, Jiang, C., et al. © 2022)
Fig. 14.24 3D imaging for tumor detection in nude mice. (a) Two color images of the same tumor-bearing mouse showing the estimated 3D profile with the tumors highlighted and the dominant targeted probes indicated. (b) Color image of the tumor-bearing mouse showing the tumors highlighted along with the dominant targeted
probes as indicated by normalized fluorescence. (c) 3D profile of the tumor-bearing mouse indicating the outof-plane height as extracted from structured illumination. (Adapted by permission from AAAS: Science Translational Medicine, Hexachromatic bioinspired camera for image-guided cancer surgery, Blair, S., et al. © 2021)
illumination and corresponding color-coded 3D reconstructions at different time points. Figure
14.25c shows the 3D reconstruction of the muzzle region (corresponding to the boxed region in Fig.
278
C. Jiang et al.
Fig. 14.25 3D measurement and tracking of a bullet fired from a toy gun. (a) Representative camera images at different time points. (b) Corresponding color-coded 3D reconstructions. (c) 3D reconstruction of the muzzle region [corresponding to the boxed region shown in (b)] as well as the bullet at three different points of time over the course of flight (i.e., 7.5 ms, 12.6 ms, and 17.7 ms). The insets show the horizontal (x–z) and vertical (y-z) profiles
crossing the body center of the flying bullet at 17.7 ms. (d) 3D point cloud of the scene at the last moment (135 ms), with the colored line showing the 130 ms long bullet trajectory. The inset plots the bullet velocity as a function of time. (Adapted by permission from Elsevier: Optics and Lasers in Engineering, Micro Fourier transform profilometry (μFTP): 3D shape measurement at 10,000 frames per second, Zuo, C., et al. © 2018)
14.25b) as well as the bullet at three different time points (i.e., 7.5 ms, 12.6 ms, and 17.7 ms). Figure 14.25d further shows the 3D point cloud of the scene at the last moment (i.e., 135 ms), with the colored line showing the 130 ms long bullet trajectory (the bullet velocity is encoded by the color).
field of view, enables the application of 3D profiling of micro-size samples. The microscopic field of view enables 3D measurement accuracy to be increased from tens of microns to microns or even sub-micron magnitudes. However, the 3D reconstruction method for microscopic samples still has some challenges in balancing the measurement accuracy and measurable volume as well as in measuring shiny surfaces. For the first challenge, Yin et al. [170] proposed using telecentric lenses with the Scheimpflug principle to improve the spatial coincidence of the object field of view between multiple views. The orientation and position of the sensor chip are modified so that the sensor plane, the principal imaging plane of the lens, and the object plane make a Scheimpflug intersection. In this way, the focus area overlaps the object, and thus the focus areas of projection and imaging overlap, making full use of the limited depth of field. This method improves the field of view and depth of field. For the second challenge, as displayed in Fig. 14.26b–d, various methods are proposed to suppress the measurement error caused by HDR imaging, such as using adaptive
14.3.9 3D Imaging for Micro-Size Samples With the continuous development of industrial technology, the use of high-precision multifunctional parts is becoming increasingly extensive in industrial processing and intelligent assembly. Industrialization is pushing the development of function parts toward miniaturization and precision, which correspondingly drives the quality-inspection methods toward high precision and efficiency. As shown in Fig. 14.26a, the development of micro fringe projection profilometry (MFPP) [36], which combines the optical structure of microscopes and fringe projection to project fringe patterns into a small
14 Fringe Projection Profilometry
279
Fig. 14.26 3D imaging of micro-size samples. (a) Absolute 3D profile measurement based on a Greenoughtype stereomicroscope for a thin fine-pitch ball grid array. (Adapted by permission from IOP Publishing: Measurement Science and Technology, Absolute three-dimensional micro surface profile measurement based on a Greenoughtype stereomicroscope, Hu, Y., et al. © 2017). (b–d) Mi-
croscopic 3D measurement of shiny surfaces based on a multi-frequency phase shift scheme for scratch detection, flatness detection, and deformation detection. (Adapted by permission by Elsevier: Optics and Lasers in Engineering, Microscopic 3D measurement of shiny surfaces based on a multi-frequency phase-shifting scheme, Hu, Y., et al. © 2019)
exposure and multi-frequency phase shifting algorithms, based on which the 3D imaging for micro-size samples with shiny surfaces can be reconstructed entirely and correctly [122].
nipulation. Second, for mobile device-embedded face recognition and virtual reality applications, efforts on miniaturizing FPP systems are highly needed to expand their application scope. Third, at the current stage, real-time FPP methods cannot obtain accurate and complete geometry information for objects with challenging optical properties (e.g., dark, translucent, and specularly reflected surfaces). The automation of online feedback and adaptive adjustment, such as proactive viewpoint adjustment, field-of-view adaptation, as well as self-recalibration to accommodate various targeted scenes is expected to be extensively researched in the future. The field of 3D imaging technology keeps a fast pace of development. FPP-based 3D imaging technologies will certainly draw more attraction from both theoretical study and practical applications. For the methodology development, the modifications of fringe projection as well as the reconstruction algorithms are always the foci of pushing the limit of FPP systems for better resolving the targeted 3D objects in complex scenarios. For example, more robust HDR FPP techniques will enable accurate and fast measurement of complex objects. In addition, the data-driven deep learning methods will be advantageous for many problems in fringe analysis especially while
14.4
Summary
This chapter reviews the basic principle of FPP techniques. With detailed descriptions from the perspectives of fringe generation, fringe analysis, and error analysis, we present the mainstream operating principles used in FPP. We also summarize the presentative applications. Despite drastic advancements in FPP methods in the 3D shape measurement field including tremendous commercial successes, challenges remain to make advanced 3D surface measurement techniques accessible and available to solve challenging problems in science, engineering, industry, and our daily lives. First, high-resolution and high-quality 3D measurement produces a huge amount of data for storage and display. Existing 3D rendering formats, including OBJ, PLY, and STL, have large file sizes (typically one order of magnitude larger than their 2D counterparts) to represent 3D geometry and texture data, which poses challenges to the host server for data ma-
280
the acquired information is limited (e.g., enhancing the accuracy of phase extraction from a single fringe pattern). Finally, FPP will open new avenues in diverse applications, including minimally invasive surgeries, unpredicted or nonrepeatable flow dynamics, and 3D objects at nonvisible spectra.
References 1. J. Geng, “Structured-light 3D surface imaging: a tutorial,” Advances in Optics and Photonics 3, 128– 160 (2011). 2. X. Su and Q. Zhang, “Dynamic 3-D shape measurement method: a review,” Optics and Lasers in Engineering 48, 191–204 (2010). 3. J. Molleda, R. Usamentiaga, D. F. García, F. G. Bulnes, A. Espina, B. Dieye, and L. N. Smith, “An improved 3D imaging system for dimensional quality inspection of rolled products in the metal industry,” Computers in Industry 64, 1186–1200 (2013). 4. G. Sansoni, S. Corini, S. Lazzari, R. Rodella, and F. Docchio, “Three-dimensional imaging based on Gray-code light projection: characterization of the measuring algorithm and development of a measuring system for industrial applications,” Applied Optics 36, 4463–4472 (1997). 5. G. Sansoni, M. Trebeschi, and F. Docchio, “Stateof-the-art and applications of 3D imaging sensors in industry, cultural heritage, medicine, and criminal investigation,” Sensors 9, 568–601 (2009). 6. G. Sansoni and F. Docchio, “Three-dimensional optical measurements and reverse engineering for automotive applications,” Robotics and ComputerIntegrated Manufacturing 20, 359–367 (2004). 7. L. Li, N. Schemenauer, X. Peng, Y. Zeng, and P. Gu, “A reverse engineering system for rapid manufacturing of complex objects,” Robotics and ComputerIntegrated Manufacturing 18, 53–67 (2002). 8. S. Motavalli, “Review of reverse engineering approaches,” Flexible Automation and Integrated Manufacturing 1999 (1999). 9. U. Buck, S. Naether, B. Räss, C. Jackowski, and M. J. Thali, “Accident or homicide–virtual crime scene reconstruction using 3D methods,” Forensic Science International 225, 75–84 (2013). 10. D. A. Komar, S. Davy-Jow, and S. J. Decker, “The use of a 3-D laser scanner to document ephemeral evidence at crime scenes and postmortem examinations,” Journal of Forensic Sciences 57, 188–191 (2012). 11. D. Raneri, “Enhancing forensic investigation through the use of modern three-dimensional (3D) imaging technologies for crime scene reconstruction,” Australian Journal of Forensic Sciences 50, 697–707 (2018).
C. Jiang et al. 12. C. Jiang, P. Kilcullen, X. Liu, J. Gribben, A. Boate, T. Ozaki, and J. Liang, “Real-time highspeed three-dimensional surface imaging using band-limited illumination profilometry with a CoaXPress interface,” Optics Letters 45, 964–967 (2020). 13. S. Banerjee, B. S. Connell, and D. K. Yue, “Three-dimensional effects on flag flapping dynamics,” Journal of Fluid Mechanics 783, 103–136 (2015). 14. M. J. Shelley and J. Zhang, “Flapping and bending bodies interacting with fluid flows,” Annual Review of Fluid Mechanics 43, 449–465 (2011). 15. B. Albouy, Y. Lucas, and S. Treuillet, “3D modeling from uncalibrated color images for a complete wound assessment tool,” in 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, (IEEE, 2007), 3323– 3326. 16. C. Ozturk, S. Dubin, M. E. Schafer, W.-Y. Shi, and M.-C. Chou, “A new structured light method for 3D wound measurement,” in Proceedings of the IEEE 22nd Annual Northeast Bioengineering Conference, (IEEE, 1996), 70–71. 17. E. Sirazitdinova and T. M. Deserno, “System design for 3D wound imaging using low-cost mobile devices,” in Medical Imaging 2017: Imaging Informatics for Healthcare, Research, and Applications, (SPIE, 2017), 258–264. 18. J. D. Claverley and R. K. Leach, “Development of a three-dimensional vibrating tactile probe for miniature CMMs,” Precision Engineering 37, 491– 499 (2013). 19. A. W. Knoll, D. Pires, O. Coulembier, P. Dubois, J. L. Hedrick, J. Frommer, and U. Duerig, “Probebased 3-D nanolithography using self-amplified depolymerization polymers,” Advanced Materials 22, 3361–3365 (2010). 20. J. RenéMayer, A. Ghazzar, and O. Rossy, “3D characterisation, modelling and compensation of the pretravel of a kinematic touch trigger probe,” Measurement 19, 83–94 (1996). 21. M. A.-B. Ebrahim, “3D laser scanners’ techniques overview,” International Journal of Science and Research 4, 323–331 (2015). 22. E. Lally, J. Gong, and A. Wang, “Method of multiple references for 3D imaging with Fourier transform interferometry,” Optics Express 18, 17591–17596 (2010). 23. A. Schutz, A. Ferrari, D. Mary, É. Thiébaut, and F. Soulez, “Large scale 3D image reconstruction in optical interferometry,” in 2015 23rd European Signal Processing Conference (EUSIPCO), (IEEE, 2015), 474–478. 24. Y. Cui, S. Schuon, D. Chan, S. Thrun, and C. Theobalt, “3D shape scanning with a time-of-flight camera,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 2010), 1173–1180.
14 Fringe Projection Profilometry 25. M. Hansard, S. Lee, O. Choi, and R. P. Horaud, Time-of-flight cameras: principles, methods and applications (Springer Science & Business Media, 2012). 26. L. Li, “Time-of-flight camera–an introduction,” Technical White Paper (2014). 27. S. May, D. Droeschel, S. Fuchs, D. Holz, and A. Nüchter, “Robust 3D-mapping with time-of-flight cameras,” in 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, (IEEE, 2009), 1673–1678. 28. S. May, B. Werner, H. Surmann, and K. Pervolz, “3D time-of-flight cameras for mobile robotics,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, (Ieee, 2006), 790–795. 29. J.-J. Aguilar, F. Torres, and M. Lope, “Stereo vision for 3D measurement: accuracy analysis, calibration and industrial applications,” Measurement 18, 193– 200 (1996). 30. S.-Y. Park and M. Subbarao, “A multiview 3D modeling system based on stereo vision techniques,” Machine Vision and Applications 16, 148–156 (2005). 31. S. Sengupta, E. Greveson, A. Shahrokni, and P. H. Torr, “Urban 3d semantic modelling using stereo vision,” in 2013 IEEE International Conference on robotics and Automation, (IEEE, 2013), 580–585. 32. Y. Sumi, Y. Kawai, T. Yoshimi, and F. Tomita, “3D object recognition in cluttered environments by segment-based stereo vision,” International Journal of Computer Vision 46, 5–23 (2002). 33. N. Uchida, T. Shibahara, T. Aoki, H. Nakajima, and K. Kobayashi, “3D face recognition using passive stereo vision,” in IEEE International Conference on Image Processing 2005, (IEEE, 2005), II-950. 34. B. Billiot, F. Cointault, L. Journaux, J.-C. Simon, and P. Gouton, “3D image acquisition system based on shape from focus technique,” Sensors 13, 5040– 5053 (2013). 35. R. Minhas, A. A. Mohammed, Q. Wu, and M. A. Sid-Ahmed, “3D shape from focus and depth map computation using steerable filters,” in International conference image analysis and recognition, (Springer, 2009), 573–583. 36. Y. Hu, Q. Chen, S. Feng, and C. Zuo, “Microscopic fringe projection profilometry: A review,” Optics and Lasers in Engineering 135, 106192 (2020). 37. N. Tornero-Martínez, G. Trujillo-Schiaffino, M. Anguiano-Morales, P. G. Mendoza-Villegas, D. P. Salas-Peimbert, and L. Corral-Martínez, “Color profilometry techniques: A review,” Measurement 45, 0136021 (2006). 38. S. Zhang, “High-speed 3D shape measurement with structured light methods: A review,” Optics and Lasers in Engineering 106, 119–131 (2018). 39. C. Zuo, S. Feng, L. Huang, T. Tao, W. Yin, and Q. Chen, “Phase shifting algorithms for fringe projection profilometry: A review,” Optics and Lasers in Engineering 109, 23–59 (2018).
281 40. J. Xu and S. Zhang, “Status, challenges, and future perspectives of fringe projection profilometry,” Optics and Lasers in Engineering 135, 106193 (2020). 41. S. Van der Jeught and J. J. Dirckx, “Real-time structured light profilometry: a review,” Optics and Lasers in Engineering 87, 18–31 (2016). 42. S. Zhang, High-speed 3D imaging with digital fringe projection techniques (CRC Press, 2018). 43. S. S. Gorthi and P. Rastogi, “Fringe projection techniques: whither we are?,” Optics and Lasers in Engineering 48, 133–140 (2010). 44. Z. Wang, D. A. Nguyen, and J. C. Barnes, “Some practical considerations in fringe projection profilometry,” Optics and Lasers in Engineering 48, 218–225 (2010). 45. B. Li, Y. Wang, J. Dai, W. Lohry, and S. Zhang, “Some recent advances on superfast 3D shape measurement with digital binary defocusing techniques,” Optics and Lasers in Engineering 54, 236– 246 (2014). 46. E. Li, X. Peng, J. Xi, J. F. Chicharo, J. Yao, and D. Zhang, “Multi-frequency and multiple phase-shift sinusoidal fringe projection for 3D profilometry,” Optics Express 13, 1561–1569 (2005). 47. D. J. Bone, “Fourier fringe analysis: the twodimensional phase unwrapping problem,” Applied Optics 30, 3627–3632 (1991). 48. S. Feng, Q. Chen, G. Gu, T. Tao, L. Zhang, Y. Hu, W. Yin, and C. Zuo, “Fringe pattern analysis using deep learning,” Advanced Photonics 1, 025001 (2019). 49. T. R. Judge and P. Bryanston-Cross, “A review of phase unwrapping techniques in fringe analysis,” Optics and Lasers in Engineering 21, 199–239 (1994). 50. W. W. Macy, “Two-dimensional fringe-pattern analysis,” Applied Optics 22, 3898–3901 (1983). 51. M. Takeda, “Fourier fringe analysis and its application to metrology of extreme physical phenomena: a review,” Applied Optics 52, 20–29 (2013). 52. P. S. Huang and S. Zhang, “Fast three-step phaseshifting algorithm,” Applied Optics 45, 5086–5091 (2006). 53. S. Zhang and S.-T. Yau, “High-resolution, real-time 3D absolute coordinate measurement based on a phase-shifting method,” Optics Express 14, 2644– 2649 (2006). 54. C. Jiang, P. Kilcullen, Y. Lai, S. Wang, T. Ozaki, and J. Liang, “Multi-scale band-limited illumination profilometry for robust three-dimensional surface imaging at video rate,” Optics Express 30, 19824– 19838 (2022). 55. J.-Y. Bouguet, “Camera calibration toolbox for matlab,” http://www. vision. caltech. edu/bouguetj/calib_doc/index. html (2004). 56. S. Zhang and P. S. Huang, “Novel method for structured light system calibration,” Optical Engineering 45, 083601 (2006).
282 57. H. Zhang, Q. Zhang, Y. Li, and Y. Liu, “High speed 3D shape measurement with temporal Fourier transform profilometry,” Applied Sciences 9, 4123 (2019). 58. M. Lu, X. Su, Y. Cao, Z. You, and M. Zhong, “Modulation measuring profilometry with cross grating projection and single shot for dynamic 3D shape measurement,” Optics and Lasers in Engineering 87, 103–110 (2016). 59. J. Xu, J. Xu, and X. Yu, “Design to phase measurement profilometry on grating projection system,” in 2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI), (IEEE, 2012), 1069–1071. 60. M. Neil, R. Juškaitis, and T. Wilson, “Real time 3D fluorescence microscopy by two beam interference illumination,” Optics Communications 153, 1– 4 (1998). 61. C. Chu, L. Wang, H. Yang, X. Tang, and Q. Chen, “An optimized fringe generator of 3D pavement profilometry based on laser interference fringe,” Optics and Lasers in Engineering 136, 106142 (2021). 62. B. Li, P. Ou, and S. Zhang, “High-speed 3D shape measurement with fiber interference,” in Interferometry XVII: Techniques and Analysis, (SPIE, 2014), 270–278. 63. M. Schaffer, M. Große, B. Harendt, and R. Kowarschik, “Coherent two-beam interference fringe projection for highspeed three-dimensional shape measurements,” Applied Optics 52, 2306– 2311 (2013). 64. D. Xiao-jie, D. Fa-jie, and L. Chang-rong, “Phase stabilizing method based on PTAC for fiber-optic interference fringe projection profilometry,” Optics & Laser Technology 47, 137–143 (2013). 65. L.-C. Chen and C.-C. Huang, “Miniaturized 3D surface profilometer using digital fringe projection,” Measurement Science and Technology 16, 1061 (2005). 66. S. Zhang, “Recent progresses on real-time 3D shape measurement using digital fringe projection techniques,” Optics and Lasers in Engineering 48, 149– 158 (2010). 67. H. Zhao, X. Liang, X. Diao, and H. Jiang, “Rapid in-situ 3D measurement of shiny object based on fast and high dynamic range digital fringe projector,” Optics and Lasers in Engineering 54, 170–174 (2014). 68. F. Lü, S. Xing, and H. Guo, “Self-correction of projector nonlinearity in phase-shifting fringe projection profilometry,” Applied Optics 56, 7204–7216 (2017). 69. J. Liang, R. N. Kohn Jr, M. F. Becker, and D. J. Heinzen, “High-precision laser beam shaping using a binary-amplitude spatial light modulator,” Applied Optics 49, 1323–1330 (2010). 70. D. Dudley, W. M. Duncan, and J. Slaughter, “Emerging digital micromirror device (DMD) applications,” in MOEMS display and imaging systems, (SPIE, 2003), 14–25.
C. Jiang et al. 71. Y. X. Ren, R. D. Lu, and L. Gong, “Tailoring light with a digital micromirror device,” Annalen der Physik 527, 447–470 (2015). 72. J. B. Sampsell, “Digital micromirror device and its application to projection displays,” Journal of Vacuum Science & Technology B: Microelectronics and Nanometer Structures Processing, Measurement, and Phenomena 12, 3242–3246 (1994). 73. C. Chang, J. Liang, D. Hei, M. F. Becker, K. Tang, Y. Feng, V. Yakimenko, C. Pellegrini, and J. Wu, “High-brightness X-ray free-electron laser with an optical undulator by pulse shaping,” Optics Express 21, 32013–32018 (2013). 74. L. J. Hornbeck, “Digital light processing for highbrightness high-resolution applications,” in Projection Displays III, (SPIE, 1997), 27–40. 75. B. Li and S. Zhang, “Microscopic structured light 3D profilometry: Binary defocusing technique vs. sinusoidal fringe projection,” Optics and Lasers in Engineering 96, 117–123 (2017). 76. S. Lei and S. Zhang, “Digital sinusoidal fringe pattern generation: Defocusing binary patterns VS focusing sinusoidal patterns,” Optics and Lasers in Engineering 48, 561–569 (2010). 77. Y. Xu, L. Ekstrand, J. Dai, and S. Zhang, “Phase error compensation for three-dimensional shape measurement with projector defocusing,” Applied Optics 50, 2572–2581 (2011). 78. C. Jiang, P. Kilcullen, Y. Lai, T. Ozaki, and J. Liang, “High-speed dual-view band-limited illumination profilometry using temporally interlaced acquisition,” Photonics Research 8, 1808–1817 (2020). 79. C. Jiang, P. Kilcullen, X. Liu, Y. Lai, T. Ozaki, and J. Liang, “High-speed three-dimensional surface measurement using band-limited illumination profilometry (BLIP),” in Emerging Digital Micromirror Device Based Systems and Applications XIII, (SPIE, 2021), 165–181. 80. C. Jiang and J. Liang, “High-speed band-limited illumination profilometry (BLIP),” in 3D Image Acquisition and Display: Technology, Perception and Applications, (Optica Publishing Group, 2022), 3Th5A. 3. 81. J. Liang, S.-Y. Wu, R. N. Kohn, M. F. Becker, and D. J. Heinzen, “Grayscale laser image formation using a programmable binary mask,” Optical Engineering 51, 108201 (2012). 82. J. Liang, M. F. Becker, R. N. Kohn, and D. J. Heinzen, “Homogeneous one-dimensional optical lattice generation using a digital micromirror device-based high-precision beam shaper,” Journal of Micro/Nanolithography, MEMS, and MOEMS 11, 023002 (2012). 83. W. Purgathofer, R. F. Tobler, and M. Geiler, “Forced random dithering: improved threshold matrices for ordered dithering,” in Proceedings of 1st International Conference on Image Processing, (IEEE, 1994), 1032–1035.
14 Fringe Projection Profilometry 84. B. E. Bayer, “An optimum method for two-level rendition of continuous-tone pictures,” in Ineternl. Conf. on Comm., 1976), 69–77. 85. T. D. Kite, B. L. Evans, and A. C. Bovik, “Modeling and quality assessment of halftoning by error diffusion,” IEEE Transactions on Image Processing 9, 909–922 (2000). 86. S. Lei and S. Zhang, “Flexible 3-D shape measurement using projector defocusing,” Optics Letters 34, 3080–3082 (2009). 87. Y. Wang, H. Zhao, H. Jiang, and X. Li, “Defocusing parameter selection strategies based on PSF measurement for square-binary defocusing fringe projection profilometry,” Optics Express 26, 20351– 20367 (2018). 88. J.-S. Hyun, B. Li, and S. Zhang, “High-speed highaccuracy three-dimensional shape measurement using digital binary defocusing method versus sinusoidal method,” Optical Engineering 56, 074102 (2017). 89. J. Liang, R. N. Kohn Jr, M. F. Becker, and D. J. Heinzen, “1.5% root-mean-square flat-intensity laser beam formed using a binary-amplitude spatial light modulator,” Applied Optics 48, 1955–1962 (2009). 90. I. Ishii, K. Yamamoto, K. Doi, and T. Tsuji, “Highspeed 3D image acquisition using coded structured light projection,” in 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, (IEEE, 2007), 925–930. 91. R. J. Valkenburg and A. M. McIvor, “Accurate 3D measurement using a structured light system,” Image and Vision Computing 16, 99–110 (1998). 92. C. Zuo, T. Tao, S. Feng, L. Huang, A. Asundi, and Q. Chen, “Micro Fourier transform profilometry (μFTP): 3D shape measurement at 10,000 frames per second,” Optics and Lasers in Engineering 102, 70–91 (2018). 93. X. Su and W. Chen, “Fourier transform profilometry:: a review,” Optics and lasers in Engineering 35, 263–284 (2001). 94. X. Mao, W. Chen, and X. Su, “Improved Fouriertransform profilometry,” Applied Optics 46, 664– 668 (2007). 95. J. Li, X. Su, and L. Guo, “Improved Fourier transform profilometry for the automatic measurement of 3D object shapes,” Optical Engineering 29, 1439– 1444 (1990). 96. H. Guo and P. S. Huang, “3-D shape measurement by use of a modified Fourier transform method,” in Two-and Three-Dimensional Methods for Inspection and Metrology VI, (SPIE, 2008), 104–111. 97. G. M. Andrés, P. Jesús, A. R. Lenny, V. Raúl, and M. Jaime, “Fourier Transform Profilometry in LabVIEW,” in Digital Systems (IntechOpen, Rijeka, 2018). 98. H. Guo, “3-D shape measurement based on Fourier transform and phase shifting method,” ((Doctoral dissertation, 2011).
283 99. Q. Kemao, “Two-dimensional windowed Fourier transform for fringe pattern analysis: principles, applications and implementations,” Optics and Lasers in Engineering 45, 304–317 (2007). 100. Q. Kemao, “Windowed Fourier transform for fringe pattern analysis,” Applied Optics 43, 2695–2702 (2004). 101. Q. Kemao, H. Wang, and W. Gao, “Windowed Fourier transform for fringe pattern analysis: theoretical analyses,” Applied Optics 47, 5408–5419 (2008). 102. F. Berryman, P. Pynsent, and J. Cubillo, “The effect of windowing in Fourier transform profilometry applied to noisy images,” Optics and Lasers in Engineering 41, 815–825 (2004). 103. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014). 104. C. Zuo, J. Qian, S. Feng, W. Yin, Y. Li, P. Fan, J. Han, K. Qian, and Q. Chen, “Deep learning in optical metrology: a review,” Light: Science & Applications 11, 39 (2022). 105. J. Qian, S. Feng, Y. Li, T. Tao, J. Han, Q. Chen, and C. Zuo, “Single-shot absolute 3D shape measurement with deep-learning-based color fringe projection profilometry,” Optics Letters 45, 1842–1845 (2020). 106. M. Takeda, Q. Gu, M. Kinoshita, H. Takai, and Y. Takahashi, “Frequency-multiplex Fourier-transform profilometry: a single-shot three-dimensional shape measurement of objects with large height discontinuities and/or surface isolations,” Applied Optics 36, 5347–5354 (1997). 107. Y. Li, J. Qian, S. Feng, Q. Chen, and C. Zuo, “Deeplearning-enabled dual-frequency composite fringe projection profilometry for single-shot absolute 3D shape measurement,” Opto-Electronics Advances 5, 210021 (2022). 108. D. C. Ghiglia and M. D. Pritt, “Two-dimensional phase unwrapping: theory, algorithms, and software,” A Wiley Interscience Publication (1998). 109. T. J. Flynn, “Two-dimensional phase unwrapping with minimum weighted discontinuity,” Journal of the Optical Society of America A 14, 2692–2701 (1997). 110. D. C. Ghiglia and L. A. Romero, “Robust twodimensional weighted and unweighted phase unwrapping that uses fast transforms and iterative methods,” Journal of the Optical Society of America A 11, 107–117 (1994). 111. K. Chen, J. Xi, and Y. Yu, “Quality-guided spatial phase unwrapping algorithm for fast threedimensional measurement,” Optics Communications 294, 139–147 (2013). 112. C. Zuo, L. Huang, M. Zhang, Q. Chen, and A. Asundi, “Temporal phase unwrapping algorithms for fringe projection profilometry: A comparative review,” Optics and Lasers in Engineering 85, 84– 103 (2016).
284 113. M. Zhang, Q. Chen, T. Tao, S. Feng, Y. Hu, H. Li, and C. Zuo, “Robust and efficient multi-frequency temporal phase unwrapping: optimal fringe frequency and pattern sequence selection,” Optics Express 25, 20381–20400 (2017). 114. H. Li, Y. Hu, T. Tao, S. Feng, M. Zhang, Y. Zhang, and C. Zuo, “Optimal wavelength selection strategy in temporal phase unwrapping with projection distance minimization,” Applied Optics 57, 2352–2360 (2018). 115. H. Zhao, W. Chen, and Y. Tan, “Phase-unwrapping algorithm for the measurement of three-dimensional object shapes,” Applied Optics 33, 4497–4500 (1994). 116. L. Kinell and M. Sjödahl, “Robustness of reduced temporal phase unwrapping in the measurement of shape,” Applied Optics 40, 2297–2303 (2001). 117. W. Osten, W. Nadeborn, and P. Andrae, “General hierarchical approach in absolute phase measurement,” in Laser Interferometry VIII: Techniques and Analysis, (SPIE, 1996), 2–13. 118. W. Yin, C. Zuo, S. Feng, T. Tao, Y. Hu, L. Huang, J. Ma, and Q. Chen, “High-speed threedimensional shape measurement using geometryconstraint-based number-theoretical phase unwrapping,” Optics and Lasers in Engineering 115, 21–31 (2019). 119. Y. An, J.-S. Hyun, and S. Zhang, “Pixel-wise absolute phase unwrapping using geometric constraints of structured light system,” Optics Express 24, 18445–18459 (2016). 120. C. Bräuer-Burchardt, P. Kühmstedt, and G. Notni, “Phase unwrapping using geometric constraints for high-speed fringe projection based 3D measurements,” in Modeling Aspects in Optical Metrology IV, (SPIE, 2013), 51–61. 121. J. Qian, S. Feng, T. Tao, Y. Hu, Y. Li, Q. Chen, and C. Zuo, “Deep-learning-enabled geometric constraints and phase unwrapping for single-shot absolute 3D shape measurement,” APL Photonics 5, 046105 (2020). 122. S. Feng, L. Zhang, C. Zuo, T. Tao, Q. Chen, and G. Gu, “High dynamic range 3d measurements with fringe projection profilometry: a review,” Measurement Science and Technology 29, 122001 (2018). 123. S. Zhang and S.-T. Yau, “High dynamic range scanning technique,” Optical Engineering 48, 033604 (2009). 124. C. Waddington and J. Kofman, “Saturation avoidance by adaptive fringe projection in phase-shifting 3D surface-shape measurement,” in 2010 international symposium on optomechatronic technologies, (IEEE, 2010), 1–4. 125. S. Ri, M. Fujigaki, and Y. Morimoto, “Intensity range extension method for three-dimensional shape measurement in phase-measuring profilometry using a digital micromirror device camera,” Applied Optics 47, 5400–5407 (2008).
C. Jiang et al. 126. S. Feng, Y. Zhang, Q. Chen, C. Zuo, R. Li, and G. Shen, “General solution for high dynamic range three-dimensional shape measurement using the fringe projection technique,” Optics and Lasers in Engineering 59, 56–71 (2014). 127. L. Ekstrand and S. Zhang, “Autoexposure for threedimensional shape measurement using a digitallight-processing projector,” Optical Engineering 50, 123603 (2011). 128. L. Rao and F. Da, “High dynamic range 3D shape determination based on automatic exposure selection,” Journal of Visual Communication and Image Representation 50, 217–226 (2018). 129. S. Feng, Q. Chen, C. Zuo, and A. Asundi, “Fast three-dimensional measurements for dynamic scenes with shiny surfaces,” Optics Communications 382, 18–27 (2017). 130. D. Li and J. Kofman, “Adaptive fringe-pattern projection for image saturation avoidance in 3D surfaceshape measurement,” Optics Express 22, 9887–9901 (2014). 131. H. Lin, J. Gao, Q. Mei, Y. He, J. Liu, and X. Wang, “Adaptive digital fringe projection technique for high dynamic range three-dimensional shape measurement,” Optics Express 24, 7703–7718 (2016). 132. T. Chen, H. P. Lensch, C. Fuchs, and H.-P. Seidel, “Polarization and phase-shifting for 3D scanning of translucent objects,” in 2007 IEEE conference on computer vision and pattern recognition, (IEEE, 2007), 1–8. 133. B. Salahieh, Z. Chen, J. J. Rodriguez, and R. Liang, “Multi-polarization fringe projection imaging for high dynamic range objects,” Optics Express 22, 10064–10071 (2014). 134. Z. Cai, X. Liu, X. Peng, Y. Yin, A. Li, J. Wu, and B. Z. Gao, “Structured light field 3D imaging,” Optics Express 24, 20324–20334 (2016). 135. Y. Yin, Z. Cai, H. Jiang, X. Meng, J. Xi, and X. Peng, “High dynamic range imaging for fringe projection profilometry with single-shot raw data of the color camera,” Optics and Lasers in Engineering 89, 138– 144 (2017). 136. C. Jiang, T. Bell, and S. Zhang, “High dynamic range real-time 3D shape measurement,” Optics Express 24, 7337–7346 (2016). 137. M. Wang, G. Du, C. Zhou, C. Zhang, S. Si, H. Li, Z. Lei, and Y. Li, “Enhanced high dynamic range 3D shape measurement based on generalized phaseshifting algorithm,” Optics Communications 385, 43–53 (2017). 138. E. Hu, Y. He, and Y. Chen, “Study on a novel phaserecovering algorithm for partial intensity saturation in digital projection grating phase-shifting profilometry,” Optik 121, 23–28 (2010). 139. B. Chen and S. Zhang, “High-quality 3D shape measurement using saturated fringe patterns,” Optics and Lasers in Engineering 87, 83–89 (2016).
14 Fringe Projection Profilometry 140. Y. Chen, Y. He, and E. Hu, “Phase deviation analysis and phase retrieval for partial intensity saturation in phase-shifting projected fringe profilometry,” Optics Communications 281, 3087–3090 (2008). 141. B. Pan, Q. Kemao, L. Huang, and A. Asundi, “Phase error analysis and compensation for nonsinusoidal waveforms in phase-shifting digital fringe projection profilometry,” Optics Letters 34, 416–418 (2009). 142. Z. Lei, C. Wang, and C. Zhou, “Multi-frequency inverse-phase fringe projection profilometry for nonlinear phase error compensation,” Optics and Lasers in Engineering 66, 249–257 (2015). 143. L. Lu, V. Suresh, Y. Zheng, Y. Wang, J. Xi, and B. Li, “Motion induced error reduction methods for phase shifting profilometry: A review,” Optics and Lasers in Engineering 141, 106573 (2021). 144. Y. Wang, V. Suresh, and B. Li, “Motion-induced error reduction for binary defocusing profilometry via additional temporal sampling,” Optics Express 27, 23948–23958 (2019). 145. Y. Gong and S. Zhang, “Ultrafast 3-D shape measurement with an off-the-shelf DLP projector,” Optics Express 18, 19743–19754 (2010). 146. L. Lu, J. Xi, Y. Yu, and Q. Guo, “New approach to improve the accuracy of 3-D shape measurement of moving object using phase shifting profilometry,” Optics Express 21, 30610–30622 (2013). 147. L. Lu, Y. Yin, Z. Su, X. Ren, Y. Luan, and J. Xi, “General model for phase shifting profilometry with an object in motion,” Applied Optics 57, 10364– 10369 (2018). 148. M. Duan, Y. Jin, C. Xu, X. Xu, C. Zhu, and E. Chen, “Phase-shifting profilometry for the robust 3D shape measurement of moving objects,” Optics Express 27, 22100–22115 (2019). 149. A. Breitbarth, P. Kühmstedt, G. Notni, and J. Denzler, “Motion compensation for three-dimensional measurements of macroscopic objects using fringe projection,” in DGaO Proceedings, 2012), A11. 150. T. Weise, B. Leibe, and L. V. Gool, “Fast 3D Scanning with Automatic Motion Compensation,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007), 1–8. 151. L. Huang, P. S. K. Chua, and A. Asundi, “Leastsquares calibration method for fringe projection profilometry considering camera lens distortion,” Applied Optics 49, 1539–1548 (2010). 152. S. Yang, M. Liu, J. Song, S. Yin, Y. Ren, J. Zhu, and S. Chen, “Projector distortion residual compensation in fringe projection system,” Optics and Lasers in Engineering 114, 104–110 (2019). 153. S. Xing and H. Guo, “Iterative calibration method for measurement system having lens distortions in fringe projection profilometry,” Optics Express 28, 1177–1196 (2020). 154. L. Zhou, J. Gan, X. Liu, L. Xu, and W. Lu, “Specklenoise-reduction method of projecting interferometry fringes based on power spectrum density,” Applied Optics 51, 6974–6978 (2012).
285 155. H. Liu, G. Lu, S. Wu, S. Yin, and F. T. S. Yu, “Speckle-induced phase error in laser-based phaseshifting projected fringe profilometry,” Journal of the Optical Society of America A 16, 1484–1495 (1999). 156. J. Burke, T. Bothe, W. Osten, and C. Hess, Reverse engineering by fringe projection, International Symposium on Optical Science and Technology (SPIE, 2002), Vol. 4778. 157. Y. H. Liao, J. S. Hyun, M. Feller, T. Bell, I. Bortins, J. Wolfe, D. Baldwin, and S. Zhang, “Portable highresolution automated 3D imaging for footwear and tire impression capture,” Journal of Forensic Sciences 66, 112–128 (2021). 158. E. Checcucci, D. Amparore, C. Fiori, M. Manfredi, M. Ivano, M. Di Dio, G. Niculescu, F. Piramide, G. Cattaneo, P. Piazzolla, G. E. Cacciamani, R. Autorino, and F. Porpiglia, “3D imaging applications for robotic urologic surgery: an ESUT YAUWP review,” World Journal of Urology 38, 869–881 (2020). 159. B. Li, Y. An, D. Cappelleri, J. Xu, and S. Zhang, “High-accuracy, high-speed 3D structured light imaging techniques and potential applications to intelligent robotics,” International Journal of Intelligent Robotics and Applications 1, 86–103 (2017). 160. N. Blanc, T. Oggier, G. Gruener, J. Weingarten, A. Codourey, and P. Seitz, “Miniaturized smart cameras for 3D-imaging in real-time [mobile robot applications],” Sensors, 2004 IEEE, vol.471. 161. Z. Zhang, G. Nejat, H. Guo, and P. Huang, “A novel 3D sensory system for robot-assisted mapping of cluttered urban search and rescue environments,” Intelligent Service Robotics 4, 119–134 (2011). 162. G. Rao, G. Wang, X. Yang, J. Xu, and K. Chen, “Normal Direction Measurement and Optimization With a Dense Three-Dimensional Point Cloud in Robotic Drilling,” IEEE/ASME Transactions on Mechatronics 23, 986–996 (2018). 163. W.-X. Huang and H. J. Sung, “Three-dimensional simulation of a flapping flag in a uniform flow,” Journal of Fluid Mechanics 653, 301–336 (2010). 164. S. Alben and M. J. Shelley, “Flapping States of a Flag in an Inviscid Fluid: Bistability and the Transition to Chaos,” Physical Review Letters 100, 074301 (2008). 165. M. Argentina and L. Mahadevan, “Fluid-flowinduced flutter of a flag,” Proceedings of the National Academy of Sciences 102, 1829–1834 (2005). 166. J. Paul Siebert and S. J. Marshall, “Human body 3D imaging by speckle texture projection photogrammetry,” Sensor Review 20, 218–226 (2000). 167. G. R. J. Swennen, W. Mollemans, and F. Schutyser, “Three-Dimensional Treatment Planning of Orthognathic Surgery in the Era of Virtual Imaging,” Journal of Oral and Maxillofacial Surgery 67, 2080– 2092 (2009).
286 168. M. R. Markiewicz and R. B. Bell, “The Use of 3D Imaging Tools in Facial Plastic Surgery,” Facial Plastic Surgery Clinics 19, 655–682 (2011). 169. S. Blair, M. Garcia, T. Davis, Z. Zhu, Z. Liang, C. Konopka, K. Kauffman, R. Colanceski, I. Ferati, B. Kondov, S. Stojanoski, M. B. Todorovska, N. T. Dimitrovska, N. Jakupi, D. Miladinova, G. Petrusevska, G. Kondov, W. L. Dobrucki, S. Nie,and
C. Jiang et al. V. Gruev, “Hexachromatic bioinspired camera for image-guided cancer surgery,” Science Translational Medicine 13, eaaw7067 (2021). 170. Y. Yin, M. Wang, B. Z. Gao, X. Liu, and X. Peng, “Fringe projection 3D microscopy with the general imaging model,” Optics Express 23, 6846–6857 (2015).
Grid-Index-Based Three-Dimensional Profilometry
15
Elahi Ahsan, QiDan Zhu, Jun Lu, Yong Li, and Muhammad Bilal
Abstract
Grid-index-based three-dimensional (3D) surface profilometry is a technique based on a spatially encoded structured light system (SLS). The competitive advantage of this technique is the ability to measure 3D surfaces in a single shot and is hence useful for realtime applications. This chapter will briefly review the existing fast, single-shot 3D measurement methods based on the gridindex-based structured light spatial projection. However, covering all possible 3D grid-indexbased surface imaging techniques might be impossible. We have selected and explained essential methods to help readers gain insight into grid-index-based surface profilometry. Keywords
Structured light · Single-shot · 3D measurement · Triangulation · Stripe indexing · 1D grid pattern · Grid indexing · 2D grid pattern
E. Ahsan () · Q. Zhu · J. Lu · Y. Li · M. Bilal College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin City, Heilongjiang Province, China e-mail: [email protected]; [email protected]
15.1
Introduction
15.1.1 Purpose and Significance Structured light is a general concept utilized for non-contact-based three-dimensional (3D) measurement in close range, and it is one of the cheapest and most reliable methods. The non-contactbased 3D measurement is an important research topic in computer vision since it is broadly used and applied in industrial inspection and manufacturing [1], reverse engineering of mechanical parts and assemblies [2], virtual reality [3], object recognition [4], 3D map building [5], perception system of autonomous mobile robots (selfdriving cars [6]), and applications in medical diagnosis (tomography) [7]. Therefore, fast, singleshot, 3D measurement techniques have become the most vital and challenging task with their employment in real-time and dynamic scene applications scenarios [8]. A high-speed and real-time 3D surface profilometry technique can implement through a structured light system (SLS) [9]. Optical metrology researchers emphasize using fringe pattern-based SLSs for real-time applications [10, 11]. However, the techniques based on fringe patterns have the limitations of lower accuracy and resolution and require sequential projections. While on the other side, the real-time application scenarios demand 3D measurement through single-shot techniques. As a result, grid-indexbased spatial neighborhood schemes emerged,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_15
287
288
which are single-shot, fast, and used to develop real-time and dynamic scene applications.
15.1.2 Background and History Woodham [12] introduced photometric stereoscopy in 1980. It is a variant approach for the shape measurement of an object through multiple images and shading. So, stereo-vision techniques were employed for 3D measurement before developing SLSs. In a stereo vision method, more than one camera is utilized to solve the correspondence problem between two or more views of an object [13, 14]. These cameras are put parallel to each other and are separated through a baseline distance assumed to be known accurately. The two cameras simultaneously capture two images. Images are analyzed to note the differences, and the same pixels accurately identify. Hence, the correspondence problem between the two images is resolved [15]. The major drawback occurs with the unstable 3D measurement result and poor performance when applied to textureless surfaces [16]. So, many researchers propose various stereo-matching algorithms to address the correspondence problem [17–19]. In the structured light projection technique, one of the stereo vision cameras is replaced with a light-emitting projector [20]. So the SLS can also call the active stereo vision technique [21]. It employs the triangulation principle to obtain the 3D measurement. Thus, the correspondence between two images transforms into a perspective looking for corresponding points between the projected pattern and the captured image [22, 23]. The structured light techniques can be classified into three main types: temporal coding (or time multiplexing and sequential projections), gridindexed-based spatial neighborhood techniques, and hybrid methods [24]. Time multiplexing coding schemes can give good 3D measurement results but require multiple patterns, which is only suitable for static objects [25]. Sequential coding schemes can be classified into binary patterns, gray coding, phase shift, photometric, and hybrid techniques.
E. Ahsan et al.
In contrast, the spatial coding schemes can acquire 3D information with only a single pattern projection, thus suitable for dynamic scenes and targets [26]. These single-shot techniques can be further classified into two categories: stripe-based grid indexing [i.e., onedimensional (1D) encoding schemes] and twodimensional (2D) shape primitive or symbolbased encoding methods (i.e., 2D grid indexing). The classification hierarchy of structured light techniques depicts in Fig. 15.1. Will and Pennington introduced the concept of grid coding in 1972 for automatic extraction of the range data [27, 28]. LeMoigne and Waxman introduced a structured light approach that involves projecting a rectangular grid of highcontrast points and lines onto the surface for short-range 3D applications [29, 30]. Wang et al. [31] applied the grid coding technique through structured light to find object surface orientation and structure. The grid pattern combines the advantages of both the simple point and the line pattern, as sharp discontinuities may indicate abrupt changes at several points on the object’s surface. But grid coding implies weak constrictions on the physical objects [32] since labeling intersecting points of the grid is time-consuming, especially if some parts of the object are occluded [33]. The 2D spatial neighborhood techniques evolved from the grid encoding schemes. In the grid-indexedbased spatial codification techniques, the codeword of a specific location is extracted from the surrounding points. The key idea is to ensure the distinctiveness of the codeword at any location in the whole range of the pattern. As highlighted in Fig. 15.1, the grid-indexed-based SLS can be classified as 1D and 2D spatial grid patterns. The 1D spatial grid pattern can be classified into continuously varying color coding schemes [34–37]. De Bruijn Sequence [38–42], color-coded stripes [43], segmented stripes [44], and grayscale-coded stripes [45]. On the other hand, 2D spatial grid patterns can be classified as spatially varying color patterns and 2D color-coded grids [46–49], M-arrays [7, 33, 50–57] or robust pseudo-random sequences [58], and non-formal coding schemes [59, 60]. We will discuss a few of these techniques in subsequent paragraphs.
15 Grid-Index-Based Three-Dimensional Profilometry
289
Binary Coding Gray Coding Time Multiplexing /Temporal Coding/ Sequential Projection
Phase Shift Photometric Hybrid: Gray Code + Phase Shift
Structured Light
Continuously Varying Color Patterns De Bruijn Sequence Stripe Indexing (1D Grid Pattern)
Color Coded Stripes Segmented Stripes
Grid Index based Spatial Neighborhood (Single shot)
Gray Scales Coded Stripes Spatialy Varying Color Patterns and 2D Color Coded Grid Grid Indexing (2D Grid Pattern)
Hybrid Methods
M-arrays/ Psuedo Random Sequence Non Formal Coding
Fig. 15.1 Structured light techniques
15.2
1D Spatial Grid Pattern
15.2.1 Continuously Varying Color Patterns Tajima et al. [34] proposed a single monochromatic illumination pattern with gradually increasing wavelength, i.e., a rainbow pattern in 1990. An extensive set of vertical slits are encoded using different wavelengths such that an excellent spectrum sampling from red to blue is obtained. The two cameras are jointly used to observe the
scene. The correspondence between two cameras is obtained by calculating the ratio between the two images. Geng proposed the rainbow 3D camera in 1996 using a continuously varying color pattern implemented through spatially varying wavelength illumination, which can project onto the object’s surface to calculate the depth [35]. The fixed geometry of the rainbow light projector establishes a one-to-one correspondence between the projection angle of a plane of light, and a particular spectral wavelength, thus providing easy-to-identify landmarks on each surface point.
290
E. Ahsan et al.
a.
b.
Camera
3D Objects
Continuous Rainbow Color Pattern
Blue Channel Intensity Variation
Baseline
Green Channel Intensity Variation Projector Spatially Varying Wavelength Projection
Combining Function
Composite Three Channel Intensity Variation
Red Channel Intensity Variation
c.
Fig. 15.2 Continuously varying color patterns (a) Rainbow 3D camera similar to pattern proposed in [35], (b) Continuous rainbow color pattern similar to pattern proposed in [36], (c) Color-coded sinusoidal fringe pattern [37]. ((a) and (b) are reprinted from the open access paper of Optica publishing group, Advances in Optics and Pho-
tonics Journal, Geng J (2011) Structured-light 3D surface imaging: a tutorial. https://doi.org/10.1364/aop.3.000128. (c) is reprinted with the permission of the Department of Mechanical Engineering at Stony Brook University. Song Zhang (2005), Ph.D. Thesis, High-resolution, Real-time 3-D Shape Measurement)
With a known baseline and an available viewing angle, the 3D range values corresponding to each pixel can be computed through a straightforward triangulation principle. This concept was further extended in 2004 by proposing a method and apparatus for 3D imaging using light patterns having multiple sub-patterns to encode the spatial location information [36]. An intensity variation pattern was constructed for each color (red, green, and blue) channel. When all these channels were added together, they formed a continuously varying color scheme, the same as a rainbow-like color projection pattern. Mostly these patterns do not necessarily follow a linear variation relationship in the color spectrum (wavelength). Still, since the ratios among the contributions from each color channel are known, the decoding scheme is easy to derive. A similar approach was adopted by Zhang [37, 61] in 2005 to form a color-coded fringe pattern for high-speed, realtime 3D acquisition. Instead of the saw-tooth waveform, sinusoidal functions generated three
color channels. Three fringe patterns were projected based on the phase shifting technique, and a parallel processing phase unwrapping algorithm was developed. Figure 15.2 shows the continuously varying color patterns as proposed by different authors.
15.2.2 De Bruijn Sequence De Bruijn sequences are a well-defined type of sequences used in the encoding of spatial neighborhood structure light projection patterns. A De Bruijn sequence of an order ‘k’ over the alphabets of ‘m’ symbols is a circular string of length ‘mk ’ that contains each substring of length ‘k’ exactly once. A De Bruijn sequence has a flat autocorrelation function with a unique peak at its zeroth moment. So, it is the best autocorrelation function that can be achieved, and this property makes it suitable to be used in encoding structured light projection patterns. In 1987, Boyer and Kak [38]
15 Grid-Index-Based Three-Dimensional Profilometry
proposed a technique that used a pattern formed by vertical slits coded with the three primary colors (i.e., red, green, and blue) and separated by black bands. The sequence of colored slits was designed so that if the pattern is divided into subpatterns of a certain length, nothing is repeated. In the decoding stage, the morphology of the measuring surface acted as a perturbation of a signal applied to the projected pattern. So, the received pattern can contain disorders or even deletions of the slits. A four-step algorithm called the stripe indexing process was designed for decoding. In 1989, Hugli and Maitre [40] improved the pattern proposed by Boyer and Kak by using a pseudo-random sequence for projecting the colored stripes, and the same color in two consecutive bars was not allowed. The advantage occurs in decoding. So, any subsequence position becomes independent of location. In 1990, Vuylsteke and Oosterlinck [39] proposed a single-shot binary encoding pattern using De Bruijn sequences derived from the pseudorandom noise. A total of 63 columns were encoded in a unique design with a checkerboard structure where the column of every grid point is encoded. The encoding system was based on two binary pseudo-random sequences of order six and length 63. This approach was used for extensive feature extraction but at the cost of more time consumption. In 2002, Zhang [41] proposed a rapid shape acquisition method using colored structure light and dynamic programming. The technique works by projecting a pattern of alternating color stripes and matching the projected color transitions with observed edges in the image. The correspondence problem is solved using a multi-pass dynamic programming algorithm that eliminates global smoothness assumptions and strict ordering constraints in previous formulations. Many techniques that use the De Bruijn sequence employ colored stripe patterns, multi-slit, and shape primitive-based patterns to obtain dense reconstructions with single-shot measurement. The colored patterns are suitable for locating intensity peaks in the image, while the multi-slit shape primitive-based designs aim to find edges. In 2005, Pages et al.
291
[42] proposed an optimized De Bruijn pattern for one-shot shape acquisition to design colored stripe patterns so that both intensity peaks and edges can be located without loss of accuracy and reduce the number of hue levels included in the pattern. Figure 15.3 shows pattern illumination using De Bruijn Sequence.
15.2.3 Color-Coded Stripes Most color-coded stripe-based methods suggest the design of color patterns that can be uniquely identified in the illumination space. Still, little explicit attention has been paid to the selection of colors. Je et al. [43] proposed a high-contrast color stripe pattern for real-time range sensing. It is used to select colors for illumination in stripe patterns that maximize the range resolution. They also proposed a two-shot range imaging method insensitive to system noise, nonlinearity, and object color reflectance. The selection of stripe colors and the design of multiple-stripe patterns for the single-shot measurements increase the color contrast between the stripes, reducing the ambiguities resulting from colored object surfaces and the limitations in sensor/projector resolutions. The two-shot imaging adds an extra video frame and increases the color contrast between the first and second video frames to diminish the ambiguities further. Figure 15.4 shows a colorcoded strip pattern to form a 1D grid.
15.2.4 Segmented Stripes Maruyama et al. [44] proposed a unique method in 1993 to encode a single-shot structured light projection using multiple slits with random cuts. Each slit is identified by random dots given as randomly distributed cuts on each slit. Thus, each slit is divided into many small line segments. The segment matching is performed based on the correspondence of endpoints along epipolar lines. Depth information is obtained by triangulation from the matched pairs of segments. The method has the advantage of adding a unique segmented
292
E. Ahsan et al.
a.
R G B
b.
(i) Test Surface
(ii) Illumination of the surface using simple and hybrid methods
(iii) 3D Measurement Results using simple and hybrid Methods Fig. 15.3 Pattern illumination using the De Bruijn sequence. (a) Pattern proposed by Zhang [41]*. (b) Optimized De Bruijn patterns proposed by Pages et al. [42]. ((a) is reprinted with the permission of IEEE Computer Society. Zhang L, Curless B, Seitz SM (2002) Rapid shape acquisition using color structured light and multipass dynamic programming. Proc – 1st Int Symp 3D Data Process Vis Transm 3DPVT 2002 24–36. https://doi.org/ 10.1109/TDPVT.2002.1024035. (b) is reprinted with the
permission of Elsevier Journal of Image and Vision Computing. Pagès J, Salvi J, Collewet C, Forest J (2005) Optimised de Bruijn patterns for one-shot shape acquisition. Image Vis Comput 23:707–720. https://doi.org/10.1016/ j.imavis.2005.05.007. *Note: The binary R, G, and B color intensities are selected using a De Bruijn Sequence for m = 5 and k = 3. The intensity variation of each binary channel combines to form a unique pattern with every three consecutive transitions not repeated)
coding scheme to distinguish one stripe from others. So, when decoding, the unique segmented pattern of each strip is distinguishable. This segmented indexing method is intriguing but can only be applied to a 3D object with a smooth surface when the pattern distortion due to shape is not critical. Otherwise, it may be difficult to recover the unique segmented pattern, which can owe to deformation or discontinuity of the surface. The segmented stripe pattern is shown in Fig. 15.5.
15.2.5 Grayscale-Coded Stripes Posdamer et al. [62] introduced the grayscale encoding technique in 1982 by projecting a sequence of ‘m’ patterns to encode 2m stripes. The N-ary reflected gray code was proposed in 1984, which used multi-level gray codes for pattern illumination [63]. Carrihill and Hummel developed the intensity ratio sensor [64], in which many grey levels from black to white were used for light-plane labeling. However, the labeling was analog and hence susceptible to noise. Although
15 Grid-Index-Based Three-Dimensional Profilometry
293
Fig. 15.4 Color-coded stripe pattern [43] (a) RGB Colorcoded stripes pattern and its zoomed view. (b) Singleshot imaging when applied to the human face with its range result. (c) Two-shot imaging with Rubik’s cube with its range result. ((a), (b), and (c) are reprinted with
the permission of Springer Nature, Je C, Lee SW, Park RH (2004) High-contrast color-stripe pattern for rapid structured-light range imaging. In: Eighth European conference on computer vision (ECCV), Lecture Notes in Computer Science. pp. 95–107)
all of these technologies were time-multiplexing, a new conception of encoding using grayscale illumination is evolved. The advantage of grayscale encoding in binary is that the effect of a detection error due to a transition is limited to an error between the two adjacent light planes [65]. So, in 1998, Durdle et al. [45] proposed a pattern
for reconstructing the human trunk to diagnose scoliosis (abnormal curvature of the spine) by presenting multi-grayscale coded stripes. Three intensity levels (i.e., black, gray, and white) generate a pattern. When three intensity levels (i.e., black, white, and gray) are used, the intensity levels of stripes can be arranged such that any
294
E. Ahsan et al.
15.3
2D Spatial Grid Pattern
The essential requirement of a 2D grid pattern is to ensure the unique labeling of each sub-window in the grid of the projected pattern. The key to success is how unique code can be attributed to each position of the illuminated pattern so that the pattern in each sub-window is unique and fully distinguished from its neighbors in the 2D location. Hu and Stockman [32] and Keizer et al. [66] suggested that a 2D grid pattern of vertical and horizontal lines can resolve and greatly simplify the correspondence problem. Fig. 15.5 A pattern is similar to the segmented stripe pattern proposed by Maruyama et al. [44]. (This figure is reprinted with the permission of IEEE. Maruyama M, Abe S (1993) Range Sensing by Projecting Multiple Slits with Random Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 15:647–651. https:// doi.org/10.1109/34.216735. Note that the author creates this picture to enhance the readers understanding, and it may be different from the pattern proposed by Maruyama)
Fig. 15.6 A pattern is similar to the grayscale-coded stripe proposed by Durdle et al. [45]. (This figure is reprinted with the permission of IEEE. Durdle NG, Thayyoor J, Raso VJ (1998) An improved structured light technique for surface reconstruction of the human trunk. In: IEEE Canadian Conference on Electrical and Computer Engineering. IEEE, Waterloo, Ontario, pp. 874–877. Note that the author creates this picture to enhance the readers understanding, and it may be different from the pattern proposed by Durdle)
group of bars has a unique intensity pattern within a period of length. The pattern is generated in the following manner: BWG, WBG, GWB, GBW, GWB, GBW, WGB, BWG, WBW, and GBG, as shown in Fig. 15.6.
15.3.1 Spatially Varying Color Patterns and 2D Color-Coded Grid In 1991, Griffin et al. [46] proposed a matrix of colored circles or dots projected onto the scene for range data acquisition. The circles have red, green, and blue colors used as shape primitive and spread through a pseudorandom sequence of size 11 × 29 generated through a brute force algorithm. The robustness of the sequence is ensured so that no two codewords in the pattern are the same, and a codeword is identified from its four adjacent neighbors. So each defined location in the pattern is unique, and hence correspondence can establish. Since the generated pseudorandom sequence was small, the measured resolution may be low and give fewer feature points. In 2007, Desjardins and Payeur [67, 68] used a bi-dimensional pseudo-random sequence of length 53 × 38 color codes with a window property of 3 × 3 to form a projection pattern. Instead of circles, they used a rectangular grid of colors. The scene was captured by two cameras with different positions. The pseudo-random color codes were used to create artificial textures on a location. The decoding was performed through a confidence map in a group to ensure reliable feature matching.
15 Grid-Index-Based Three-Dimensional Profilometry
Petriu et al. [47] introduced a pseudo-random sequence of length 15 × 15 with color-encoded grid patterns composed of the rows and columns of grid lines and applied this sequence on a simple cubic surface. The pattern was based on small pseudo-random sequences, and only 59% of feature points were detected successfully. A similar approach was adopted by Salvi et al. [69] in 1998. However, their pattern was 29 × 29 grid lines of six colors. The horizontal slits were red, green, and blue, while the vertical slits were magenta, yellow, and cyan. They also neglected the effects and segmentation errors caused by highly saturated color objects. The six colors of the pattern were decoded with six different gray levels in a single image obtained after the segmentation of the captured image. The cross points of horizontal and vertical slits were detected by searching the local maxima in the intensity image. Since both pattern axes were coded, cross points of the grid were redundantly coded, which led to a more accurate triangulation [70]. Furukawa et al. [71] and Kawasaki et al. [72] proposed a similar approach with horizontal grid lines in blue and vertical grid lines in red. Their method does not rely on the image processing operation and is robust against undetected grid points as it does not assume relative ordering. Their solution includes performing singular value decomposition on a large and very sparse matrix, which is expensive and may be numerically inaccurate. Moreover, the algorithm may sometimes fail to converge to the correct reconstruction due to instabilities. Ulusoy et al. [73, 74] proposed a single-shot technique using De Bruijn spaced horizontal and vertical grid lines pattern. Their grid pattern is similar to the one used by Kawasaki et al. [71, 72]. However, to simplify the search for correspondences between the projected pattern and the captured images, they proposed De Bruijn spaced grids lines, grid patterns with spacings that follow a De Bruijn sequence. The decoding stage is further improved using a probabilistic graphical model and solved efficiently using loopy belief propagation [74]. Song et al. [48, 49] proposed a 2D pseudo-random rhombic or diamond-shaped color element pattern. The grid points between the pattern elements are adopted as the feature points instead of the
295
centroid. Two possible types of grid-point are described with a scheme that allows each grid point to be uniquely distinguished by a codeword and the type it belongs to. In addition, this procedure is utilized to determine an object’s 3D orientation and 3D position [49]. Wijenayake et al. [55] introduced a method that encodes two pseudo-random sequences of feature size 45 × 45 on top of other to form a single pattern image. One pseudo-random sequence was a binary sequence with a window property of 7 × 7 implemented with monochromatic light of red and white, while the other pseudo-random sequence having a 3 × 3 window property used colors of magenta, cyan and yellow. Since the red color has the longest wavelength, whenever red is mapped with any other color on the second sequence, the second sequence color may undermine, and the red color may appear dominantly. The 2D color-coded grid patterns are shown in Fig. 15.7. All these patterns are employed through color-coded shape primitives that may be affected by surface colors, albedos, and reflections.
15.3.2 M-array or Robust Pseudo Random Sequence The pseudo-random sequences or M-arraybased methods employed in 2D color-coded grid patterns have been discussed in the previous section. However, single-shot structured light encodes through color and geometrical features [75]. In this part, we will briefly describe a few methods employing encoding of the patterns using grayscale geometrical symbols while using a robust pseudo-random sequence or the Marray. The pseudo-random sequences are widely used in many applications [76, 77] before their employment in pattern encoding of SLSs. The M-array was first used in structured light by Morita et al. [53] in 1988. Since then, various researchers have encoded many patterns based on this methodology. M-arrays are formed using the theory of perfect maps, formulated by Etzion [78, 79] in 1988. In perfect maps, no codewords repeat themselves. The codewords and their locations are unique in the sequence. M-arrays (perfect
296
E. Ahsan et al.
Fig. 15.7 2D color-coded patterns (a) Pattern similar to Griffin et al. [46] colored circle or dots (b) Pattern similar to Desjardin and Payeur [67, 68] colored rectangles (c) Pattern similar to Petriu grid lines [47]. (d) Song Zhan and Ronald Chung 2D Diamond shaped color grid pattern [48, 49]. (However, (a) is printed with permission of the pattern recognition society, Griffin PM, Narasimhan LS, Yee SR (1992) Generation of uniquely encoded light patterns for range data acquisition. Pattern Recognit 25:609– 616. https://doi.org/10.1016/0031-3203(92)90078-W. (b) is printed with the permissions of Springer Nature, Payeur P, Desjardins D (2009) Structured light stereoscopic imaging with dynamic pseudo-random patterns. In: Lecture Notes in Computer Science (including subseries lecture notes in artificial intelligence and bioinformatics).
Springer-Verlag, Berlin Heidelberg, pp. 687–696. (c) is printed with the permission of IEEE, Petriu EM, Sakr Z, Spoelder HJW, Moica A (2000) Object recognition using pseudo-random color encoded structured light, in 17th IEEE Instrumentation and Measurement Technology Conference. IEEE, Baltimore, MD, USA, pp. 1237–1241 (d) is reprinted with the permission of IEEE, IEEE Transactions on Pattern Analysis and Machine Intelligence. Song Z, Chung R (2010) Determining both surface position and orientation in structured light based sensing. 32: 1770– 1780. https://doi.org/10.1109/TPAMI.2009.192. Note that (a) and (c) is created by the author to enhance the readers understanding, and they may differ from the patterns proposed by Griffin and Petriu)
maps) are pseudo-random numbers sequences or arrays (with a specific number of alphanumeric basis, example: 3, 4, 5, . . . .) having the dimensions ‘wxv’ (example 45 × 72) in which a sub-matrix or sub-window of ‘nxm’ (example 2 × 2 or 3 × 3) appears only once. M-arrays are 2D space sequences, unlike the De Bruijn, which is only 1D space sequences. So, coarse correspondence can easily be established using
perfect maps in structured light. Griffin et al. [46] used color dots or circles, which we discussed previously. Still, in their second approach, Yee et al. [80] proposed to generate an array of 18 × 66 features using an alphabet of four words 1, 2, 3, 4 using grayscale geometric symbols. Compared with the color-based approach, better results were obtained in the grayscale geometrical featurebased technique while applying it to colored
15 Grid-Index-Based Three-Dimensional Profilometry
objects. In 2008, Chen et al. [51, 52] proposed a 3D imaging system that modifies Griffin and Yee’s methods. He added a connectivity condition and eliminated the gaps between two symbols to achieve dense reconstruction. Morano et al. [33] introduced grayscale geometrical symbols to be used as shape primitives to encode the structured light pattern in 1998. Albiter et al. [7] proposed a pattern using a pseudorandom sequence of size 29 × 27 and having a window property of 3 × 3 for measuring a small surface. He used monochromatic light and three symbols (i.e., disk, circle, and stripe) to represent the codeword. Later, researchers who used grayscale or geometrical primitive tried to increase the feature points. For example, Lu et al. [50] used an M-array of a larger feature size of 48 × 52 and three alphabets as those used by Albiter [7]. Similarly, Jia et al. [56, 81] proposed a similarly shaped ten alphabets Marray with a feature size of 79 × 59 points. Ahsan et al. [58] proposed a pixel-encoded digitally controllable pattern with five sets of twenty-five geometrical symbols. He also proposed a method to generate robust pseudo-random sequences for any required size according to projector dimensions and surface area. Many pseudorandom-based methods have been proposed and continuously proposed [57, 82–84]. Figure 15.8 shows the geometrical symbol-based patterns proposed by various researchers. It is important to highlight that statistically random coding schemes have also been successfully employed in consumer products such as Microsoft Kinect V1, Intel Realsense R200, and iPhone X [9]. The statistically random pattern used by Kinect I, RealSense, and iPhone X is shown in Fig. 15.9.
15.3.3 Non-formal Coding In non-formal coding techniques, researchers create a specific pattern to fulfill certain requirements without following procedural methods. These include both 1D and 2D grid indexing methods. An example of a 1D grid
297
pattern is already discussed in the previous section as segmented stripes [44]. 1D procedures are generally based on the De Bruijn sequence. Researchers such as Forster [85] and Fechteler et al. [86] used similar color-coded stripes or lines. However, they ensured that at the receiving device, two adjacent colors must differ in at least two color channels. This employed a multi-slit pattern. The 2D non-formal methods are also discussed in the 2D color-coded grid section (i.e., Sect. 15.3.1). The examples are Kawasaki et al. [71, 72] and the technique employed by Ulusoy et al. [73, 74]. In these techniques, the uniqueness of a specific location has been encoded in the spacing between horizontal and vertical lines. Ito et al. [59] used a set of square cells (e.g., a checkerboard) having one out of three possible intensity values. Every node (intersection between four compartments of the checkerboard) was associated with the intensity values of the forming cells. To differentiate nodes having the same subcode, epipolar restrictions between the camera and the projector were employed. The idea of using epipolar conditions was also applied in work inferred by Koninckx et al. [87], who proposed an adaptive system of green diagonal lines (named coding lines) superimposed to a grid of vertical black lines (called base pattern). If a coding line is not coincident with an epipolar line, intersections created with the base pattern would all have laid on different epipolar lines. This determines a unique point in the projected pattern, being able to perform the matching and the triangulation. A greater inclination of diagonal lines gave the reconstruction a higher density but a lower noise resistance. Figure 15.10 shows a few of the non-formal coded patterns.
15.4
3D Measurement Model
This section will explain the mathematical model for obtaining 3D measurements. The linear camera model with a 3D measurement mechanism and apparatus is shown in Fig. 15.11.
298
E. Ahsan et al. a.
b.
c.
i.
8×8 Pixel symbols ii. 10×10 Pixel symbols iii. 12×12 Pixel symbols
vi. Pattern using M-array with size 51×81, 4 Alphabet, 2 space, 14×14 pixel symbol
iv. 14×14 Pixel symbols
v. 16×16 Pixel symbols
vii. Pattern using M-array with size 45×72, 3 Alphabet, 2 space, 16×16 pixel symbol
viii. Pattern applied to the surface and its 3D result.
Fig. 15.8 2D geometrical symbol-based patterns. (a) Shape Primitive proposed by Yee et al. [80] (b) Code Word of Yee et al. [80] pattern (c) 5 sets of 25 geometrical shape primitive and generated patterns proposed by Ahsan et al. [58]. (a) and (b) are printed with the permission of SPIE Journal of optical engineering, Yee SR, Griffin PM (1994) Three-dimensional imaging
system. Opt Eng 33:2070–2075. https://doi.org/10.1117/ 12.169713. (c) was first published in IEEE open Access journal and with the permission author himself, Elahi A, Lu J, Zhu QD, Yong L (2020) A Single-Shot, Pixel Encoded 3D Measurement Technique for Structure Light. IEEE Access 8: 127254–127,271. https://doi.org/10.1109/ ACCESS.2020.3009025
15 Grid-Index-Based Three-Dimensional Profilometry
Fig. 15.9 Pseudo-random patterns are employed in commercial devices [9]. (a) Microsoft Kinect V1 (b) Intel RealSense R200 (c) iPhone X. (This picture is reprinted with the permission of Elsevier Journal of Optics and Lasers
299
in Engineering. Zhang S (2018) High-speed 3D shape measurement with structured light methods: A review. Opt Lasers Eng 106:119–131. https://doi.org/10.1016/ j.optlaseng.2018.02.017)
where R is a 3 × 3 rotational matrix, and T is a 3 × 1 translational matrix. M is an outside or extrinsic camera parameter matrix. So, both rotation and translation (i.e., R and T) are external or extrinsic camera calibration parameters. The perspective camera view is normalized into the digital image by using simple geometry and the pinhole camera model with the following relationship Fig. 15.10 Non-formal coded patterns proposed by Ito et al. [59] (checkerboard pattern). (This figure is reprinted with the permission of the Pergamon Journal of Pattern Recognition. Ito M, Ishii A (1995) A three-level checkerboard pattern (TCP) projection method for curved surface measurement. Pattern Recognit 28:27–40. https:/ /doi.org/10.1016/0031-3203(94)E0047-O. Note: This figure is self-created by the author to enhance readers’ understanding)
The point P (x, y, z) in the real world can be transformed into a 2D digital image ‘I’ with pixels (u, v, 1) through coordinate transformation [88, 89]. The world coordinates (Xw , Yw , Zw , 1)T can transform to camera coordinates (XC , YC , ZC , 1)T through the following relationship: ⎡ ⎤ ⎤ XC Xw ⎢ YC ⎥ ⎥ RT ⎢ ⎢ Yw ⎥ ⎥ .⎢ ⎣ ZC ⎦ = 0 1 ⎣ Zw ⎦ = M 1 1 ⎡
⎡
⎤ Xw ⎢ Yw ⎥ ⎢ ⎥ ⎣ Zw ⎦ 1 (15.1)
.
un vn
=
XC ZC YC ZC
(15.2)
,
where (un , vn ) are X–Y coordinates on the imagecapturing plane. The camera lens distortion contains radial distortion and tangential distortion and is defined by the relationship
ud .md = vd
= 1 + k1 r + k2 r 2
4
un + dx, vn (15.3)
where (ud , vd ) are the coordinates of the image plane in the camera after including lens distortion effects. The first term represents radial distortion, where the coefficients k1 and k2 are the radial distortion parameters. The second term means tangential distortion, and it can define as 2 p1 un vn + p2 r 2 + 2 un 2 .dx= , (15.4) p1 r 2 + 2 vn 2 + 2 p2 un vn
300
E. Ahsan et al.
World Coordinates
3D Objects
Yw Zw
Xw
O P (x, y, z)
Zc
Yi
O’
xd
Yc
xp
v
Camera Imag e P lane
Camera Coordinates
md
Xi
Oi
C
u
m o (u0, v0)
Digital Image
Xc
Projector
Camera
Computer
Fig. 15.11 Linear camera model with 3D measurement mechanism
where p1 and p2 are tangential distortion parameters, and the coefficient ‘r’ can be defined as r 2 = un 2 + vn 2 .
.
(15.5)
Similarly, the coordinate transformation from the camera image plane (ud , vd ) (including
distortion effects) to the digital image coordinates (u, v) stored in the computer can be expressed as ⎡
⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ u α γβ u0 ud ud . ⎣ v ⎦ = ⎣ 0 β v0 ⎦ ⎣ vd ⎦ = K ⎣ vd ⎦ , 1 001 1 1 (15.6)
15 Grid-Index-Based Three-Dimensional Profilometry
where (u0 , v0 ) are the position coordinates of the center point ‘o’ in the computer image, α is the scale factor for the u-axis and β represents the scale factor for the v-axis, respectively, in the digital image, and K is the inside or intrinsic camera parameter. The 3D model of the measuring surface can be obtained by establishing correspondence between the projected image pattern and captured image pattern from the measuring surface. So, the grid points of the captured image pixel coordinates can be represented as (u1 , v1 , 1)T , and similarly, the projected image pixel coordinates can be defined as (u2 , v2 , 1)T . The corresponding relationship between these two coordinates, the pixel coordinates of the world coordinate system (Xw , Yw , Zw , 1), can be obtained. So we can rewrite Eq. (15.2) in the form of matrix multiplication as follows
ZC1
.
⎤
⎡
⎤
Finally, Eqs. (15.1), (15.6)–(15.8) define the relationship between the coordinate points of the world coordinate in space and the pixel coordinates of the captured image as
ZC1
.
⎤ un1
⎣ vn1 ⎦ = 1 + k1 r 2 + k2 r 4 K 1 ⎡ ⎤ ⎡ ⎤ XW 1000 ⎢ ⎥ ⎣ 0 1 0 0 ⎦ R T ⎢ YW ⎥ . (15.9) 0 1 ⎣ ZW ⎦ 0010 1
⎡
⎤ XW u1 ⎢ YW ⎥ ⎥ .ZC1 ⎣ v1 ⎦ = M ⎢ ⎣ ZW ⎦ 1 1 ⎤ ⎡ ⎡ ⎤ X m11 m12 m13 m14 ⎢ W ⎥ YW ⎥ = ⎣ m21 m22 m23 m24 ⎦ ⎢ ⎣ ZW ⎦ . (15.10) m31 m32 m33 m34 1 ⎡
⎤
Similarly, we can find the relationship between the world coordinate system and the coordinates system of the projected pattern with a point-topoint transformation by ⎡
⎤ XW u2 ⎢ YW ⎥ ⎥ .ZC2 ⎣ v2 ⎦ = N ⎢ ⎣ ZW ⎦ 1 1 ⎡ ⎤ ⎤ XW ⎡ n11 n12 n13 n14 ⎢ YW ⎥ ⎥ = ⎣ n21 n22 n23 n24 ⎦ ⎢ ⎣ ZW ⎦ , n31 n32 n33 n34 1 ⎡
(15.7)
Usually, we do not need to consider the tangential lens distortion since, nowadays, the camera lens distortion effects are overcome and accommodated during camera manufacturing. Therefore, Eqs. (15.3) and (15.4) will be simplified as follows: ⎤ ⎤ ⎡ ⎡ un1 ud1
2 4 ⎣ . ⎣ vd1 ⎦ = 1 + k1 r + k2 r vn1 ⎦ (15.8) 1 1
⎡
Eventually, it can be written as
⎡
⎤ XC1 1000 ⎢ un1 ⎥ ⎣ vn1 ⎦ = ⎣ 0 1 0 0 ⎦ ⎢ YC1 ⎥ ⎣ ZC1 ⎦ 0010 1 1 ⎡
301
⎤
(15.11)
where ZC1 and ZC2 are the coordinates of point ‘P’ in the camera coordinates system and an optical axis of the projector coordinate system, respectively. ‘mij ’ is the ith row and the jth column of the matrix ‘M’. At the same time, ‘nij ’ is the ith row and the jth column of matrix ‘N’, respectively. From Eqs. (15.10) and (15.11) with the elimination of ‘ZC1 ’ and ‘ZC2 ’, we can obtain the world coordinate, i.e., [Xw , Yw , Zw ]T . The system of equations with three elements of world coordinates is shown in matrix form ⎡
⎤ (u1 m31 − m11 ) (u1 m32 − m12 ) ⎢ (u1 m33 − m13 ) ⎥ ⎢ ⎥ ⎢ (v1 m31 − m21 ) (v1 m32 − m22 ) ⎥ ⎢ ⎥ ⎢ (v1 m33 − m23 ) ⎥ ⎢ ⎥ . ⎢ (u1 n31 − n11 ) (u1 n32 − n12 ) ⎥ ⎢ ⎥ ⎢ (u1 n33 − n13 ) ⎥ ⎢ ⎥ ⎣ (v1 n31 − n21 ) (v1 n32 − n22 ) ⎦ (v1 n33 − n23 )
302
E. Ahsan et al.
⎡ ⎤ ⎤ m14 − u1 m34 XW ⎢ ⎥ ⎣ YW ⎦ = ⎢ m24 − v1 m34 ⎥ . (15.12) ⎣ n14 − u1 n34 ⎦ ZW n24 − v1 n34 ⎡
In the algebraic form, it can be represented as ⎧ ⎫ (u1 m31 − m11 ) XW + (u1 m32 − m12 ) YW ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + (u1 m33 − m13 ) ZW ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ m − m + m − m X Y (v (v ) ) ⎪ ⎪ 1 31 21 W 1 32 22 W ⎪ ⎪ ⎨ ⎬ + (v1 m33 − m23 ) ZW . ⎪ ⎪ (u1 n31 − n11 ) XW + (u1 n32 − n12 ) YW ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ + (u1 n33 − n13 ) ZW ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ n − n + n − n X Y (v (v ) ) 1 31 21 W 1 32 22 W ⎪ ⎪ ⎩ ⎭ + (v1 n33 − n23 ) ZW ⎫ ⎧ m14 − u1 m34 ⎪ ⎪ ⎪ ⎪ ⎬ ⎨ m24 − v1 m34 . = n − u1 n34 ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 14 n24 − v1 n34 (15.13) Thus, Eqs. (15.12) or (15.13) above is constrained by a linear system composed of four equations with three unknowns. A unique theoretical solution can be obtained directly. However, practically, the extracted data may contain noise. Therefore, using the least square method to solve for the world coordinates of point P is equivalent to the sum of squares of the minimum distance for two rays emitted by the camera. So, Eq. (15.13) can be rewritten as A P = b,
.
where ⎡
(15.14)
⎤ (u1 m31 − m11 ) (u1 m32 − m12 ) ⎢ ⎥ (u1 m33 − m13 ) ⎢ ⎥ ⎢ (v1 m31 − m21 ) (v1 m32 − m22 ) ⎥ ⎢ ⎥ ⎢ ⎥ (v1 m33 − m23 ) ⎥ .A = ⎢ ⎢ (u1 n31 − n11 ) (u1 n32 − n12 ) ⎥ ⎢ ⎥ ⎢ ⎥ (u1 n33 − n13 ) ⎢ ⎥ ⎣ (v1 n31 − n21 ) (v1 n32 − n22 ) ⎦ (v1 n33 − n23 ) ⎤ ⎡ XW .P = ⎣ YW ⎦ , and ZW
⎡
⎤ m14 − u1 m34 ⎢ m24 − v1 m34 ⎥ ⎥ .b = ⎢ ⎣ n14 − u1 n34 ⎦ . n24 − v1 n34 Then the point ‘P’ in the world coordinate system can be determined by −1 T P = AT A A b
.
15.5
(15.15)
Conclusion
Grid-indexed-based 3D surface profilometry is a technique based on a spatially encoded structure light system. In this chapter, grid-indexed-based surface profilometric techniques are explained in tutorial fashion to enhance the capability and learning ability of readers. In the first section, we have introduced the concept of this technique with the necessary background and history. In Sect. 15.2, we discussed the 1D grid-indexedbased methods. Section 15.3 described 2D gridindex methods. In Sect. 15.4, the mathematical model for the triangulation-based process is explained. Figures are included where necessary to superimpose the understanding and knowledge. Only some pictures are self-created using the specific method mentioned with reference.
References 1. Kersen S (2017) 3D Measurement: a Next Generation Tool for Manufacturing. Quality 56:24,26–27 2. Buonamici F, Carfagni M, Volpe Y (2017) Recent strategies for 3D reconstruction using Reverse Engineering: a bird’s eye view. Adv Mech Des Eng Manuf Lect Notes Mech Eng 841–850. https:// doi.org/10.1007/978-3-319-45781-9_84 3. Huang J, Chen Z, Ceylan D, Jin H (2017) 6DOF VR videos with a single 360 camera. Proc – IEEE Virtual Real 37–44. https://doi.org/10.1109/ VR.2017.7892229 4. Stein F, Medioni G (1992) Structural Indexing: Efficient 3-D Object Recognition. IEEE Trans Pattern Anal Mach Intell 125–145:125. https://doi.org/ 10.1109/34.121785 5. May S, Fuchs S, Droeschel D, et al. (2009) Robust 3D-mapping with Time-of-Flight cameras. 2009 IEEE/RSJ Int Conf Intell Robot Syst
15 Grid-Index-Based Three-Dimensional Profilometry
6.
7.
8. 9.
10.
11.
12.
13.
14.
15.
16.
17. 18.
19.
20.
21.
IROS 2009 1673–1678. https://doi.org/10.1109/ IROS.2009.5354684 Häne C, Heng L, Lee GH, et al. (2017) 3D visual perception for self-driving cars using a multi-camera system: Calibration, mapping, localization, and obstacle detection. Image Vis Comput 68:14–27. https:/ /doi.org/10.1016/j.imavis.2017.07.003 Albitar C, Graebling P, Doignon C (2007) Robust structured light coding for 3D reconstruction. In: Proceedings of the 11th IEEE International Conference on Computer Vision. IEEE, Rio De Janeiro, Brazil, pp 7–12 Criminisi A (2002) Single-View Metrology: Algorithms and Applications. Int J Comput Vis 224–239 Zhang S (2018) High-speed 3D shape measurement with structured light methods: A review. Opt Lasers Eng 106:119–131. https://doi.org/10.1016/ j.optlaseng.2018.02.017 Zuo C, Feng S, Huang L, et al. (2018) Phaseshifting algorithms for fringe projection profilometry: A review. Opt Lasers Eng 109:23–59. https://doi.org/ 10.1016/j.optlaseng.2018.04.019 Zhang S (2010) Recent progresses on real-time 3D shape measurement using digital fringe projection techniques. Opt Lasers Eng 48:149–158. https:// doi.org/10.1016/j.optlaseng.2009.03.008 R. Woodham (1980) Photometric method for determining surface orientation from multiple images. Opt Eng 19:134–140 Moons T, Van Gool L, Vergauwen M (2009) 3D reconstruction from multiple images part 1: Principles. Found Trends Comput Graph Vis 4:287–404. doi:https://doi.org/10.1561/0600000007 T. Higo, Y. Matsushita, N. Joshi KI (2009) A handheld photometric stereo camera for 3-D modeling. In: IEEE 12th International Conference on Computer Vision. pp 1234–1241 Davies ER (2012) Computer and Machine Vision Theory Algorithms Practicalities, Fourth Edi. Academic Press, London Lin SS, Bajcsy R (2003) High resolution catadioptric omnidirectional stereo sensor for robot vision. In: IEEE international conference on robotics. pp 1694– 1699 DHOND UR, Aggarwal K (1989) Structure from Stereo-A Review. IEEE Trans Syst MAN, Cybern 19: Tippetts B, Lee DJ, Lillywhite K, Archibald J (2016) Review of stereo vision algorithms and their suitability for resource-limited systems. J Realt Image Process 11:5–25 Brown MZ, Burschka D, Hager GD (2003) Advances in Computational Stereo. IEEE Trans Pattern Anal Mach Intell 25:993–1008 Wei Z, Zhou F, Zhang G (2005) 3D coordinates measurement based on structured light sensor. Sensors Actuators, A Phys 120:527–535. https://doi.org/ 10.1016/j.sna.2004.12.007 Hata K, Savarese S (2019) CS231A Course Notes 5: Active and Volumetric Stereo. Stanford-CS231A 12
303 22. BATLLE J, MOUADDIB E, SALVI J (1998) Recent progress in coded structured light as a technique to solve the correspondence problem: A survey. Pattern Recognit 31:963–982. https://doi.org/10.1016/ S0031-3203(97)00074-5 23. Geng J (2011) Structured-light 3D surface imaging: a tutorial. Adv Opt Photonics 3:128. https://doi.org/ 10.1364/aop.3.000128 24. Salvi J, Pagès J, Batlle J (2004) Pattern codification strategies in structured light systems. Pattern Recognit 37:827–849. https://doi.org/10.1016/ j.patcog.2003.10.002 25. Ishii I, Yamamoto K, Doi K, Tsuji T (2007) Highspeed 3D image acquisition using coded structured light projection. IEEE Int Conf Intell Robot Syst 925– 930. https://doi.org/10.1109/IROS.2007.4399180 26. Salvi J, Fernandez S, Pribanic T, Llado X (2010) A state of the art in structured light patterns for surface profilometry. Pattern Recognit 43:2666–2680. https:/ /doi.org/10.1016/j.patcog.2010.03.004 27. Will PM, Pennington KS (1971) Grid coding: A preprocessing technique for robot and machine vision. Artif Intell 2:319–329. https://doi.org/10.1016/00043702(71)90015-4 28. Pennington KS, Will PM (1972) Grid Coding: Novel Technique for Image Processing. Proc IEEE 60:669– 680 29. Le Moigne J, Waxman AM (1984) Projected Light Grids for Short Range Navigation of Autonomous Robots., in Proceedings of 7th IEEE International Conference on Pattern Recognition. IEEE, Montereal, Canada, pp 203–206 30. LeMoigne J, Waxman AM (1988) Structured light patterns for robot mobility. IEEE J Robot Autom 4:541–548 31. Y.F. W, A. M, J.K. A (1987) Computation of Surface Orientation and Structure of Objects Using Grid Coding. IEEE Trans Pattern Anal Mach Intell PAMI9:129–137 32. Hu G, Stockman G (1989) 3-D Surface Solution Using Structured Light and Constraint Propagation. IEEE Trans Pattern Anal Mach Intell I:390–402 33. Morano RA, Ozturk C, Conn R, et al (1998) Structured light using pseudorandom codes. IEEE Trans Pattern Anal Mach Intell 20:322–327 34. Tajima J, IWAKAWA M (1990) 3D Data Acquisition by Rainbow Range Finder. In: 10th International Conference on Pattern Recognition. IEEE Computer Society Press, Atlantic City, NJ, USA, pp 309–313 35. Geng J (1996) Rainbow three-dimensional camera new concept of high-speed three-dimensional vision system. Opt Engneering 35:376–383 36. Geng J (2004) Method and apparatus for 3D imaging using light pattern having multiple sub-patterns 37. Zhang S (2005) HHigh-resolution Real-time 3-D Shape Measurement. Stony Brook University 38. Boyer KL, A.C. Kak (1987) Color Encoded Structured Light For Rapid Active Ranging. IEEE Trans Pattern Anal Mach Intell PAMI-9:14–28
304 39. Vuylsteke P, Oosterlinck A (1990) Range Image Acquisition with a Single Binary-Encoded Light Pattern. IEEE Trans Pattern Anal Mach Intell 12:148–164. https://doi.org/10.1109/34.44402 40. Hugli H, Maitre G (2012) Generation And Use Of Color Pseudo Random Sequences For Coding Structured Light In Active Ranging. Ind Insp 1010:75. https://doi.org/10.1117/12.949215 41. Zhang L, Curless B, Seitz SM (2002) Rapid shape acquisition using color structured light and multipass dynamic programming. Proc – 1st Int Symp 3D Data Process Vis Transm 3DPVT 2002 24–36. https:/ /doi.org/10.1109/TDPVT.2002.1024035 42. Pagès J, Salvi J, Collewet C, Forest J (2005) Optimised de Bruijn patterns for one-shot shape acquisition. Image Vis Comput 23:707–720. https://doi.org/ 10.1016/j.imavis.2005.05.007 43. Je C, Lee SW, Park RH (2004) High-contrast colorstripe pattern for rapid structured-light range imaging. In: Eighth European conference on computer vision (ECCV), Lecture Notes in Computer Science. pp 95– 107 44. Maruyama M, Abe S (1993) Range Sensing by Projecting Multiple Slits with Random Cuts. IEEE Trans Pattern Anal Mach Intell 15:647–651. https://doi.org/ 10.1109/34.216735 45. Durdle NG, Thayyoor J, Raso VJ (2002) An improved structured light technique for surface reconstruction of the human trunk. 874–877. https://doi.org/10.1109/ ccece.1998.685637 46. Griffin PM, Narasimhan LS, Yee SR (1992) Generation of uniquely encoded light patterns for range data acquisition. Pattern Recognit 25:609–616. https:/ /doi.org/10.1016/0031-3203(92)90078-W 47. Petriu EM, Sakr Z, Spoelder HJW, Moica A (2002) Object recognition using pseudo-random color encoded structured light. 1237–1241. https://doi.org/ 10.1109/imtc.2000.848675 48. Song Z, Chung R (2008) Grid Point Extraction Exploiting Point Symmetry in a Pseudo-Random Color Pattern. In: 15th IEEE International Conference of Image Processing. IEEE, San Diego, CA, USA, pp 1956–1959 49. Song Z, Chung R (2010) Determining both surface position and orientation in structuredlight-based sensing. IEEE Trans Pattern Anal Mach Intell 32:1770–1780. https://doi.org/10.1109/ TPAMI.2009.192 50. Lu J, Han J, Ahsan E, et al. (2016) A structured light vision measurement with large size M-array for dynamic scenes. Chinese Control Conf CCC 2016-Augus: 3834–3839. https://doi.org/10.1109/ ChiCC.2016.7553951 51. Chen SY, Li YF, Zhang J (2007) Realtime structured light vision with the principle of unique color codes. Proc – IEEE Int Conf Robot Autom 429–434. https:/ /doi.org/10.1109/ROBOT.2007.363824 52. Chen SY, Li YF, Zhang J (2008) Vision Processing for Realtime 3-D Data Acquisition Based on Coded
E. Ahsan et al.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63. 64.
65.
66.
67.
Structured Light. IEEE Trans Image Process 17:167– 176 Morita H, Yajima K, Sakata S (1988) Reconstruction of surfaces of 3-d objects by M-array pattern projection method. In: IEEE Second International Conference on Computer Vision. IEEE, Tampa, FL, USA, pp 468–473 Petriu EM, Bieseman T, Trif N, et al. (1992) Visual Object Recognition Using Pseudo Random Grid Encoding. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. pp 1617–1624 Wijenayake U, Choi SI, Park SY (2012) Combination of color and binary pattern codification for an error correcting m-array technique. Proc 2012 9th Conf Comput Robot Vision, CRV 2012 139–146. https:// doi.org/10.1109/CRV.2012.26 Xiao-jun JIA, Zhi-Jiang Z, Qing-cang Y (2011) Construction for M-arrays and application in structured light. © Shanghai Univ Springer-Verlag Berlin Heidelb 15:63–68 Li F, Shang X, Tao Q, et al. (2021) Single-Shot Depth Sensing with Pseudo Two-Dimensional Sequence Coded Discrete Binary Pattern. IEEE Sens J 21:11075–11083. https://doi.org/10.1109/ JSEN.2021.3061146 Elahi A, Lu J, Zhu QD, Yong L (2020) A Single-Shot, Pixel Encoded 3D Measurement Technique for Structure Light. IEEE Access 8:127254–127271. https:// doi.org/10.1109/ACCESS.2020.3009025 Ito M, Ishii A (1995) A three-level checkerboard pattern (TCP) projection method for curved surface measurement. Pattern Recognit 28:27–40. https://doi.org/ 10.1016/0031-3203(94)E0047-O Wang Z, Zhou Q, Shuang Y (2020) Threedimensional reconstruction with single-shot structured light dot pattern and analytic solutions. Measurement 151:107114. https://doi.org/10.1016/ j.measurement.2019.107114 Zhang S, Huang PS (2006) High-resolution, realtime three-dimensional shape measurement. Opt Eng 45:123601. https://doi.org/10.1117/1.2402128 Posdamer, J.L., Altschuler MD (1982) Surface measurement by space-encoded projected beam systems. Comput Graph Image Process 18:1–17 M.C. Er (1984) On Generating the N-Ary Reflected Gray Codes. IEEE Trans Comput 33:739–741 Carrihill B, Hummel R (1985) Experiments with the Intensity Ratio Depth Sensor. Comput Vis Graph Image Process 32:337–358 Sato K, Inokuchi S (1985) Three-Dimensional Surface Measurement by Space Encoding Range Imaging. Robot Syst 2:27–39 Keizer RL, Dunn SM (1989) Marked grid labeling. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Press, San Diego, CA, USA, pp 612–617 Desjardins D, Payeur P (2007) Dense stereo range sensing with marching pseudo-random patterns. In: Proceedings – Fourth Canadian Conference on Com-
15 Grid-Index-Based Three-Dimensional Profilometry
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
puter and Robot Vision, CRV 2007. IEEE, Montereal, Canada, pp 216–223 Payeur P, Desjardins D (2009) Structured light stereoscopic imaging with dynamic pseudo-random patterns. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). SpringerVerlag, Berlin Heidelberg, pp 687–696 Salvi J, Batlle J, Mouaddib E (1998) A robust-coded pattern projection for dynamic 3D scene measurement. Pattern Recognit Lett 19:1055–1065. https:// doi.org/10.1016/S0167-8655(98)00085-3 Pagès J, Salvi J, Matabosch C (2003) Robust segmentation and decoding of a grid pattern for structured light. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp 689–696 Furukawa R, Viet HQH, Kawasaki H, et al. (2008) One-shot range scanner using coplanarity constraints. In: 15th IEEE International Conference on Image Processing, ICIP. pp 1524–1527 Kawasaki H, Furukawa R, Sagawa R, Yagi Y (2008) Dynamic scene shape reconstruction using a single structured light pattern. In: 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR. pp 1–8 Ulusoy AO, Calakli F, Taubin G (2009) Oneshot scanning using de Bruijn spaced grids. 2009 IEEE 12th Int Conf Comput Vis Work ICCV Work 2009 1786–1792. https://doi.org/10.1109/ ICCVW.2009.5457499 Ali OU, Calakli F, Taubin G (2010) Robust one-shot 3D scanning using loopy belief propagation. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition – Workshops, CVPRW 2010. pp 15–22 Lin H, Nie L, Song Z (2016) A single-shot structured light means by encoding both color and geometrical features. Pattern Recognit 54:178–189. https:// doi.org/10.1016/j.patcog.2015.12.013 Rivat J, Sarkozy A (2005) On pseudorandom sequences and their application. Electron Notes Discret Math 21:369–370. https://doi.org/10.1016/ j.endm.2005.07.060 Rivat J, Sárkozy A (2006) On pseudorandom sequences and their application. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect
305
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88. 89.
Notes Bioinformatics) 4123 LNCS:343–361. https:/ /doi.org/10.1007/11889342_19 Etzion T (1988) Constructions for Perfect Maps and Pseudorandom Arrays. IEEE Trans Inf Theory 34:1308–1316. https://doi.org/10.1109/18.21260 Macwilliams FJ, Sloane NJA (1976) Pseudo Random Sequences and Arrays. Proc IEEE 64:1715–1729. https://doi.org/10.1109/PROC.1976.10411 Yee SR, Griffin PM (1994) Three-dimensional imaging system. Opt Eng 33:2070–2075. https://doi.org/ 10.1117/12.169713 Jia X, Yue G, Mei F (2009) The mathematical model and applications of coded structured light system for object detecting. J Comput 4:53–60. https://doi.org/ 10.4304/jcp.4.1.53-60 Maurice X, Graebling P, Doignon C (2011) Epipolarbased structured light pattern design for 3-D reconstruction of moving surfaces. In: Proceedings – IEEE International Conference on Robotics and Automation. pp 5301–5308 Fang M, Shen W, Zeng D, et al. (2015) Oneshot monochromatic symbol pattern for 3D reconstruction using perfect submap coding. Optik (Stuttg) 126:3771–3780. https://doi.org/10.1016/ j.ijleo.2015.07.140 Song L, Tang S, Song Z (2017) A robust structured light pattern decoding method for single-shot 3D reconstruction. In: IEEE International Conference on Real-Time Computing and Robotics, RCAR. IEEE, Okinawa, Japan, pp 668–672 Forster F (2006) A high-resolution and high accuracy real-time 3D sensor based on structured light. Proc – Third Int Symp 3D Data Process Vis Transm 3DPVT 2006 208–215. https://doi.org/ 10.1109/3DPVT.2006.13 Fechteler P, Eisert P (2008) Adaptive color classification for structured light systems. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition and Workshop. IEEE Computer Society Press, Anchorage, AK, USA, pp 49–59 Koninckx TP, Van Gool L (2006) Real-time range acquisition by adaptive structured light. IEEE Trans Pattern Anal Mach Intell 28:432–445. https://doi.org/ 10.1109/TPAMI.2006.62 Hata K, Savarese S (2015) CS231A Course Notes 1 : Camera Models. 16 Savarese S (2015) Lecture 2: Camera Models. 18
Depth from Time-of-Flight Imaging
16
Mohit Gupta
Abstract
In this chapter, we discuss techniques that use temporally coded light sources for measuring scene depths. The light source is modeled as a point source whose intensity can be modulated over time at high speeds. Examples of such sources are lasers that can emit light pulses of short duration (few picoseconds to a few nanoseconds) and light-emitting diodes (LEDs) whose intensity can be modulated at high temporal frequencies (tens of MHz to GHz). In all cases, the objective is to measure the total time it takes for light to travel from the source to a scene point and from the scene point back to a sensor, which is typically colocated with the source. This time duration is referred to as time-of-flight (ToF) and is proportional to the distance between the scene point and the sensor.
16.1
Time-of-Flight: Basic Principle
Consider two points A and B in space such that the distance between them is d. Suppose an entity (a physical object or a wave) traveling at a constant speed c takes time t to travel from A to B. Then, d, c, and t are related by the simple equation: .d = c × t. (16.1) If speed and travel time are known, the distance can be computed. ToF as a distance measurement technique is used extensively in nature by animals such as bats and whales for navigation and hunting. These natural ToF systems are based on the propagation of sound waves. Our interest here is in light based or optical ToF systems which are based on measuring the propagation time of light.
Keywords
Time-of-flight · Depth-sensing · Temporal light coding · 3D cameras · Range imaging · Pulsed time-of-flight cameras · Coded time-of-flight cameras · Homodyne coding · Heterodyne coding
M. Gupta () Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA e-mail: [email protected]
16.2
Optical ToF-Based Depth Measurement Systems
Conceptually, all optical ToF systems have a similar configuration, which is illustrated in Fig. 16.1. They consist of a light source that emits temporally coded light, i.e., the intensity of the emitted light is temporally modulated. The light travels to the scene, is reflected, and then travels back to a sensor that is approximately co-located with the source. The total time taken by light to travel back and forth is measured by comparing the emitted
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_16
307
308
M. Gupta
source
scene sensor Fig. 16.1 Basic operating principle of optical ToFbased depth recovery. Optical ToF systems consist of a light source whose intensity is temporally modulated at high speeds. The light emitted by the source travels to the scene of interest, is reflected, and is then received by a sensor. Although for illustration purposes the source and the sensor are shown to be
separate from each other, in practice, their distance is small as compared to the distances of scene points. Hence, we can assume the source and sensor to be colocated. The total time taken by light to travel back and forth is measured by comparing the emitted light and the received light. The travel time is then used to estimate the scene distances
light and the light received at the sensor. Since the speed of light is known, Eq. 16.1 is used to estimate the scene depths.
“stop-watch” measures the total travel time .τ = tstop − tstart . The distance d between the sensor and the scene (scene depth) is then found as:
Classification of ToF Techniques Optical ToF techniques can be broadly classified into two main categories based on the way the emitted light is modulated: (a) impulse ToF and (b) continuous-wave ToF. In impulse ToF techniques, the light source emits very short pulses of light (a typical duration of less than 1 nanosecond), whereas in continuous-wave (CW) ToF, the source emits light continuously with the intensity of light being modulated over time.
16.3
Impulse Time-of-Flight
Conceptually, the simplest ToF-based depth recovery method is impulse (or pulsed) ToF. In such a system, the light source emits a short temporal pulse of light towards the scene whose depth needs to be measured. The pulse travels to the scene, and a fraction of it gets reflected in the direction of the sensor (detector), as shown in Fig. 16.2. Let .tstart be the time instant when the source emits the pulse and .tstop be the time instant when the sensor receives the pulse. The source and the sensor are temporally synchronized so that they share the timing information. A high precision
d=
.
cτ , 2
(16.2)
where c is the speed of light. The factor of 2 on the right-hand side accounts for the fact that the travel time .τ is for the round-trip distance. Impulse ToF is the basis of the first LIDAR (light-detection-and-ranging) systems developed nearly 50 years ago [15, 27]. Since then, these systems have been widely used for measuring the shape and position of objects, especially in applications involving large-scale scenes such as architectural surveying, urban mapping, autonomous navigation, and inspection of aeroplanes and ships. LIDAR systems have also been used to capture large-scale terrain data [33]. Example topographic and bathymetric data for the United States acquired using LIDAR is available from the United States Interagency Elevation Inventory [38]. Today, many commercial range estimation systems are based on the impulse ToF method [32, 55]. Hardware Implementation An impulse ToF system has three basic components [25]: the source (transmitter), the sensor (receiver), and the timing unit (stop-watch).
16
Depth from Time-of-Flight Imaging
309
source stop-watch scene sensor
(a) setup
intensity
time delay emitted pulse
received pulse time
(b) measurement principle Fig. 16.2 Impulse time-of-flight imaging. (a) Impulse ToF systems consist of a light source which emits short temporal pulses of light. The pulse travels to the scene, and a fraction of it gets reflected in the direction of the sensor.
(b) A high precision “stop-watch” measures the duration between the instant when the pulse is emitted by the source and the instant when the returning pulse is received by the sensor
Source In order to achieve high depth resolution and accuracy, the light source in an impulse ToF system must emit pulses of very short durations (typically less than 1 nanosecond) and high peak optical power [28]. Such pulses can be generated by using a pulsed laser or a laser diode. For instance, optical pulses with widths of less than 100 picoseconds and peak power in the range of 100 mW to 100 W can be generated by either driving a laser with an external electronic circuit or by using mode-locked or Q-switched lasers. With such sources, it is possible to achieve a depth resolution of approximately 1 centimeter.
ToF systems [36]. Single photon avalanche photodiodes (SPADs) are used in applications where higher sensitivity and precision are required [11, 34]. The bandwidths achieved by such SPADs can be in excess of GHz, which makes it possible to achieve high depth resolution.
Sensor The sensor needs to measure the reflected light at a fast rate in order to detect the short pulses. Due to their small size and high measurement speed (bandwidth), photodiodes are typically used as sensors in impulse
Timing Unit The timing unit or the stop watch is implemented by using a time-to-amplitudeconverter (TAC). The source and the sensor send electrical pulses (called the start and stop pulse, respectively) to the timing unit at the instants when they emit and receive optical pulses. The TAC produces an analog voltage that is proportional to the time delay between the start and the stop pulses, which is then measured after conversion into a digital voltage value. Another way to implement the timing unit is by using a time-
310
M. Gupta
scene
scene
scene
raster scan
source and sensor (a) 0D illumination 0D sensor 2D scanning
source and sensor (b) 1D illumination 1D sensor 1D scanning
source and sensor (c) 2D illumination 2D sensor No scanning
Fig. 16.3 Different hardware setups for impulse ToF imaging. (a) Point scanning systems emit a single beam of light, and the detector is a single photodiode. These systems measure the depth of one scene point at a time. In order to measure the depth of an entire scene, the source and the sensor are scanned along two dimensions. (b)
Stripe scanning systems emit a 1D light sheet, and the sensor consists of a 1D array of photodiodes. In this case, scanning is required only along one dimension. (c) Fullframe systems consist of a 2D array of photodiodes and a light source that emits a cone of light. Such systems do not require any mechanical scanning
to-digital converter (TDC). Using a digital system clock which oscillates at a known frequency, a TDC measures the number of system clock periods between the start and the stop pulses. The higher the clock frequency, the higher the temporal resolution of the TDC, and thus, the higher the depth resolution achieved by the ToF system.
Resolution and Noise Analysis There are several factors that determine the depth resolution of an impulse ToF system, including the laser pulse width and peak optical power and the sensor bandwidth. Specifically, the resolution is proportional to the square root of the optical power and the sensor bandwidth and inversely proportional to the square root of the pulse width [28]. The resolution can be increased by sending multiple light pulses periodically and averaging the distances estimated by using individual pulses. The resolution of the average estimate is higher than the individual estimates by a multiplicative factor equal to the square root of the number of pulses used [28].
Scanning Vs. Full-Frame Systems In early impulse ToF systems, the source emitted a single beam of light, and the detector was a single photodiode. These systems were singlepoint range systems, as they measured the depth of one scene point at a time. In order to measure the depth of an entire scene, the source and the sensor were scanned along two dimensions, requiring a large acquisition time. This is illustrated in Fig. 16.3. Faster acquisition can be achieved by a source emitting a 1D light sheet and a sensor with a 1D array of photodiodes. Such a system requires scanning along just one dimension [55]. A full-frame system can be achieved by using a 2D array of photodiodes and a light source that emits a diverging light beam (e.g., a light cone) [37]. Such a system does not require any mechanical scanning.
Depth resolution is also affected by several sources of measurement noise, including photon noise, sensor read noise, laser speckle, and the timing jitter in the timing unit. For example, photon noise in the reflected light (due to both the source and ambient illumination) may cause spurious pulses in the received signal, which results in an incorrect measurement of the time delay .τ . Let .Δτ be the error in the time delay measurement. The resulting distance error .Δd is given by:
16
Depth from Time-of-Flight Imaging
cΔτ (16.3) . 2 For instance, a timing error of 1 nanosecond will result in a distance error of 15 centimeters. For a detailed discussion on the noise sources in an impulse ToF system and resolution analysis, see [25, 28, 36]. Impulse ToF systems have been used for measuring scene characteristics (3D shape, motion, and appearance) of non-line-of-sight scenes (“around-the-corner”) [26, 40, 56]. These systems require very high timing resolution. Thus, they use mode-locked lasers with femtosecond pulse widths and streak cameras with picosecond temporal resolution. Δd =
.
311
the radiant intensity emitted by the light source along a given direction can be written as: M(t) = om + am cos(ωt) ,
.
(16.4)
where .om and .am are the offset and the amplitude of the sinusoid. It is assumed that the initial phase (at .t = 0) of the sinusoid is zero. Since the light intensity is always positive, .om ≥ am . As the emitted light travels to the scene and is reflected back to the sensor, the sinusoid undergoes a temporal shift. Consider a scene point S, which is imaged at sensor pixel .p. The radiance .L(p, t) received at .p is given by: L(p, t) = α(p)M (t − τ (p)) + A(p) ,
.
(16.5)
16.4
Continuous-Wave Time-of-Flight
The major drawback of impulse ToF systems is the expensive hardware requirements. As discussed in the previous section, the light source must be a laser with high peak optical power, the sensor needs to have high bandwidth, and the timing unit must also have a high temporal resolution. For instance, in order to have a depth resolution of 1 centimeter, the timing of the received pulse must be resolved to within 66 picoseconds. These requirements significantly increase the cost of impulse ToF systems, often making them unsuitable for consumer applications. In order to address these limitations, continuous-wave time-of-flight (CW ToF) systems were developed [43, 51]. A CW ToF system consists of a light source whose intensity is continuously modulated over time, according to a periodic function .M(t), called the modulation function. Such continuous modulation can be achieved on a wide range of light sources including light-emitting diodes (LEDs), which are significantly cheaper as compared to lasers. Also, the sensor does not need to sample the reflected light at high speeds, thus lowering the bandwidth requirements and cost of the sensor. The most widely used modulation function in CW ToF systems is a temporal sinusoid, where
2d(p) c
is the temporal shift; it is the where .τ (p) = time taken by the light to travel from the source to scene point S and then back to the sensor. .d(p) is the distance between the sensor and the scene point S. The constant .α(p) encapsulates the reflectance properties and orientation of S, the camera’s gain, the light source’s brightness, and the intensity fall-off. .A(p) is the ambient illumination term, which is assumed to be constant over time. By substituting Eq. 16.4 into Eq. 16.5, and simplifying, we get: L(p, t) = ol (p) + al (p) cos (ωt − φ(p)) ,
.
(16.6) where .ol (p) = α(p)om + A(p) and .al (p) = α(p)am . Note that .L(p, t) is also a temporal sinusoid with offset .ol , amplitude .al , and phase 2ωd(p) .φ(p) = . Essentially, .L(p, t) is a phasec shifted, scaled, and offset version of the emitted sinusoid .M(t). The amount of phase-shift .φ(p) is directly proportional to the scene depth, as shown in Fig. 16.4. The process of estimating phase .φ(p) (and hence depth .d(p)) from the received radiance .L(p, t) is called decoding or demodulation. Decoding is performed by measuring the temporal correlation of .L(p, t) with a function .R(t) which is called the decoding or the demodulation function. If the sensor’s exposure is temporally mod-
312
M. Gupta
intensity
phase-shift
emitted light
received light
time Fig. 16.4 Basic principle of continuous wave ToF method. In continuous-wave ToF systems, the intensity of the light emitted from the source is temporally modulated according to a continuous function. The intensity of light received at the sensor is a scaled, offset, and temporally
shifted version of the emitted light. The amount of shift is proportional to the travel time, and hence, the scene distance. Distances are computed by comparing the emitted and received light and estimating the shift between them
ulated during image capture according to .R(t),1 the brightness .B(p) measured at a sensor pixel .p is given by the correlation between .R(t) and .L(p, t):
are both sinusoids of the same temporal frequency, • Heterodyne decoding, where .M(t) and .R(t) are sinusoids of different frequency.
B(p) = R(t) ⋆ L(p, t)
.
T+
16.4.1 Homodyne Decoding
Tint 2
=
R(t)L(p, t)dt,
(16.7)
T T − int 2
where .⋆ is the correlation operator, T is the time instant at the middle of the sensor integration period, and .Tint is the length of the integration period, as shown in Fig. 16.5. Equation 16.7 represents the image formation model of CW ToF imaging. As we will show shortly, phases can be recovered from the measured image brightness values.
In homodyne decoding [29], the demodulation function .R(t) is also a sinusoid of the same frequency .ω as the modulation function .M(t): R(ψ, t) = or + ar cos(ωt − ψ) ,
.
(16.8)
where .or and .ar are the offset and the amplitude of .R(t), and .ψ is the relative phase between .R(t) and .M(t). Substituting Eqs. 16.6 and 16.8 in Eq. 16.7, and simplifying, we get: B(p, ψ) = ob (p) + ab (p) cos (ψ − φ(p)) , (16.9)
.
Decoding Approaches For sinusoidal modulation, there are two main decoding approaches depending on the choice of the demodulation function .R(t):
where .ob (p) = or ol (p)Tint and .ab (p) = ar al (p)Tint 2 . Equation 16.9 states that the measured 2 brightness .B(p, ψ) is also a sinusoidal function with three unknowns .ob (p), .ab (p), and .φ(p).
• Homodyne decoding, where modulation and demodulation functions, .M(t) and .R(t),
2 Strictly
1 This is achieved either by modulating the brightness gain of the sensor on-chip (e.g., photonic mixer devices [29,30, 48]) or by using fast optical shutters (e.g., Pockels cells or image intensifier tubes) in front of the sensor [3, 7, 49].
infinity, where .Tmod = 2π ω is the period of the modulation sinusoid. In practice, the approximation is valid since typically, .Tmod is of the order of a few nanoseconds, and .Tint is of the order of a few milliseconds.
speaking, Eq. 16.9 is an approximation. It holds int exactly only in the limit as the ratio . TTmod approaches
16
Depth from Time-of-Flight Imaging
313
source
source modulation function sensor
signal generator
scene
sensor exposure function
measured brightness
Fig. 16.5 Decoding by taking correlation measurements. Decoding (estimating phase) is performed by measuring the temporal correlation of light received at the sensor and a demodulation function. This is achieved by
These unknowns can be estimated by taking three image brightness measurements .B (p, ψ) for different values of .ψ, for instance, .ψ = 0, 2π .ψ = , and .ψ = 4π : 3 3
temporally modulating the sensor’s exposure according to the demodulation function during the sensor integration period
B = CX ,
.
where ⎡
⎤ B(0) ⎢ ⎥ ⎢B 2π ⎥ ⎥ .B = ⎢ ⎢ 3 ⎥ , ⎣ 4π ⎦ B 3 ⎡ ⎤ 1 cos(0) sin(0) ⎢ ⎥ ⎢ ⎥ ⎢1 cos 2π sin 2π ⎥ C=⎢ 3 3 ⎥, ⎢ ⎥ ⎣ 4π ⎦ 4π sin 1 cos 3 3 ⎡ ⎤ ob X = ⎣ab cos (φ)⎦ . (16.14) ab sin (φ)
B (p, 0) = ob (p) + ab (p) cos (0 − φ(p)) ,
.
B p, B p,
2π 3 4π 3
.
(16.10) 2π = ob (p) + ab (p) cos − φ(p) , 3
.
(16.11) 4π = ob (p) + ab (p) cos − φ(p) . 3
(16.12) In order to take a measurement .B (p, ψ), the demodulation function .R(ψ, t) is shifted by phase .ψ relative to the modulation function .M(t), and then correlated with the received radiance, as shown in Fig. 16.6. This process is called phaseshifting. Whereas the active triangulation method uses sinusoidal patterns that are spatially shifted, the time-of-flight approach discussed here uses temporally coded sinusoidal illumination. The above set of equations can be written compactly as a linear system of three equations:
(16.13)
For brevity, we have dropped the argument .p. B is the .3 × 1 vector of the measured intensities at pixel .p. .C is the measurement matrix of size .3×3. .X is the .3×1 unknown vector. .X can be estimated by simple linear inversion: .X = C−1 B. Note that the linear inversion is performed for each camera pixel individually. Once .X3×1 is recovered, the phase .φ is computed as: .
314
M. Gupta
received radiance
demodulation function
time
time
measurement 1
measurement 2
Fig. 16.6 Homodyne decoding. In homodyne decoding, both modulation and demodulation functions are sinusoids of the same frequency. Phase of the received light can be
φ = arccos
.
X(2) X(2)2 + X(3)2
time
measurement 3
estimated by taking three correlation measurements. For each measurement, the demodulation function is phaseshifted and then correlated with the received radiance
16.4.2 The 4-Bucket Method ,
(16.15)
where .X(j ) is the .j th, j = [1, 2, 3], element of the estimated vector .X3×1 . .arccos(.) is the inverse cosine function.3 Finally, the depth is computed as: cφ (16.16) .d = . 2ω Relationship Between Sensor Frame Rate and Modulation Frequency Recall the image formation model of CW ToF imaging as given in Eq. 16.7. Notice that the integration time .Tint (and the frame rate) of the sensor could be chosen independently of the frequency of the modulation function. For instance, even if the modulation function has a high frequency of .ω = 100 MHz (period of 10 nanoseconds), the sensor could have a low frame rate of 10 Hz (integration time .Tint = 100 milliseconds). Because of the low frame rate and bandwidth requirement of the sensor, and hence low cost, CW ToF techniques are widely used in consumer CToF devices [22, 35, 45].
While three images are theoretically sufficient for estimating the phase, more measurements may be taken for increasing accuracy in the presence of image noise. If the number of measurements is .N > 3, the linear system of equations (Eq. 16.13) becomes overdetermined because the number of equations is more than the number of unknowns. Such a system can be solved by using linear least squares methods. One special case worth discussing is .N = 4, i.e., taking four measurements at phase-shifts of π . : 2 B (0) = ob + ab cos (0 − φ)
.
B
π 2
= ob + ab cos (φ) , . (16.17) π = ob + ab cos −φ 2 (16.18) = ob + ab sin (φ) , .
B (π) = ob + ab cos (π − φ) B
3π 2
= ob − ab cos (φ) , . (16.19) 3π = ob + ab cos −φ 2 = ob − ab sin (φ) .
(16.20)
3 The function .arccos(.) returns a phase value .φ in the range
of .[0, π ]. The true phase value could be either .φ or .2π − φ. This is because .cos(θ) = cos(2π − θ). The two-way ambiguity can be resolved by considering the sign of .X(3). If .X(3) = ac sin (φ) > 0, then .0 ≤ φ ≤ π , else, .π ≤ φ ≤ 2π.
As before, we have dropped the argument .p for brevity. From these measurements, the phase .φ can be recovered as:
16
Depth from Time-of-Flight Imaging
315
B π2 − B 3π 2 .φ = arctan . B (0) − B (π)
(16.21)
This method is used in several commercially available CW ToF sensors. The four measurements are taken simultaneously using the 4bucket (or 4-tap) lock-in pixel architecture [31], thus enabling single-shot depth estimation.
16.4.3 Multi-Frequency Phase Shifting The depth error due to image noise in a CW ToF system is inversely proportional to the modulation frequency .ω [30]: 1 .Δd ∝ . ω
(16.22)
Thus, higher accuracy can be achieved by using higher modulation frequencies. On the other hand, recall from Eq. 16.15 that the phase .φ is computed by using inverse trigonometric functions (e.g., arctan, arcsine, arccosine), which have a range of .2π . Since the phase and the depth are related as .φ = ω 2dc , all scene depths of the form πc .d + n for any integer n will have the same ω recovered phase, leading to depth ambiguities. This is called the wrapped phase problem. It follows that the maximum depth range .Rmax in which depths can be measured unambiguously is given by .Rmax = πωc [29]. For example, for 4 .ω = 2π × 100 MHz, .Rmax is .1.5 meters. This presents a trade-off between achieving a large depth range and achieving high depth accuracy. Higher modulation frequencies can achieve high depth accuracy, albeit in a small unambiguous depth range. How can we measure accurate scene depths in a large range? It is possible to achieve both high accuracy and a large depth range by using a multi-frequency approach which involves capturing images and estimating the phases for two or more frequen-
cies [23, 24, 42, 51].5 For example, two frequencies could be used such that one is sufficiently low to achieve the desired unambiguous depth range, and the other is sufficiently high to achieve the desired level of accuracy. The high frequency phase gives accurate but ambiguous depth information, which can be disambiguated by using the noisy but unambiguous low frequency phase. Alternatively, it is also possible to use multiple high frequencies [17, 23]. For a detailed discussion on phase disambiguation (phase-unwrapping) methods in CW ToF imaging, see [18].
16.4.4 Heterodyne Decoding In heterodyne decoding [6, 13, 43, 51], the frequencies .ω and .ω' of the source modulation and sensor demodulation functions are slightly different, as shown in Fig. 16.7a. Let .M(t) and .R(t) be given as: M(t) = om + am cos(ωt) , .
.
R(ψ, t) = or + ar cos(ω' t − ψ) ,
(16.23) (16.24)
where .om and .am are the offset and amplitude of M(t), respectively. .or , .ar and .ψ are the offset, amplitude, and phase of .R(t). The frequencies .ω and .ω' are chosen so that .ωbeat = |ω − ω' | ⪡ ω and .ωbeat ⪡ ω' . For instance, typically, .ω and .ω' are of the order of 10–100 MHz, whereas .ωbeat = 0.1–1 Hz. As described earlier in Eq. 16.6, the radiance .L(p, t) incident at a pixel .p is a scaled and phaseshifted version of .M(t). Similar to homodyne decoding, in order to measure the phase of .L(p, t), it is correlated with the demodulation function .R(t). Substituting Eqs. 16.6 and 16.24 in the image formation equation (Eq. 16.7), we get an expression for the measured brightness at pixel .p: .
B(p, ψ, T )
.
= ob (p) + ab (p) cos (ωbeat T + ψ − φ(p)) , (16.25)
4.ω
is the angular modulation frequency, which is .2π times the modulation frequency.
5 If
F is the number of frequencies used, the total number of images required for the multi-frequency approach is at least 3F .
316
M. Gupta
(a)
demodulation function
value
modulation function
time
(b)
value
phase measured image brightness time
Fig. 16.7 Heterodyne decoding. (a) In heterodyne decoding, the modulation and demodulation functions are sinusoids with different frequencies. The difference in the frequencies is called the beat frequency. (b) The captured image brightness varies sinusoidally (at the beat
frequency) as a function of the time instant when the measurement is taken. The phase can be recovered by taking three image measurements for different phases of the demodulation function or by taking measurements at different time instants
where .ob (p) and .ab (p) are as defined after Eq. 16.9.6 This equation is similar to the image brightness equation (Eq. 16.9) of homodyne decoding, with one important difference. The image brightness in homodyne decoding is independent of the time instant T when the sensor takes the measurement.7 In contrast, the image brightness in heterodyne decoding varies sinusoidally as a function of T with frequency equal to the beat frequency .ωbeat = ω' − ω, as shown in Fig. 16.7b. Substituting ' .ψ = ψ + ωbeat T in Eq. 16.25, we get:
tion or by taking measurements at different time instants T .
B(p, ψ ' ) = ob + ab cos ψ ' − φ(p) . (16.26)
.
This equation has the same form as Eq. 16.9. The phase .φ(p) can be recovered by taking three correlation measurements .B(p, ψ ' ) for different values of .ψ ' , which can be achieved by either varying the phase .ψ of the demodulation func-
6 We
assume that the period of the beat sinusoid .Tbeat is significantly larger than the sensor integration period, i.e., 2π .Tbeat ⪢ Tint , where .Tbeat = ωbeat . In practice, this approximation is valid because typically, .Tbeat = 0.1 − 1 seconds, and .Tint is of the order of a few milliseconds. 7 T is defined with respect to a reference time, e.g., the time instant when the light source starts emitting light.
Noise Analysis The depth accuracy of a CW ToF system is limited by several sources of image noise including shot noise, sensor read noise, and quantization noise. While read noise and quantization noise can be mitigated by using higher quality or cooled sensors, shot noise is due to the quantum nature of arriving photons and cannot be avoided. The standard deviation of photon noise is proportional to the square-root of the number of photons due to both the signal (light emitted from the source) and any background ambient illumination [19, 29, 46, 47]. The standard deviation of depth error due to shot noise of a CW ToF system that uses sinusoidal modulation is given by: Δd =
.
√ c ob , ωκab
(16.27)
where .ω is the modulation frequency [30]. .ob and .ab are the offset and amplitude of the demodulated brightness value received at a pixel (see Eqs. 16.9 and 16.26). .κ is a constant which is proportional to the number of measurements. The offset .ob is proportional to the amount of
16
Depth from Time-of-Flight Imaging
317
background illumination. Thus, stronger ambient illumination reduces the depth accuracy. .ab is proportional to the light source strength, its modulation contrast (ratio of the amplitude and the offset of the emitted sinusoid), and the reflectance of the scene. Thus, higher accuracy is achieved for brighter light sources and scenes. Errors due to Imperfect Modulation. Sinusoidal CW ToF systems assume that both the modulation and demodulation functions are perfectly sinusoidal. However, in practice, the source and the sensor may be modulated by a digitally generated square wave signals. In that case, the functions may have higher-order harmonics, in addition to the base frequency. This results in systematic errors in the recovered depth. It is possible to mitigate these errors by reducing the duty cycle of the modulation waveforms [41]. If the modulation and demodulation signals are not perfectly sinusoidal, heterodyne systems achieve lower error rates as compared to homodyne systems [10, 14].
16.5
Chirp Coding
Chirp coding techniques were originally designed as low cost alternatives to expensive impulsebased RADAR systems [16,21,53] and were later adapted in LIDAR systems [4, 8, 9, 52]. In chirp coding, the source modulation signal .M(t) is a sinusoidal function whose frequency increases linearly with time. The instantaneous frequency .ω(t) is given by: ω(t) = ωo +
.
Δω t, T
(16.28)
where .ωo is the initial frequency, .Δω is the difference between the final and the initial frequency, and T is the duration of the frequency sweep. The instantaneous phase of .M(t) is found by integrating .ω(t) over time: t φ(t) =
.
0
ω(t ' )dt ' .
Substituting Eq. 16.28 into Eq. 16.29, and integrating, we get: φ(t) = ωo t +
.
Δωt 2 . 2T
(16.30)
The resulting function .M(t) is called a chirp signal and is given by:
M(t) = om + am cos (φ(t)) . (16.31) Δωt 2 = om + am cos ωo t + , 2T
.
(16.32) where .om and .am are the constant offset and amplitude. An example chirp signal and its instantaneous frequency transform are illustrated in Fig. 16.8a–b (adapted from [1]). Note that we are considering intensity modulated chirp coding, where the intensity of the light source is modulated as a chirp function. Chirp coding can also be performed by modulating the wavelength of the emitted light (e.g., using tunable lasers), as discussed in the next subsection. The received radiance .L(t) is a scaled and time-shifted version of .M(t):
Δω (t − τ )2 .L(t) = ol + al cos ωo (t − τ ) + 2T
,
(16.33) where .ol and .al are the offset and the amplitude of the received radiance, .τ = 2dc is the timeshift (travel time), and d is the scene distance. Since .L(t) is a time-shifted version of .M(t), the instantaneous frequency transform of .L(t) is also a time-shifted version of the instantaneous frequency transform of .M(t).8 The amount of shift is equal to the travel time .τ , as shown in Fig. 16.8c. As a result, the instantaneous frequencies of .M(t) and .L(t) differ by a constant amount. The difference, .ωb , is proportional to the travel time .τ , as shown in Fig. 16.8d:
(16.29) 8 This
is true only for instantaneous frequency transforms.
318
M. Gupta
intensity (a) modulation signal time
frequency (b) modulation signal time frequency
modulation signal (copy) time delay
(c) received light signal
time frequency
(d) beat frequency time Fig. 16.8 Chirp coding and decoding. In chirp coding, the modulation signal is a sinusoid whose frequency increases linearly with time. (a) An example chirp modulation signal. (b) Instantaneous frequency transform of a chirp signal is a linear ramp. (c) The frequency transform of the radiance received at the sensor is a temporally shifted version of the frequency transform of the
ωb =
.
Δω τ. T
(16.34)
Decoding Chirp For Computing Depth Note that .ωb is the beat frequency of signals .M(t) and .L(t), and can be estimated by measuring their product. The multiplication is performed electronically by using a photodiode to convert .L(t) into an electrical signal .Le (t), and then multiplying .Le (t) with a copy of the original modulation signal .M(t) [52]. The product .P (t) is given as: P (t) = L(t)M(t).
.
(16.35) Δωτ Δωτ 2 = op + ap cos t + ωo τ − 2T T +Ph (t) ,
(16.36)
emitted modulation signal. (d) Due to this time-shift, the instantaneous frequencies of the modulation signal and the received light differ by a constant amount. The difference is proportional to the travel time (and the scene depth) and is measured by determining the beat frequency of the modulation signal and the received light signal [1]
where .op = om ol and .ap = am2al . The function .Ph (t) denotes the sum of all the terms of the product except the DC term (.op ) and the temporal cosine term of frequency . Δωτ . Chirp coding sysT tems are designed so that the frequency of all the terms in .Ph (t) is greater than . Δωτ . Hence, .Ph (t) T can be removed by low-pass filtering of .P (t). The filtered version, .Pf (t) is given by: Δωτ .Pf (t) = op + ap cos t + ωo τ , (16.37) T
where the .τ 2 term is also ignored, since typically, .τ ⪡ T . .Pf (t) is a temporal cosine function with frequency .ωb = Δωτ , which can be estimated by T computing the Fourier transform of .Pf (t). Scene depth d can then be determined as:
16
Depth from Time-of-Flight Imaging
d=
.
ωb T c . 2Δω
319
(16.38)
Wavelength-Modulated Chirp LIDAR In the approach described above, chirp coding is performed by modulating the intensity of the emitted light. Chirp coding can also be performed by modulating the wavelength of the emitted light [2,5,12,39,50,54,57]. This can be achieved by using tunable lasers whose optical frequency (wavelength) is modulated over time. The beat frequency between the emitted and received signal is measured by coherent detection. The depth resolution .Δd achieved by chirp coding-based ToF systems is inversely proportional to the range of frequencies used [52]: Δd =
.
c . 2Δω
(16.39)
In wavelength-modulated chirp coding, the range of frequencies is significantly higher (hundreds of GHz) as compared to what can be achieved by intensity-modulated chirp coding (.∼100 MHz), thus resulting in significantly higher range resolution. For a detailed analysis of wavelengthmodulated chirp coding, see [57].
Velocimetry Using Chirped LIDAR Chirp-based LIDARs have also been used to measure displacement [5] and velocity [20, 39, 44]. While it is possible to measure velocity by estimating the gradient of distance with respect to time, the velocity can be measured directly and more accurately by utilizing the Doppler effect (Doppler frequency shift) of moving objects. Merits and Limitations of Chirp Coding Since chirp coding is a CW modulation method, it can be implemented with low-cost LEDs and laser diodes. On the other hand, for demodulation, Fourier analysis needs to be performed on the received signal. This requires sampling the signal at very high frequencies and capturing a relatively large number of samples as compared to the homodyne CW ToF method. Hence, chirp codingbased systems need a high bandwidth sensor such
as a high-quality photodiode and are mostly limited to single-point ranging systems.
16.6
Conclusion and Discussion
Time-of-flight (ToF) cameras have fast emerged as the preferred 3D imaging technique in several scientific and consumer applications, including robot navigation, motion capture, human computer interfaces, and 3D mapping. In this chapter, we discussed ToF-based depthsensing techniques that use temporally modulated light sources. We broadly classified ToF cameras into impulse-based and continuouswave methods, based on the light waveform emitted from the source—impulse ToF is based on short light pulses emitted from the source, whereas continuous-wave ToF methods rely on measuring time-shifts of continuously modulated waveforms (e.g., sinusoids). For both family of methods, we performed sensitivity and noise analysis to aid performance evaluation under various real-world imaging conditions.
References 1. Adany, P., Allen, C., Hui, R.: Chirped lidar using simplified homodyne detection. Journal of Lightwave Technology 27(16), 3351–3357 (2009) 2. Amann, M.C., Bosch, T., Lescure, M., Myllyla, R., Rioux, M.: Laser Ranging: A critical review of usual techniques for distance measurement. Optical Engineering 40(1) (2001) 3. Anthes, J.P., Garcia, P., Pierce, J.T., Dressendorfer, P.V.: Nonscanned ladar imaging and applications. vol. 1936, pp. 11–22 (1993) 4. Bazin, G., Journet, B.: A new laser range-finder based on fmcw-like method. In: IEEE Instrumentation and Measurement Technology Conference. vol. 1, pp. 90– 93 vol.1 (1996) 5. Beheim, G., Fritsch, K.: Remote displacement measurements using a laser diode. Electronics Letters 21(3), 93–94 (1985) 6. Carnegie, D.A., Cree, M.J., Dorrington, A.A.: A highresolution full-field range imaging system. Review of Scientific Instruments 76(8), 083702–7 (2005) 7. Carnegie, D.A., McClymont, J.R.K., Jongenelen, A.P.P., B. Drayto and, A.A.D., Payne, A.D.: Design and construction of a configurable full-field range imaging system for mobile robotic applications. Lecture Notes in Electrical Engineering 83 (2011)
320 8. Collins, S.F., Huang, W.X., Murphy, M.M., Grattan, K.T.V., Palmer, A.W.: A simple laser diode ranging scheme using an intensity modulated fmcw approach. Measurement Science and Technology 4(12) (1993) 9. Collins, S.F., Huang, W.X., Murphy, M.M., Grattan, K.T.V., Palmer, A.W.: Ranging measurements over a 20 metre path using an intensity-chirped laser diode. Measurement Science and Technology 5(6) (1994) 10. Conroy, R.M., Dorrington, A.A., Künnemeyer, R., Cree, M.J.: Range imager performance comparison in homodyne and heterodyne operating modes. vol. 7239, pp. 723905–10 (2009) 11. Cova, S., Longoni, A., Andreoni, A.: Towards picosecond resolution with single-photon avalanche diodes. Review of Scientific Instruments 52(3), 408– 412 (1981) 12. Dieckmann, A.: FMCW-LIDAR with tunable twinguide laser diode. Electronics Letters 30(4), 308–309 (1994) 13. Dorrington, A.A., Cree, M.J., Payne, A.D., Conroy, R.M., Carnegie, D.A.: Achieving sub-millimetre precision with a solid-state full-field heterodyning range imaging camera. Measurement Science and Technology 18(9) (2007) 14. Dorrington, A.A., Cree, M.J., Carnegie, D.A., Payne, A.D., Conroy, R.M., Godbaz, J.P., Jongenelen, A.P.P.: Video-rate or high-precision: A flexible range imaging camera. In: Proc. SPIE (6813) (2008) 15. Goldstein, B.S., Dalrymple, G.F.: Gallium arsenide injection laser radar. Proc. of the IEEE 55(2), 181– 188 (1967) 16. Griffiths, H.D.: New ideas in FM radar. Electronics Communication Engineering Journal 2(5), 185–194 (1990) 17. Gupta, M., Nayar, S.K., Hullin, M., Martin, J.: Phasor Imaging: A Generalization of Correlation Based Time-of-Flight Imaging. ACM Transactions on Graphics (2015) 18. Hansard, M., Lee, S., Choi, O., Horaud, R.: Disambiguation of time-of-flight data. In: Time-of-Flight Cameras, pp. 29–43. Springer Briefs in Computer Science (2013) 19. Hasinoff, S., Durand, F., Freeman, W.: Noise-optimal capture for high dynamic range photography. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 553–560 (2010) 20. Hulme, K.F., Collins, B.S., Constant, G.D., Pinson, J.T.: A co2 laser rangefinder using heterodyne detection and chirp pulse compression. Optical and Quantum Electronics 13(1), 35–45 (1981) 21. Hymans, A., Lait, J.: Analysis of a frequencymodulated continuous-wave ranging system. Proceedings of the IEE—Part B: Electronic and Communication Engineering 107(34), 365–372 (1960) 22. Intel-SoftKinectic: (2013), http://www.engadget. com/2013/06/04/-intel-announces-creative-depthvision-camera-at-computex-2013/
M. Gupta 23. Jongenelen, A.P.P., Carnegie, D., Payne, A., Dorrington, A.: Maximizing precision over extended unambiguous range for ToF range imaging systems. In: IEEE Instrumentation and Measurement Technology Conference (I2MTC) (2010). DOI https://doi.org/10. 1109/IMTC.2010.5488178 24. Jongenelen, A.P.P., Bailey, D.G., Payne, A.D., Dorrington, A.A., Carnegie, D.A.: Analysis of errors in ToF range imaging with dual-frequency modulation. IEEE Transactions on Instrumentation and Measurement 60(5) (2011) 25. Kilpela, A.: Pulsed time-of-flight laser range finder techniques for fast, high precision measurement applications. PhD Thesis, Dept. of Electronics, University of Oulu (2004) 26. Kirmani, A., Hutchison, T., Davis, J., Raskar, R.: Looking around the corner using transient imaging. In: IEEE ICCV (2009) 27. Koechner, W.: Optical ranging system employing a high power injection laser diode. IEEE Trans. aerospace and electronic systems 4(1) (1968) 28. Koskinen, M., Kostamovaara, J.T., Myllylae, R.A.: Comparison of continuous-wave and pulsed time-offlight laser range-finding techniques. vol. 1614, pp. 296–305 (1992) 29. Lange, R.: 3d time-of-flight distance measurement with custom solid-state image sensors in CMOSCCD-technology. PhD Thesis (2000) 30. Lange, R., Seitz, P.: Solid-state time-of-flight range camera. IEEE J. Quantum Electronics 37(3) (2001) 31. Lange, R., Seitz, P., Biber, A., Lauxtermann, S.: Demodulation pixels in CCD and CMOS technologies for time-of-flight ranging. In: IST/SPIE International Symposium on Electronic Imaging (2000) 32. Leica-Geosystems: Pulsed LIDAR Sensor. http:// www.leica-geosystems.us/en/index.htm 33. Mamon, G., Youmans, D.G., Sztankay, Z.G., Mongan, C.E.: Pulsed GaAs laser terrain profiler. Appl. Opt. 17(6), 868–877 (1978) 34. Massa, J.S., Buller, G.S., Walker, A.C., Cova, S., Umasuthan, M., Wallace, A.M.: Time-of-flight optical ranging system based on time-correlated singlephoton counting. Appl. Opt. 37(31), 7298–7304 (1998) 35. Microsoft-Kinect: (2014), http://news.xbox.com/ 2014/04/xbox-one-march-npd 36. Moring, I., Heikkinen, T., Myllyla, R., Kilpela, A.: Acquisition of three-dimensional image data by a scanning laser range finder. Optical Engineering 28(8) (1989) 37. Niclass, C., Rochas, A., Besse, P.A., Charbon, E.: Design and characterization of a CMOS 3-D image sensor based on single photon avalanche diodes. IEEE Journal of Solid-State Circuits 40(9), 1847– 1854 (2005) 38. NOAA, N.O.A.A.: US Interagency Elevation Inventory. http://coast.noaa.gov/inventory/ 39. Nordin, D.: Optical Frequency Modulated Continuous Wave (FMCW) Range and Velocity Measure-
16
40.
41.
42.
43.
44.
45. 46.
47.
Depth from Time-of-Flight Imaging ments. PhD Thesis, Dept. of Computer Science and Electrical Engineering, Lulea University of Technology (2004) Pandharkar, R., Velten, A., Bardagjy, A., Lawson, E., Bawendi, M., Raskar, R.: Estimating motion and size of moving non-line-of-sight objects in cluttered environments. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 265–272 (2011) Payne, A.D., Dorrington, A.A., Cree, M.J.: Illumination waveform optimization for timeof-flight range imaging cameras. In: Proc. SPIE 8085 (2010) Payne, A.D., Jongenelen, A.P., Dorrington, A.A., Cree, M.J., Carnegie, D.A.: Multiple frequency range imaging to remove measurement ambiguity. In: Proc. of Conference on Optical 3-D Measurement Techniques (2009). DOI https://doi.org/10.1109/ICME. 2013.6607553 Payne, J.M.: An optical distance measuring instrument. Review of Scientific Instruments 44(3), 304– 306 (1973) Piracha, M.U., Nguyen, D., Ozdur, I., Delfyett, P.J.: Simultaneous ranging and velocimetry of fast moving targets using oppositely chirped pulses from a mode-locked laser. Opt. Express 19(12), 11213– 11219 (2011) PMD-Technologies-gmbH: Photonic mixer devices, http://www.pmdtec.com/ Ratner, N., Schechner, Y.: Illumination multiplexing within fundamental limits. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 1–8 (2007) Schechner, Y.Y., Nayar, S.K., Belhumeur, P.N.: Multiplexing for optimal lighting. IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1339–1354 (2007)
321 48. Schwarte, R., Xu, Z., Heinol, H., Olk, J., Klein, R., Buxbaum, B., Fischer, H., Schulte, J.: New electrooptical mixing and correlating sensor: Facilities and applications of the photonic mixer device. In: Proc. SPIE (3100) (1997) 49. Schwarte, R., Heinol, H.G., Xu, Z., Hartmann, K.: New active 3d vision system based on rf-modulation interferometry of incoherent light. vol. 2588, pp. 126– 134 (1995) 50. Slotwinski, A.R., Goodwin, F.E., Simonson, D.L.: Utilizing GaAlAs Laser Diodes As A Source For Frequency Modulated Continuous Wave (FMCW) Coherent Laser Radars. vol. 1043, pp. 245–251 (1989) 51. Smith, D.E.: Electronic distance measurement for industrial and scientific applications. Hewlett-Packard Journal 31(6) (1980) 52. Stann, B.L., Ruff, W.C., Sztankay, Z.G.: Intensitymodulated diode laser radar using frequencymodulation/continuous-wave ranging techniques. Optical Engineering 35(11), 3270–3278 (1996) 53. Stove, A.G.: Linear fmcw radar techniques. IEEE Proceedings of Radar and Signal Processing 139(5), 343–350 (1992) 54. Strzelecki, E.M., Cohen, D.A., Coldren, L.A.: Investigation of tunable single frequency diode lasers for sensor applications. Journal of Lightwave Technology 6(10), 1610–1618 (1988) 55. Velodyne: Pulsed LIDAR Sensor. http://www. velodynelidar.com/lidar/lidar.aspx 56. Velten, A., Willwacher, T., Gupta, O., Veeraraghavan, A., Bawendi, M.G., Raskar, R.: Recovering threedimensional shape around a corner using ultrafast time-of-flight imaging. Nature 3 (745) (2012) 57. Zheng, J.: Analysis of optical frequency-modulated continuous-wave interference. Appl. Opt. 43(21), 4189–4198 (2004)
Illumination-Coded Optical Diffraction Tomography
17
Andreas Zheng, Hui Xie, Yanping He, Shiyuan Wei, Tong Ling, and Renjie Zhou
Abstract
Keywords
Optical diffraction tomography (ODT) is a label-free 3D light microscopy technique that utilizes refractive index (RI) distributions as a source of image contrast. In ODT, a sequence of 2D holograms is acquired to reconstruct a 3D RI map of a semi-transparent object by using an inverse scattering model. In recent developments, through using spatial light modulators (SLMs), digital micromirror devices (DMDs), or light-emitting diode (LED) arrays, illumination-coded ODT methods have been realized to allow for faster and more stable 3D imaging, opening up many new avenues in both bioimaging and material metrology. In this chapter, we present the principles of ODT that include the reconstruction models and experimental apparatus, review the latest developments in illumination-coded ODT, and highlight several emerging applications of ODT.
Label-free imaging · Quantitative phase imaging · 3D imaging · Optical diffraction tomography · Live cell imaging · Illumination coding · Material metrology · Tomographic phase microscopy · Interferometric imaging · Refractive index mapping
A. Zheng · Y. He · S. Wei · R. Zhou () Department of Biomedical Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China e-mail: [email protected]; [email protected]; [email protected]; [email protected] H. Xie · T. Ling School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University, Singapore, Singapore e-mail: [email protected]; [email protected]
17.1
Introduction
Many living biological specimens, e.g., thin tissues and cells, are optically transparent, thus rendering a poor intensity contrast. Instead, they can create a considerable amount of phase shift, which is the basis of phase-contrast microscopy [1, 2]. Since the early 2000s, the field of quantitative phase imaging (QPI) or quantitative phasecontrast microscopy, also referred to as quantitative phase microscopy (QPM), has steadily become more and more popular for bio-imaging and material metrology [3–8], especially in the recent years due to the increased need for labelfree and non-invasive imaging solutions [5, 9]. In QPI, the optical path difference (OPD) or phase delay, as a product of the mean refractive index (RI) contrast and thickness, is retrieved at each point over the specimen. Multiple quantitative parameters about the specimen can be extracted from the OPD map to facilitate a wide range of applications, including profiling fabricated mate-
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 J. Liang (ed.), Coded Optical Imaging, https://doi.org/10.1007/978-3-031-39062-3_17
323
324
rial structures [10, 11], weighing single cell dry mass (i.e., the non-aqueous contents mostly constituted of protein) for studying cell growth [12– 14], quantifying cell mechanical properties (e.g., red blood cell mechanics [15–17]), classifying cell cycle and viability [9, 18], and so on. Despite many exciting advances, the 3D nature of many material or biological specimens has hindered us to obtain accurate analysis from two-dimensional (2D) holographic measurements using QPI. Optical diffraction tomography (ODT), as first proposed in theory by E. Wolf in the 1960s [19], provides an inverse scattering model to determine the 3D structure of a semi-transparent object from a sequence of 2D holograms measured from different illumination angles [20, 21] and/or wavelengths [22], sample rotations [7], focal depths [23], etc. Using ODT, one can reconstruct the 3D distributions of RI, which is an inherent property of substances, considering RI is proportional to the concentration of organic molecules [24– 26]. Therefore, the 3D RI maps can be used to profile the biochemical information in cells or tissues [27–29] or reveal the internal structures of specimens for non-invasive material testing [30, 31]. As ODT becomes more and more popular, it has promoted a magnitude of biological and diagnostic studies, including but not limited to cell biology [20, 32–34], hematology [15, 35], and histopathology [36]. Since its inception, significant efforts have been made in solving the inverse scattering problems in ODT. In the early days, ODT models based on the first Born approximation or the first Rytov approximation were implemented [20, 37], while K. Iwata and R. Nagata performed one of the first comparisons between the two models around 1975 [38]. The first known experimental demonstrations of ODT were conducted in the late 1970s [39]. In the recent two decades, significant improvements in both reconstruction models and instrumentation have been made, especially since 2006 when the potential of ODT for bioimaging was demonstrated [7, 20, 21, 40]. In recent years, several new 3D RI reconstruction models have been developed, including adaptive regularization over the first Rytov approximation [41], learning tomography based on the split-step
A. Zheng et al.
non-paraxial (LT-SSNP) method [42], and multilayer Born multiple-scattering model [43]. In earlier ODT apparatuses, the specimens were rotated with a micropipette [7] or the illumination beam angle was scanned with one or two orthogonal galvanometer mirrors [20, 21, 44], which may potentially perturb the specimen or cause instabilities to the system when scaling up the scanning rate. To achieve more stable operations and simplify the ODT apparatuses, over the past decade programmable/reconfigurable devices have been used to modulate the illumination beam, such as spatial light modulators (SLMs) [45, 46], digital micromirror devices (DMDs) [47, 48], lightemitting diode (LED) arrays [49–53], the diffractive beam shaper and other elements as configured in six-pack holography [54], etc. More importantly, the use of SLMs, DMDs, or LED arrays has allowed for coding the illumination patterns [55–57] (e.g., turning on a few LED elements [50] or color-multiplexing [58]), and subsequently multiplexing the displayed holograms in time or space. At the same time, the development of illumination-coded ODT has given rise to the design of new image acquisition schemes and reconstruction algorithms, leading to drastic enhancement in 3D imaging speed and throughput, while reducing the imaging system complexity [87]. In this chapter, we introduce the recent developments in illumination-coded ODT. In Sect. 17.2, we provide the fundamentals of ODT, including the working principle, reconstruction models, and experimental implementations. In Sect. 17.3, we present the recent developments in illumination-coded ODT. In Sect. 17.4, a few representative applications of ODT are highlighted. In Sect. 17.5, we conclude and discuss future research directions in ODT.
17.2
Fundamentals of ODT
17.2.1 Working Principles In ODT, a 3D object with an unknown RI distribution (or scattering potential χ(r)) is treated as a black box. As illustrated in Fig. 17.1, under
17 Illumination-Coded Optical Diffraction Tomography
325
Fig. 17.1 Schematic illustration of ODT principles
normal incidence, a scattered wave Us (r) is generated, and it encodes the sample structural information. To probe the 3D structural information, up to hundreds of intensity or field measurements are conducted, such as at different illumination angles or wavelengths or different sample focal depths. The complex scattered fields for all measurements are retrieved through a holographic recording system through either interferometry or non-interferometric means. By applying an inverse scattering model, the 3D RI map is reconstructed from all the holographic measurements. For biological specimens, such as cells that have features around the order of the wavelength, implementing an ODT model is essential to ensure that high-resolution structures are correctly reconstructed [19, 37, 59–61]. In Sect. 17.2.2, we introduce the physical models for 3D RI reconstruction.
mation is applied to each layer for achieving a higher reconstruction resolution than the paraxial wave approximation as used in the BPM model. As understanding the first Born and the first Rytov approximation models is the beginning of understanding more complex ODT models, we will focus on describing these models in the following. The physical quantity that relates to the 3D RI distribution is the scattering potential, which is defined as .χ(r) = χ (x, y, z) = β02 n2 (x, y, z) − n2m , where .β0 = 2π is the λ0 wavenumber in free space, λ0 the wavelength of the incident field, n(x, y, z) is the RI distribution of the object, and nm is the mean RI of the surrounding medium. Assuming a plane wave, Ui (r), incident on the object, the scattered wave by the sample, Us (r), in the forward or the backward direction follows the inhomogeneous wave equation [19, 32],
∇ 2 Us (r) + β 2 Us (r) = −χ(r)U (r),
.
(17.1)
17.2.2 3D Reconstruction Models In 2009, the first Rytov approximation was first implemented for 3D RI reconstruction of living cells [20], showing significantly improved spatial resolution than using the filtered back projection algorithms that neglect diffraction [7, 21]. To improve the reconstruction quality, especially in the axial domain, and alleviate the missing cone issue due to limited angle scanning range, the beam propagation method (BPM) [62] and the multi-layer Born model [43] have been recently developed, as illustrated in Fig. 17.2. The first Rytov approximation considers single scattering events, while the BPM and multi-layer Born models consider multiple scattering events. In the multi-layer Born model, the first Born approxi-
where β = nm β 0 is the wavenumber inside the medium, and U(r) = Ui (r) + Us (r) is the total field. The first Born approximation assumes that the scattered field is much weaker than the incident field, i.e., Us (r) 10 GHz). This allows for increasing the number of spectral features per pulse by one to two orders of magnitude. In this time-lens approach spectral patterning is achieved as follows [24, 25]. An ultrahigh-speed serial optical waveform is generated by modulating approximately 1-ps optical pulses from an approximately 10-GHz repetition rate mode-locked laser using a synchronized pulse pattern generator driving an electro-optic amplitude modulator. Binary on-off modulation is used yielding a pseudorandom ultrafast pulse train with a 10-GHz bit rate. Subsequently, this pulse train is split and temporally multiplexed up to a >300 GHz rate. This ultrahigh-speed pseudorandom serial optical waveform is then encoded onto the spectra of laser pulses through time-to-frequency mapping with a temporal Fourier process. Specifically, the ultrahigh-rate pseudorandom optical waveform termed the “signal”, is injected into a length of dispersive fiber corresponding to the focal length of a subsequent time-lens. The output of this fiber is time-lensed using four-wave mixing (FWM) with a suitably chirped ultrafast laser pulse train, termed the “pump”. The FWM process generates a new lightwave, termed the “idler”, that corresponds to the time-lensed version of the signal. The idler lightwave is spectrally isolated using an optical filter and then sent through a length of dispersive fiber corresponding to the focal length of the time-lens. At this point, the idler lightwave is precisely the Fourier transform of the input signal and the ultrahigh-speed binary patterns of the signal are encoded on the spectra of the idler pulses through the time-to-frequency conversion of this temporal Fourier processor. The pulse repetition rate of the idler pulse train is set by the pump pulse rate, typically 100 MHz. An example of spectral patterning using this approach is shown in Fig. 22.5b [25]. Patterned spectra formed in this manner have been generated with over 3000 features per pulse bandwidth corresponding to a spectral feature size of 730 MHz, which exceeds the pulse repetition rate of 90 MHz as dictated by the constraints of time-lens processing. Importantly, this corresponds to more than an order of magnitude increase in spectral
413
features per pulse over what is possible with chirp processing.
22.5
Example Applications of CHiRP-CS
The CHiRP-CS architecture has been applied to wide-bandwidth microwave waveform sensing and measurement. Applications include wideband spectrum sensing for spectrum sharing as well as electromagnetic spectral awareness. Photonic systems are desirable as they offer extremely large instantaneous bandwidths for signal processing, however, Nyquist sampling systems necessitate a large number of parallel digitizers to cover the wide bandwidths (>10 GHz), which leads to large, expensive, and power-hungry systems. Leveraging compressed sensing in such applications can dramatically reduce the resource requirements and CHiRP-CS provides an effective architecture. In microwave sensing applications the spectrally patterned laser pulses are only partially compressed following the patterning stage. This partial compression results in an increase in the effective sampling rate of the pseudorandom pattern beyond the bandwidth of the pattern generator [20, 21, 23]. For example, effective sampling rates of nearly 120 GSample/s have been generated using this approach using a 12.8 Gbit/s pattern generator [23]. Following partial compression, the pulses are modulated in time by the waveform of interest. As the pulses still possess frequency-totime mapping this temporal modulation imprints the microwave waveform onto the spectra of the pulses, mixing the signal of interest with the ultrahigh-rate pseudorandom pattern. Finally, the pulses are fully compressed and detected as impulses to a photodetector and the waveform of interest is reconstructed for the pulse energy sequence using compressed sensing algorithms. Algorithms such as gradient projection for sparse reconstruction (GPSR) [56] and various versions of orthogonal matching pursuit (OMP) [57] have been successfully applied. Particular attention must be taken to leverage algorithms that are robust against basis mismatch in these
414
systems [21, 58]. Example results from a 12-GHz bandwidth CHiRP-CS system are shown in Fig. 22.6 [21]. In this system, frequency identification of multiple tones can be achieved to an accuracy of 1 part in 25,000 (480 kHz accuracy) with