Understanding Optical Systems through Theory and Case Studies [1 ed.] 1510608354, 9781510608351

"This book explains how to understand and analyze the working principles of optical systems by means of optical the

121 22 17MB

English Pages 294 [296] Year 2017

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Understanding Optical Systems through Theory and Case Studies [1 ed.]
 1510608354, 9781510608351

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

through Theory and Case Studies Sijiong Zhang, Changwei Li, and Shun Li

P.O. Box 10 Bellingham, WA 98227-0010 ISBN: 9781510608351 SPIE Vol. No.: PM276

UNDERSTANDING OPTICAL SYSTEMS through Theory and Case Studies

ZHANG, LI, LI

This book explains how to understand and analyze the working principles of optical systems by means of optical theories and case studies. Part I focuses mainly on the theory of classical optics, providing an introduction to geometrical and wave optics, and some concepts of quantum and statistical optics. Part II presents case studies of three practical optical systems that comprise important and commonly used optical elements: confocal microscopes, online co-phasing optical systems for segmented mirrors, and adaptive optics systems. With the theoretical background gained in Part I, readers can apply their understanding of the optical systems presented in Part II to the conception of their own novel optical systems. The book can be used as a text or reference guide for students majoring in optics or physics. It can also be used as a reference for any scientist, engineer, or researcher whose work involves optical systems.

Understanding Optical Systems through Theory and Case Studies

UNDERSTANDING OPTICAL SYSTEMS

Sijiong Zhang Changwei Li Shun Li

Library of Congress Cataloging-in-Publication Data Names: Zhang, Sijiong (Scientist), author. | Li, Changwei (Dr.), author. | Li, Shun (Scientist), author. Title: Understanding optical systems through theory and case studies / Sijiong Zhang, Changwei Li, and Shun Li. Description: Bellingham, Washington : SPIE Press, [2017] | Includes bibliographical references and index. Identifiers: LCCN 2016053158 | ISBN 9781510608351 (softcover ; alk. paper) | ISBN 1510608354 (softcover ; alk. paper) | ISBN 9781510608368 (pdf) | ISBN 1510608362 (pdf) | ISBN 9781510608382 (Kindle) | ISBN 1510608389 (Kindle) | ISBN 9781510608375 (ePub) | ISBN 1510608370 (ePub) Subjects: LCSH: Optics. | Optical instruments. | Imaging systems. Classification: LCC QC355.3 .Z43 2017 | DDC 535–dc23 LC record available at https://lccn.loc.gov/2016053158

Published by SPIE P.O. Box 10 Bellingham, Washington 98227-0010 USA Phone: +1 360.676.3290 Fax: +1 360.647.1445 Email: [email protected] Web: http://spie.org

Copyright © 2017 Society of Photo-Optical Instrumentation Engineers (SPIE) All rights reserved. No part of this publication may be reproduced or distributed in any form or by any means without written permission of the publisher. The content of this book reflects the work and thought of the authors. Every effort has been made to publish reliable and accurate information herein, but the publisher is not responsible for the validity of the information or for any outcomes resulting from reliance thereon. Cover image credit: iStock, Maxiphoto Printed in the United States of America. First Printing. For updates to this book, visit http://spie.org and type “PM276” in the search field.

Contents Preface

xiii

I THEORY

1

1 Introduction to Light and Optical Systems

3

1.1

What Is Light? 1.1.1 Light as electromagnetic waves 1.1.2 Light as particles: photons 1.1.3 Wave–particle duality of light 1.2 How Do Light Sources Produce Light? 1.2.1 Explanation by electromagnetic wave theory 1.2.2 Explanation by quantum theory 1.3 Theories of Light: An Overview 1.3.1 Geometrical optics 1.3.2 Wave optics 1.3.3 Quantum optics 1.4 Overview of Optical Systems 1.4.1 What are optical systems? 1.4.2 Main types of optical systems 1.4.2.1 Optical imaging systems 1.4.2.2 Optical systems for energy collection References 2 Geometrical Optics 2.1 2.2 2.3

2.4

3 4 4 6 6 7 7 9 9 10 11 12 12 12 12 13 15 17

Definition of the Index of Refraction Origin of the Index of Refraction Reflection and Refraction of Light 2.3.1 Sign conventions 2.3.2 Laws of reflection and refraction 2.3.3 Total internal reflection 2.3.4 Dispersion of light Perfect Optical Imaging Systems 2.4.1 Imaging concept 2.4.2 Cardinal points and planes in imaging systems

v

19 20 21 21 22 25 26 27 27 28

vi

Contents

2.4.3 2.4.4

Stops and pupils in imaging systems Some useful formulas 2.4.4.1 Object–image relationship 2.4.4.2 Magnifications 2.4.4.3 Lagrange invariant 2.4.4.3.1 Lagrange invariant: an incarnation of the uncertainty principle in geometrical optics 2.5 Raytracing 2.5.1 Paraxial raytracing 2.5.1.1 Matrix approach to paraxial raytracing 2.5.1.2 Examples 2.5.1.2.1 Single lenses 2.5.1.2.2 Compound lenses 2.5.2 Diffraction raytracing 2.6 Geometrical Aberrations 2.6.1 Primary aberrations 2.6.1.1 Spherical aberration 2.6.1.2 Coma 2.6.1.3 Astigmatism 2.6.1.4 Curvature of field 2.6.1.5 Distortion 2.6.2 High-order aberrations 2.6.3 Chromatic aberrations 2.6.3.1 Example A: The principle of the positive achromatic doublet 2.6.3.2 Example B: Binary optics used for chromatic aberration correction in the IR spectrum 2.7 General Procedure for Designing Optical Imaging Systems 2.7.1 First-order design of optical imaging systems 2.7.2 Detailed design of optical imaging systems 2.7.2.1 Configuration optimization 2.7.2.2 Tolerance analysis 2.7.3 Design of an achromatic doublet References 3 Wave Optics 3.1

Electromagnetic Theory of Optics 3.1.1 Maxwell’s equations 3.1.2 Wave equations 3.1.2.1 Vector wave equation 3.1.2.2 Scalar wave equation 3.1.2.2.1 The Helmholtz equation

33 36 37 37 39 40 41 41 45 48 48 50 51 53 54 55 57 58 59 60 61 61 62 63 64 65 67 67 68 69 75 77 77 77 79 79 80 80

Contents

vii

3.1.3

3.2

3.3

3.4

3.5

Light waves and their characteristics 3.1.3.1 Plane waves 3.1.3.2 Spherical waves 3.1.3.3 Characteristics of light waves Diffraction 3.2.1 Rayleigh–Sommerfeld diffraction formula 3.2.2 Fresnel approximation 3.2.3 Fraunhofer approximation 3.2.4 Examples 3.2.4.1 Circular aperture 3.2.4.2 Rectangular aperture 3.2.4.3 Other aperture shapes Interference 3.3.1 Coherence 3.3.1.1 Temporal coherence 3.3.1.2 Spatial coherence 3.3.2 Examples 3.3.2.1 Two-beam interference 3.3.2.2 Multibeam interference 3.3.2.3 Fourier transform spectrometer 3.3.2.4 Stellar interferometer Fourier Optics: An Introduction 3.4.1 Fourier transform 3.4.2 Angular spectrum expansion 3.4.3 Fourier transform in optics 3.4.3.1 Phase transformation of a positive lens 3.4.3.2 Fourier transform by a lens 3.4.4 Examples of optical Fourier spectra 3.4.4.1 Point sources 3.4.4.2 Plane waves 3.4.4.3 Slits 3.4.4.4 Circular apertures 3.4.4.5 Sinusoidal amplitude gratings 3.4.4.6 Phase-contrast microscopes 3.4.5 Formulas governing image formation in Fourier optics 3.4.5.1 Point spread functions 3.4.5.2 Image formation with coherent illumination 3.4.5.3 Image formation with incoherent illumination Wavefront Aberrations 3.5.1 Optical path difference 3.5.1.1 Example: Light traveling in different media of the same thickness

81 81 82 83 85 86 88 92 93 93 96 98 99 99 103 104 107 107 111 113 115 116 116 117 122 122 123 125 125 126 128 130 131 132 133 133 134 135 136 137 137

viii

Contents

3.5.2

Peak-to-valley and root-mean-square values of a wavefront aberration 3.5.3 Zernike representation of wavefront aberrations 3.6 Resolution Limits of Optical Imaging Systems References

138 140 142 143

II COMPONENTS AND CASE STUDIES

145

4 General Optical Components in Optical Systems

147

4.1

4.2

4.3

4.4

4.5

Light Sources 4.1.1 Incoherent sources 4.1.2 Coherent sources Lenses 4.2.1 Spherical lenses 4.2.2 Spherical ball lenses 4.2.3 Cylindrical lenses 4.2.4 Axicons 4.2.5 Aspheric lenses 4.2.6 Plane-parallel plates 4.2.7 Optical wedges Mirrors and prisms 4.3.1 Mirrors 4.3.1.1 Plane mirrors 4.3.1.2 Curved mirrors 4.3.2 Prisms 4.3.2.1 Dispersing prisms 4.3.2.2 Reflecting prisms 4.3.2.3 Right-angle prism 4.3.2.4 Dove prism 4.3.2.5 Pentaprism 4.3.2.6 Beam-splitting prisms Diffractive Optical Elements 4.4.1 Principle of a grating and diffraction order 4.4.2 Grating equation 4.4.3 Dispersion 4.4.4 Resolution of a grating 4.4.5 Free spectral range 4.4.6 Blazing Optical Filters 4.5.1 Absorptive and interference filters 4.5.1.1 Absorptive filters 4.5.1.2 Interference filters 4.5.2 Optical filters with different functions 4.5.2.1 Longpass filters

147 148 149 151 151 152 153 154 155 157 158 158 158 159 160 164 164 166 168 169 169 170 171 172 173 175 175 176 177 178 179 179 179 180 180

Contents

ix

4.5.2.2 Shortpass filters 4.5.2.3 Bandpass filters 4.5.2.4 Neutral-density filters 4.6 Optical Fibers 4.6.1 Multimode and single-mode fibers 4.6.2 Attenuation in fibers 4.6.3 Dispersion of fibers 4.7 Optical Detectors 4.7.1 Types of optical detectors 4.7.2 Thermal detectors 4.7.3 Photon detectors 4.7.3.1 Photoemissive detectors 4.7.3.2 Photoconductive detectors 4.7.3.3 Photovoltaic detectors 4.7.3.4 Detector arrays 4.7.4 Performance characteristics References 5 Case Study 1: Confocal Microscopes 5.1

Fundamentals of Standard Optical Microscopes 5.1.1 Configuration and characteristics of the standard microscope 5.1.1.1 Field 5.1.1.2 Resolution 5.1.2 Main elements of standard optical microscopes 5.1.2.1 Illumination system 5.1.2.2 Objective 5.1.2.3 Eyepiece 5.2 Confocal Microscopes 5.2.1 Principles of confocal microscopes and their configurations 5.2.2 Main components of confocal microscopes 5.2.2.1 Light sources 5.2.2.2 Objectives 5.2.2.3 Pinholes 5.2.2.4 Detectors 5.2.2.5 Scanning systems 5.3 Types of Confocal Microscopes 5.3.1 Nipkow-disk scanning confocal microscopes 5.3.2 Scanning-slit confocal microscopes References 6 Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors 6.1

Principles of Dual-Wavelength Digital Holography for Phase Measurement

181 181 182 183 184 185 185 186 186 186 187 187 188 188 190 192 193 195 195 195 196 197 198 198 199 199 201 201 202 202 203 203 204 204 205 205 207 208 209 210

x

Contents

6.1.1

Single-wavelength digital holography for phase measurement 6.1.2 Dual-wavelength digital holography for phase measurement 6.2 Design of the Holographic Recorder: A Point Diffraction Mach–Zehnder Interferometer 6.3 Algorithm for Numerical Processing of Interferograms 6.4 Performance 6.4.1 Online cophasing of S1 by dual-wavelength digital holography 6.4.2 Online cophasing of S2 by dual-wavelength digital holography References 7 Case Study 3: Adaptive Optics Systems 7.1

7.2

7.3

7.4

Principles of Adaptive Optics 7.1.1 Imaging through atmospheric turbulence 7.1.1.1 Structure function of the refractive index and its power spectrum 7.1.1.2 Phase structure function and its power spectrum 7.1.1.3 Image formation through atmospheric turbulence 7.1.1.3.1 Long-exposure imaging 7.1.1.3.2 Short-exposure imaging 7.1.2 Wavefront sensing 7.1.2.1 Shack–Hartmann wavefront sensor 7.1.2.2 Lateral shearing interferometer 7.1.2.3 Curvature wavefront sensor 7.1.3 Wavefront correction 7.1.3.1 Types of wavefront correctors 7.1.3.2 Membrane deformable mirrors 7.1.3.3 Piezoelectric deformable mirrors 7.1.3.4 Technical parameters of the deformable mirror 7.1.4 Control system 7.1.4.1 Closed-loop control of an AO system 7.1.4.2 Description of modal control Astronomical Telescopes and Atmospheric Seeing 7.2.1 Astronomical telescopes 7.2.2 Atmospheric seeing Optical Design of the AO System 7.3.1 First-order design of the AO system 7.3.2 Detailed design of the AO system Core Components of the AO System and Related Algorithms

210 212 214 216 218 218 220 224 227 227 229 229 231 233 233 234 235 236 238 240 241 241 243 244 245 246 246 247 248 248 248 249 249 250 251

Contents

xi

7.4.1

Shack–Hartmann wavefront sensor 7.4.1.1 Technical parameters 7.4.1.2 Wavefront reconstruction algorithm 7.4.2 Piezoelectric deformable mirrors 7.4.2.1 Technical parameters 7.4.2.2 Influence function matrix 7.4.2.3 Modal control method 7.4.3 Piezoelectric tip/tilt mirror 7.5 Order Estimation in Modal Wavefront Reconstruction 7.6 Matching Problem between the SH Sensor and the DM in an AO System 7.7 Implementation of the AO System Controller 7.8 Performance of the AO System References Appendices Appendix A: Dirac d Function A.1 Definition A.2 Properties A.2.1 Scaling property A.2.2 Shifting property A.3 d Function as a limit A.4 A useful formula Appendix B: Convolution B.1 Definition B.2 Description B.3 Properties B.3.1 Algebraic properties B.3.2 Convolution with the d function B.3.3 Translation invariance Appendix C: Correlation C.1 Definition C.2 Description C.3 Properties Appendix D: Statistical Correlation Appendix E: 2D Fourier Transform E.1 Definition E.2 Description E.3 Properties E.3.1 Linearity theorem E.3.2 Similarity theorem E.3.3 Shift theorem

251 251 251 254 254 254 255 256 257 258 259 260 261 263 263 263 263 263 264 264 265 265 265 265 266 266 266 266 266 266 267 267 267 268 268 268 269 269 269 269

xii

Contents

E.3.4 Parseval’s theorem E.3.5 Convolution theorem E.3.6 Autocorrelation theorem E.3.7 Fourier integral theorem Appendix F: Power Spectrum Appendix G: Linear Systems G.1 Impulse response and superposition integral G.2 Invariant linear systems References Index

270 270 270 271 271 272 272 273 273 275

Preface Optical systems have such broad applications that they can be found in countless scientific disciplines, industry, and everyday life. Many scientists and engineers whose work involves optical systems can use commercial offthe-shelf systems or build an optical system from the level of optical components. However, scientists and engineers who need to build customized optical systems must be able to understand the working principles of these systems and the components they comprise. With this requirement in mind, the goal of this book is to guide readers in acquiring an understanding of how and why optical systems and their related optical components work. The prerequisite for understanding optical systems is for readers to understand the optical theories involved in the systems. Then, armed with this understanding of the theory, readers can learn to analyze and understand these systems by studying some practical optical systems. An understanding of optical theories together with an examination of some practical optical systems will boost the reader to a higher level of expertise in building optical systems. Having worked in the field of optics for many years, we believe that a clear, global picture of optical theory is important for understanding optical systems, especially for conceiving new optical systems, which is our ultimate goal for readers. We also believe that the most effective and quickest way for readers to acquire the ability to analyze and understand optical systems is by studying examples of some typical optical systems. Based on the above tenets, this book consists of two parts: optical theory, involving mainly classical optics; and case studies of optical systems, involving mainly imaging systems. Three practical optical systems are provided to show readers how to analyze and understand the working principles of optical systems by means of optical theories. We expect readers to not only master the basic methods used in building the optical systems in the examples presented in the book, but also to be able to apply these methods in new situations and to conceive their own systems. This is where real understanding is demonstrated. The book is divided into two parts. Part I on Theory (Chapters 1–3) gives an introduction to geometrical optics and wave optics, and some concepts of quantum optics and statistical optics. Chapter 1 presents an overview of the properties and generation of light, a brief summary of optical theories, and

xiii

xiv

Preface

some optical systems. Chapter 2 focuses on geometrical optics. Included in this chapter are the origin of the index of refraction, laws of reflection and refraction, perfect optical imaging systems, raytracing, geometrical aberrations, and design of an achromatic doublet. Chapter 3 presents descriptions of wave optics. Topics covered are Maxwell’s equations, wave equations, light waves and their characteristics, diffraction, interference, Fourier optics, wavefront aberrations, and resolution limits of optical imaging systems. Part II on case studies (Chapters 4–7) describes some important and commonly used optical elements and presents three examples of practical optical systems. Commonly used optical components are introduced in Chapter 4. Chapter 5 covers confocal microscopes, whose principle can be explained mainly using geometrical optics. Chapter 6 describes an online cophasing optical system for segmented mirrors. The principle of the co-phasing optical system is expounded mainly by wave optics. Finally, in Chapter 7, a comprehensive example concerning an adaptive optics system designed and implemented by the adaptive optics group led by Sijiong Zhang is explained using both geometrical and wave optics. The advanced mathematics needed for readers of this book are calculus, Fourier transforms, and matrix operations. The book can be used as a textbook or reference for students majoring in optics or physics. It can also serve as a reference for scientists, engineers, and researchers whose work involves optical systems. We would like to express our sincere thanks to the numerous people who have contributed to this book. We are very grateful to many colleagues of Nanjing Institute of Astronomical Optics & Technology (NIAOT), Chinese Academy of Sciences, especially, the director of NIAOT, Prof. Yongtian Zhu, who supported us in writing this book. We thank Prof. Dong Xiao for closely reading the manuscript and giving many useful suggestions during the revision of the manuscript. We thank Dr. Yanting Lu for participating in the revision of the manuscript and writing its appendices. We also thank Dr. Bangming Li for writing part of Chapter 7. Additionally, we would like to express our sincere appreciation to SPIE, especially to Senior Editor Dara Burrows, and to the anonymous reviewers for their helpful and constructive suggestions to improve this book. Finally, the first author, Sijiong Zhang, would like to express his thanks to Prof. Alan Greenaway at Heriot-Watt University, Edinburgh for guiding him into the field of adaptive optics. Sijiong Zhang Changwei Li Shun Li July 2017

Part I THEORY

Chapter 1

Introduction to Light and Optical Systems Optical systems, which provide much convenience to our lives and industries, manipulate light to satisfy the particular requirements of end users. For end users and optical engineers, understanding optical systems is of fundamental importance for using, designing, or manufacturing optical systems. In order to better understand optical systems, it is essential to be familiar with the behavior and properties of light. In history, the behavior and properties of light have been gradually discovered in the process of explaining some optical phenomena, and validated by many experiments. However, the nature of light is still difficult for many people to understand, especially for beginner students of optics. In this chapter, we give readers a very brief description of light and optical systems. The question “What is light?” is firstly addressed. The generation of light and the three basic theories of light—geometrical optics, wave optics, and quantum optics—are then briefly discussed, after which, the concept of statistical optics is mentioned. Finally, an overview of optical systems is presented, and some examples are also provided to give readers a sense of what is entailed in such systems.

1.1 What Is Light? Light is familiar, but also mysterious to human beings. We are accustomed to experiencing sunlight and manmade light every day. However, the nature of light is extremely foreign to us. What is light exactly? For several centuries, people have been devoting much effort to answering this question. At first, people thought that light consists of massive corpuscles that obey Newtonian mechanics, which can easily explain the phenomena of rectilinear propagation, reflection, and refraction of light. However, this theory cannot explain the phenomena of diffraction, interference, and polarization of light. For these phenomena, the wave nature of light, which was proposed later on, is more pertinent. Finally, human beings found that within a certain wavelength

3

4

Chapter 1

region, light is nothing but electromagnetic waves obeying Maxwellian electromagnetic theory. When people found the photoelectric effect, i.e., that many metals emit electrons when light shines on them, neither of the two theories above could explain this phenomenon any longer. Einstein proposed that a beam of light is not a wave, but rather a collection of photons; thus, the quantum theory of light emerged. Currently, people think that light has the attributes of a wave–particle duality. 1.1.1 Light as electromagnetic waves On the basis of previous research on electricity and magnetism, Maxwell summarized a set of equations, i.e., Maxwell’s equations, in rigorous mathematical form. This set of equations predicted the existence of electromagnetic waves propagating in vacuum with the speed of 3  108 m∕s, the same as the speed of light. This led Maxwell to surmise that light is a kind of electromagnetic wave. Gradually, light as electromagnetic waves was verified and accepted. The wave attribute of light can be observed in many experiments, such as the single-slit diffraction and Young’s double-slit interference experiments. A detailed discussion of the wave characteristics of light will be presented in Chapter 3. However, although light is a kind of electromagnetic wave with all of the characteristics of waves, it is different from mechanical waves, such as sound and water waves. For example, light can propagate in both vacuum and media, while mechanical waves can propagate only in media. 1.1.2 Light as particles: photons The theory that deals with light as a kind of electromagnetic wave, namely, wave optics, successfully explains the optical phenomena of diffraction, interference, and polarization of light. However, wave optics is unable to explain phenomena involving the interaction of light and matter, such as the photoelectric effect. The photoelectric effect was first observed by Hertz in 1887. As shown in Fig. 1.1, when blue light is incident on the surface of caesium, electrons are emitted from that surface. These emitted electrons are also called photoelectrons. According to classical electromagnetic radiation theory, the photoelectric effect is due to the transfer of energy from light incident on caesium to electrons in it. From this perspective, two predictions can be easily obtained: (1) the intensity of the incoming light completely determines whether photoelectrons are released or not as well as the speed of the photoelectron; (2) the brighter the incoming light the more photoelectrons are emitted, and the brighter the incoming light the faster the photoelectrons are emitted. However, experimental results are contradictory to these predictions in two points: (1) blue light can always release electrons from caesium, no matter

Introduction to Light and Optical Systems

Figure 1.1

5

Diagram of the photoelectric effect.

how dim it is, and not even one photoelectron is emitted from caesium by exposing it to red light, regardless of how bright the red light is (assuming that nonlinear optical effects do not occur); (b) the speed of an emitted photoelectron is completely determined by the frequency of the light that releases the photoelectron. These contradictions between experimental results and the predictions from the electromagnetic theory indicate that a new theory should be invoked to explain this effect. In 1905, Einstein explained the photoelectric effect using the concept of photons. In Einstein’s theory, light is a collection of photons. Each photon has a fixed amount of energy, which only depends on the frequency of light. When photons fall on the surface of a metal, the energy of a photon is split into two parts if light of this frequency can release electrons in this metal. One part is used for releasing an electron from the metal, and the other part transforms into the kinetic energy of the released electron. According to the law of conservation of energy, the process of energy conversion in the photoelectric effect can be expressed as 1 hn ¼ my2 þ w, 2

(1.1)

where h is Planck’s constant, n is the frequency of light, m is the mass of the electron, y is the speed of the photoelectron, and w is the work function of the metal, which represents the smallest energy needed for electrons to escape from the constraints of the metal. According to Eq. (1.1), both the release of the photoelectron from the metal by light and the speed of the photoelectron are determined by the frequency of the incoming light, irrespective of the intensity of light incident on the metal.

6

Chapter 1

The concept of light as photons successfully explains the photoelectric effect and is confirmed by more and more experimental results. This reveals that the particle nature is another aspect of light. 1.1.3 Wave–particle duality of light In the quantum theory of light, the wave nature and the particle nature of light are unified by invoking the wave–particle duality of photons. For example, the connection between the wave and particle natures of light is furnished by Planck’s constant h.1 In view of the particle nature of light, a photon has a packet of energy E, E ¼ hn ¼

h , T

(1.2)

and a momentum p, p¼

h : l

(1.3)

Obviously, both the energy and the momentum of a photon are expressed by the wave attribute of light in terms of T (the period of the light wave) or l (the wavelength of the light wave), respectively. By rearranging Eqs. (1.2) and (1.3), Planck’s constant can be expressed as h ¼ ET ¼ pl:

(1.4)

Note that E and p express the attribute of a particle of light, while T and l characterize the attribute of a wave of light. This means that Planck’s constant can be expressed as the product of the particle and wave attributes of light, i.e., ET or pl. If light exhibits more wave nature, the particle nature of light will manifest less, and vice versa. Furthermore, it should be pointed out that in the quantum theory of light, the wave nature of photons involves a probability wave, not a classical wave. In practical issues, when light interacts with matter—such as in the generation of light and the detection of light with photoelectric detectors, which will be discussed in Chapter 4—it displays the particle characteristics. On the other hand, light exhibits more wave characteristics in situations involving diffraction of light and the interaction between light waves, such as the interference of light.

1.2 How Do Light Sources Produce Light? The generation of light from light sources can be roughly explained by the classical electromagnetic radiation theory or, more precisely, by the

Introduction to Light and Optical Systems

7

quantum theory of light. These two explanations are illustrated in the following subsections. 1.2.1 Explanation by electromagnetic wave theory Matter is made of atoms, and each atom is composed of a nucleus and some electrons around the nucleus. When an amount of energy is continuously injected into matter, the temperature of the matter will gradually increase, and electron movement around nuclei will be gradually accelerated. However, according to the law of conservation of energy, the temperature of matter and speed of electrons cannot increase infinitely. The injected energy must be consumed in some way, and radiation is one of the ways it can be consumed. Moreover, in classical electromagnetic radiation theory, all accelerated particles with charges will radiate energy, so the accelerated electrons will radiate energy in the form of electromagnetic waves, i.e., light. According to blackbody radiation theory, a material in thermal equilibrium that absorbs more energy than another same material (leading to a higher temperature) also radiates more light, and the radiated light has more light constituents with shorter wavelengths. Conversely, a material that absorbs less energy than another same material (leading to lower temperature) also radiates less light, and the radiated light has more light constituents with longer wavelengths. For example, when the temperature of heated steel is low, the steel appears to be red due to the radiation of the steel containing more light with longer wavelengths. When the temperature of the steel is high enough, the steel appears to be white because the radiation of the steel contains more light with shorter wavelengths. The sun, having a high temperature due to the thermonuclear reactions occurring in it, radiates electromagnetic waves across most of the electromagnetic spectrum, including g rays, x rays, ultraviolet, visible light, infrared, and even radio waves. Visible light, defined as light with wavelengths that are visible to normal human eyes, falls in the range of the electromagnetic spectrum between ultraviolet (UV) and infrared (IR). It has wavelengths of about 380 nm to 740 nm. 1.2.2 Explanation by quantum theory In this section, we explain the generation of light using quantum theory. We first introduce the basic concept of energy levels in matter. In quantum theory, all matter particles, e.g., electrons, also have wave properties (matter waves) and possess the wave–particle duality. The relationship between energy and wavelength or frequency for a matter wave is the same as that for light described in Section 1.1.3. In order to understand the formation of energy levels in matter, we will suppose that a particle, e.g., an electron, is trapped inside a potential well, shown as a one-dimensional (1D) well in Fig. 1.2(a). In this case, the form of the electron wave is very much like a sound wave from a guitar string. We can

8

Chapter 1

Figure 1.2 Diagram of (a) electron waves in a 1D potential well and (b) the corresponding energy levels.

understand the formation of energy levels in matter using an analogy to a bounded sound wave. Let us assume that electron waves are constrained by two parallel barriers (a potential well) separated by a distance L. Only waves that can form standing waves after interfering with their corresponding reflected waves from the two parallel barriers can exist in the potential well. For standing waves to exist, the length of the round trip between the two parallel barriers must be an integral multiple of their corresponding wavelengths. As shown in Fig. 1.2(a), wavelengths of electron waves should be 2L, 2L/2, 2L/3. . . , 2L/m,. . . , where m ¼ 1, 2, 3, . . . . So the constraint of the potential well can be seen as a wavelength selector that only retains waves with discrete wavelengths equal to 2L/m. In other words, only some discrete frequencies exist for electron waves in a bounded region. Because the energy of an electron wave is proportional to its frequency (see Section 1.1.3),2 values of energies of existing waves are discrete, as shown in Fig. 1.2(b). The lowest energy level E 0 shown in Fig. 1.2(b) is called the ground state, at which electrons are the most stable. The other energy levels above the ground state are called excited states, where electrons have limited lifetimes and can transit to the ground state or to other low energy levels at any time. Generally, according to Boltzmann distribution, most of the electrons in matter are in the state with the lowest energy, i.e., the ground state. Once matter is heated or injected with an amount of energy, electrons in the ground state will absorb a portion of the energy and transit to excited states. An example of this is the transition in the two-level system shown in Fig. 1.3(a). Because electrons in excited states have higher energy and limited lifetime, they can quickly transit from excited states to the ground state or to other low energy levels. During this transition process, a portion of energy, in the form of light, will be radiated. As shown in Fig. 1.3(b), the energy hn of the radiated

Introduction to Light and Optical Systems

Figure 1.3

9

Diagram of transitions between E 0 and E 1 and in a two-level system.

light equals the energy gap between the two energy levels. The larger the energy gap the larger the energy of light, and the shorter the wavelength of light. Furthermore, energy levels and energy gaps between those energy levels for different types of matter are completely different due to the different configurations (or different constraints to electrons) of atoms. This is the reason that each type of matter has its own characteristic absorptions and emissions. For example, wavelengths of light emitted from any type of sodium lamp are all at 589.0 or 589.6 nm. Note that if transitions between states with many different energy gaps occur simultaneously, light in multiple wavelengths or even white light can be produced. In fact, the above just explained the generation of light from incoherent sources. Light generated by coherent sources, e.g., a laser, will be briefly introduced in Chapter 4.

1.3 Theories of Light: An Overview Theories of light can currently be divided into three main branches: geometrical optics, wave optics, and quantum optics. In the following subsections, brief introductions to these three branches are presented. 1.3.1 Geometrical optics Geometrical optics, which deals with light as rays that travel in straight lines in a homogeneous medium, formulates optical laws in the language of geometry. In geometrical optics, the laws of reflection and refraction can provide very good explanations for many optical phenomena, such as specular reflection, the shallower appearance for the depth of water, the dispersion of light, etc. Furthermore, with the help of these two laws and the rectilinear propagation of light rays in homogeneous media, the path of a light ray can be traced throughout an optical system, revealing the main characteristics of that system. In addition, by introducing the diffraction phenomenon in terms of the laws of reflection and refraction, the diffraction of light by edges, corners, or vertices of boundary surfaces can be predicted using geometrical optics.3 The diffraction ray can also be traced based on geometrical optics.4 The details of diffraction raytracing will be presented in Section 2.5.2. This

10

Chapter 1

geometrical ray-trace methodology for the diffraction ray is commonly used in many optical design software packages. In order to simplify the calculation of paths of rays and give a quick evaluation of an optical system, the input rays to an optical system are limited in the paraxial region, and this is called paraxial optics. If paraxial optics is extended to the whole space of the optical system, it is called Gaussian optics. In Gaussian optics, optical imaging systems can produce perfect images, and characteristics of an optical imaging system, including the object–image relationship, the magnifications, and the field of view, can be easily calculated in the framework of this theory. However, as perfect imaging systems do not exist in practice due to the nonlinearity of the law of refraction and dispersion, images of optical imaging systems are blurred. The differences between the blurred and the corresponding perfect image are known as geometrical aberrations. Although aberrations of an optical imaging system are impossible to completely remove from the entire field of view, the performance of the optical imaging system can be greatly improved by correcting most of the aberrations. Hence, correcting geometrical aberrations is one of the most important steps in optical system design. Usually, geometrical optics can give reasonable explanations for most optical phenomena. However, because geometrical optics is determined from the approximation of wave optics as the wavelength of light approaches zero (l → 0), the wave nature of light is neglected. Therefore, it is impossible to explain the physical reasons for the diffraction and interference of light using geometrical optics. 1.3.2 Wave optics Wave optics, which deals with light as waves, studies optical phenomena involved in the wave nature of light, such as diffraction, interference, and polarization. As light is a form of electromagnetic wave, all wave characteristics of light can be deduced from Maxwell’s equations. For example, starting with Maxwell’s equations, both the wave equations and the Rayleigh–Sommerfeld diffraction formula for light can be determined. The wave equations of light clearly reveal the wave nature of light; the Rayleigh–Sommerfeld diffraction formula describes an optical field at a reference plane as a superposition of spherical waves. In addition, it is highly convenient to describe the propagation of light in free space with the Rayleigh–Sommerfeld diffraction formula. According to the Rayleigh–Sommerfeld diffraction formula, the propagation path of light is not along a straight line due to the diffraction of light. In order to simplify the Rayleigh–Sommerfeld diffraction formula, the Fresnel and Fraunhofer approximations are obtained under the near- and far-field

Introduction to Light and Optical Systems

11

approximations, respectively. Similar to the interference of water waves, a light wave can interfere with another light wave, in what we call interference of light waves. The study of wave optics using the Fourier transform is called Fourier optics. In Fourier optics, a wave is regarded as the superposition of a set of plane waves with the same wavelength, and the direction of the propagation of each plane wave stands for one spatial frequency of the Fourier spectrum of the wave. The Fourier spectrum, which expresses the wave in the spatial frequency domain, is also known as the angular spectrum. In view of the angular spectrum expansion, propagation of light can be seen as the propagation of the angular spectrum. The angular spectrum expansion can describe the propagation of light with more accuracy than the Fresnel and Fraunhofer approximations, but it is not more accurate than the Rayleigh–Sommerfeld diffraction formula. Wavefront aberrations are the deviations of actual wavefront from the ideal wavefront. Note that directions of rays corresponding to the local wavefront are normal. For the sake of convenience, wavefront aberrations are usually expanded in a series of orthonormal polynomials, e.g., the Zernike polynomials. In wave optics, the resolution of an optical imaging system is ultimately limited by the diffraction of light, which is different from the case of geometrical optics. Generally, the resolution of an optical imaging system is determined using the following: the resolution limits of the optical imaging system; the Sparrow criterion and Rayleigh criterion; and a measure of image quality, the Strehl ratio. Wave optics not only explains optical phenomena such as the diffraction and interference of light, but is also commonly used in optical system designs and optical metrology. However, as wave optics ignores the particle aspect of light, it cannot be used in scenarios involving in the interaction between light and matter, such as the photoelectric effect. 1.3.3 Quantum optics Optical phenomena concerning the interaction between light and matter, such as characteristic emission, the photoelectric effect, etc., is in the realm of quantum optics. The concept of the photon, proposed by Einstein in 1905, is fundamental to quantum optics, and the interaction between light and matter can be considered as interactions between photons and atoms of matter. For example, the invention of the laser is the most famous application of quantum optics. This new type of optical source provided an important experimental tool for the development of modern optics. Furthermore, commonly used photoelectric detectors, such as charge-coupled devices (CCDs), photodiodes, and photomultiplier tubes, which will be introduced in Chapter 4, are all successful applications of quantum optics.

12

Chapter 1

Quantum optics describes the wave–particle duality of light well. So far, it is the most accurate theory of optics. Here, we also briefly mention the concept of statistical optics, or the theory of optical coherence. This theory involves the study of the properties of random light fluctuations in terms of statistics. Randomness of light fluctuations is caused by unpredictable fluctuations of light sources, e.g., a hot object, or by a medium, e.g., the atmosphere through which light propagates. Furthermore, the interaction between light and matter is a random or stochastic process that is demonstrated by quantum theory. As a consequence, any detection of light will be accompanied by random fluctuations. For dealing with these situations, a statistical approach must be invoked. This branch of optics is called statistical optics. In this book, the theories behind geometrical optics and wave optics are predominantly explained, while some of the concepts behind quantum and statistical optics are occasionally depicted as well.

1.4 Overview of Optical Systems 1.4.1 What are optical systems? An optical system is usually composed of a number of individual optical elements, such as lenses, mirrors, gratings, detectors, etc. However, as the goal of an optical system is to achieve certain functions by manipulating light, an optical system is not simply a combination of optical elements. The optical system must be carefully designed to constrain the propagation of light in it. From this point of view, an optical system either (1) processes light to produce an image for viewing or to collect energy for detection or (2) analyzes light to determine a characteristic of the light or to reveal properties of the surroundings that are interacting with light. In the next subsection, some main types of optical systems are taken as examples to give readers a general idea of what optical systems are. These types of optical systems are classified based on their main functions. Moreover, as the functions of optical systems are achieved by manipulating light, it should be borne in mind that all optical systems must transfer optical energy. 1.4.2 Main types of optical systems 1.4.2.1 Optical imaging systems

Optical imaging systems are one of the most important and widely used optical systems. Generally, optical imaging systems map an object of interest in object space into a corresponding image in image space. These systems allow objects to be seen more clearly either by improving the resolution of an

Introduction to Light and Optical Systems

13

optical system (e.g., an adaptive optics system for a telescope) or by magnifying the image of an object. By correcting aberrations of a particular optical system via another optical system, the blurred image of an object of interest becomes clear. For example, the eyeglass for correcting vision is an optical imaging system generally used in our lives. With the help of eyeglasses, wearers can see the world more clearly. An adaptive optics system, widely used to improve the image quality of the astronomical telescope, is another optical imaging system that pursues high-resolution images of astronomical targets by correcting aberrations introduced by atmospheric turbulence. Different from eyeglasses, adaptive optics systems need to actively correct dynamic aberrations caused by atmospheric turbulence at a very high frequency. The details of adaptive optics systems will be discussed in Chapter 7. By magnifying images of objects using optical imaging systems, the targets of interest, too far or too small to be seen, can be observed. Telescopes and microscopes belong to this kind of optical imaging systems. For example, by magnifying the angle subtended to observers, telescopes can display more details of objects at a long distance compared to those visible with the naked eye. As shown in Fig. 1.4, the original angle u subtended to observers is magnified to become a larger angle u0 . The principle of a microscope for magnifying the image of an object will be presented in Chapter 5. 1.4.2.2 Optical systems for energy collection

Energy-collecting systems are designed to collect optical energy, regardless of their imaging functions. Solar energy collectors are the most widely used energy-collecting systems. Solar energy is renewable and clean. However, as the solar energy density on the surface of the earth is relatively low, it is difficult to use solar energy directly. By using solar energy collectors, the solar energy distributed on a large area can be concentrated on a small area to

Figure 1.4

Diagram of the magnification of the subtended angle by a telescope.

14

Chapter 1

increase the energy density, and the gathered energy can be used to heat or provide power. An astronomical telescope can also be considered as a type of energy collector, e.g., for astronomical spectroscopy, which aims to gather as much light as possible to detect the spectrum of the dim objects in space. Since the intensity of light collected by the telescope increases with the square of the aperture size of the telescope, telescopes with large apertures can detect dimmer objects in space compared to telescopes with smaller apertures. This is one important reason that apertures of astronomical telescopes are being designed to be increasingly larger. Another example of energy-collecting systems is the optical probe shown in Fig. 1.5. This optical probe is designed to measure the lifetime of fluorescence or phosphorescence emitted from a moving target. The purpose of this system is not imaging but collecting light from a moving fluorescent source in a relative wide field of view without losing intensity caused by the conventional probe with a small field of view.5 To achieve this, an optical conjugate relationship (to be explained in Chapter 2) exists between the first surface of the first lens of the probe and the first surface of the fiber bundle, as shown in Fig. 1.5. This conjugate relationship makes the system very useful for measuring the intensity of light emitted from a moving object. The normalized energy transfer of the optical probe as a function of the incident angle of light (the normalized energy transfer function) is shown in Fig. 1.6. When the incident angle changes in the range of –6 deg to þ 6 deg, the energy-transfer function of the optical probe is close to 1.0 and substantially flat for practical applications. This property of the probe ensures that the measurement of the lifetime of fluorescence from a moving target is almost unaffected by a small field of view in the conventional probe. It should also be noticed that for this system the acceptance angle of the fiber bundle determines the field of view of the system. The illumination system is a very important type of optical system that is used in almost every corner of our world. In our daily lives, both street lamps on the sides of roads and fluorescent lights in buildings and homes are illumination systems that give us a bright world. In addition to the normally used lights, illumination systems can also play an important role in industry and academic research. For example, by applying a structured illumination system in a microscope, observers can distinguish objects in great detail, or even achieve subdiffraction-limited resolution.6

Figure 1.5 Diagram of an optical probe with a photomultiplier tube (PMT) as a detector.

Introduction to Light and Optical Systems

Figure 1.6

15

Normalized energy-transfer function of the optical probe.

References 1. F. A. Jenkins and H. E. White, Fundamentals of Optics, Fourth Edition, McGraw-Hill Education, New York (1976). 2. R. P. Feynman, R. B. Leighton, and M. Sands, The Feynman Lectures on Physics, Volume III, Commemorative Issue Edition, Addison-Wesley, Boston (1989). 3. J. B. Keller, “Geometrical theory of diffraction,” Journal of the Optical Society of America 52(2), 116–130 (1962). 4. Y. G. Soskind, Field Guide to Diffractive Optics, SPIE Press, Bellingham, Washington (2011) [doi: 10.1117/3.895041]. 5. J. Feist and S. Zhang, “Optical Probe and Apparatus,” UK Patent 2484482 (2012). 6. L. Schermelleh, P. M. Carlton, S. Haase, L. Shao, L. Winoto, P. Kner, B. Burke, M. C. Cardoso, D. A. Agard, and M. G. Gustafsson, “Subdiffraction multicolor imaging of the nuclear periphery with 3D structured illumination microscopy,” Science 320(5881), 1332–1336 (2008).

Chapter 2

Geometrical Optics Geometrical optics, an old branch of optics, formulates optical laws in the language of geometry under the approximation that the scale of the wavelength of light, compared with that of the optical system on which light is incident, is close to zero. In the regime of geometrical optics, a fundamental concept is the optical ray. The feasibility of optical rays introduced in geometrical optics can be explained by quantum theory. According to the uncertainty principle in quantum theory, we cannot simultaneously measure the position (x, representing a position in the vertical direction) and the momentum (p, representing a momentum in the vertical direction) of a photon with arbitrarily high precision. The uncertainties of its position and momentum are in compliance with the uncertainty inequality, DxDpx ≥ h∕4p, where h is Planck’s constant, Dx is the uncertainty of the position of the photon, and Dpx is the uncertainty of the momentum of the photon. As shown in Fig. 2.1, photons are emitting from a source at an infinite distance, so all of these photons have a momentum of p ¼ h/l (l is the wavelength of photons) only in the horizontal direction before passing through the slit or an optical element. After passing through the slit, photons spread out at an angle of Du, as shown in Fig. 2.1(a). So the uncertainty of the momentum in the vertical direction can be calculated as Dpx ¼ pDu ¼ (h/l)Du, as expressed in Ref. 1. On substituting this expression into the uncertainty inequality equation above, the angle of spread satisfies Du ≥ l/(4pDx). Here, Dx is on the order of magnitude of the width of the slit. When l/(Dx) → 0, as shown in Fig. 2.1(b), the angle of spread is close to zero. This means that if the size of the slit (or apertures, lenses, etc.) shown in Fig. 2.1 is much larger than the scale of the wavelength, photons will not spread after passing through the optical element; i.e., propagation paths of photons can be approximated by straight lines. Therefore, it can be concluded that rays traveling in straight lines (in geometrical optics) is an inevitable result of quantum theory when the size of 17

18

Chapter 2

Figure 2.1 Illustration of photons traveling through slits with different scales, where the angle of photon spread is (a) Du and (b) close to 0.

the slit (or apertures, lenses, etc.) is much larger than the scale of the wavelength. Although geometrical optics is simple, it can be used to explain many optical phenomena such as the propagation, refraction, and reflection of light, etc. Furthermore, geometrical optics is the essential theory first employed for understanding and designing an optical system. For example, almost all of the initial configurations of optical imaging systems are designed by the laws of geometrical optics. For the sake of convenience, some common concepts in geometrical optics are presented. In geometrical optics, light emitted from sources is usually abstracted as geometrical lines, each line having an amount of energy. These lines are called light rays, and their directions denote propagation directions of light. As introduced in Chapter 1, light is a kind of electromagnetic wave. In this wave regime for light, a wavefront is the surface for which all points that light rays pass through have the same phase. Rays and their wavefronts are always locally perpendicular to each other. Therefore, the propagation of light is equivalent to the propagation of its wavefront. According to the surface types of wavefronts, there are plane waves, spherical waves, cylindrical waves, and others. The beam corresponding to a plane wave is called the parallel beam, while that corresponding to a spherical wave is called the concentric beam. Figure 2.2 shows wavefronts of a light beam emitted from a point source in vacuum. Obviously, the shape of the wavefront is spherical at any time, and

Geometrical Optics

19

Figure 2.2 Rays emitted from a point source and their wavefronts.

the curvature of a wavefront (reciprocal of the radius) decreases along the direction of the propagation of light. When the wavefront is sufficiently far from the point source, the curvature of radius of the wavefront can be regarded as infinite, and a small portion of the wavefront can be approximated as a plane; this small portion of the beam can be considered as a parallel beam. This chapter is mainly about the basic knowledge of geometrical optics required for the analysis and design of an optical system. First, the definition of refractive index and its origin are given and explained. Then, laws of refraction and reflection are presented. Next, the perfect optical imaging system and geometrical aberrations are illustrated. Finally, a brief description of a general procedure for designing an optical imaging system and a concrete design example for an achromat are given.

2.1 Definition of the Index of Refraction Index of refraction n (or absolute index of refraction), a very important parameter to characterize optical properties of a medium, is defined as the ratio of the speed of light in vacuum to that in a given medium: c n¼ , y

(2.1)

where c, about 3  108 m/s, denotes the propagation speed of light in vacuum, and y is the propagation speed of light in the medium. As the speed in a medium y is smaller than that in vacuum c, the index of refraction of the medium is larger than 1.0. In general, the index of refraction of air is about 1.000293 (at 0 °C and 1 atmospheric pressure for yellow light). Since most optical systems work in air, the index of refraction of a medium is often expressed as a relative refractive index, which is the ratio of the speed of light in air to that in the medium. Thus,

20

Chapter 2

the relative refractive index of air is 1. Generally, using the relative refractive index is more convenient than using the absolute index of refraction. When light travels in a medium, to characterize the time required for light to travel between two points, the optical path length (OPL), or optical path is required. The OPL is defined as the product of the geometrical length of the pathway that a light ray travels in a medium and the corresponding index of refraction of the medium through which the light ray propagates. As shown in Fig. 2.3, the OPL between point A and point B in a medium can be expressed as B

OPL ¼ ∫ nðsÞds, A

where n(s) is the index of refraction of the medium on line element ds. In a homogeneous medium, the OPL can be simplified as OPL ¼ nd, where n is the index of refraction of the medium, and d is the geometrical path length that the light ray propagates. The difference between OPLs is called the optical path difference (OPD), which will be discussed in detail in Chapter 3. It must be noted that the above definition of the index of refraction is phenomenological. It is an empirical formula and cannot explain why the speed of light in a medium is slower than that in vacuum. Next, we will try to explain the origin of the index of refraction, and to answer the question of why the propagation speed of light is slow in a medium.

2.2 Origin of the Index of Refraction Here, we explain why the speed of light in a medium is slow. The explanation is based on the interaction between light and matter, which is the origin of the index of refraction. Atoms are basic constituent units of matter, and each atom is composed of a nucleus and one or more electrons around the nucleus. As shown in Fig. 2.4, when light enters a medium, atoms (concentric circles) in the

Figure 2.3

Schematic of the optical path length from A to B in a medium.

Geometrical Optics

Figure 2.4

21

Diagram of the interaction between light and matter.

medium will be polarized and become dipoles (ellipses with positive and negative signs) due to the interaction between the electric field of the light wave and the atoms. Harmonic oscillations of a dipole will radiate a new light wave according to the classical electromagnetic radiation theory briefly presented in Chapter 1. The new light wave can be considered as the sum of two waves, one of which completely cancels out the incident light wave. Meanwhile, the other light wave successively polarizes other atoms. Because the polarization of the medium by light waves needs a little bit of time, the speed of the propagation of light waves is slowed down, and the index of refraction characterizes the retardation influence of a medium on the propagation speed of light. A detailed discussion of this process can be found in Ref. 2. Because the speed of light is slowed down by a medium, in some sense, the index of refraction can be thought as a “medium delay parameter” to the propagation speed of the electromagnetic field in this medium.

2.3 Reflection and Refraction of Light In this section, first we define a unified convention of algebraic signs for distances and angles, then we describe the behavior of optical rays when they intersect the interface separating two homogeneous media with different refractive indices. 2.3.1 Sign conventions It is necessary to adopt a unified convention of algebraic signs for distances and angles. The following conventions are not unique, and many optical workers have their own conventions, but the same convention should be

22

Chapter 2

followed for any question regarding optical imaging systems. In this book, the following sign conventions are adopted: 1. Segments perpendicular to the optical axis are positive above the optical axis, and negative below the axis; 2. Segments parallel to the optical axis are positive from left to right, and negative from right to left; 3. Distances from the vertex of the front surface to that of the rear surface are positive from left to right, and negative from right to left; 4. Rotating the optical axis by an acute angle to the ray, the angle between the ray and the optical axis is positive for counterclockwise rotation, and negative for clockwise rotation; 5. Rotating the normal by an acute angle to the ray, the angle between the ray and the normal is positive for counterclockwise rotation, and negative for clockwise rotation. After defining the unified conventions of algebraic signs for distances and angles in geometrical optics, we can study an optical imaging system based on these conventions. 2.3.2 Laws of reflection and refraction Suppose that a given surface is flat and the refractive index of medium 1 n1 is smaller than that of medium 2 n2. As shown in Fig. 2.5, a plane wave travels into medium 2 from medium 1 at an oblique angle. Rays in medium 1, which are perpendicular to the incident wavefront, are called incident rays. The angle between the incident ray and the normal of the interface is called the incident angle—I1 in Fig. 2.5. The plane specified by the incident ray and

Figure 2.5 Diagram of a plane wave passing through the interface between two different media.

Geometrical Optics

23

the normal of the interface is called the plane of incidence. Rays in medium 2, which are perpendicular to the outgoing wavefront, are called refracted rays. The angle between the refracted ray and the normal of the interface is called the refracted angle—I2 in Fig. 2.5. For medium 1 and medium 2, the symmetry relative to the plane of incidence exists thanks to their homogeneities. Based on this point, we can infer that the refracted rays should lie in the plane of incidence. In other words, the incident, refracted, and reflected rays, and the normal are coplanar. Suppose that at time t0 the incident wavefront reaches the interface at point A, and at time t1 ¼ t0 þ Dt, it reaches the interface at C. During time interval Dt, the outgoing wavefront propagates from AB to DC. According to the definition of the refractive index, the speed of light in medium 1 is y1 ¼ c/n1, while that in medium 2 is y2 ¼ c/n2. Thus, the speed of light in medium 1 is n2/n1 times faster than that in medium 2, and the propagation distance that the wavefront travels during Dt in medium 1 will also be n2/n1 times longer than that in medium 2. Thus, d1/d2 ¼ n2/n1. Since rays are perpendicular to the wavefront and the normal is perpendicular to the interface between two media, angle CAB ¼ I1, and angle ACD ¼ I2. By a trigonometric relationship, it can be determined that d1 ¼ AC · sinI1 and d2 ¼ AC · sinI2. After substituting these two equations (those that define d1 and d2) into the equation d1/d2 ¼ n2/n1, it is easy to obtain n1 sin I 1 ¼ n2 sin I 2 :

(2.2)

This relationship, together with the statement that the refracted ray is in the plane of incidence, is the law of refraction (or Snell’s law). Although the law of refraction is deduced from the case of the plane interface between two media, it is fully applicable to any surface, not just the plane. Furthermore, when a ray is incident on the interface, it is not only the refracted ray that emerges but also the reflected ray; the refracted ray enters the second medium, and the reflected ray propagates back into the first medium. Using an approach similar to the derivation of the law of refraction, it can be proved that the reflected ray and the incident ray are located at different sides of the normal of the interface, and the reflected angle, which is the angle between the reflected ray and the normal of the interface, equals the incident angle in magnitude but with opposite sign according to the sign conventions presented above. Therefore, the mathematical form of the law of reflection is as follows: I 1 ¼ I 3 ,

(2.3)

where I1 is the incident angle, and I3 is the reflected angle, as shown in Fig. 2.6.

24

Chapter 2

Figure 2.6

Relationships between the incident, reflected, and refracted rays.

Figure 2.6 represents the relationship between the incident, refracted, and reflected rays, when a ray is incident on the interface of two media. It should be emphasized that the incident, refracted, and reflected rays, and the normal lie in the same plane, i.e., the plane of incidence. The laws of refraction and reflection can be summarized as follows: 1. The incident, refracted, and reflected rays, and the normal are in the same plane. 2. The sine of both the refracted angle and the incident angle, and the indices of refraction of media satisfy Eq. (2.2). 3. The reflected and incident rays are located on different sides of the normal, and the reflected angle equals the incident angle but with opposite sign. In addition to the laws of refraction and reflection, there is another law in geometrical optics, i.e., the rectilinear propagation of light rays in a homogeneous and uniform medium. Many optical phenomena can be explained by the laws of refraction and reflection. One popular example of the refraction of light is that a fish in water appears to be at a shallower depth than where it actually is. Because the index of refraction of water is larger than that of air, according to Eq. (2.2), the refracted angle will be larger than the incident angle for rays propagating from water into air. As shown in Fig. 2.7(a), a light ray from the fish leaves the water at an angle larger than that in water relative to the normal of the interface between water and air, such that the apparent position of the fish in water is quite a bit shallower than the actual position of the fish. This

Geometrical Optics

25

Figure 2.7 Examples of (a) refraction and (b) reflection.

phenomenon is caused by the refraction of light between water and air and explains why, e.g., someone who is spearfishing needs to aim the spear below the location where the fish appears to be in the water. An example of a type of reflection is specular reflection, which is shown in Fig 2.7(b). This diagram depicts the imaging of an object by a flat mirror. The object appears to be located behind the mirror due to the reflection of light. 2.3.3 Total internal reflection When a ray is incident on the interface of two media, it usually splits into a refracted ray and a reflected ray. However, under certain conditions the refracted ray will disappear. This phenomenon is called the total internal reflection of light. Here, the criteria for the phenomenon of total internal reflection is presented. Usually, for two media, the medium with a larger refractive index is called a higher-index medium, while a medium with a smaller refractive index is called a lower-index medium. When a ray travels from a higher-index medium into a lower-index medium, the refracted angle is larger than the incident angle according to Snell’s law, and the refracted angle increases as the incident angle increases. When the incident angle reaches a critical value, the refracted angle would be 90 deg. At this critical point, light will be totally reflected back into the first medium. According to Snell’s law, the critical angle Im satisfies sin I m ¼ n2 ∕n1 : When the incident angle is larger than this critical angle, the refracted ray will disappear, and all of the light will be reflected back into the first medium. Total internal reflection is widely exploited in optical instruments. For example, optical fibers, which will be presented in Chapter 4, work based on the principle of total internal reflection.

26

Chapter 2

In summary, the criteria for total internal reflection are as follows: 1. A ray must travel from a higher-index medium into a lower-index medium. 2. The incident angle must be larger than the critical angle. 2.3.4 Dispersion of light The refractive index of any optical medium is different for light with different wavelengths. In general, the refractive index of a medium for light with a shorter wavelength, e.g., violet light, is larger than that for light with a longer wavelength, e.g., red light. This phenomenon is called the normal dispersion of light. The opposite situation creates a phenomenon called the abnormal dispersion of light. Figure 2.8 shows an example of dispersion by a triangular prism. When white light is incident on one surface of the prism, white light passing through the prism disperses into different colors with different refracted angles because of the dispersion of the triangular prism. As shown in the figure, red light, having a smaller index of refraction than orange light, will bend less than orange light, which will bend less than yellow light, etc., creating a rainbow. In general, the refractive index of an optical material is measured and reported at specific wavelengths of elemental spectral lines. For visible applications, the hydrogen F line (486.1 nm), helium d line (587.6 nm), and the hydrogen C line (656.3 nm) are usually used. The refractivity of an optical material is defined as the difference between the refractive index of the d line and the unit, or nd – 1. The principal dispersion of an optical material is defined as the difference between refractive indices of the F line and the C line, or nF – nC. The dispersion is commonly specified by another value called the reciprocal relative dispersion, also called the Abbe number V, which is defined as the ratio of the refractivity to the principal dispersion: V¼

nd  1 : nF  nC

Figure 2.8 Diagram of a triangular prism dispersing white light into a rainbow.

Geometrical Optics

27

The Abbe number characterizes the dispersion of a material, and its typical values range from 25 to 65. Obviously, an optical material with a low Abbe number has a high dispersion. The use of optical glasses with different Abbe numbers can correct chromatic aberrations, as will presented in Sections 2.6 and 2.7. The relative partial dispersion ratio is defined as Pd,C ¼

nd  nC : nF  nC

The relative partial dispersion ratio is used when correcting the secondary spectrum in an optical system (which will be discussed in Section 2.6).

2.4 Perfect Optical Imaging Systems In previous sections, the basic laws of geometrical optics have been demonstrated. Now, we turn to a discussion of optical systems. Since most optical systems are optical imaging systems, it is essential to introduce some basic knowledge of optical imaging systems, which are the main class of optical systems discussed in this book. 2.4.1 Imaging concept Before introducing the imaging concept, it should be noted that there are two optical spaces (object space and image space) for each optical system. The object is in object space, and the image is in image space. Each optical space has real and virtual segments, and extends to infinity. Thus, the two spaces completely overlap. The concept of imaging can be roughly defined as the point-to-point mapping between an object in object space and its image in image space. This concept of imaging is based on a basic assumption that any object consists of a collection of independent point sources. Each point source in object space emits a divergent spherical beam into an optical system, which turns each divergent beam into a convergent beam that is concentrated to a small point in image space, and this point is considered as an image of the corresponding point of the object. The collection of such “point images” is called an image of the whole object. The object and its image in this relationship are conjugate to each other. For a point object P, if all rays coming from point P in object space converge to a corresponding point P 0 in image space after passing through an optical system, and the coordinates of P 0 are proportional to those of P, P 0 will be called a perfect image of P. The optical system will be referred to as a perfect imaging system that maps points to points, lines to lines, and planes to planes.3 In general, an optical system is composed of a series of refractive or reflective surfaces whose shapes can be flat, spherical, or aspherical. However, even if all optical components in an optical system are ideal, it is

28

Chapter 2

still impossible for this optical system to produce a perfect image. One of the main reasons for this that the relationship between the refracted angle and the incident angle is nonlinear, as expressed by the law of refraction in Eq. (2.2). This nonlinear relationship will become more severe as the incident angle becomes larger. Therefore, actual optical systems cannot produce perfect images. According to the small-angle approximation in mathematics, the sine of a small angle can be approximated by that small angle in radians, which is actually the first term of the Taylor expansion of the sine function. In this case, the law of refraction becomes n1 I 1 ¼ n2 I 2 :

(2.4)

Thus, the nonlinear relationship between the incident and refracted angles becomes a linear one. If refraction at each surface of an optical system satisfies Eq. (2.4), this optical system can be considered as a perfect imaging system, or an optical system without aberrations. The image of an object formed by a perfect imaging system is completely similar to the object and is identical to the object if their sizes are the same. In practice, small incident and refracted angles only occur when rays lie close to the optical axis—a line that passes through centers of curvatures of all surfaces for a rotationally symmetric optical system—throughout the optical system. The resulting theory studies imaging laws of optical systems in the paraxial region using Eq. (2.4) and is called paraxial optics. Paraxial optics can also be extended to the entire space of an optical system instead of being limited to the vicinity of the optical axis. This approach for describing the behavior of light rays in an optical system is called Gaussian optics. Paraxial optics and Gaussian optics are also called first-order, or primary, optics. It should be noted that any actual optical system with a certain aperture size and a field of view (these concepts will be explained in subsection 2.4.3) cannot satisfy laws of Gaussian optics and consequently cannot produce a perfect image. However, it has been proven that the imaging properties of a well-corrected optical system can behave nearly in compliance with laws of Gaussian optics over a certain field of view. Furthermore, the position and size of the image obtained by Gaussian optics can be considered as a very convenient reference to measure the deviation between the actual image and the perfect image, i.e., the reference for aberrations of an optical system (aberrations will be presented in Section 2.6). Thus, it is essential to study optical systems using Gaussian optics as a starting point. 2.4.2 Cardinal points and planes in imaging systems Mathematician Carl Friedrich Gauss discovered that a well-corrected optical system can be treated as a “black box” whose characteristics are determined

Geometrical Optics

29

by its cardinal points and planes, including the focal points and planes, the principal points and planes, and the nodal points.4 The focal point of an optical system is defined as a common point on the system’s optical axis to which all optical rays parallel to its optical axis converge. As shown in Fig. 2.9, rays that are parallel to the optical axis of an optical system pass through this optical system from object space to image space and intersect the optical axis at point F 0 , and n (n 0 ) denotes the refractive index of the object space (image space). Point F 0 is called the second or rear focal point of the optical system. The plane that is perpendicular to the optical axis and passing through point F 0 is called the second or rear focal plane. Similarly, point F, to which rays parallel to the optical axis from image space converge, is called the first or front focal point. The plane that is perpendicular to the optical axis and passing through point F is called the first or front focal plane. It should be noted that paths of light rays are completely reversible. This means that rays passing through F (F 0 ) in object space (image space) will exit the optical system parallel to its optical axis. As shown in Fig. 2.9, if we extend a ray parallel to the optical axis from the object space in the forward direction and the corresponding convergent ray in the backward direction as well, these two extended lines will meet at point H 0 . The plane that passes through point H 0 and is perpendicular to the optical axis is called the second or rear principal plane, which crosses the optical axis at point P 0 , namely, the second or rear principal point. In a similar manner, we can define the first or front principal point, and the first or front principal plane. With the definitions of the principal and focal points, the first and second focal lengths of an optical system can be given. The distance from the second principal point to the second focal point is the second or rear focal length, denoted as f 0 in Fig. 2.9. Similarly, the distance from the first principal point to the first focal point is the first or front focal length, denoted as f, which is negative according to the sign conventions. The back focal distance/length (bfd/bfl) is the

Figure 2.9

Cardinal points and planes of an optical system.

30

Chapter 2

distance from the vertex of the last surface of the optical system to the second focal point. Similarly, the front focal distance (ffd) is the distance from the vertex of the first surface of the optical system to the first focal point. In general, if the refractive indices of the media in object space and image space are not equal, the absolute values of the first and second focal length are different, and their relationship can be derived following the method used by W. T. Welford.3 For the optical system shown in Fig. 2.10, ray r1, coming from its first focal point F, meets its second focal plane at Q 0 when it passes through the optical system. If a point source is put on the first focal point F, the beam passing through this optical system will be parallel to the system’s optical axis according to the previous definition of the focal point. Therefore, line segments P 0 H 0 and F 0 Q 0 can be considered as wavefronts of the beam. According to the definition of OPL, light rays from the same source to the same wavefront should have the same OPLs. This means that [FHH 0 Q 0 ] ¼ [FPP 0 F 0 ], where square brackets denote the OPL. The OPL can be further calculated as nF H þ ½HH 0  þ n0 f 0 ¼ nf þ ½PP0  þ n0 f 0 , where n and n 0 denote refractive indices of object space and image space, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi respectively. If the distance PH is denoted by h, then F H ¼ h2 þ f 2 . qffiffiffiffiffiffiffiffiffiffiffi2ffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h2 . Generally speaking, since –f ≫ h, F H ¼ h2 þ f 2 ¼ f 1 þ hf 2  f  2f Furthermore, using this approximation expression instead of FH, we obtain 

nh2 ¼ ½PP0   ½HH 0 : 2f

Similarly, by considering another ray r2 that passes through the second focal point F 0 and exits the optical system parallel to its optical axis, we have another expression,

Figure 2.10 Relationship between first and second focal lengths.

Geometrical Optics

31

n0 h2 ¼ ½PP0   ½HH 0 : 2f 0 Comparing the last two equations that concern [PP 0 ] – [HH 0 ], we obtain the relationship n0 n 0 ¼ : f f

(2.5)

Usually, the optical power of an optical system describes the capability of that system to bend rays and is defined as f¼

n0 n ¼ : f0 f

(2.6)

If the meter is used as the unit of focal length, the unit of optical power will be the diopter (m–1). The reciprocal of the optical power, which can be seen as a reduced value of the second (first) focal length by the refractive index of image space (object space), is defined as the effective (or equivalent) focal length (EFL), given by fe ¼

1 f0 f ¼ 0¼ : f n n

(2.7)

The relationship in Eq. (2.7) also means that the EFL is always referenced to an index of refraction of 1.0, even if the image or object space index is not unity. In practice, the refractive index of air can be approximately thought of as 1. It can be seen that if both sides of an optical system are in air, the second and the first focal lengths are equal in magnitude and opposite in sign, and the EFL equals the second focal length. The relationships between principal points and planes are shown in Fig. 2.11. A ray that is parallel to the optical axis of an optical system and crosses the first principal plane at point H will intersect the second principal plane at point H 0 and reach the second focal point. Note that the heights of PH and P 0 H 0 must be the same according to the definition of principal planes. Conversely, a ray that passes through the first focal point F and point H will pass through point H 0 , and the outgoing ray will be parallel to the optical axis. Therefore, points H and H 0 are a pair of conjugate points. Similarly, the first and the second principal planes are a pair of conjugate planes. The transverse magnification (this concept will be explained in subsection 2.4.4) of the pair of the principal planes is 1. Nodal points are two axial points. As shown in Fig. 2.12, if the incident ray enters an optical system through one of the two nodal points, the outgoing

32

Chapter 2

Figure 2.11 Relationships between principal points and principal planes.

Figure 2.12 Illustration of the nodal points of an optical system.

ray will exit from the other nodal point of the optical system and be parallel to the incident ray. When the media on both sides of the optical system are the same (e.g., air), the first and second nodal points coincide with the first and second principal points, respectively. Figure 2.13 is a schematic diagram of the positions of the focal and principal points of some typical spherical lenses. As shown in the figure, the second focal point of a positive lens is on the right side of the lens, and the first focal point is on the left side of the lens. However, the second focal point of a negative lens is on the left side of the lens, and its first focal point is on the right side of the lens. Additionally, the positions of the principal points of each lens type vary with the curvatures of the two surfaces of the lens. The rule for determining the positions of the principal points for different types of lenses can be summarized as follows. If the curvatures of the front surface and the rear surface of a lens are the same, the principal points are inside the lens and

Geometrical Optics

33

Figure 2.13 Positions of the focal and principal points of different types of single lenses.

symmetrical about the center of the lens. If the curvatures of the two surfaces of a lens are different, it is an asymmetric lens, which can be considered as a lens bending from a symmetric lens. As the lens is bent, the positions of the principal points of an asymmetric lens move toward the surface, whose absolute value of curvature becomes larger during the bending of the lens. When the curvature of one surface of the lens becomes zero, one of its principal points will be exactly on the other surface of the lens. If the lens is bent continually, the principal points will keep moving away the lens. 2.4.3 Stops and pupils in imaging systems An optical imaging system typically has several stops (or apertures), which limit the amount of light passing through the optical system. Stops can be rims of lenses, mirrors, or diaphragms placed in the path of light rays. One of these stops will limit the diameter of the beam of light accepted by the optical system from an on-axis point on an object. This stop is called the aperture stop, and its size determines the illumination on the image. Therefore, the aperture stop plays a fundamental role in an optical imaging system. The aperture stop of an optical system is not necessarily immediately evident. This is because the optical system is usually composed of many lenses and other optical elements, and its aperture stop can be located at any position in the optical system, depending on the configuration of the optical system. To find the aperture stop of an optical system, we can trace a pencil of

34

Chapter 2

rays from an axial point on an object through every element of the optical system to find the aperture that confines the diameter of the pencil of rays reaching the image plane. This aperture is the aperture stop. The entrance and exit pupils are also introduced to conveniently describe an optical system. The image of the aperture stop as seen through all optical elements in front of it is called the entrance pupil, and the image of the aperture stop as seen through all optical elements behind it is called the exit pupil. The entrance pupil and the exit pupil are conjugate to each other, as they are the image of the aperture stop in object space and image space, respectively. According to the definition of a pupil, only rays passing a pupil can reach the image plane and be involved in imaging. Therefore, the radiation received by an optical system depends on the sizes and locations of pupils. The positions of the entrance pupil and the exit pupil can be simply determined by the initial and final intersections of the chief ray—the oblique ray from the maximum field angle and through the center of the aperture stop—with the optical axis. In current optical imaging systems, the locations and sizes of pupils and stops can be easily determined by automatic raytracing software. Optical imaging systems have an extra stop that limits the size or angular extent of the image. This stop is called the field stop. Similar to the relationship between the aperture stop and the entrance (exit) pupil, the image of the field stop as seen through all optical elements in front of the field stop is called the entrance window, and the image of the field stop as seen through all optical elements behind the field stop is called the exit window. In addition, the size of the field stop determines the angular field of view in object (or image) space. The angular field of view in object (or image) space is defined as the angle subtended by the entrance (or exit) window to the center of the entrance (or exit) pupil. Next, the example of a simple landscape lens is used to show stops and pupils in an optical imaging system. Figure 2.14 is a diagram of a simple landscape lens consisting of three parts: the objective lens L, the aperture A, and the photographic film B. The objective lens L can form the image of the object onto the photographic film. There are two stops in this system. One is the rim of lens L, and the other is aperture A. To find the aperture stop of the system, a pencil of rays parallel to the optical axis is traced, as shown in Fig. 2.14. Then it can be determined that the size of aperture A defines the diameter of the beam reaching the film. Therefore, aperture A is the aperture stop in this imaging system. In this figure, A 0 is the image of aperture A seen through the lens L. So A 0 is the entrance pupil of this imaging system. Since there is no lens after aperture A in this imaging system, the exit pupil of this imaging system is aperture A, itself. In addition, the size of the photographic film determines the size of the image. Thus, the edge of the photographic film is the field stop of this system. According to the definitions of the entrance and

Geometrical Optics

35

Figure 2.14 Stops and pupils of a simple landscape lens.

exit windows, the entrance window of the system is at infinity, and the exit window is on the image plane. As presented above, the aperture of an optical system is of fundamental importance. For example, it determines the amount of light collected from an object and limits the resolution of the optical system (the resolution of the optical system is defined in Chapter 3). There are three ways to specify an aperture of an optical system, i.e., entrance pupil diameter (EPD), F-number (F/#), and numerical aperture (NA). A circular aperture is commonly used, so the three ways to specify the aperture presented here are all based on the circular aperture. • EPD is defined as the diameter of the beam on the plane of the entrance pupil in object space. • F/#, or the relative aperture, is the ratio of the EFL of an optical system to its EPD for an object at infinity [as shown in Fig. 2.15(a)]: F ∕# ¼

fe , EPD

where fe is the EFL. • NA is specified in object and image space, respectively, as follows: NAobject ¼ n sin u,

NAimage ¼ n0 sin u0 ,

where n and n 0 are refractive indices of media in object space and image space, respectively; u is the half-angle subtended by the entrance pupil to the object point on the optical axis; and u 0 is the half-angle subtended by the exit pupil to the image point on the optical axis. All of these quantities are shown in shown Fig. 2.15(b).

36

Chapter 2

Figure 2.15 Schematics showing definitions of (a) EPD and EFL, from which the F-number can be derived, and (b) numerical aperture.

Generally, the EPD is most commonly used for specifying an aperture of an optical system, especially in the professional optical software for optical design. F/# is typically employed to specify an optical system with a distant object, such as an astronomical telescope. Finally, NA is used to specify a finite conjugate system, e.g., an objective of a microscope (to be discussed in Chapter 5). 2.4.4 Some useful formulas In this subsection, some useful formulas based on the sign conventions presented in Section 2.3 are introduced to express the inherent properties of an optical imaging system. Figure 2.16 is a diagram showing image formation through a perfect, positive lens, where n and n0 are the refractive indices of the media in the object and image spaces, respectively, and h and h 0 are the heights of the object and image, respectively. The first and second focal lengths are labeled as f and f 0 , respectively. The object and image distances, respectively, from the first and second principal planes to the corresponding object and image planes are l and l 0 , and the distances, respectively, from the first or second focal point to the corresponding object or image planes are x and x 0 . The primed symbols

Figure 2.16

Image formation through a perfect, positive lens.

Geometrical Optics

37

refer to quantities associated with the image, and unprimed symbols refer to those associated with the object. According to the sign conventions, h, f 0 , x 0 , and l 0 are positive, and h 0 , f, x, and l are negative. 2.4.4.1 Object–image relationship

The relationship between object and image distances can be determined according to the geometrical relationships shown in Fig. 2.16. From two pairs of similar triangles (ABF, PQF, and A 0 B 0 F 0 , P 0 H 0 F 0 ) and the relations h ¼ P 0 H 0 and –h 0 ¼ PQ shown in Fig. 2.16, it is easy to obtain h x h f0 and ¼ ¼ ; h0 f h0 x0 thus, the following equation can be obtained: xx0 ¼ f f 0 :

(2.8)

This is the Newtonian form of the imaging equation. Using Eq. (2.7), Eq. (2.8) can be written in another form, as xx0 ¼ nn0 f 2e : The image equation has another form in terms of the object and image distances. Substituting x ¼ l – f and x 0 ¼ l 0 – f 0 into Eq. (2.8), we have ðl  f Þðl 0  f 0 Þ ¼ f f 0 : Expanding the left part of the above equation, canceling out the ff 0 term, and simultaneously dividing both sides of this equation by ll 0 , we obtain f0 f þ ¼ 1: l0 l

(2.9)

This is the Gaussian form of the imaging equation. Using Eq. (2.7), another form of Eq. (2.9) can be determined using the EFL as follows: n0 n 1 ¼ f: 0  ¼ l l fe If the optical system is in air, the Gaussian equation can be formulated as 1 1 1  ¼ : l0 l f e 2.4.4.2 Magnifications

The ratio of the size of the image to that of the object is called the transverse (or lateral) magnification of an optical imaging system. According to

38

Chapter 2

expressions about the relation between h and h 0 presented above for derivation of the object–image relationship, the transverse magnification of an optical imaging system can be expressed as m¼

h0 f x0 ¼ ¼ 0: h f x

Using Eq. (2.9), we have l0 ¼

lf 0 : lf

Then, substituting x ¼ l – f in the equation above, we obtain x¼

lf 0 : l0

Substituting this equation into m ¼ –f/x, the transverse magnification can be rewritten as m¼

f l0 : f0 l

Using Eq. (2.7), we have m¼

nl 0 : n0 l

(2.10)

If the medium in object space is the same as that in image space, i.e., n 0 ¼ n, the magnification is simplified as m¼

l0 : l

From Eq. (2.10), it can be concluded that the transverse magnification is determined by the object distance for a particular optical system. Once the position of the object is given, the unique image distance can be determined by Eq. (2.9). For an optical system in which the object and image planes are fixed, its transverse magnification is constant. Considering the case where the object is placed on the first principal plane, the image appears on the second principal plane; then, the transverse magnification is 1, as mentioned in subsection 2.4.2. Quoting Warren Smith, “Longitudinal magnification is the magnification along the optical axis, i.e., the magnification of the longitudinal thickness of the object or the magnification of a longitudinal motion along the axis.”4 For a perfect optical imaging system, when the thickness of the object or the

Geometrical Optics

39

longitudinal motion is very small, the longitudinal magnification can be determined by the differential form of the Newtonian equation as xdx0 þ x0 dx ¼ 0: Then, we obtain ¯ ¼ m

dx0 x0 ¼ , dx x

¯ is the longitudinal magnification. Using m ¼ –f /x ¼ –x 0 /f 0 , we obtain where m x ¼ –f/m and x 0 ¼ –mf 0 . Substituting these two equations into the above ¯ the longitudinal magnification can be rewritten as equation that defines m, ¯ ¼ m2 m

f 0 n0 2 ¼ m: f n

(2.11)

The last result of Eq. (2.11) is obtained by using Eq. (2.7); if n 0 ¼ n, Eq. (2.11) is simplified as ¯ ¼ m2 : m 2.4.4.3 Lagrange invariant

The Lagrange invariant (or Helmholtz invariant) measures the throughput (for a discussion of the concept of throughput, see Ref. 5) of an optical system and also represents the information capability (the space–bandwidth product) of the optical system. The Lagrange invariant is very useful and can be used to calculate optical parameters without any intermediate operations or raytracing. The Lagrange invariant can be determined by tracing two main rays—one is the marginal ray, and the other one is the chief ray, also called the principal ray. As shown in the optical system depicted in Fig. 2.17, the marginal ray emerges from the on-axis point of the object and passes through the edge of the aperture stop, while the chief ray emits from the edge of the object and passes through the center of the aperture stop. The height of the chief ray on

Figure 2.17 Diagram of the Lagrange invariant of an optical system.

40

Chapter 2

the image plane determines the size of the image; the height of the marginal ray on the pupil plane determines the radius of the pupil. With the help of these two rays, at any given location of the optical system, the Lagrange invariant can be defined as H ¼ nup y  nuyp , where y is the height, u is the angle, and n is the refractive index at the location concerned; variables with subscript p denote quantities of the chief ray; and variables without subscript denote those of the marginal ray. For a given optical system, its Lagrange invariant is a constant at all surfaces of its optical elements and spaces between these surfaces. In particular, the Lagrange invariant at the object plane and the image plane can be expressed as H ¼ nuh ¼ n0 u0 h0 , where h is the height of the object, n is the refractive index, and u is the angle between the marginal ray and the optical axis. Primed symbols refer to variables in image space, and unprimed symbols refer to those in object space. 2.4.4.3.1 Lagrange invariant: an incarnation of the uncertainty principle in geometrical optics

In an alternative way to consider a light beam propagating through an optical system, h can be taken to be the diameter of the beam and u the divergence angle of the beam. In this view, the Lagrange invariant indicates that the product of the divergence angle and the diameter of the beam is a constant throughout an optical system. This relationship is, in essence, the uncertainty principle in quantum theory. The beam diameter can be seen as equivalent to the uncertainty of a photon’s coordinates, and the divergence angle of the beam can be seen as equivalent to the uncertainty of the momentum of a photon. The larger the divergence angle the larger the uncertainty of the photon’s momentum. According to the uncertainty principle, with a large degree of uncertainty in the momentum of a photon, the uncertainty of the coordinate of the photon will be small, which means that the diameter of the beam will be small, and vice versa. We should always keep the Lagrange invariant or the uncertainty principle in mind when we study or design an optical system. Especially when determining some core parameters of an optical system, we should validate these parameters by the Lagrange invariant to avoid fatal mistakes. For example, the diameter of a beam in an adaptive optics system for a telescope cannot be overly compressed, or the curvature of the wavefront—due to a large divergent angle based on the Lagrange invariant—will become too large

Geometrical Optics

41

to be corrected. For light coupled into a fiber, the Lagrange invariant is fundamental to the selection of the acceptance angle of the fiber.

2.5 Raytracing In geometrical optics, raytracing is the process of finding propagation paths for light rays passing through an optical system. Generally, the path for a light ray is described in terms of incident height and angle of inclination at each surface of the optical system. The results obtained by raytracing can be used by an optical designer to calculate Gaussian properties and aberrations of an optical system. Thus, raytracing is of fundamental significance in optical design. Due to the development of computer technologies, current raytracing for the design of optical systems can be accomplished by running professional optical design software on computers. However, in order to better understand and design an optical system, it is very helpful and in fact essential to be able to manually trace light rays through the optical system. In this section, both paraxial raytracing for optical systems and raytracing for diffractive optical elements (DOEs) are briefly introduced. 2.5.1 Paraxial raytracing Since this book is not intended specifically for optical system design but rather for helping readers understand the working principle of an optical system, we do not introduce raytracing in detail. Instead, we simply introduce two methods for paraxial raytracing that can be easily used to determine the Gaussian properties of an optical system. For a more detailed discussion on raytracing, readers are referred to the reference Lens Design Fundamentals.6 Raytracing is based on the concept of light rays, which are assumed to propagate along straight lines in any homogeneous medium and follow Snell’s law at an interface of two media having different refractive indices. For paraxial raytracing, as heights and angles of inclination for paraxial rays are infinitesimal, values of the sine function of angles are approximated by angles in radians, and values of the cosine function of angles are approximated as 1.0. Moreover, another implicit hypothesis for paraxial rays is that the surface sag is ignored during paraxial raytracing. Because all heights and angles of inclination used during paraxial raytracing are infinitesimal, we can determine their relative magnitudes using paraxial raytracing equations. Since infinitesimals have finite relative magnitudes, we can use any finite numbers to represent paraxial quantities, but we must remember that all of these finite numbers should be multiplied by a very small factor such as 10–50 if the paraxial quantities are to have any meaning.6 However, longitudinal paraxial measurements, such as the focal length, are not infinitesimal. Thus, paraxial raytracing is generally used for finding the Gaussian properties of an optical system.

42

Chapter 2

Figure 2.18 Diagram of a paraxial ray refracted by a spherical surface.

Figure 2.18 is a schematic of a paraxial ray passing through point A refracted by a spherical surface at point B. Let the height of the ray at the surface be y and the angle between the ray and the horizontal line be u. When the ray is refracted by the surface, the height of the ray is still y, and the angle between the refracted ray and the horizontal line becomes u 0 . The value of angle u 0 can be determined by the following series of equations. Suppose that point O is the center of sphere of the spherical surface. The angle AOB can be determined by ∠AOB ¼ yC, where C is the curvature of the spherical surface. Then, according to the geometrical relationships shown in Fig. 2.18, the incident angle can be determined by i ¼ yC þ u: Using n 0 i 0 ¼ ni (Snell’s law under paraxial approximation), we obtain i0 ¼

n ðyC þ uÞ, n0

where n and n 0 are refractive indices of media just in front of and behind the spherical surface (in object space and image space), respectively. Since i 0  u 0 ¼ yC, we obtain u0 ¼

n ðyC þ uÞ  yC: n0

By multiplying by n 0 on both sides of the equation above, we obtain n0 u0 ¼ nu  yf,

(2.12)

where f is the optical power of the spherical surface, and f ¼ (n0 – n)C. As shown in Fig. 2.18, the distance in object space from the vertex of the surface to point A is denoted as l, and the distance in image space from the same vertex to point A 0 is denoted as l 0 . It is easy to obtain

Geometrical Optics

43

y l ¼ , u y 0 l ¼ : u0 As shown in Fig. 2.19, when a paraxial ray transfers from surface 1 to surface 2 in a homogeneous medium, the angle of inclination remains constant. The ray height can be determined by y2 ¼ y1 þ du01 ,

(2.13)

where d is the distance between the vertices of surfaces 1 and 2. Now, we can trace the paraxial rays through an optical system using Eq. (2.12) for refraction or reflection (just by setting n 0 ¼ –n) at each surface of an optical element of an optical system, and using Eq. (2.13) for transferring between the two surfaces. This method is known as the ynu raytracing method. Generally, a worksheet can be used to trace rays for an optical system. Table 2.1 gives an example of tracing a paraxial ray through an achromatic doublet using the ynu method. The lens data used in the table are the same as those presented in Table 2.4 (Section 2.7). The Gaussian properties of an optical system can be determined by tracing two paraxial rays parallel to the optical axis: one from object space to image space and the other from image space to object space. First, we trace a paraxial ray parallel to the optical axis from object space to image space in order to find the cardinal points of an optical system in image space. An optical system with k surfaces is simply illustrated by the first surface and the final surface, as shown in Fig. 2.20. A paraxial ray parallel to the optical axis is traced from object space to image space. Since the ray is parallel to the optical axis in object space, it must go through the second focal point F 0 of the system after being refracted by the final surface of the system.

Figure 2.19 Transfer of a paraxial ray from surface 1 to surface 2.

44

Chapter 2 Table 2.1 Tracing a paraxial ray through an achromatic doublet.

surface C d n –f d/n y nu u

1

2 –0.014890

0.010863 4.203 1.51680

1 –0.005614

1 –0.003416

0.002322

1 0 0

–0.005079 2 1.67271

2.770965 1

3

1.195665 0.984443

–0.005614 –0.003701

0.980463 –0.003329 –0.001990

–0.006679 –0.006679

Figure 2.20 Finding the cardinal points in image space by paraxial raytracing (adapted from Ref. 10).

From the geometrical relationships shown in Fig. 2.20, the second focal length of the optical system can be given by f0 ¼

y1 : u0k

The optical power and EFL are expressed by n0 u0k n0 ¼  , f0 y1 f0 y f e ¼ 0 ¼  0 10 : n n uk f¼

The back focal length of the optical system can be determined by bf l ¼

yk : u0k

Secondly, the cardinal points in object space of the same optical system can be presented by tracing a paraxial ray parallel to the optical axis from image space to object space, as shown in Fig. 2.21. Similar to finding the

Geometrical Optics

45

Figure 2.21 Finding cardinal points in object space by paraxial raytracing (adapted from Ref. 10).

second focal length, from geometrical relationships shown Fig. 2.21, the first focal length can be expressed as f ¼

yk : u1

The optical power and EFL can also be determined by n nu f¼ ¼ 1, yk f 1 y fe ¼ ¼ k : f nu1 Finally, the front focal distance is given by ffd ¼ 

y1 : u1

2.5.1.1 Matrix approach to paraxial raytracing

The matrix method, an alternative approach to paraxial raytracing, has a simple formulation and is easy to use. Here we briefly introduce the matrix method. We first discuss paraxial raytracing by the matrix method for the case where (1) the ray is refracted on the first surface and (2) the refracted ray travels a distance d1 from the first surface to the second surface in a homogeneous medium. Using Eqs. (2.12) and (2.13), for the first surface we find that y1 ¼ y1 , n01 u01

¼ n1 u1  y1 f1 :

These two equations can be rewritten in a matrix form, as

46

Chapter 2



y1 n01 u01





1 ¼ f1

0 1



 y1 : n1 u1

(2.14)

The square matrix in Eq. (2.14) is known as the refraction matrix for the first surface. Then, the paraxial ray propagates to the second surface. The transfer can be performed by y2 ¼ y1 þ n01 u01

d1 , n01

n2 u2 ¼ n01 u01 , where d1 is the distance between the vertices of surfaces 1 and 2; n2 equals n01 , the refractive index of object space of surface 2; and u2 equals u01 , the angle between the incident ray and optical axis at surface 2. Rewriting these equations in matrix form, we obtain      y1 y2 1 d 1 ∕n01 ¼ : (2.15) n01 u01 n2 u2 0 1 The square matrix in Eq. (2.15) is known as the transfer matrix from surface 1 to surface 2. Substituting Eq. (2.14) into Eq. (2.15), paraxial raytracing using the matrix approach for this case can be expressed as       1 d 1 ∕n01 1 0 y1 y2 ¼ : n2 u2 0 1 f1 1 n1 u1 This matrix can, furthermore, be extended to an optical system containing k surfaces, giving         yk 1 0 1 d 1 ∕n01 1 0 1 d k1 ∕n0k1 · · · ¼ n0k u0k 1 fk1 1 0 1 fk 1 0    y1 1 0 , f1 1 n1 u1 where yk is the height of the ray on the kth surface, n0k is the refractive index of image space of the kth surface, u0k is the angle between the refracted ray (the ray refracted by the kth surface) and the optical axis, fk is the optical power of the kth surface, and dk–1 is the distance between the vertices of the (k – 1)th and kth surfaces. Obviously, any paraxial ray can be traced by the equation above. If we multiply all of the square matrices in the above equation, another square matrix that formulates the property of an optical system can be obtained and written as

Geometrical Optics



A C

47

    B 1 0 1 d 0k1 ∕n0k1 1 ¼ D fk 1 0 1 fk1   1 0 : f1 1

0 1



 · · ·

1 d 01 ∕n01 0 1



(2.16) Since the norm of each matrix on the right side of Eq. (2.16) is 1, the norm of the matrix on the left side of the equation is also 1; that is, AD – BC ¼ 1. With the left side of Eq. (2.16), we can immediately find the height and angle of inclination for a paraxial ray though an optical system by      yk A B y1 ¼ : (2.17) n0k u0k C D n1 u1 We can also find the Gaussian properties by the four elements of the left side of Eq. (2.16). To do this, we first trace a parallel ray with height h and an of angle 0 deg from the object space to image space to find the second cardinal points. Substituting y1 ¼ h and u1 ¼ 0 into Eq. (2.17), we obtain yk ¼ Ah, n0k u0k

¼ Ch:

Then, the second focal length of this optical system can be given by f0 ¼

n0k y1 : ¼  u0k C

(2.18)

The optical power and EFL are expressed, respectively, by n0k ¼ C, f0 f0 1 fe ¼ 0 ¼  : nk C f¼

(2.19)

The back focal length of the optical system can be determined by bf l ¼

n0k A yk : ¼  C u0k

(2.20)

To find the second cardinal points, a parallel ray with height h and an angle of 0 deg from the image space to object space is traced. By setting yk ¼ h and uk ¼ 0, Eq. (2.17) can be written as

48

Chapter 2

   h A ¼ 0 C

B D



 y1 : n1 u1

It is easy to obtain 

h ¼ Ay1 þ Bn1 u1 : 0 ¼ Cy1 þ Dn1 u1

Furthermore, the following sets of equations can be obtained by multiplying the elements A, B, C, and D of the matrix on both sides of each equation above as follows: 

Dh ¼ ADy1 þ BDn1 u1 , 0 ¼ BCy1 þ BDn1 u1



Ch ¼ ACy1 þ BCn1 u1 : 0 ¼ ACy1 þ ADn1 u1

Subtracting one equation from the other in each set of equations, and using the equation AD – BC ¼ 1, we can obtain y1 ¼ Dh, n1 u1 ¼ Ch: Then, the first focal length and the front focal distance can be determined, respectively, by yk n1 ¼ , u1 C y nD ffd ¼  1 ¼ 1 : u1 C f ¼

(2.21)

Thus, if the value of each element of the matrix of an optical system is determined, the Gaussian properties of the optical system can be determined by Eqs. (2.18) through (2.21). 2.5.1.2 Examples 2.5.1.2.1 Single lenses

A single lens can be considered as a combination of two refractive surfaces with a thickness t, as shown in Fig. 2.22. If the curvatures of the two surfaces are known, we can find the optical power of the single lens by the matrix approach of paraxial raytracing. Using Eq. (2.16), the matrix of the single lens can be given by

Geometrical Optics

49

Figure 2.22 Example of a single lens.



A C

  1 0 1 t∕n f1 1 0 1 3 t t 1  f1 n n 5 ¼4 t t : 1  f2 f1  f2 þ f1 f2 n n

  1 B ¼ f2 D 2

0 1



Using Eq. (2.19), the optical power of the single lens can be determined by t f ¼ C ¼ f1 þ f2  f1 f2 : n

(2.22)

The matrix for a thick lens can be written as 2 Mthick ¼ 4

1  f1 f

t n

3 t n 5 t : 1  f2 n

For a frequently used thin lens, the thickness is usually negligible. In this case, the matrix for a thin lens can be written as   1 0 Mthin ¼ , fthin 1 where fthin is the optical power of a thin lens, which can be simplified as fthin ¼ f1 þ f2 ¼ ðn  1ÞðC 1  C 2 Þ,

(2.23)

where f1 ¼ (n  1)C1, f2 ¼ (1  n)C2, C1 and C2 are curvatures of the two surfaces of the singlet, respectively, and the curvature of a lens surface is the reciprocal of its radius. Equation (2.23) is also known as the Lensmaker’s

50

Chapter 2

equation, which describes the relationship between the curvatures of the two surfaces of a lens and its optical power. 2.5.1.2.2 Compound lenses

Frequently, two or more lenses are combined to form a compound lens, and that compound lens can be used as a basic unit in an optical system. Here we present how to deal with compound lenses using the matrix approach by tracing an example combination of two lenses. As shown in Fig. 2.23, two thin lenses are grouped together as a compound lens. For a thin lens, the thickness is ignored. Suppose that these two thin lenses are in air and the distance between them is d. Their optical powers are f1 and f2. Then, the matrix of the compound lens can be determined by 

A C

     B 1 0 1 d 1 0 ¼ D f2 1 0 1 f1 1   1  df1 d ¼ : f1  f2 þ df1 f2 1  df2

Similar to the single lens, the Gaussian properties of this compound lens can be determined by Eqs. (2.18) through (2.21). The optical power of the compound lens can be determined immediately by f ¼ C ¼ f1 þ f2  df1 f2 :

(2.24)

The EFL (the reciprocal of its optical power) of a compound lens can be calculated; then the first and second focal lengths of the compound lens can be determined by Eq. (2.7).

Figure 2.23 Example of a compound lens.

Geometrical Optics

51

2.5.2 Diffraction raytracing In subsection 2.5.1, raytracing for rays encountering a refractive/reflective surface and propagating a distance in a homogeneous medium were discussed. In this subsection, diffraction raytracing, i.e., the diffraction of a single ray by a diffractive surface, is discussed. The basic idea of the following presentation is from the appendix of the Field Guide by Y. G. Soskind.7 The refraction/reflection of a ray by a refractive/reflective surface can be described by Snell’s law in vector form by n2 ðS0  rÞ ¼ n1 ðS  rÞ,

(2.25)

where S0 , S, and r, are unit vectors; n1 and n2 are refractive indices of media before and after rays encountering the surface, respectively; S0 and S define the ray propagation directions before and after encountering the surface, respectively; and r defines the local normal to the surface. These three unit vectors can be described in Cartesian coordinates by 8 0 < S ¼ L0S i þ M 0S j þ N 0S k S ¼ LS i þ M S j þ N S k , : r ¼ Lr i þ M r j þ N r k where i, j, and k stand for the unit vectors for the x, y, and z coordinates, respectively, and all of the coefficients directly preceding them are their corresponding direction cosines. If the refractive/reflective surface is changed to a grating surface, the effect of diffraction by the surface should be considered according to Snell’s law written in vector form. Using the rearranged form of the grating equation described by Eq. (4.13) in Chapter 4, the diffraction of a single ray by a grating surface can be described by adding an additional term, Lq, to Eq. (2.25) as follows: n2 ðS0  rÞ ¼ n1 ðS  rÞ þ Lq,

(2.26)

where L ¼ ml0/dg, m is the working diffraction order, l0 is the working wavelength, dg is the local groove spacing, and q is a unit vector parallel to the grating grooves at the ray intersection point. The geometrical relations of all of these quantities are shown in Fig. 2.24, and the other characteristics of the diffraction of a grating are discussed in detail in Section 4.4 of Chapter 4. By introducing a unit vector p ¼ ui þ vj þ wk, which is parallel to the grating surface and normal to the grating grooves at the intersection point, and where q ¼ –p  r, Eq. (2.26) can be rearranged as ðn2 S0  n1 S þ LpÞ  r ¼ 0:

(2.27)

52

Chapter 2

Figure 2.24 Diagram of diffraction raytracing through a diffractive surface.

According to the calculation rule of a cross product, the direction of the vector in parentheses in Eq. (2.27) should be parallel to the direction of the unit vector r, i.e., n2 S0  n1 S þ Lp ¼ GD r,

(2.28)

where GD is a constant. Rearranging Eq. (2.28), it is easy to obtain S0 ¼

1 ðn S  Lp þ GD rÞ: n2 1

(2.29)

Dot multiplying S0 on both sides of Eq. (2.28), we obtain S0 · S0 ¼

1 ðn S  Lp þ GD rÞ · S0 : n2 1

Because S0 is a unit vector, S0 · S ¼ 1. Replacing S0 on the right side of the above equation by Eq. (2.29) and rearranging, we have ðGD Þ2 þ 2ðn1 S · r  Lp · rÞGD þ n21  n22 þ L2  2n1 LS · p ¼ 0:

(2.30)

As unit vectors r and p are perpendicular to each other (the direction of r is the local normal of the grating surface, and that of p is parallel to the grating surface), p · r ¼ 0. Equation (2.30) can be simplified as ðGD Þ2 þ 2n1 ðS · rÞGD þ n21  n22 þ L2  2n1 LS · p ¼ 0: Let

(2.31)

Geometrical Optics

53

a ¼ n1 ðS · rÞ ¼ n1 ðLS Lr þ M S M r þ N S N r Þ, b ¼ n21  n22 þ L2  2n1 LS · p ¼ n21  n22 þ L2  2n1 ðLS u þ M S v þ N S wÞ: By solving Eq. (2.31), we obtain pffiffiffiffiffiffiffiffiffiffiffiffiffi a2  b, pffiffiffiffiffiffiffiffiffiffiffiffiffi a2  b: GD 2 ¼ a  GD 1 ¼ a þ

(2.32)

Note that, for a refractive/reflective surface, the term Lp in Eq. (2.28) will not D exist. GD 1 corresponds to finite raytracing for refraction situations, while G1 corresponds to finite raytracing for reflection situations. Correspondingly, for diffraction raytracing, GD 1 is used for the description of diffraction by transmission gratings, and GD 2 by reflection gratings. Using Eqs. (2.29) and (2.32), diffraction raytracing through diffractive surfaces can be described. Note that, for refraction/reflection by refractive/ reflective surfaces, as mentioned in Section 2.3, the incident and refractive/ reflective rays, and the normal of the interface between two media are coplanar. However, for diffraction raytracing, vectors S0 , S, and r are no longer coplanar, which means that the diffractive ray does not lie in the incident plane, as shown in Fig. 2.24. After tracing a ray diffracted by a diffractive surface, the propagation of this diffracted ray to the next surface can be described by the propagation formula. Thus, diffraction raytracing can be completely described in the regime of geometrical optics.

2.6 Geometrical Aberrations As seen in the previous section, under the linear approximation for the law of refraction, i.e., the paraxial approximation, Gaussian optics governs the behavior of an optical imaging system; thus, Gaussian optics is also referred to as the laws of a perfect optical imaging system. However, in a practical optical imaging system, the light rays involved in imaging generally arrive from a certain field of view, and the actual image formed by the optical imaging system almost always shows a certain amount of deviation from the image predicted by Gaussian optics due to the nonlinearity of the law of refraction. In other words, a practical optical imaging system with a certain field of view fails to produce a perfect image for an object. The failure of an optical system to form a perfect image is due to the so-called aberrations of this optical system. In the geometrical optics domain, aberrations are called geometrical, or ray aberrations. The ultimate goal when designing an optical imaging system is to make the image formed by the optical system under design match the

54

Chapter 2

perfect image predicted by Gaussian optics. To achieve this goal, aberrations of the optical system should be removed as much as possible. Therefore, the study of geometrical aberrations is an essential prerequisite for designing an optical imaging system. In general, the aberrations of an optical system can be determined by raytracing. Suppose that a ray in an optical imaging system emits from a point of the object and arrives at a point on the image plane after passing through this optical system. If this point on the image plane does not coincide with the corresponding point of the perfect image predicted by Gaussian optics, any deviations from the point predicted by Gaussian optics is due to aberrations introduced by this optical system. Aberrations are caused by the nonlinearity of the law of refraction, in which the sine function is nonlinear. The Taylor expansion of the sine function can be expressed as follows: sin u ¼ u 

u3 u5 u7 þ  þ ::: : 3! 5! 7!

(2.33)

If the incident angle is very small, it is accurate enough to approximate the sine function in the law of refraction expressed by Eq. (2.2) by using only the first term on the right side of the above equation. In this case, the law of refraction obviously becomes a linear one, and the image formed by an optical system following this linear regime is perfect. In other words, the smaller the incident angle u the better the linear approximation for sin u. Therefore, a practical optical imaging system can produce a nearly perfect image in its paraxial region. Otherwise, the high nonlinearity of the law of refraction leads to a departure of the image from the perfect one, and geometrical aberrations appear. The departures of the image from the perfect one caused by the term of the third power in Eq. (2.33) are called primary aberrations, while those caused by terms of higher odd powers are called high-order aberrations. 2.6.1 Primary aberrations There are five primary monochromatic aberrations for a rotationally symmetric optical imaging system. These five basic types of aberrations, also called Seidel aberrations, are spherical aberration, coma, astigmatism, curvature of field, and distortion. Each type of Seidel aberration has its own characteristics and corresponding geometric shape. However, in a practical optical system, these five aberrations combine, but only one or a few of them is dominant. The optical system can be corrected according to the characteristics of the dominant aberrations. In the following subsections, the origins and characteristics of each type of Seidel aberration are discussed in detail for the case of a positive spherical lens.

Geometrical Optics

55

2.6.1.1 Spherical aberration

As shown in Fig. 2.25, if light incident on a lens is very close to its optical axis, light rays will converge to the paraxial focal point predicted by Gaussian optics. As the heights of rays (incident on the lens) from the optical axis of the lens increase, the intersection points of the corresponding outgoing rays and the optical axis become closer and closer to the lens. As such, the image formed through the whole aperture of the lens will not be perfect, but blurred. This blurred image is caused by a type of aberration, spherical aberration, whose magnitude depends on the height of the ray in the entrance pupil. In a quantitative way, spherical aberration, for a particular height of rays on the lens, is defined as the distance along the optical axis from the perfect image of a point object to the cross-point of all rays at this height with the optical axis. Spherical aberration is a kind of on-axis aberration. As shown in Fig. 2.25, the farther the incident ray is from the optical axis the farther its intersection point with the optical axis is from the paraxial focal point, and the greater the spherical aberration. The reason for this phenomenon is the high nonlinearity of the law of refraction when the incident angle becomes as large as the incident rays that are far from the paraxial region. Thus, the marginal rays have the greatest spherical aberration. Example A A plano-convex lens is placed in a pencil of rays parallel to its optical axis, as shown in Fig. 2.26. Which of the two optical layouts in Fig. 2.26 has smaller spherical aberrations? This problem can be easily solved by understanding the origin of spherical aberration in optical systems. As spherical aberration is caused by the fact that rays far from the optical axis have large incident and exiting angles, to minimize spherical aberration, it is preferable to use or design a lens for which rays incident on it have incident and refracted angles that are as small as possible. As shown in Fig. 2.26(a), rays are bent only at the rear surface of the

Figure 2.25 Diagram of spherical aberration for a simple lens.

56

Chapter 2

Figure 2.26 Different layouts of a plano-convex lens leading to different spherical aberrations.

plano-convex lens, resulting in large incident and refracted angles on the rear surface for rays far from the optical axis. However, in Fig. 2.26(b), rays are bent on both surfaces of the lens, and both the incident and refracted angles on each surface are compromised and balanced compared with those of the layout in Fig. 2.26(a). Bending rays by two surfaces as shown in Fig. 2.26(b), which avoids the occurrence of a very large incident angle on each surface of the lens, can be considered as a way to share the optical power. So in the optical layout of Fig. 2.26(b), the convex surface of the plano-convex lens facing toward the parallel light has smaller spherical aberration than that in the layout of Fig. 2.26(a). Example B Why is spherical aberration of a doublet smaller than that of a simple lens with the same clear aperture and EFL as those of a doublet? As shown in Fig. 2.27, a doublet generally consists of two single lenses: a positive one and a negative one. The actual focal point of the positive lens is to the left of its paraxial focal point, and the actual focal point of the negative lens is to the right of its paraxial focal point. According to the sign conventions presented in Section 2.3, spherical aberration of the positive lens is negative, and that of the negative lens is positive. If these two single lenses are designed to have spherical aberrations with the same absolute value but opposite signs, spherical aberration of a doublet can be reduced to close to zero. Two tips for eliminating spherical aberration can be summarized from these two examples: 1. Avoid the occurrence of a very large incident angle on any surface. 2. Increase the number of surfaces or lenses to share the optical power. These two tips are also suitable for eliminating other aberrations, although they are concluded from the two examples for eliminating spherical aberration.

Geometrical Optics

Figure 2.27

57

Diagram of eliminating spherical aberration by a doublet.

2.6.1.2 Coma

Spherical aberration is an on-axis aberration. When rays are obliquely incident on an optical system, the image of a point becomes nonrotationally symmetrical, and off-axis aberrations will appear. Coma, one of the off-axis aberrations, is caused by the variation of the magnification over the entrance pupil. The fundamental reason for different magnifications over the entrance pupil is the different incident angles for rays passing through the optical system. So coma of an optical system changes greatly with the position and size of the pupil/stop. Figure 2.28 is a diagram of the generation of coma for a simple positive lens. When parallel rays are incident on the lens at a certain angle with respect to the optical axis, rays passing through different portions of the lens—being bent to different extents due to the different incident angles on the lens—are converged at different heights. This leads to different magnifications compared with that of the central ray, which passes through the center of the beam incident on the lens. For the case shown in Fig. 2.28, the central ray passes only through the center of the lens. Rays having different distances from the central ray are imaged with different magnifications. Furthermore, the central ray is focused on the paraxial image point, and other incident rays—passing through different concentric circles on the lens and having a common center that is the cross point of the central ray and the lens—are imaged as a series of small circles. The centers of these small circles have different heights compared to the optical axis due to different magnifications, which is caused by different incident angles. These small circles and the paraxial image point form the shape of a comet, which is the origin of the name coma.

58

Chapter 2

Figure 2.28

Diagram of coma for a simple lens.

2.6.1.3 Astigmatism

Astigmatism is the second off-axis aberration to be discussed. For the purpose of illustrating this aberration, two planes, called the tangential and sagittal planes, are introduced. The plane that includes the optical axis and an off-axis object point is called the tangential plane, and the plane perpendicular to the tangential plane and containing the chief ray is the sagittal plane. As shown in Fig. 2.29, when a pencil of oblique rays from an off-axis object point is incident on a simple lens, the cross-section of these rays on the front surface of the lens forms an ellipse. Meanwhile, the curvature of any spherical surface of the lens is the same everywhere. So the optical power in the tangential direction is different from that in the sagittal direction. From Fig. 2.29, it is easy to see that the incident angles of the rays on the lens in the tangential plane are, in general, larger than the incident angles of the rays on the lens in the sagittal plane. This fact leads to the rays in the tangential plane being more severely bent toward the lens than those in the sagittal plane; as such, the optical powers in the tangential and sagittal planes are different. As is well known, rays passing through lenses with different optical powers converge to different points; similarly, rays lying in the tangential plane and rays lying in the sagittal plane converge to two different points, each point being a different distance from the lens. More precisely, rays in the tangential plane converge to a line that is perpendicular to the tangential plane, and rays in the sagittal plane converge to a line that is perpendicular to the sagittal plane. So the two focused lines are perpendicular to each other, and the distance along the chief ray between these two focused lines is defined as the measure for the aberration of astigmatism.

Geometrical Optics

59

Figure 2.29 Diagram of astigmatism of a simple lens.

2.6.1.4 Curvature of field

A diagram of the curvature of field, the third off-axis aberration to be discussed, is shown in Fig. 2.30. When parallel rays are incident on a lens with different angles to the optical axis, these rays are focused at different positions in image space. As the law of refraction is highly nonlinear, different angles between the incident rays and the optical axis result in different distances between the lens and the corresponding focused positions. The actual focal

Figure 2.30

Diagram of the generation of curvature of field for a simple lens.

60

Chapter 2

plane is no longer flat, but curved. The difference between the curved plane and the paraxial focal plane is called curvature of field. For example, in order to correct curvature of field for an optical imaging system, photographic film can be curved on the image plane of a camera according to the amount of curvature of field for the imaging system. 2.6.1.5 Distortion

Similar to the generation of curvature of field, the aberration of distortion is also due to the rays incident on a lens with different angles to the optical axis converging to different positions. As distances between the lens and these positions are different, magnifications at these different positions on the image plane are also different. When an extended object is imaged by this lens, magnifications for different points on this extended object will be different, resulting in a distorted image of the extended object. The difference between the paraxial image and the distorted image is called the aberration of distortion. When the heights of off-axis image points of the extended object are larger than those of the corresponding paraxial image points, this kind of distortion is said to be positive; conversely, negative distortion is seen when the heights of off-axis image points of the extended object are smaller than those of the corresponding paraxial image points. Figure 2.31 shows two distorted images of four concentric squares. The squares depicted in solid lines are distorted images, while those depicted in dashed lines are perfect images. The pincushion shape is caused by positive distortion (or pincushion distortion), while the barrel shape results from negative distortion (or barrel distortion).

Figure 2.31

(a) Pincushion and (b) barrel distortions.

Geometrical Optics

61

2.6.2 High-order aberrations As mentioned at the beginning of this section, all monochromatic aberrations except the primary aberrations are called high-order aberrations. In general, high-order aberrations include fifth-order, seventh-order, ninth-order, and other higher, odd-power aberrations. The aberrations for each order have many terms, e.g., fifth-order aberrations have nine terms. Thus, it is very useful and essential to use a general approach of analysis for high-order aberrations. A high-order aberration can be classified as spherical-aberrationlike, astigmatic, and coma-like, according to its graphical representation.3 All aberrations, including primary and high-order, can be classified as astigmatic/ symmetric and comatic/asymmetric aberrations.6 A common method of correcting for high-order aberrations is to balance these high-order aberrations with the proper number of low-order aberrations.8,9 One important reason for classifying aberrations into different categories is that aberrations belonging to the same category can be used to balance each other. 2.6.3 Chromatic aberrations The primary aberrations discussed previously are monochromatic aberrations. When a lens is illuminated by a broadband light source, another type of aberration, chromatic aberrations, will appear. Due to the dispersion of optical materials, as mentioned in Section 2.3, performances of an optical component for light with different wavelengths differ as well, and all of the differences between the images resulting from light with different wavelengths caused by dispersion are called chromatic aberrations. Materials with normal dispersion have larger indices of refraction for shorter wavelengths than for longer wavelengths. As explained in the last part of Section 2.3, a triangular prism bends blue light more severely than red light because of dispersion. Similarly, a simple positive lens, approximately considered as a combination of two triangular prisms with their bases cemented together, will bend blue light toward its optical axis more than it bends red light [Fig. 2.32(a)] if the lens is made of an optical material with normal dispersion. As such, the positive lens has a larger focal length for red light (C line) than for blue light (F line). This variation in focal length with wavelength is called axial chromatic aberration or axial color. Lateral chromatic aberration, also known as lateral color, is caused by the dispersion of the chief ray.10 When a lens forms an image of an off-axis point for different wavelengths, lateral color occurs. As shown in Fig. 2.32(b), when a simple lens is illuminated by broadband light from an off-axis point, the image has a different lateral magnification for each color and is blurred by a radial color smear. Lateral color is often very difficult to correct for wide-field systems.

62

Chapter 2

Figure 2.32 Diagram of chromatic aberrations of a simple lens: (a) axial chromatic aberration and (b) lateral chromatic aberration (adapted from Ref. 10).

As light incident on an optical system consists of multiple wavelengths in most cases, it is essential to correct chromatic aberrations of the optical system by using achromatic optical components. Here, two examples of correcting chromatic aberrations are demonstrated. 2.6.3.1 Example A: The principle of the positive achromatic doublet

A positive achromatic doublet is usually composed of two parts, a positive lens and a negative one. As shown in Fig. 2.33, supposing that there is no negative lens, the positive lens would focus blue light and red light at two different points (A and B, respectively). In order to pull these two focal points together, a negative lens, which can bend light away from its optical axis and introduce chromatic aberrations having signs opposite to those of the positive lens, should be cemented together with the positive lens to compensate for chromatic aberrations caused by the positive lens. Furthermore, the negative lens should have a lower optical power than the positive lens to ensure that the total optical power of the achromatic doublet is positive according to the formula for the optical power of a compound lens [Eq. (2.24)]. To eliminate chromatic aberrations and make the doublet simultaneously behave as a

Geometrical Optics

63

Figure 2.33 Diagram of the principle of a positive achromatic doublet (adapted from Ref. 14).

positive lens for the positive achromatic doublet, the positive lens should be made of low-dispersion optical material, and the negative lens should be made of high-dispersion material. This is the basic working principle of an achromatic doublet. An achromatic doublet can focus blue light (F line) and red light (C line) at a common focal point, but focuses yellow light (d line) at a different location, as shown in Fig. 2.34. The distance between the coincident focal point for the F and C lines and that for the d line is called the secondary color (secondary spectrum) of the achromatic doublet. To correct the secondary color, a triplet or a compound composed of three singlets with different dispersion glasses is needed. 2.6.3.2 Example B: Binary optics used for chromatic aberration correction in the IR spectrum

As optical materials with high transmission in the IR band are practically limited to silicon, germanium, and zinc selenide, and all three of these materials have low dispersions, the types of achromatic lenses for use in the IR are limited, even if an air-spaced doublet and triplet are included. Here, the concept of using binary surfaces to correct chromatic aberrations is presented.

Figure 2.34

Diagram of secondary color (adapted from Ref. 10).

64

Chapter 2

In this example, binary optical elements (BOEs) serve to compensate for chromatic aberrations. BOEs refer to optical elements with one or more etched surfaces, such as gratings or zone plates. Compared with more common lenses, a binary optical lens offers three significant advantages when used for correcting chromatic aberrations by cementing two lenses to create a doublet. First, a binary optical lens has the fundamental property that its focal length is inversely proportional to the wavelength; secondly, a binary optical lens behaves like an ultrahigh-index refractive lens with an index that varies linearly with wavelength;11 thirdly, a binary optical lens has an effective Abbe number of –3.45. These three properties of a binary lens allow for a new way to correct chromatic aberrations of a conventional refractive lens in the IR spectrum. Based on the method in example A, where chromatic aberrations are corrected using two lenses with different materials, an achromatic lens in the IR spectrum can be produced by etching one or more surfaces of a conventional IR lens according to the working principle of binary optics. This type of lens is called a hybrid diffractive–refractive lens. Alternatively, but similar in concept, chromatic aberration of an optical imaging system in the IR spectrum can also be eliminated by placing a binary optical lens at the proper position in the optical system. Detailed discussions on the correction of chromatic aberrations using binary optics can be found in Refs. 11–13. In practice, it is impossible and unnecessary to completely eliminate aberrations since all detectors, including human eyes, have detection limits. In optical system design, we usually reduce the aberrations of an optical system under design to a certain tolerance range so that the residual aberrations have little influence on the image quality of the optical system or cannot be detected by detectors; e.g., the size of a point image is smaller than or equal to the size of the pixel of a CCD. Then the optical system is considered to be a welldesigned system that can provide images with high quality.

2.7 General Procedure for Designing Optical Imaging Systems The design of an optical imaging system is both scientific and artistic. It is scientific because each optical imaging system is designed and evaluated based on physical laws and mathematics; it is artistic because the configuration of the designed optical imaging systems is basically dependent on the designer’s personal experience, habits, interests, and understanding of optics. In this section, a brief introduction to the design of optical imaging systems is given. The criteria for evaluating an optical imaging system will be discussed in Section 3.6. The flow chart shown in Fig. 2.35 is a basic procedure for designing optical imaging systems. According to this procedure, the design of an optical imaging system can be divided into two parts: first-order design and detailed design.

Geometrical Optics

65

Figure 2.35 Flow chart of the procedure for designing an optical imaging system.

2.7.1 First-order design of optical imaging systems Generally speaking, the design of an optical imaging system is initiated by a need to develop a piece of equipment or an instrument for imaging. As shown in Fig. 2.35, the first-order design of an optical imaging system consists of two steps, i.e., (1) determining the optical parameters according to the requirements

66

Chapter 2

of the end users and (2) choosing or designing an initial configuration for the optical imaging system. In order to design an optical imaging system that satisfies the requirements of the end users, the size and basic optical parameters of the imaging system must be determined. These basic optical parameters are set or calculated by the first-order approximation of geometrical optics. When determining these parameters, the mechanical configuration, electrical characteristics, and working environments should be also considered to ensure that the design is feasible in practical implementation. In general, four aspects need to be considered when designers determine the optical parameters of an optical imaging system to be designed. The first aspect is the basic parameters of the optical imaging system, such as the working wavelength band, the magnification, the field of view, the object distance, the image distance, the EFL, the entrance pupil, the exit pupil, and so on. The second aspect is the physical dimensions of the optical imaging system, which can be considered as a constraint in the first-order design. The third aspect is the image quality of the optical imaging system. The final aspect is the working environments of the optical imaging system, for which the system can be designed to tolerate specified degrees of variations in vibration, temperature, and humidity. Determination of the optical parameters is a challenging task for optical designers. Experienced designers having comprehensive knowledge of optical systems can properly adjust these parameters, and the selected parameters will not only meet the customer’s requirements but also make the resulting design and implementation go smoothly. However, parameters proposed by beginners can be hard to achieve in detailed design or even impossible to implement in practice. For that reason, designers should become familiar with various typical optical systems and gradually accumulate experience in optical design. Once these parameters are determined, the initial configuration of the optical imaging system can be chosen or designed. Generally, designers can take the configurations of optical systems with similar functions from published patents or related journals and books as the initial configuration of their designs. Whether the optical imaging system to be designed is simple or has some special characteristics, designers can determine the initial configuration using knowledge of Gaussian optics and theory of primary aberrations. For example, to design a photographic objective with a focal length of 50 mm, an F/# of 3.5, and a field of angle of 50 deg, generally speaking, a Cooke or Tessar form can be chosen as an initial configuration. If another objective has the same requirements as the above photographic objective except for the F/# being 2, the double-Gaussian form would be more suitable as the initial configuration. The reason for choosing these initial optical configurations for the required photographic objective is that they allow for easily attainable and effective design results.14 In terms of optimization, the goal is a good resulting design that can be easily realized.

Geometrical Optics

67

2.7.2 Detailed design of optical imaging systems As shown in Fig. 2.35, the detailed design of an optical imaging system mainly consists of three steps: configuration optimization, tolerance analysis, and image quality evaluation. In order to ensure that the image quality achieves design targets, image quality evaluation is indispensable after each step of configuration optimization and tolerance analysis. Note that these procedures need to be repeated several times to adjust the configuration parameters of an optical system until the requirements of the end users are completely satisfied. Sometimes, regardless of how much effort has been put into the detailed design, a satisfactory design cannot be configured. In this case, the designer should return to the first-order design to change the initial configuration of the optical imaging system under design. Due to the development of computer technology, the entire process of the detailed design can be performed on a computer by a professional software such as Code V or Zemax. The computer-assisted design makes the process of optical imaging system design simple and easy. However, if the design of the optical imaging system is totally dependent on computers and lacks a thorough understanding of the whole optical imaging system to be designed, it can be very difficult to get a feasible design. Next, the steps involved in configuration optimization and tolerance analysis are discussed. 2.7.2.1 Configuration optimization

Configuration optimization, the purpose of which is to improve the image quality of an optical imaging system, is a process of adjusting the configuration parameters of the system according to the results of raytracing. The adjustable configuration parameters include parameters of the surface type for each lens (radius of curvature of the spherical lens, high-order coefficients of the aspheric lens), thicknesses of lenses, intervals between lenses, materials of lenses, etc. Configuration optimization usually needs to be iterated many times, especially for optical imaging systems with strict requirements and complicated configurations. In general, the goal of configuration optimization is to find a reasonable set of configuration parameters with which aberrations of the optical system are equal to or smaller than those specified in the system requirements. In mathematics, the process of finding the set of configuration parameters can be considered as a nonlinear optimization problem, in which a minimum point in the solution space needs to be found without any information on the locations of the minimum point. As shown in Fig. 2.36, taking a nonlinear optimization problem in one dimension as an example, the process of optimization involves finding an acceptable point on a curve that represents the set of candidate solutions. The dashed line in Fig. 2.36 represents design requirements, and points C, F, and G below this line are called the acceptable points. If any one

68

Chapter 2

Figure 2.36 Schematic illustrating the optimization of configuration parameters.

of the acceptable points is arrived at during the optimization, that point is viewed as the solution of this problem, and the optimization is complete. Currently, nearly all configuration optimization is performed with the help of professional optical design software (e.g., Zemax, Code V, OSLO— Optics Software for Layout and Optimization, etc.). Designers set the optimization variables (e.g., thickness, curvature, and materials) and choose the proper merit functions (e.g., wavefronts, axis color, EFL), then the process of optimization is performed automatically by the software. Usually, optimization variables are those configuration parameters that need to be adjusted. However, automatic optical design programs can converge to some local minimum points that are not acceptable (such as points B, D, or H in Fig. 2.36). In this case, designers need to adjust the variables and merit functions, and run the program again. Note that optical design software is not omnipotent, and additional guidance by human input is essential during computer-assisted configuration optimization. 2.7.2.2 Tolerance analysis

Evaluating the tolerances of a design scheme is an important, indispensable, and final step in optical imaging system design; only optical imaging systems that have been designed with the appropriate tolerances can be fabricated and used in the real world. Three steps are involved in establishing tolerances to parameters of a designed optical system. First, defects in the image that are caused by fabrication errors, assembly errors, and effects of the environment should be analyzed. Next, an appropriate tolerance must be established for each parameter of the designed system to make sure that the system can be manufactured. Finally, a different solution must be provided for each of the different working environments under which the optical system under design will be used. In general, as with configuration optimization, tolerance analysis can also be carried out with the help of optical design software as long as the software has its own integrated tolerance modules. Note that an optical system with strict tolerances is usually difficult or even impossible to manufacture. In this case, the optical system should be re-optimized or re-designed until an

Geometrical Optics

69

acceptable tolerance is distributed, as shown in Fig. 2.35. Then the optical system design is complete, and designers can draw diagrams of the optical elements and provide them to the manufacturers. Here, only a brief summary of the optical system design process is presented; detailed descriptions can be found in Refs. 4, 6, 9, and 14–18. The next subsection describes the design of an achromatic doublet that basically follows the design procedure provided above. 2.7.3 Design of an achromatic doublet Due to the dispersion of optical materials, chromatic aberrations exist universally in optical imaging systems that use refractive components for broadband illumination. An achromatic doublet comprises two parts that have been cemented together. The two parts are made of materials having different dispersions. The primary axial chromatic aberration is reduced by drawing the F and C lines to a common focus. Meanwhile, spherical aberration of the achromatic doublet can also be reduced by bending each part of the doublet. Suppose that an achromatic doublet to be designed for visible light has the following optical parameters: the half field of view is 1 deg, the EFL is 150 mm, and the F/# is 6. The design of this achromatic doublet can be divided into the two separate steps explained earlier in this section. The first step, first-order design, calculates an initial configuration based on geometrical optics, or selects an initial configuration from an existing design according to the requirements of the doublet to be designed. The second step, detailed design, optimizes the initial configuration to seek a solution that satisfies all of the requirements of the achromatic doublet to be designed. In the first step, the initial configuration of the doublet is calculated based on Gaussian optics. As the required doublet will be used for visible light, the working wavelengths of the doublet are selected in the range from 486.1 nm (F line) to 656.3 nm (C line), and a wavelength of 587.6 nm (d line) is used as a reference wavelength. As determined in Section 2.6, in order to correct chromatic aberrations, the doublet is a composite lens in which a positive lens with a low-dispersion glass and a negative lens with a high-dispersion glass have been cemented together. Accordingly, glasses N-BK7 (glass code: 517642.251) and N-SF5 (glass code: 673323.286) from the SCHOTT catalog19 are selected for the positive and negative lenses, respectively. The optical glasses used in the lens to be designed are identical to those of a commercial achromatic lens (from Edmund Optics, Stock No. #47-643).20 First, we derive the formula for the optical power variation of the single thin lens with its refractive index. By taking the differential of the Lensmaker’s equation [Eq. (2.23)], it is easy to obtain the optical power variation of the single thin lens with the refractive index: df ¼ DCdn,

70

Chapter 2

where DC is the difference in curvatures between the two surfaces of the single thin lens, and dn is the variation in refractive index of the thin lens optical material with wavelength. For the F and C lines, the optical power variation of the single lens is df ¼ DCðnF  nC Þ, where nF and nC are the refractive indices of the optical material for the F and C lines, respectively. Multiplying df ¼ DC(nF – nC) by (nd – 1) / (nd – 1) and rearranging in terms of the definition of the Abbe number V presented in Section 2.3, the above equation can be written as df ¼ DCðnF  nC Þ ¼

nd  1 nd  1

fd , V

(2.34)

where nd is the refractive index of the material of the lens for the d line, and fd is the optical power of the lens for the d line. Thus, we see that the optical power difference of the lens for the F and C lines equals the optical power for the d line divided by the Abbe number. Next, we try to analyze the achromatic doublet. As the distance between the two parts of the doublet is zero (they are cemented together), based on Eq. (2.24), the power of the doublet is f ¼ fp þ fn ,

(2.35)

where fp and fn are optical powers of the positive and negative lenses of the doublet for the d line, respectively. The difference df between the optical powers of the F and C lines for the doublet can be determined by taking the differential of Eq. (2.35) as follows: df ¼ dfp þ dfn :

(2.36)

Using Eq. (2.34) for the positive and negative lenses, we have the following expression: df ¼

fp fn þ , Vp Vn

(2.37)

where Vp and Vn are the Abbe numbers of the optical glasses of the positive and negative lenses, respectively. Furthermore, by setting df ¼ 0, the foci of the F and C lines are forced to meet on the optical axis. Thus, we obtain the following equation:

Geometrical Optics

71

fp fn þ ¼ 0: Vp Vn

(2.38)

By means of Eqs. (2.35) and (2.38), the optical powers of the positive and negative lenses of the doublet can be, respectively, calculated by Vp , Vp  Vn Vn : fn ¼ f Vp  Vn

fp ¼ f

(2.39)

According to Ref. 19, the Abbe numbers for N-BK7 and N-SF5, i.e., Vp and Vn, are 64.17 and 32.25, respectively. Substituting values of the two Abbe numbers and the optical power of the doublet (the reciprocal of the EFL) into Eq. (2.39), the numerical values of the optical powers of the positive and negative can be calculated as fp ¼ 0.0134022556 mm–1, and fn ¼ –0.0067355890 mm–1, respectively. To give an appropriate initial configuration for the doublet under design (Fig. 2.37), some relations for the curvatures C1, C2, C3, and C4 are tentatively given. First, for simplicity, the absolute curvatures of two surfaces of the positive lens are set to be the same, i.e., C1 ¼ –C2; second, the first surface of the negative lens is set to have the same curvature as that of the second surface of the positive lens (C2 ¼ C3) since these two surfaces are cemented together. According to the Lensmaker’s equation [Eq. (2.23)], the differences between the curvatures of the positive and negative lenses can be, respectively, calculated by

Figure 2.37 Diagram of an achromatic doublet under design.

72

Chapter 2

fp , nd  1 f C3  C4 ¼ 0 n , nd  1 C1  C2 ¼

(2.40)

where nd and n0d are refractive indices of glass N-BK7 and N-SF5 for the d line, respectively. The refractive indices at the d line are used because the d line acts as the reference light for correcting chromatic aberrations of the doublet. According to Ref. 19, the refractive indices of N-BK7 and N-SF5 for the d line, i.e., nd and n0d , are 1.51680 and 1.67271, respectively. Substituting nd and n0d into Eq. (2.40), and using the relations C1 ¼ –C2 and C2 ¼ C3, the curvature for each surface of the doublet can be given by C1 ¼ –C2 ¼ C3 ¼ 0.0129665786 mm–1, and C4 ¼ –0.0029539595 mm–1. Thus, the radius of curvature of each surface, i.e., the reciprocal of the curvature, can be given. Furthermore, the initial thicknesses of the two parts of the doublet are set to 0 mm, and the initial configuration of the doublet is determined as shown in Table 2.2. This initial configuration of the doublet has an EFL of 150 mm and an axial achromatic aberration of 3.764  10–4 mm. The second step, optimization of the initial configuration, can be performed by a professional optical design software, such as Zemax or Code V. In this example, Zemax is used to optimize the initial configuration of the doublet; discussion on optimizations in Zemax can be found in Refs. 8 and 16. In Zemax, the radius of each surface and the thicknesses of the doublet to be designed are all set as variables, and Zemax’s default merit function is used. Moreover, two additional constraints, the EFL for d light being 150 mm and the axial chromatic aberration for the F and C lines being zero, are added into the default merit function. When using the Zemax merit function, the center thicknesses of glasses are limited to between 2 mm and 8 mm, and the edge thicknesses are limited to not less than 2 mm. During optimization, first, the radius of each surface of the doublet is set as a variable, and the two thicknesses are fixed; then, the radius of each surface of the doublet is fixed, and the two thicknesses are set as variables. After this process is repeated three times, good design results (Table 2.3) can be obtained. The achromatic doublet with this configuration has an EFL of 150.001 mm and an axial chromatic aberration of 60.55 mm, as shown in Fig. 2.38. Table 2.2 Surface 1 2 3

Initial configuration for the achromatic doublet under design. Radius (mm)

Glass

Thickness (mm)

Diameter (mm)

77.1213 –77.1213 –338.5287

N-BK7 N-SF5 —

0 0 —

25 25 25

Geometrical Optics

73

Table 2.3 Optimized configuration for the achromatic doublet. Surface 1 2 3

Radius (mm)

Glass

Thickness (mm)

Diameter (mm)

91.994 –66.658 –196.808

N-BK7 N-SF5 —

4.203 2 —

25 25 25

Figure 2.38 Axial chromatic focal shift of the achromatic doublet with an optimized configuration.

To ensure that the achromatic doublet can actually be manufactured, testplate fitting is performed on the optimized configuration shown in Table 2.3. After performing test-plate fitting on the optimized configuration, the configuration of the doublet is changed, as shown in Table 2.4. The curvatures of the surfaces in this configuration are slightly different from those in the optimized configuration. Because some parameters are changed after the test-plate fitting, the EFL as well as the axial chromatic aberration vary slightly. The EFL of the doublet becomes 149.733 mm, and the axial chromatic aberration becomes 88 mm. Note that if the EFL and the axial chromatic aberration cannot satisfy the requirements of the end users, the configuration after test-plate fitting should be further optimized.

74

Chapter 2 Table 2.4 Surface 1 2 3

Configuration of the achromatic doublet after test-plate fitting. Radius (mm)

Glass

Thickness (mm)

Diameter (mm)

92.052 –67.158 –196.901

N-BK7 N-SF5 —

4.203 2 —

25 25 25

The residual chromatic focal shift (the focal shift between the F, C, and d lines) shown in Fig. 2.38 is caused by the secondary spectrum (defined in Section 2.6), which can be further corrected by a triplet—a compound lens consisting of three lenses made of different dispersion glasses. For the configuration of the designed doublet shown in Table 2.4, the root mean square (RMS) of the wavefront aberrations at different fields of view for 0, 0.707, and 1 deg for the d line are 0.0693, 0.1238, and 0.1805 waves, respectively, and the wavefront for the field of view of 0 deg for the d line is shown in Fig 2.39. The modulation transfer function (to be presented in Chapter 3) of the doublet is shown in Fig. 2.40. When the value of the modulation transfer function (MTF) is above 0.4, spatial frequencies are lower than 65 lines/mm for a field of view of 1 deg.

Figure 2.39 Wavefront of the doublet with a field of view of 0 deg for the d line.

Geometrical Optics

75

Figure 2.40 Modulation transfer function of the designed doublet.

Finally, tolerance analysis should be performed to ensure that manufacturers following this design can fabricate the lens. The tolerance analysis module is embedded in most optical design software, so this content is not presented in this example.

References 1. R. P. Feynman, R. B. Leighton, and M. Sands, The Feynman Lectures on Physics, Volume III, Commemorative Issue Edition, Addison-Wesley, Boston (1989). 2. M. Born and E. Wolf, Principles of Optics, Seventh Edition, Cambridge University Press, Cambridge (1999). 3. W. T. Welford, Aberrations of Optical Systems, Adam Hilger, Bristol (1991). 4. W. J. Smith, Modern Optical Engineering, Fourth Edition, McGraw-Hill, New York (2008). 5. J. Mertz, Introduction to Optical Microscopy, Roberts and Company Publishers, Greenwood Village, Colorado, pp. 88–95 (2010).

76

Chapter 2

6. R. Kingslake and R. B. Johnson, Lens Design Fundamentals, Second Edition, Academic Press, Burlington, Massachusetts and SPIE Press, Bellingham, Washington (2010). 7. Y. G. Soskind, Field Guide to Diffractive Optics, SPIE Press, Bellingham, Washington (2011) [doi: 10.1117/3.895041]. 8. R. R. Shannon, The Art and Science of Optical Design, Cambridge University Press, Cambridge (1997). 9. J. Bentley and C. Olson, Field Guide to Lens Design, SPIE Press, Bellingham, Washington (2012) [doi: 10.1117/3.933997]. 10. J. E. Greivenkamp, Field Guide to Geometrical Optics, SPIE Press, Bellingham, Washington (2004) [doi: 10.1117/3.547461]. 11. M. W. Farn and W. B. Veldkamp, “Binary Optics,” in Handbook of Optics I, Third Edition, Michael Bass, Ed., McGraw-Hill, New York, pp. 23.1–23.17 (2010). 12. T. Stone and N. George, “Hybrid diffractive-refractive lenses and achromats,” Applied Optics 27(14), 2960–2971 (1988). 13. G. J. Swanson and W. B. Veldkamp, “Diffractive optical elements for use in infrared systems,” Optical Engineering 28(6), 605–608 (1989) [doi: 10.1117/12.7977008]. 14. R. E. Fischer, B. Tadic-Galeb, and P. R. Yoder, Optical System Design, Second Edition, McGraw-Hill, New York and SPIE Press, Bellingham, Washington (2008). 15. M. Laikin, Lens Design, Fourth Edition, CRC Press, Boca Raton, Florida (2007). 16. J. M. Geary, Introduction to Lens Design with Practical ZEMAX Examples, Willmann-Bell, Inc., Richmond (2002). 17. M. J. Riedl, Optical Design: Applying the Fundamentals, SPIE Press, Bellingham, Washington (2009) [doi: 10.1117/3.835815]. 18. M. J. Riedl, Optics for Technicians, SPIE Press, Bellingham, Washington (2015) [doi: 10.1117/3.2197595]. 19. SCHOTT Optical Glass Data Sheets: http://www.us.schott.com/advanced_ optics/english/download/schott-optical-glass-collection-datasheets-july-2015us.pdf. 20. Edmund Optics: http://www.edmundoptics.com/optics/optical-lenses/ achromatic-lenses/vis-0-coated-achromatic-lenses/47643/.

Chapter 3

Wave Optics Wave optics, which deals with light as electromagnetic waves, is an important branch of optics. The main optical phenomena discussed in wave optics include diffraction, interference, and polarization of light. This chapter focuses on the basic knowledge of wave optics and its applications. First, the electromagnetic theory of optics is briefly introduced. Secondly, theories and examples of the diffraction and interference of light are given and explained. Next, a brief introduction of Fourier optics is presented. Then, wavefront aberration is introduced. Finally, the theoretical resolution limits of an optical imaging system are provided.

3.1 Electromagnetic Theory of Optics As illustrated in Chapter 1, light can be thought of as a kind of electromagnetic wave, and many optical phenomena can be explained by the classical electromagnetic theory. This section describes the basic electromagnetic theory of optics. 3.1.1 Maxwell’s equations Maxwell’s equations describe the relationships between the electric and magnetic fields. In the international MKS system of units (MKS indicating meter, kilogram, second), the differential form of Maxwell’s equations in continuous media is as follows: ∇ · D ¼ r,

(3.1a)

∇ · B ¼ 0,

(3.1b)

∇E¼

77

­B , ­t

(3.1c)

78

Chapter 3

∇H¼jþ

­D , ­t

(3.1d)

where ∇· is the divergence operator, ∇ is the curl operator (both operators only act on a vector), D is the electric displacement, B is the magnetic induction, E is the electric vector (or electric field), H is the magnetic vector (or magnetic field), r is the density of charge in media (or the electric charge density), and j is the density of current (or the electric current density). The four equations can be summarized as follows: • Equation (3.1a), known as Gauss’s law for electric fields, reveals the relationship between the distribution of electric charge and the corresponding resulting electric field; i.e., the electric field generated by charges is divergent from or convergent to them depending on their signs of charges. This equation also defines the electric charge density r. • Equation (3.1b), known as Gauss’s law for magnetism, implies that there are no free magnetic monopoles, such that the total magnetic flux (the integral of the normal component of the magnetic field over the surface through which this magnetic field passes) through a closed surface is zero. • Equation (3.1c), known as Faraday’s law of induction (or the Maxwell– Faraday equation), states that a time-varying magnetic field is always accompanied by a whirling, spatially varying electric field. • Equation (3.1d), known as Ampère’s law with Maxwell’s displacement current addition, implies that both the conventional current and a timevarying electric field are always accompanied by a whirling, spatially varying magnetic field. To uniquely determine field vectors in a medium from a given distribution of currents and charges using Maxwell’s equations, material equations (or constitutive relations) should be employed. In general, material equations, which describe the behavior of substances under the influences of the electromagnetic field on the substances, are rather complicated. For the sake of simplicity, it is assumed that the bodies are motionless or in very slow motion relative to each other, and are isotropic media, in which case, the material equations can be expressed as1 8 < j ¼ sE D ¼ εE , : B ¼ mH

(3.2)

where s is the electric conductivity, ε is the dielectric constant (or permittivity), and m is the magnetic permeability.

Wave Optics

79

In principle, Maxwell’s equations combined with material equations are capable of handling many optical problems. However, solving these problems using Maxwell’s equations and material equations can be very complicated due to the complexity of actual situations. Fortunately, most optical problems can be simplified by some reasonable approximations, and their solutions can be simultaneously simplified. The remainder of this chapter mainly concerns the use of Maxwell’s equations and various approximations to solve particular optical problems. 3.1.2 Wave equations Based on the descriptions of Maxwell’s equations provided above, it can be concluded that a temporally varying magnetic field generates a spatially and temporally varying electric field; similarly, a temporally varying electric field generates a spatially and temporally varying magnetic field. It is the alternation of these excitations between electric and magnetic fields that causes electromagnetic waves to propagate in space. 3.1.2.1 Vector wave equation

Next, the existence of electromagnetic waves can be predicted based on Maxwell’s equations. For an infinite, homogeneous, and isotropic optical medium, ε and m are constants, and s is negligibly small. If the electromagnetic fields are far from the radiation source, r and j equal zero in space, and the previous assumptions are feasible for most optical problems. In this case, Maxwell’s Eq. (3.1) can be further simplified to 8 ∇·E ¼ 0 > > > > > < ∇ · B ¼ 0 ­B : ∇E¼ > ­t > > ­E > > : ∇  B ¼ εm ­t Applying the vector operator ∇  to the latter two equations of the set of equations above, using the identity ∇  ð∇  VÞ ¼ ∇ð∇ · VÞ  ∇2 V (where V is a vector function, and ∇ is the gradient), and considering the first two equations, it is easy to obtain 8 > 1 ­2 E > < ∇2 E  2 2 ¼ 0 y ­t , 1 ­2 B > > 2 :∇ B  2 2 ¼ 0 y ­t pffiffiffiffiffiffi where y ¼ 1∕ εm, and

(3.3)

80

Chapter 3

∇2 ¼

­2 ­2 ­2 þ þ ­x2 ­y2 ­z2

is the Laplacian. Equations (3.3) are standard vector wave equations that relate the second spatial and temporal derivatives of the electric and magnetic fields. These equations further state that electric and magnetic fields have characteristics of waves and propagate in the form of waves. Equations (3.3) also implicate that the propagation speed of electromagnetic waves is y. When electromagnetic waves propagate in vacuum, the propagation speed is pffiffiffiffiffiffiffiffiffiffi c ¼ 1∕ ε0 m0 ,

(3.4)

where ε0 and m0 are the electric constant in vacuum (or electric permittivity) and the magnetic constant in vacuum (or magnetic permeability), respectively. As ε0 ¼ 8.8542  10–12 C2N–1m–2 and m0 ¼ 4p  10–7 Ns2C–2, the propagation speed of electromagnetic waves in vacuum is 2.99794  108m/s, or approximately 3  108m/s. This value is consistent with the propagation speed of light in vacuum, measured by experiment. From this fact, Maxwell inferred that light is a kind of electromagnetic wave. 3.1.2.2 Scalar wave equation

As shown above, since the electric field obeys the vector wave equation [Eq. (3.3)], it can be seen that all components of the electric field satisfy a scalar wave equation. For example, for the x component of the electric field, denoted as u, its scalar wave equation is ∇2 u 

1 ­2 u ¼ 0: y2 ­t2

(3.5)

In this way, properties of all components of the electric field can be described using a single scalar wave equation—Eq. (3.5). Note that the scalar wave equation is an approximation and is only valid in a dielectric medium that is linear, isotropic, homogenous, nonmagnetic, and nondispersive. The media used in most optical imaging systems satisfy these properties, making the scalar wave equation a highly accurate approximation. Therefore, the propagation of light will be discussed based on a scalar wave equation. 3.1.2.2.1 The Helmholtz equation

Let the electric field of light at position P and time t be represented by a scalar function u(P, t). For a monochromatic wave, its scalar field can be written explicitly as

Wave Optics

81

uðP, tÞ ¼ UðPÞeivt , where U(P) is the space-dependent part of the scalar field, and v is the angular frequency. Substituting this expression into Eq. (3.5), and considering the relationships v ¼ 2pn and y ¼ nl (to be described in subsection 3.1.3), a timeindependent equation, known as the Helmholtz equation, can be given as ∇2 U þ k 2 U ¼ 0,

(3.6)

where k ¼ 2p/l and is known as the wavenumber. Since the time dependence of the light wave is known a priori, the complex function U serves as an adequate description of light. Equation (3.6) is a starting point for obtaining the Rayleigh–Sommerfeld diffraction formula that will be presented in Section 3.2. 3.1.3 Light waves and their characteristics Expressions (3.3) and (3.5) represent the vector and scalar wave equations of electromagnetic waves, respectively. There are different types of waves and many different solutions of wave equations according to the theory of differential equations. In particular, two important types of waves are plane waves and spherical waves. For the most part, in this section the scalar wave equation as it was explained in subsection 3.1.2 is used. Therefore, the expressions for plane and spherical waves presented here are based on the scalar wave equation [Eq. (3.5)]. However, the vector expression of plane waves based on the vector wave equation [Eq. (3.3)] are also provided for the purpose of discussing characteristics related to the directions of an electric or magnetic field. 3.1.3.1 Plane waves

One of the typical solutions of the wave equation is a plane wave. The general expression for a plane wave is u ¼ (r, t) ¼ u(r – yt, 0), where r is a position vector, and t is time. This expression signifies that the propagation of a plane wave is a “self-copy” of the oscillation of the source that travels through space and time. The concrete expression of a plane wave is the harmonic wave, which can be mathematically modeled by a cosine function,2 uðr, tÞ ¼ A cosðk · r  vt  dÞ,

(3.7a)

uðr, tÞ ¼ Aeiðk · rvtdÞ ,

(3.7b)

or by

which uses complex-exponential notation. In these two expressions, A is the amplitude of the wave, k is the wave vector (the direction of k is the direction of propagation of light), v is the angular frequency, and d is an initial phase

82

Chapter 3

measured in radians. In particular, the vector form of a plane harmonic wave is a solution of Eq. (3.3), as follows: Eðr, tÞ ¼ Aeiðk · rvtdÞ ,

(3.8)

where A is the vector amplitude of the wave, and its direction is called the direction of polarization of light. Polarization refers to the geometrical orientation of the field oscillation of an electromagnetic wave. There are a variety of states of polarizations of light: linearly polarized light, right-hand and left-hand circularly polarized light, right-hand and left-hand elliptically polarized light, and unpolarized light. Harmonic waves are important because they are the basic building blocks for almost all types of waves. The sum (or integral) of a number of harmonic waves with different frequencies can form most waveforms according to Fourier synthesis, which will be presented in Section 3.4. It should be noted that there is no ideal plane wave in nature, although plane waves are useful in academic research and engineering. 3.1.3.2 Spherical waves

Another typical solution of the wave equation is a spherical wave, which can be expressed as uðr, tÞ ¼

A iðk · rvtdÞ e , r

(3.9)

where r is the magnitude of vector r. Equation (3.9) represents a spherical wave diverging from its origin. The spherical wave is often used to describe a light wave emitted from a point source as well as a fundamental wave in Huygens’ principle. However, when a light wave far from the source is considered, the influence of r on the spherical wave is very small and applies only in a limited range of interest, and the light wave in this limited range would be considered as a plane wave. Detailed definitions of all quantities included in Eqs. (3.7) to (3.9) will be given in the next subsection on characteristics of light waves. All electromagnetic waves have two time-varying fields. One is the electric field, and the other, perpendicular to the electric field in an isotropic medium, is the magnetic field. The expression for the magnetic field is similar to that of the electric field. Generally, only the electric field is retained when considering a light field, mainly because its interaction with a medium is, in most cases, much stronger than that of the magnetic field. From the expressions that describe the electric and magnetic fields and Maxwell’s equations, the following three properties of light in an isotropic and unbounded medium can be obtained: (1) Light is a kind of transverse wave, as the directions of the electric and magnetic fields are both perpendicular to the

Wave Optics

83

direction of propagation of light. (2) The electric vector E, magnetic vector H, and wave vector k of light form a right-handed orthogonal triad of vectors. (3) The electric field E and the magnetic field H of light have the same phase, pffiffiffi pffiffiffiffi and their magnitudes follow the relation mH ¼ εE. Here, a letter in italics refers to the magnitude of a vector, and that vector is represented by the corresponding letter in bold. The remainder of this book follows this convention. Electromagnetic theory interprets light intensity as the energy flux of the electromagnetic field. The energy flux can be expressed as a vector (Poynting vector, S ¼ E  H) whose direction represents the direction of the transfer of energy, and its magnitude is defined as1 rffiffiffiffi rffiffiffiffi ε 2 m 2 E ¼ H : S ¼ EH ¼ m ε Generally speaking, light is detected by a sensor or a detector that responds in a certain format to the incident light by converting it to some other measurable attributes. Because the frequency of light is quite high (about 1015 Hz), no detector can sense the instantaneous value of the energy flux of light. Thus, the intensity of light is usually expressed as a time average of the energy flux by T0 T0 rffiffiffiffi 1 ε 2 1 I¼ ∫ Sdt ¼ mA T ∫ cos2ðk · r  vt  dÞdt: T0 0 0 0 pffiffiffiffiffiffiffiffi Setting C 0 ¼ ε∕m, and considering that T0 (the response time of the detector) is much greater than the period of the light, the expression of light intensity becomes

1 I ¼ C 0 A2 : 2

(3.10)

Equation (3.10) indicates that the intensity of light is proportional to the square of the amplitude of the electric field A. Next, some characteristics of light waves are briefly introduced. 3.1.3.3 Characteristics of light waves

A light wave is usually described by wave characteristics such as wavelength, frequency, period, phase, and amplitude. The wavelength of light l is the distance between one peak (or crest) of a wave and the next peak, as shown in Fig. 3.1. The magnitude of the wave vector k in Eq. (3.7) is 2p/l. The frequency of light is the number of oscillation cycles of the electric field (or magnetic field) per second in a fixed position, usually expressed as n. The product of the wavelength l and the frequency n is the speed of light, y ¼ nl. In practice, the angular frequency v is often used, and v ¼ 2pn. The reciprocal of the frequency is called the period, generally denoted as T. In Eq. (3.7), the

84

Chapter 3

Figure 3.1 Diagram of a simple electromagnetic wave.

argument k ∙ r – vt – d of the cosine term is called the phase. In actuality, the absolute phase of light is meaningless. Only the phase relative to a reference point, a reference surface, or the phase difference has physical significances. A surface perpendicular to the direction of the propagation of light, on which each point has the same phase, is called a wavefront. Figure 3.2 shows an ideal wavefront and a distorted wavefront. Figure 3.3 shows the spectrum—in a sequence of wavelengths or frequencies of radiation—of electromagnetic radiation waves, including radio waves, microwaves, infrared radiation, visible light, ultraviolet radiation, x rays, and g rays. The wavelengths of these waves range from several picometers to several meters. In this spectrum of electromagnetic waves, visible light is a very narrow band in the entire electromagnetic radiation spectrum, and its

Figure 3.2

Diagrams of (a) an ideal wavefront and (b) a distorted wavefront.

Wave Optics

85

Figure 3.3

Spectrum of electromagnetic radiation waves.

wavelengths range from 400 nm to 760 nm, corresponding to colors of light ranging from violet to red.

3.2 Diffraction In the previous section, we briefly reviewed Maxwell’s equations and the related wave equations, with which the propagation of light can be solved mathematically. In most common imaging geometries, light travels along the z axis from one plane to another, e.g., from plane z1 to plane z2, or from the exit pupil to the image plane in an optical system, as shown in Fig. 3.4. This describes the phenomena of the nonrectilinear propagation of light when it encounters an obstacle or passes through an aperture, known as diffraction of

Figure 3.4

Light propagation from one plane z1 to another plane z2.

86

Chapter 3

light. Because an understanding of the diffraction of light and its limitations is crucial for appreciating the optical properties involved in optical imaging, a brief introduction to scalar diffraction theory is presented. 3.2.1 Rayleigh–Sommerfeld diffraction formula In this subsection, the Rayleigh–Sommerfeld diffraction formula is briefly reviewed. The diffraction formula explains how to calculate the light field at plane z2 propagating from the light field at plane z1, as shown in Fig. 3.5, or, more specifically, how to calculate the light field at an arbitrary point P on plane z2. The Rayleigh–Sommerfeld diffraction formula is obtained by solving the Helmholtz equation [Eq. (3.6)] under the following assumptions by means of the theory of Green’s functions. The assumptions (or boundary conditions) are: 1. The field inside the aperture is exactly the same as it would be in the absence of an aperture. 2. The field outside the aperture vanishes. In addition, the Sommerfeld radiation condition further suggests that the solution of the diffraction problem can be given as UðPÞ ¼

1 eikrz ∫∫ UðP1 Þ cosðn, rz Þds, rz il S

(3.11)

where U(P1) is the light disturbance at point P1 on the aperture, rz is the distance between point P1 on plane z1 and point P on plane z2, and cos(n, rz)

Figure 3.5

Illustration of the Rayleigh–Sommerfeld diffraction formula.

Wave Optics

87

represents the cosine of the angle between the outward normal n at point P1 and the vector from P to P1. The integration is taken over the clear part of the aperture denoted by S. A detailed derivation of the Rayleigh–Sommerfeld solution [Eq. (3.11)] can be found in Ref. 3. We should bear in mind that the Rayleigh–Sommerfeld formula is an approximation of the diffraction of light in the theory of scalar diffraction. For most ordinary optical imaging systems, fortunately, the results deduced from the scalar diffraction theory of Rayleigh–Sommerfeld are in good agreement with the experimental reults if two conditions are satisfied: (1) the diffracting aperture is large compared with the wavelength l of light, and (2) the distance between the diffracting field observed and the aperture is much larger than the wavelength, i.e., rz ≫ l. Comparing Eq. (3.11) to the expression for the spherical wave [Eq. (3.9)], we can roughly give a “quasi-physical” interpretation of the Eq. (3.11). The main content of this interpretation is known as the Huygens–Fresnel principle. The integrand in Eq. (3.11) can be considered as the product of a spherical wave U(P1)[exp(ikrz)/rz] and an inclination factor cos(n, rz). This spherical wave originates from point P1 on the aperture, and U(P1) is the light field on that point. Thus, the interpretation can be stated as follows: Each point on the primary wavefront of light can be considered as a new source of a secondary spherical disturbance. The disturbance on the wavefront at a later instant is the coherent superposition of all of the secondary wavelets, and the wavefront at a later instant can be found by constructing the envelope of these secondary wavelets, as shown in Fig. 3.6. It can be clearly seen that the Huygens–Fresnel principle is simply the logical consequences of the electromagnetic wave nature of light. It satisfies the principle of linear superposition because of the linearity of Maxwell’s equations.

Figure 3.6

Illustration of the Huygens–Fresnel principle.

88

Chapter 3

The Rayleigh–Sommerfeld formula gives quite accurate results for diffraction problems, but the calculations of Eq. (3.11) are very complex in most instances. In practice, some approximations—the Fresnel and Fraunhofer diffraction formulas derived from the Rayleigh–Sommerfeld solution—are commonly used to process diffraction problems and will be described below. It should be noted that the Fresnel and Fraunhofer diffraction formulas are obtained from the Rayleigh–Sommerfeld formula under the paraxial approximation, so they are also called the paraxial approximations of scalar diffraction theory. 3.2.2 Fresnel approximation This subsection uses the Rayleigh–Sommerfeld diffraction integral [Eq. (3.11)] to introduce the Fresnel approximation. As shown in Fig. 3.7, the aperture lies in the x1-y1 plane and is illuminated by light coming from its left side. The light field in the x-y plane that is parallel to the x1-y1 plane will be calculated. The distance from the x1-y1 plane to the x-y plane is z. According to the Rayleigh–Sommerfeld diffraction formula, the light field can be expressed as UðPÞ ¼

1 expðikrz Þ ∫∫ UðP1 Þ cosuds, il S rz

where point P lies in the x-y plane, and point P1 lies in the x1-y1 plane; l is the pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi wavelength; k is the wavenumber; rz ¼ z2 þ ðx  x1 Þ2 þ ðy  y1 Þ2 , which is the distance between point P and point P1; u is the angle between the normal line at point P1 and vector rz; and cosu ¼ z/rz. Therefore, the above equation representing the light field can be rewritten as

Figure 3.7

Geometry of diffraction by an aperture.

Wave Optics

89

UðPÞ ¼

z expðikrz Þ ∫∫ Uðx1 ,y1 Þ dx1 dy1 : il S r2z

(3.12)

To calculate the integral of Eq. (3.12), two different approximations are taken for the distance rz that appears in the denominator and numerator of the right side of Eq. (3.12). As the diameter of the aperture is usually much smaller than distance z, rz in the denominator is substituted by z, and the error introduced by this approximation is usually small. However, due to the small size of the wavelength, a tighter approximation must be made for rz in the exponent. Using the binomial expansion of the square root, the expression of rz can be rewritten as          1 x  x1 2 y  y1 2 1 x  x1 2 y  y1 2 2 þ þ rz ¼ z 1 þ  z z z z 2 8  þ · · · : (3.13) When z is sufficiently large, the contribution of the third term in Eq. (3.13) to the phase krz is much smaller than 1 rad, and rz in the exponent can be replaced by the first two terms in Eq. (3.13). Thus, the diffraction integral can be rewritten as Uðx, yÞ ¼

eikz k 2 2 ∫∫ Uðx1 , y1 Þei2z½ðxx1 Þ þðyy1 Þ  dx1 dy1 : ilz S

(3.14)

This is the Fresnel approximation or the Fresnel diffraction integral. In the Fresnel approximation, the spherical secondary wavelets of the Rayleigh– Sommerfeld diffraction formula are replaced by parabolic wavelets. This integral actually performs an operation, the so-called 2D convolution in mathematics (see Appendix G). The physical interpretation of this convolution can be explained as follows: The light field U(x,y) of point P on the x-y plane is a linear superposition of all of the contributions of the light field U(x1, y1) weighted by a propagation factor eikz i k ½ðxx1 Þ2 þðyy1 Þ2  e 2z ilz at every point in the aperture S on the x1-y1 plane. Since this propagation factor only depends on the lateral coordinate differences (x – x1, y – y1) of point P on the x-y plane relative to point P1 on the x1-y1 plane, it is said that Eq. (3.14) describes a space-invariant system. In optics, this space-invariant property is also called the isoplanatic condition. The convolution kernel

90

Chapter 3

eikz i k ðx2 þy2 Þ , e 2z ilz which is independent of the location of point P1 on the x1-y1 plane, is also called a coherent point spread function of an optical system. We can rewrite Eq. (3.14) as Uðx, yÞ ¼

i eikz i k ðx2 þy2 Þ h k 2 2 2p e 2z ∫∫ Uðx1 , y1 Þei2zðx1 þy1 Þ ei lz ðxx1 þyy1 Þ dx1 dy1 , ilz S

(3.15)

which is another form of the Fresnel diffraction integral that can be recognized as the Fourier transform of the product of the light field on the aperture and a quadratic phase exponential. The domain where the Fresnel approximation is valid is called the region of Fresnel diffraction. As previously mentioned, it can be deduced that the region of Fresnel diffraction must satisfy the inequality 2  p 3 2 2 z ≫ : (3.16) ðx  x1 Þ þ ðy  y1 Þ 4l max Note that this inequality is a very strict condition for the validity of Fresnel diffraction. We take a numerical example to estimate distance z if the inequality (3.16) is satisfied. For a circular aperture of size 20 mm, a circular observation region of size 20 mm, and a light wavelength of 0.55 mm, according to inequality (3.16), distance z should be much greater than 610 mm for the validity of Fresnel approximation. The accuracy of the Fresnel approximation is actually extremely good, even for a distance that is very close to the aperture—usually about 40 to 50 times (or more) the wavelength of light. In the following discussion, we will explain the reason for this conclusion according to the approach of Ref. 3. The accuracy of the Fresnel integral, expressed by Eq. (3.14), is determined by the errors caused by higher-order terms neglected in Eq. (3.13) that are larger than the first order when distance z is not large enough. A sufficient condition for the accuracy of the Fresnel approximation, as expressed in Eq. (3.16), is that it guarantees the maximum phase change induced by the neglected second-order term to be much less than 1 rad. However, this condition is not essential for ensuring the accuracy of the Fresnel approximation, while it is necessary that the phase change induced by higher-order terms does not significantly change the Fresnel integral. Considering the convolution [Eq. (3.14)], if the contributions to the Fresnel integral that come from points (x1, y1) (for which x1 ≈ x and y1 ≈ y) are dominant, the particular values of high-order terms in Eq. (3.13) are unimportant to the value of the Fresnel integral. To investigate this point, we expand the convolution kernel of Eq. (3.14),

Wave Optics

91

eikz i k ðx2 þy2 Þ , e 2z ilz into its real and imaginary parts using Euler’s formula:      1 i k ðx2 þy2 Þ 1 p 2 p 2 2 2 e 2z ðx þ y Þ þ i sin ðx þ y Þ , ¼ cos ilz ilz lz lz where the phasor eizk has been dropped by redefining a phase reference, and k is replaced by 2p/l. To analyze the cosine and sine functions of a quadratic phase, the plots of 1D functions of cos(pZ2) and sin(pZ2) are shown in Figs. 3.8(a) and (b),

Figure 3.8 Quadratic-phase function of (a) cosine and (b) sine.

92

Chapter 3

respectively. We can see from these figures that the two functions oscillate very rapidly outside the (approximate) range of –2 < Z < 2. It is easily concluded that the contributions of the convolutions of these two functions with another function that is smooth and slowly varying will be negligible outside the range of –2 < Z < 2, since the rapid oscillations of functions cos(pZ2) and sin(pZ2) lead to the same number of roughly equal positive and negative contributions to the integral. Therefore, the major contribution to the Fresnel integral comes from the range of –2 < Z < 2 due to the slow oscillations of cos(pZ2) and sin(pZ2) in this range. Similar to the above discussion in one dimension, for the convolution kernel of Eq. (3.14), the majority of the contribution to the Fresnel integral comes from points in a square on the x1-y1 plane. The square, centered at the pffiffiffiffiffi point (x1 ¼ x and y1 ¼ y), has a width of 4 lz (derived from expressions pZ2 ¼ (p/lz)x2 and Z ¼ 2). From this conclusion, the contribution of the light pffiffiffiffiffi field on the x1-y1 plane lies in a square aperture of size 4 lz, not in its entire physical aperture size. For this reason, the inequality [Eq. (3.16)] can be simplified as pffiffiffiffiffi pffiffiffiffiffi p z3 ≫ ½ð2 lzÞ2 þ ð2 lzÞ2 2 4l ¼ 16plz2 , i.e., z ≫ 16pl. For the same numerical example presented previously, the distance should satisfy z ≫ 27.64 mm, which is much closer to the aperture than that calculated by Eq. (3.16) using the entire physical aperture. Note that if the light field in the diffraction aperture on the x1-y1 plane is not a relatively smooth and slowly varying function, the above conclusion might not be valid. However, for a common optical system, the diffraction aperture generally does not contain fine structures, and the illumination on the aperture can be considered as relatively smooth and slowly varying. Thus, the above conclusion is generally valid for common optical systems. If distance z is allowed to approach zero, i.e., the observation points approach the diffraction aperture, the 2D quadratic-phase function [the convolution kernel of Eq. (3.14)] behaves as a d function (see the appendices), indicating a light field U(x, y) that is identical to that in the aperture, i.e., U(x1, y1). This means that a light field behind an aperture is just the projection of the light field in the aperture on the plane of observation, which implies that the prediction of geometrical optics for the light field is valid in this scenario. 3.2.3 Fraunhofer approximation In this subsection Fraunhofer diffraction will be discussed. In the region of Fresnel approximation, if distance z between the aperture and the diffraction plane becomes very large and satisfies the inequality

Wave Optics

93

z≫ k

pðx21 þ y21 Þmax , l

(3.17)

the quadratic phase factor ei2zðx1 þy1 Þ in Eq. (3.15) approximately equals 1 over the entire aperture—a region called the Fraunhofer diffraction region. In this region, the distribution on the diffraction plane can be calculated by k

Uðx, yÞ ¼

2

2

eikz ei2zðx þy Þ k ∫∫ Uðx1 , y1 Þei z ðxx1 þyy1 Þ dx1 dy1 : ilz S 2

2

(3.18)

This equation is called the Fraunhofer approximation or Fraunhofer diffraction integral. It is can be found that Eq. (3.18) is simply the product of the Fourier transform of the light field on the aperture and an extra phase factor. From a comparison of the conditions of the Fresnel and Fraunhofer diffraction regions [Eqs. (3.16) and (3.17)], we find that the distance required for validating the Fraunhofer approximation is much longer than that of the Fresnel approximation. Due to this fact, Fraunhofer diffraction is also known as far-field diffraction, and Fresnel diffraction is also known as near-field diffraction. For visible light, the conditions of Fraunhofer diffraction are very strict. For example, at a wavelength of 0.5 mm and with an aperture width of 20 mm, distance z must be much greater than 1200 m. Fortunately, if a positive lens, which has the Fourier-transforming property (which will be presented later), is properly situated behind the aperture, Fraunhofer diffraction patterns can be observed at a distance much closer than that implied by Eq. (3.17). Obviously, the Fraunhofer diffraction region is included in the Fresnel diffraction region, and the Fresnel diffraction integral can be used to calculate diffracted fields in the Fraunhofer region, but not vice versa. 3.2.4 Examples Here, several examples of Fraunhofer diffraction patterns for different apertures are discussed. For most detectors, the output signal is linearly proportional to the incident power of light, and the intensity of light is directly proportional to the power of light, as described in subsection 3.1.3. Thus, in the following subsection, the distributions of the intensity, i.e., the diffraction pattern, for different apertures will be calculated and presented. 3.2.4.1 Circular aperture

As shown in Fig 3.9, a circular aperture is illuminated by a beam of parallel monochromatic light with uniform amplitude. A positive lens is placed behind the aperture, and a flat screen or a detector for observing the diffraction pattern is placed at the focal plane of the lens.

94

Chapter 3

Figure 3.9

Illustration of Fraunhofer diffraction by an aperture.

Suppose that the radius of the circular aperture is a, the aperture is on the x1-y1 plane, and the center of the aperture is at the origin of the x1-y1 plane. Then, the amplitude transmittance function of the aperture is given by   r tðr1 Þ ¼ circ 1 , a where r1 is the radius coordinate in the plane of the aperture, r12 ¼ x12 þ y12, and circ(x) is the circular function, which equals 1 for x ≤ 1, and zeros for others. As the aperture is illuminated by a unit amplitude, normally incident, monochromatic plane wave, the field distribution across the aperture equals the transmittance function t(r1) according to the theory presented in subsection 3.2.2. Using Eq. (3.18) and changing the Cartesian coordinates into cylindrical coordinates, we can calculate the amplitude distribution of the Fraunhofer diffraction pattern at the flat screen. If the focal length of the lens is f, then z in Eq. (3.18) is replaced by f. Finally, the amplitude distribution of the Fraunhofer diffraction pattern for the circular aperture can be expressed by UðPÞ ¼ pa2 C

2J 1 ðkauÞ , kau

(3.19)

where P is a point on the flat screen, C ¼ eikf/(ilf ), u ¼ r/f is the angle between the direction of diffraction and the optical axis, and J1(∙) is the Bessel function of the first kind of order one. Details of the calculation of the Eq. (3.19) can be found in Ref. 1. The intensity distribution (diffraction pattern) can be written as

Wave Optics

95

 2J 1 ðkauÞ 2 I ðPÞ ¼ jUðPÞj ¼ I 0, kau 

2

where I0, being equal to p2a4C2, is the intensity at the center of the diffraction pattern. This intensity distribution is called the Airy pattern, as shown in Fig. 3.10. A bright spot at the center and several bright and dark rings around the central spot in the Airy pattern can be seen in Fig. 3.10. The central spot is much brighter than the surrounding rings. The brightness of the rings diminishes as the radii of rings increase. Figure 3.11 shows a cross section of the Airy pattern. Note that the radius of the central spot, which corresponds to the first zero point from the origin in Fig. 3.11, is kau ¼ 1.22p: Thus, the angular radius of the spot is l u ¼ 0.61 : a

(3.20)

This equation indicates that the angular radius of the Airy pattern is proportional to the wavelength and inversely proportional to the radius of the aperture. As the central spot has most energy of the Airy pattern (about 84%),

Figure 3.10 Airy pattern.

96

Chapter 3

Figure 3.11 Cross section of the Airy pattern.

the radius of the spot is usually taken as a measure of the Airy pattern. Furthermore, since most optical imaging systems have circular stops, their angular resolutions are determined by Eq. (3.20); this concept will be discussed in detail in Section 3.6. 3.2.4.2 Rectangular aperture

Consider a rectangular aperture with width 2a and height 2b. If the origin of the x1-y1 plane is at the center of the rectangle, and axes of the x1-y1 plane are parallel to the sides of the rectangle, and the amplitude transmittance function of the aperture is given by 

   x1 y tðx1 , y1 Þ ¼ rect rect 1 , 2a 2b where rect(x) is the rectangular function, which equals 0 for |x| > 0.5, 0.5 for |x| ¼ 0.5, and 1 for |x| < 0.5. And the Fraunhofer diffraction integral [Eq. (3.18)] becomes k

b

a

eikf ei2f ðx þy Þ Uðx, yÞ ¼ ∫ ∫ eikfðxx1þyy1 Þdx1 dy1 : ilf b a 2

2

Calculating the above integral, the amplitude distribution is

Wave Optics

97

UðPÞ ¼ 4abC

sinðkax∕f Þ sinðkby∕f Þ : kax∕f kby∕f

Hence, the intensity distribution is given by 

sinðkax∕f Þ I ðPÞ ¼ jUðPÞj ¼ kax∕f 2

2 

 sinðkby∕f Þ 2 I 0, kby∕f

where I0, equaling 16a2b2C2, is the intensity at the center of the diffraction pattern. Note that the size of the central portion is Dx  Dy. Here, 8 l > < Dx ¼ f a > : Dy ¼ l f : b

(3.21)

Figure 3.12 shows the diffraction pattern of a square aperture. Since the majority of the energy of the diffraction pattern lies in the central portion, the size of the central portion is normally used for measuring the effect of diffraction. Equation (3.21) shows that the size of the central portion is inversely proportional to the size of the aperture. Thus, the smaller the aperture the larger the diffraction pattern.

Figure 3.12

Fraunhofer diffraction pattern of a square aperture.

98

Chapter 3

3.2.4.3 Other aperture shapes

In practice, there may be some apertures with other shapes, and the diffraction patterns of these apertures are all different from. Figure 3.13 presents some typical apertures and their diffraction patterns. Some general rules concerning the relation between the shape of the aperture and its diffraction pattern are summarized below using the shapes in Fig. 13.3 as examples.

Figure 3.13 (left) Apertures with different shapes and (right) their corresponding diffraction patterns.

Wave Optics

99

1. The diffracted direction of light is perpendicular to the edge of a diffracting object. For example, if the aperture has a triangular shape, the diffracted direction of light is perpendicular to each side of the aperture, and the diffraction pattern will be a star shape with six spikes, as shown in Fig. 3.13(b). 2. If an aperture has a shape similar to another aperture (such as an ellipse and a circle), the diffraction pattern of the first aperture can be determined by extending or compressing the diffraction pattern of the typical one, as in the diffraction pattern of the ellipse aperture shown in Fig. 3.13(c). The rule is that “when the aperture is uniformly extended in a a:1 ratio in a particular direction, the Fraunhofer pattern contracts in a 1:a ratio, and the intensity in the new pattern is a2 times the intensity at the corresponding point of the original pattern.” A detailed derivation of this rule can be found in Ref. 1.

3.3 Interference The diffraction of light was discussed in Section 3.2. In this section, we discuss the interference of light, which is widely used for highly accurate optical testing in optical factories and laboratories. Interference is a phenomenon in which two or more light waves are superposed, and the resultant intensities in the regions of superposition are reinforced or attenuated. However, this does not mean that the superposition of any two arbitrary light waves can produce a ‘stationary’ interference pattern (fringe pattern). Actually, only the superposition of light waves correlated to each other can produce a stationary interference pattern. This correlation between light waves is also known as the coherence of light. In this section, the coherence of light is introduced, then some typical examples of the interference of light are given. 3.3.1 Coherence Coherence between light waves at different locations and different instants of time is the correlation between light waves and can be quantitatively measured by interference patterns. Since real light encompasses a large range of frequencies (i.e., the linewidth), the behavior of a real light wave appears to be a random property that is not fully predictable; e.g., its amplitude and phase are random.2 This randomness in light waves leads to different extents of the coherence of light waves. Let us denote a light wave with a range of frequencies as U(P, t) at location P and at time t. For a fixed P, the entire wave U(P, t) is just a function of t defined for all t. Such a defined wave is called a wave train passing through P. We define the coherence of two wave trains using the concept of correlation (see the appendices) between wave trains according to the following points:

100

Chapter 3

1. Two wave trains are coherent if they are fully correlated. 2. Two wave trains are incoherent if they are uncorrelated. 3. Two wave trains are partially coherent if they are partially correlated. Furthermore, the coherence between two wave trains can be expressed quantitatively by the normalized correlation function. Figure 3.14 shows two light wave trains, denoted as U1(P1, t) and U2(P2, t) at points P1 and P2, respectively. Suppose that these two wave trains propagate to point P without any attenuation. Then the light field at this point is UðP, tÞ ¼ U 1 ðP, t  t1 Þ þ U 2 ðP, t  t2 Þ, where t1 and t2 are the times needed for wave trains propagating from P1 to P and from P2 to P, respectively; U1(P, t – t1) is the light field at point P arriving from U1(P1, t); and U2(P, t – t2) is the light field at point P arriving from U2(P2,t). Then the intensity at P is I ðPÞ ¼ hUðP, tÞU  ðP, tÞi ¼ hU 1 ðP, t  t1 ÞU 1 ðP, t  t1 Þi þ hU 2 ðP, t  t2 ÞU 2 ðP, t  t2 Þi þ hU 1 ðP, t  t1 ÞU 2 ðP, t  t2 Þi þ hU 1 ðP, t  t1 ÞU 2 ðP, t  t2 Þi ¼ I 1 ðPÞ þ I 2 ðPÞ þ 2RefhU 1 ðP, t  t1 ÞU 2 ðP, t  t2 Þig,

(3.22)

where h · i stands for the time average that is caused by the finite response time of the detector, and Re{∙} is the real part. Note that the third term on the right

Figure 3.14

Superposition of two light wave trains.

Wave Optics

101

side of Eq. (3.22) is the statistical correlation of U1 and U2 (the definition of statistical correlation can be found in the appendices). Setting G12 ðtÞ ¼ hU 1 ðP, t  t1 ÞU 2 ðP, t  t2 Þi,

(3.23)

where t ¼ t1 – t2, and G12(t) is known as the cross-correlation function or mutual coherence function of these two wave trains. Usually, the cross-correlation function can be normalized as G12 ðtÞ G12 ðtÞ ffipffiffiffiffiffiffiffiffiffiffiffiffi , pffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ pffiffiffiffiffiffiffiffiffiffiffi g12 ðtÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi G11 ð0Þ G22 ð0Þ I 1 ðPÞ I 2 ðPÞ

(3.24)

where g12(t) is called the complex degree of coherence. Then, Eq. (3.22) can be rewritten in the form pffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffi I ðPÞ ¼ I 1 ðPÞ þ I 2 ðPÞ þ 2 I 1 ðPÞ I 2 ðPÞRefg12 ðtÞg:

(3.25)

Equation (3.25) is the general interference law for two optical fields, and the third term on the right side of this equation is called the interference term. This equation shows that, in order to determine the intensity arising from the superposition of two light wave trains, one must know not only the intensity of each wave train but also the real part of the complex degree of coherence g12(t) of the two light wave trains. According to the definition of the complex degree of coherence [Eq. (3.24)] and the Cauchy–Schwarz inequality (found in the appendices), it can be shown that the modulus of g12(t) is between zero and unity; i.e., 0 ≤ jg12 ðtÞj ≤ 1: A value of zero for |g12(t)| would mean that these two wave trains are incoherent, and a value of unity would mean that these two wave trains are completely coherent; other values of |g12(t)| would mean that these two wave trains are partially coherent. Note that all of the fringe patterns mentioned above are “stationary.” In actuality, as long as two electric fields are not perpendicular to each other, the superposition of light waves can produce fringe patterns that vary very quickly, typically with the same frequency as that of light. Since all current detectors, such as CCDs, CMOSs, and human eyes, have response times much larger than the time period of the change in the “instantaneous” fringe patterns, they cannot capture “instantaneous” fringe patterns but rather capture the time average of these “instantaneous” fringe patterns during the response time of the corresponding detector. This averaging process of detectors leads to an absence of fringes for incoherent light waves. It should

102

Chapter 3

also be known that the response time of the detector can be considered as the integration time of the time average in Eq. (3.22). A monochromatic light wave—an ideal model never encountered in nature—is normally used as a starting point to study properties of interference. For two monochromatic light wave trains in air meeting at point P, their light fields can be represented as U1 ðPÞ ¼ A1 exp½iðk1 · r1  v1 t þ d1 Þ, U2 ðPÞ ¼ A2 exp½iðk2 · r2  v2 t þ d2 Þ: According to Eq. (3.24), the complex degree of coherence of these two monochromatic light wave trains is g12 ðtÞ ¼

A1 · A2 hexpfi½ðk1 · r1  k2 · r2 Þ  ðv1 t  v2 tÞ þ ðd1  d2 Þgi: A1 A2

Let us take a simple situation: if A1 and A2 have the same direction, k1 ¼ k2, v1 ¼ v2, and the initial phase difference, d1 – d2, is assumed to be zero, then g12(t) have the form g12 ðtÞ ¼ expðikDlÞ, where Dl ¼ r1 – r2 is the optical path difference (as mentioned in Section 2.1, and as will be presented in detail in Section 3.5) of the two light wave trains at point P, and t ¼ (r1 – r2)/c is the time difference for these two wave trains propagating to point P. Because |g12(t)| ¼ 1, these monochromatic light wave trains are completely coherent. On substituting g12(t) ¼ exp(ikDl) into Eq. (3.25), the intensity at point P is pffiffiffiffiffiffiffiffiffi I ¼ I 1 þ I 2 þ 2 I 1 I 2 cosðkDlÞ:

(3.26)

From Eq. (3.26), it can be concluded that, when kDl ¼ mp,

jmj ¼ 0, 2, 4, : : : ,

the intensity at point P has the maximum value; when kDl ¼ mp,

jmj ¼ 1, 3, 5, : : : ,

the intensity at point P has the minimum value. The intensity distribution of the fringe pattern of monochromatic light waves is constituted by a series of bright and dark bands, as shown in Fig. 3.15. In order to measure the contrast of the fringe pattern, the visibility of fringes at point P is defined as

Wave Optics

103

Figure 3.15 Simulated intensity distribution of the interference fringe of monochromatic light waves.



I max  I min , I max þ I min

(3.27)

where Imax and Imin are the maximum and minimum intensities, respectively, pffiffiffiffiffiffiffiffiffi of the fringe near pointpP. According to Eq. (3.26), I max ¼ I 1 þ I 2 þ 2 I 1 I 2 , ffiffiffiffiffiffiffiffi ffi and I min ¼ I 1 þ I 2  2 I 1 I 2 . Then, the visibility of fringes can be rewritten as V¼

pffiffiffiffiffiffiffiffiffi 2 I 1I 2 : I1 þ I2

When I1 ¼ I2, the visibility has a maximum value of unity. Here we also briefly discuss the relationship between diffraction and interference of light. Considering the Huygens–Fresnel principle presented in Section 3.2, a light wave at a later instant of time can be considered as the linearly coherent superposition of the secondary wavelets from the wavefront of the light wave at the earlier instant of time. Based on this point of view, diffraction, in essence, is a kind of interference of an infinite number of light waves. 3.3.1.1 Temporal coherence

Temporal coherence is the measure of correlation between a light wave train and a copy of itself delayed by t. As the duration or the length of a light wave train emitted from a source is finite, if the delay between two light wave trains divided from a light wave train is less than the duration of this mother wave train, the two derived wave trains are coherent; otherwise, the two derived wave trains are incoherent. Temporal coherence of a light wave train can be explained with the help of the Michelson interferometer, as shown in Fig. 3.16. In the interferometer, mirror 1 is a fixed mirror, while mirror 2 is a movable mirror that can be shifted along the propagating direction of the incoming beam. When the interferometer is working, a light wave from the light source is divided into two waves by a beam splitter. Each of the two waves is reflected back to the

104

Chapter 3

Figure 3.16 Illustration of temporal coherence of light (Michelson interferometer).

beam splitter by the corresponding mirror. The wave reflected from mirror 1 passes through the beam splitter and reaches the screen, while the wave reflected from mirror 2 is reflected by the beam splitter and reaches the same screen. These two waves superpose on the screen to generate a fringe pattern. This arrangement is intended to superimpose a light wave train with a timeshifted copy of itself. Since the duration of a light wave train emitted from a source is finite, if the difference between the two light paths is large enough, the reflected wave train A2 from mirror 2 cannot catch up to the reflected wave train A1 from mirror 1, as shown in Fig. 3.16(b). Then there are no fringes on the screen. This maximum difference of light path is the coherence length of light. It can be easily deduced that the coherence length is the length of the wave train, and the coherence time is the duration of the wave train. Let y denote the propagation speed of light; the relationship between coherence length lc and the coherence time tc is l c ¼ ytc :

(3.28)

In this way, the coherence time and coherence length are both used to characterize the interference of light. In accordance with Eq. (3.28), they are simply different representations of the coherence of light. However, they also represent the degree of monochromaticity of the light source. It has been proven that the effective frequency range of light is approximately the reciprocal of the duration of a single wave train:2 Dv 

1 : tc

Thus, the longer the coherence length the narrower the frequency range of the light source. 3.3.1.2 Spatial coherence

The interference fringes of light waves become blurred when sources are spatially extended—a phenomenon due to the spatial coherence of the

Wave Optics

105

extended source. Here, the spatial coherence of light will be explained with the help of Young’s experiment. As shown in Fig. 3.17, an extended source illuminates a screen with two pinholes L1 and L2. Behind the screen is a plane on which the fringe pattern is observed. Here we consider the influence of the size of the source on the fringe pattern. For convenience, the discussion below is given only on the x-z plane. In this case, the extended source becomes a slit source. As shown in Fig. 3.17, the bottom end of this slit source, denoted as S1, is on the z axis, while its top end is denoted as S2, and the size of the slit source is a ¼ |S1 – S2|. The two pinholes are on the both sides of the z axis, and the distances to the z axis from the two pinholes are identical. The light wave emitted from point S1 of the source passes through two pinholes, L1 and L2, and produces two subwaves, one from each pinhole. These two subwaves superpose and produce a fringe pattern on the observing plane. As shown in Fig. 3.17, the intensity distribution produced by a point source is a series of bright and dark bands. The extended source can be considered as a collection of independent point sources. Because the subwaves emitting from these independent point sources are incoherent, the fringe pattern produced by an extended source is the superposition of the individual fringe patterns generated by each point on the extended source. The superposed fringe pattern near point P—the cross point of the observing plane with the z axis—is taken as an example to show the spatial coherence of the extended source. As the light path from S1 through L1 to P equals the light path from S1 through L2 to P, the intensity on point P produced by light waves from S1 through L1 and L2 is at the maximum value. The optical path difference between optical paths from each point on the slit source through L1 and L2 to point P on the observing plane is different, such that the fringe pattern generated by each point source has a different lateral

Figure 3.17 Illustration of spatial coherence (Young’s experiment).

106

Chapter 3

shift relative to that generated by S1. Considering, in particular, the top end point S2 of the extended source, if the size of the extended source is small, it will produce a maximum intensity at point P 0 next to point P, as shown in Fig. 3.17. When the size of the extended source becomes larger, P 0 is farther away from P, and the resultant fringe pattern will be blurred. There is a limiting case in which S2 produces the minimum intensity at point P while S1 produces the maximum at the same point P; then the fringe pattern disappears, and the size of the source is limited to that which can produce a stationary fringe pattern. From Eq. (3.26), the minimum intensity requires the OPD of light waves to be an odd multiple of p, corresponding to an odd multiple of one-half of a working wavelength. Since L1P ¼ L2P and S1L1 ¼ S1L2, the limiting condition for the fringe disappearance requires that S2L1 – S2L2 ¼ l/2. According to Eq. (3.26), superposing the individual fringe patterns generated by each point on the extended source is mathematically equivalent to integrating fringe intensities from all independent point sources. The integral for the interference term, a cosine function, is zero on the interval (0, p), such that the fringe pattern fades. In the following discussion, the concrete expression for this limiting condition will be derived. According to the Pythagorean theorem of the rightangled triangle, the following expressions can be determined: 2 d a , ¼R þ 2 2  d 2 þa , ¼R þ 2 

r221 r222

2

where r21 is the distance from S2 to L1, and r22 is the distance from S2 to L2. Then, we can obtain r22  r21 ¼

2da l ¼ : r21 þ r22 2

As R is much larger than a (the size of source) and d (the distance between the two pinholes), r21 þ r22 ≈ 2R; therefore, da l ¼ : R 2

(3.29a)

Setting b ≈ d/R, where b is the angle subtended by the line segment L1L2 to any point on the slit source, we obtain

Wave Optics

107

ba ¼

l : 2

(3.29b)

The above expression is, in fact, the Lagrange invariant of the optical system in Young’s experiment for the existence of a fringe pattern under the limiting condition. According to Eq. (3.29b), both the size of the source and the distance between the two pinholes have significant effects on the visibility of the fringe pattern near point P. If da/R < l/2, the fringe pattern is visible. Conversely, if da/R > l/2, fringes are absent. 3.3.2 Examples We now present several typical examples of interference and explain some applications of the interference of light. 3.3.2.1 Two-beam interference

First, we present two examples of two-beam interference—one based on division of the wavefront and the other based on division of the amplitude. Two-beam interference I: division of the wavefront

One of the most famous two-beam interference examples based on division of the wavefront is Young’s experiment, which was also used to investigate spatial coherence. As shown in Fig. 3.18, a screen A with two pinholes L1 and L2 is illuminated by a monochromatic point source S. The fringe pattern is observed on plane M far from screen A. The two pinholes L1 and L2 are close to each other and are equidistant from source S. If a light wave train is emitted from source S, its wavefront will reach the two pinholes L1 and L2 at the same time. Then the two pinholes act

Figure 3.18 Young’s experiment.

108

Chapter 3

as two in-phase monochromatic point sources, and light waves from them superpose on plane M. The fringe pattern can be observed on plane M. Suppose that plane M is on the x-y plane and the origin O of the x-y plane is on axis z, as shown in Fig. 3.18. L1L2 is parallel to the x axis. Let d be the separation between pinhole L1 and pinhole L2, and D be the distance between screen A and plane M. Now, we consider the intensity at an arbitrary point P with coordinates (x, y) on plane M. According to Eq. (3.24), the intensity at point P is dependent on the difference of the geometrical paths for light waves reaching P from L1 and L2. If the distances from L1 to P and from L2 to P are r1 and r2, respectively, then   d 2 , r21 ¼ D2 þ y2 þ x  2   d 2 2 2 2 r2 ¼ D þ y þ x þ , 2 and the difference of the geometrical paths can be expressed as Dl ¼ jr1  r2 j ¼

2xd : r1 þ r2

As D is much larger than d, x, and y, it is easy to obtain r1 þ r2  2D, and Dl ¼

xd : D

If the refractive index of the surrounding medium is n, the optical path difference from L1 and L2 to P is nDl ¼ nxd/D. According to Eq. (3.76), which will be presented in Section 3.5, the corresponding phase difference is d¼

2p nxd : l D

(3.30)

Since D is much larger than d, x, and y, intensities of the two light waves at P from L1 and L2 are approximately the same, labeled as I0. Then, according to Eq. (3.26), the intensity at point P can be expressed as d I ¼ 4I 0 cos2 : 2 Substituting Eq. (3.30) into the above equation, the intensity can be calculated as

Wave Optics

109

I ¼ 4I 0 cos2

pnxd : lD

(3.31)

According to Eq. (3.31), there are maxima of intensity when x¼

m Dl , 2 nd

jmj ¼ 0, 2, 4, : : : ,

and minima of intensity when x¼

m Dl , 2 nd

jmj ¼ 1, 3, 5, : : : :

The fringe pattern in the immediate vicinity of the origin O consists of bright and dark bands, as shown in Fig. 3.18. Those bright or dark bands are equidistant and perpendicular to the x axis. The separation between adjacent bright fringes is Dl/nd. At any point of the fringe pattern, the order of interference is defined by the number N ¼ d/2p, and the bright fringes correspond to integer orders. Two-beam interference II: division of the amplitude

The interference example based on division of the amplitude given here concerns the interference pattern produced by a transparent film with two reflecting plane surfaces. As shown in Fig. 3.19(a), a thin film is illuminated by a beam of monochromatic light. A ray incident on the film is reflected by the upper surface and the lower surface of the film. The two reflected rays intersect at point P. The difference between the two optical paths from point A to P is Dl ¼ nðAB þ BCÞ  n0 ðAP  CPÞ, where n and n0 are the refractive indices of the film and the surrounding medium, respectively. If the film is sufficiently thin and the incident angle u1 of the ray incident on the film is small, Dl can be expressed as1

Figure 3.19 (a) Interference on a transparent film and (b) the corresponding fringes.

110

Chapter 3

l Dl ¼ 2nh cos u2  , 2 where h is the thickness of the film at point B, u2 is the refracted angle of the incident ray, and l/2 is introduced due to the phase change of p by the reflection at one of the surfaces of the film. Note that, if light rays are normally incident on the film, u1 and u2 are nearly zero. Thus, l Dl ¼ 2nh  : 2 According to Eq. (3.26), when 2nh 

l m ¼ l, 2 2

jmj ¼ 0, 2, 4, : : : ,

the intensity at point P has the maximum value, and when 2nh 

l m ¼ l, 2 2

jmj ¼ 1, 3, 5, : : : ,

the intensity at point P has the minimum value. Then, in such a normally incident case, intensities for every point over the film can be determined, and an interference pattern can be observed. Furthermore, the fringe with the same order corresponds to the locus of points with the same thickness of the film, so this type of interference pattern is called fringes of equal thickness. Figure 3.20(a) is an application of fringes of equal thickness. The setup shown is used for testing curvatures of surfaces of a lens by fringes of equal thickness. As shown in Fig. 3.20(a), a plano-convex lens is placed on a plane surface, and the air gap between the convex surface of the lens and the plane surface can be considered as a thin film with a refractive index of 1. If light

Figure 3.20 Illustration of (a) the setup for generating (b) Newton rings.

Wave Optics

111

rays are normally incident on the plane surface of the plano-convex lens, fringes of equal thickness will be generated, as shown in Fig. 3.20(b). From the distribution of the fringe pattern, the thickness of the air gap at any point can be calculated according to Ref. 4. In this way, the curvature of convex surfaces can be determined. 3.3.2.2 Multibeam interference

For practical applications, a fringe pattern having very narrow and bright fringes is very useful and can be generated by multibeam interference. Here, we discuss the Fabry–Pérot interferometer, whose working principle is based on multibeam interference. The Fabry–Pérot interferometer is widely used in spectroscopy to measure the wavelength of a light wave. As shown in Fig. 3.21(a), the optical layout of the Fabry–Pérot interferometer mainly consists of two glass plates. The inner surfaces of the two glass plates, being parallel to each other, are coated with partially transparent films having high reflectivity. The fringe pattern produced by the Fabry–Pérot interferometer is shown in Fig. 3.21(b). To expound the principle of the Fabry–Pérot interferometer, first the principle of two-beam interference generated by two parallel plane surfaces is presented. Consider two parallel plane surfaces in air and illuminated by a monochromatic light source, as shown in Fig. 3.22. A ray emitted from source S is reflected by the first and second plane surfaces, and the two transmitted rays are focused at point P by a lens. The optical path difference between the two rays is Dl ¼ (AB þ BC) – AN. The distance between the two plane surfaces is h, and the incident angle and reflected angle for the input ray SA are u and u0 , respectively. In practice, assuming that u ¼ u0 is a reasonable approximation. Then AB ¼ BC ¼ h/cosu, and AN ¼ ACsinu ¼ 2htanusinu. Thus, we can obtain Dl ¼ 2h cos u:

(3.32)

According to Eq. (3.26), bright fringes are generated when

Figure 3.21 (a) Diagram of the Fabry–Pérot interferometer and (b) the corresponding fringe pattern.

112

Chapter 3

Figure 3.22

Illustration of the formation of fringes of equal inclination.

2h cos u ¼

m l, 2

and dark fringes are generated when m 2h cos u ¼ l, 2

jmj ¼ 0,2,4, : : : ,

jmj ¼ 1,3,5, : : : :

It can be seen that a given fringe is characterized by the value of incident angle u. For this reason, fringes are often called fringes of equal inclination. In actuality, if a ray is incident on two parallel plane surfaces such as in the Fabry–Pérot interferometer setup, multiple reflections between the two parallel plane surfaces occur because of the high reflectivity of the two plane surfaces, as shown in Fig. 3.23. Thus, a series of beams will be incident on the lens and will participate in producing interference fringes on the screen. Denoting R as the reflectivity of the two plane surfaces, the intensity of the transmitted light waves can be calculated by the superposition of an infinite number of transmitted light waves as It ¼

I0 , 4R 2d 1þ sin 2 ð1  RÞ2

Figure 3.23 Illustration of multibeam interference.

(3.33)

Wave Optics

113

where I0 is the intensity of the incident light; and d, which can be calculated by (2p/l)Dl using Eq. (3.32), is the phase difference between each transmitted light wave and the preceding wave. A detailed derivation of Eq. (3.33) can be found in Refs. 1 and 5. Figure 3.24 shows the intensity distribution of the fringes with several different reflectivities of the two plane surfaces. If the reflectivities of the two plane surfaces are sufficiently high, the intensities of the beams passing through the two parallel plane surfaces are approximately equal. The fringe pattern then differs from that produced by two-beam interference and consists of very narrow, bright fringes on an almost completely dark background, as shown in Fig. 3.21(b). The sharpness of the fringe is conveniently measured by the distance between the half-maximum intensity points on both sides of a maximum intensity point; this distance is called the half-intensity width. According to the derivation in Ref. 1, the ratio of the separation between adjacent fringes and the half-intensity width can be expressed as pffiffiffiffi p R z¼ , 1R where z is also known as the finesse of fringes, which is a useful measure of Fabry–Pérot fringe sharpness. 3.3.2.3 Fourier transform spectrometer

The Fourier transform spectrometer, which is based on the temporal coherence of light and the Fourier transform, is an instrument that measures the Fourier spectrum of a light source. A typical sketch of the Fourier transform spectrometer setup is shown in Fig. 3.25, where it can be seen that the optical layout of the Fourier transform spectrometer is based on the Michelson interferometer. By scanning the movable mirror over different

Figure 3.24 Multiple beam fringes of equal inclination of transmitted light waves.

114

Chapter 3

Figure 3.25

Illustration of the Fourier transform spectrometer setup.

distances, a series of intensities corresponding to different distances are recorded by a point detector. Then the spectrum of the light source can be determined through the Fourier transform of the series of recorded intensities. The principle of Fourier spectroscopy is presented next. Suppose that light emitted from a source is monochromatic, the amplitudes of two light waves arriving at the point detector are the same, and the corresponding intensity for each light wave is denoted as I0(k). According to Eq. (3.26), the intensity on the point detector can be expressed as I ðDlÞ ¼ 2I 0 ðkÞ½1 þ cosðk · DlÞ,

(3.34)

where k ¼ 2p/l, and Dl is the difference between the optical paths of two light waves passing through the two arms of the Michelson interferometer. Since light emitted from an actual source includes light with different wavelengths, the intensity on the point detector can be expressed as the integral with an integrand of Eq. (3.34); i.e., `

I ðDlÞ ¼ ∫ 2I 0 ðkÞ½1 þ cosðk · DlÞdk 0 `

`

¼ ∫ 2I 0 ðkÞdk þ ∫ 2I 0 ðkÞ cosðk · DlÞdk 0

0

`

¼ 2I 0 þ ∫ I 0 ðkÞðeikDl þ eikDl Þdk: 0

From the above equation, it can be seen that the first term on the right side of the equation is a constant. Next, we prove that the integral in the

Wave Optics

115

second term is actually a Fourier transform of I0(k). By setting I0(–k) ¼ I0(k), this integral can be extended to negative frequencies [I0(0) being counted twice]. Denoting this extended integral as I0 (Dl), it can be expressed as `

0

I ðDlÞ ¼

∫ I 0 ðkÞeikDl dk:

`

This equation indicates that I0 (Dl) and I0(k) are a Fourier transform pair. (The Fourier transform is presented in both the next section and the appendices. Thus, the power spectrum of the light source I0(k) can be determined by an inverse Fourier transform of I0 (Dl), and `

I 0 ðkÞ ¼

∫ I 0 ðDlÞeikDl dDl:

`

The difference of the optical paths Dl is proportional to the displacement of the movable mirror in Fig. 3.25. Thus, by measuring a series of intensities as a function of displacements of the mirror, the power spectrum of the light source under measurement can be determined. 3.3.2.4 Stellar interferometer

The stellar interferometer was invented by Michelson and is used to determine the diameter of a star and the separation of binary stars based on the spatial coherence of light introduced in subsection 3.3.1. The star to be determined can be treated as an extended source, and the two mirrors M1 and M2 shown in Fig. 3.26 are just like the two pinholes in Fig. 3.17.

Figure 3.26 Illustration of a stellar interferometer setup.

116

Chapter 3

According to subsection 3.3.1, if interference fringes can be observed in the focal plane of the lens in the stellar interferometer as shown in Fig. 3.26, the diameter a of the star, the distance R between the star and the interferometer, and the distance d between M1 and M2 must satisfy the inequality da l ≤ : R 2 Since R is much larger than a, a/R is usually replaced by u, known as the angular diameter of the star. Then the inequality can be rewritten as du ≤

l : 2

With an increase of distance d, the fringe pattern will be blurred. When the fringe pattern just fades away, the following equation is satisfied: du ¼

l : 2

Then the angular diameter of the star can be calculated by u¼

l : 2d

3.4 Fourier Optics: An Introduction An optical imaging system can be considered as a linear system. This means that the response of an optical system to several stimuli simultaneously acting on the system is the same as the sum of the responses of the system to each individual stimulus acting on it. As the Fourier transform is a powerful and useful tool for analyzing linear systems, it can also be used to study optical imaging systems, which is called Fourier optics. Fourier optics is one of the most important branches of optics, and a brief introduction of Fourier optics will be presented in this section. 3.4.1 Fourier transform The behavior of the Fourier transform of a signal is very similar to that of a prism dispersing white light into its constituents having different wavelengths. Like the prism spreading white light into its spectrum, the Fourier transform mathematically splits the input signal into its frequency components. For a review of the Fourier transform, please refer to the book by Papoulis6 as well as the appendices of this book. Because optical systems generally process 2D spatial signals, the Fourier transform presented in this book is limited to two

Wave Optics

117

dimensions. The Fourier transform can decompose a complex 2D light field U (x, y) into a collection of its spatial frequency components by ˜ Uðu, vÞ ¼

þ` þ`

∫ ∫

` `

Uðx, yÞ exp½i2pðux þ vyÞdxdy,

(3.35)

˜ where u and v are spatial frequencies in the frequency domain, and U(u, v) is the Fourier spectrum of U(x, y). Compared with Eq. (3.7b), the integral kernel exp[–i2p(ux þ vy)] in Eq. (3.35) is actually a plane wave. Therefore, Eq. (3.35) describes the complex amplitudes of various plane wave components that comprise the light field and is known as the Fourier analysis. ˜ Meanwhile, the Fourier spectrum U(u,v) can be used to reconstruct the original light field U(x, y) by the inverse Fourier transform: þ` þ`

Uðx, yÞ ¼

∫ ∫

` `

˜ Uðu, vÞ exp½i2pðux þ vyÞdudv:

(3.36)

This expression is also known as the Fourier synthesis. Next, two sinusoidal periodic patterns are taken as an example to provide a rough picture of the 2D Fourier transform. Figure 3.27 shows two sine types of periodic patterns and their corresponding spatial frequency spectra. The first periodic pattern, shown in Fig. 3.27(a), has a structure of sparse stripes. Its frequency spectrum obtained by Fourier transform is shown in Fig. 3.27(b). The brightest peak in the center is the power of the zero frequency, and the other two bright peaks are powers of the nonzero frequencies encoded in the sinusoidal periodic pattern. The second sinusoidal periodic pattern [Fig. 3.27(c)] is the same as the first pattern except for its denser stripes. Comparing the Fourier spectra of these two sinusoidal periodic patterns, it can be clearly seen that both have three frequency components, and the dense stripe pattern has higher nonzero spatial frequencies than the sparse pattern. A more detailed explanation is found in the example called Sinusoidal amplitude gratings in subsection 3.4.4. Descriptions of the 2D signal in the spatial domain and the frequency domain have exactly the same information, one domain having no more information than the other. 3.4.2 Angular spectrum expansion As pointed out in the last subsection, using the Fourier transform, a light field can be decomposed into a series of plane waves, whose complex amplitudes form its Fourier spectrum. By reformulating this Fourier spectrum, a new description of light fields, namely, angular spectrum expansion, can be determined. The propagation of an angular spectrum provides another way to

118

Figure 3.27 spectra.

Chapter 3

(a) and (c) Two sinusoidal patterns; and (b) and (d) their spatial frequency

describe the diffraction of light. In this subsection, the angular spectrum expansion and its propagation are discussed. Because plane waves play a key role in the angular spectrum expansion, plane waves will be discussed before presenting the content of the angular spectrum expansion. By ignoring the time factor in Eq. (3.7), a plane wave propagating with a wave vector k has the complex exponential form pðx, y, zÞ ¼ A expðik · rÞ,

(3.37)

where A is the amplitude of the plane wave; r ¼ xx¯ þ y¯y þ z¯z, a position ¯ y¯ , and z¯ denote the unit vectors for the x, y, and z coordinates, vector; and x, respectively. In order to introduce the concept of angular spectrum, let a, b, and g be angles between the propagation direction of the plane wave and axes x, y, and z, respectively. As such, the propagation direction cosines are l ¼ cosa,

Wave Optics

119

m ¼ cosb, n ¼ cosg, and l2 þ m2 þ n2 ¼ 1, as shown in Fig. 3.28. With this notation, we can determine that kx ¼ (2p/l)l, ky ¼ (2p/l)m, and kz ¼

ffi 2p 2p pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n¼ 1  l 2  m2 : l l

Then the plane wave expressed by Eq. (3.37) can be alternatively written as 

  ffi  2p 2p pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 pðx, y, zÞ ¼ A exp i ðlx þ myÞ exp i 1l m z : l l

(3.38)

Expressing the integral kernel in Eq. (3.35), i.e., a plane wave, with the propagation direction cosines of the plane wave l and m instead, the expression of the Fourier transform [Eq. (3.35)] can be rewritten as  2p ˜ mÞ ¼ ∫ ∫ Uðx, yÞ exp i ðlx þ myÞ dxdy: Uðl, l ` ` þ` þ`



(3.39)

˜ m)—the complex amplitude of the plane wave—is The Fourier spectrum U(l, a function of the propagation direction cosines of plane waves l and m. For this reason, Eq. (3.39) is called the angular spectrum of the light field U(x, y). Suppose that the light field at plane z1 ¼ 0 is expressed as U(x1, y1, 0). When this light field propagates to plane z2, it is notated as U(x, y, z). As shown in Fig. 3.29, our goal is to deduce U(x, y, z) from U(x1, y1, 0) by the propagation of the angular spectrum. According to Eq. (3.39), the angular spectra of light fields at planes z1 ¼ 0 and z2 ¼ z can be, respectively, expressed by

Figure 3.28 Wave vector of a plane wave.

120

Chapter 3

Figure 3.29

Propagation of light from plane z1 ¼ 0 to z2 ¼ z.

 2p ∫ ∫ Uðx1 , y1 , 0Þ exp i l ðlx1 þ my1 Þ dx1 dy1, ` ` þ` þ`   ˜ m, zÞ ¼ ∫ ∫ Uðx, y, zÞ exp i 2p ðlx þ myÞ dxdy: Uðl, l ` `

˜ m, 0Þ ¼ Uðl,



þ` þ`

(3.40)

Next, the propagation of the angular spectrum, i.e., the relationship between ˜ m, 0) and U(l, ˜ m, z), is deduced. According to the physical meaning of U(l, angular spectrum expansion, for the particular direction cosines, l0, m0, and n0, the plane wave at planes z1 ¼ 0 to z2 ¼ z can be, respectively, expressed using the angular spectrum as   2p ˜ Uðl 0 , m0 , 0Þ exp i (3.41) ðl x þ m0 y1 Þ , l 0 1 and   ˜ 0 , m0 , zÞ exp i 2p ðl 0 x þ m0 yÞ : Uðl l

(3.42)

At the same time, according to Eq. (3.38), the general expression of a plane wave at planes z1 ¼ 0 and z2 ¼ z can also be, respectively, written in the form of the propagation direction cosines of the plane wave:   2p A exp i ðl 0 x1 þ m0 y1 Þ , (3.43) l and    2p 2p ðl x þ m0 yÞ exp i nz : A exp i l 0 l 0 

(3.44)

As Eqs. (3.41) and (3.43) are different expressions for the same plane wave at plane z1 ¼ 0, comparing Eqs. (3.41) and (3.43), it is easy to obtain

Wave Optics

121

˜ 0 , m0 , 0Þ ¼ A: Uðl

(3.45)

Similarly, comparing Eqs. (3.42) and (3.44), we have   2p ˜ 0 , m0 , zÞ ¼ A exp i Uðl nz : (3.46) l 0 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Substituting n0 ¼ 1  l 20  m20 into Eq. (3.46) and combining Eqs. (3.45) and (3.46), it is easy to obtain  qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2p ˜ 0 , m0 , zÞ ¼ Uðl ˜ 0 , m0 , 0Þ exp i Uðl 1  l 20  m20 z : l Although this relationship is derived on cosines of a particular direction, cosines of other directions also satisfy this relationship as follows:  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ˜ m, zÞ ¼ Uðl, ˜ m, 0Þ exp i 2p 1  l 2  m2 z : Uðl, (3.47) l We can arrive at the same conclusion by substituting the angular spectrum into the Helmholtz equation and solving this differential equation; the details of this derivation can be found in Ref. 3. Equation (3.47) means that there is a phase delay for the propagation of the angular spectrum from one plane to another for l2 þ m2 ≤ 1. However, if l and m satisfy l2 þ m2 > 1, the phase pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi l 2 þ m2  1zÞ, which means delay will become a damping factor, expð 2p l that the wave will become evanescent. ˜ m, z), the light field at By applying the inverse Fourier transform to U(l, plane z2 ¼ z can be expressed as  ffi  2p pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2 ˜ Uðx, y, zÞ ¼ ∫ ∫ Uðl, m, 0Þ exp i 1l m z l ` `   2p  exp i ðlx þ myÞ dldm: l þ` þ`

(3.48)

Furthermore, when the integral is limited to a circular region, the light field at plane z2 ¼ z can be written as  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2p ˜ m, 0Þ exp i 1  l 2  m2 z Uðx, y, zÞ ¼ ∫ ∫ Uðl, l ` ` pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2p 2 2  circ l þ m exp i ðlx þ myÞ dldm: l þ` þ`

(3.49)

The propagation of the angular spectrum method for calculating the light field at a certain plane is widely used due to its accuracy and simplicity;

122

Chapter 3

e.g., it is used in wavefront sensing to calculate the light field on the detecting plane.7 3.4.3 Fourier transform in optics The Fourier transform can be performed by a simple lens, and the Fouriertransforming property of a lens is a starting point for the entire field of Fourier optics. In this subsection, the phase transform of a positive lens is presented, then the Fourier-transforming property of the positive lens is introduced. 3.4.3.1 Phase transformation of a positive lens

According to the discussion in Chapter 2, if a bundle of rays parallel to the optical axis of a thin positive lens is incident on this lens, the rays will converge to the focal point of the lens under the paraxial approximation. In wave optics, this process can also be regarded as a plane wave being converted into an ideal spherical wave by this lens under the paraxial approximation, as shown in Fig. 3.30(a). Similarly, for a thin lens, the coordinates at which an incoming ray enters the front surface of the lens [(x, y) in Fig. 3.30(b)] can be considered to be the same as those coordinates at which the ray exits the back surface of the lens. Suppose that wavefront S in Fig. 3.30(a) is just behind the thin lens. Compared to the ray passing through point A on wavefront S, the ray passing through the center of the lens has a delay z in the propagation path due to different thicknesses of the lens at different radial positions. Since wavefront S is an ideal sphere, in the right-angled triangle FAB, it is easy to obtain ðf  zÞ2 þ r2 ¼ f 2 ,

(3.50)

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where f is the focal length of the lens, and r ¼ x2 þ y2 is the radial distance between point (x, y) on the lens and the center of the lens, as shown in

Figure 3.30

Diagram of the phase transformation of a positive thin lens.

Wave Optics

123

Fig. 3.30(b). Considering the paraxial approximation, the second-order power term z2 can be neglected in the expansion of Eq. (3.50), and the delay z can be expressed as z¼

r2 : 2f

(3.51)

Converting this delay into a phase delay using Eq. (3.76) in Section 3.5, the phase transformation of a positive thin lens can be given by kr2/(2f ). The field transmission function of the lens can be expressed as   r2 tðx, yÞ ¼ exp ik , 2f

(3.52)

where the negative sign in the exponent is due to the convergence of the spherical wave. Note that, although this phase transformation is deduced in the case of a positive lens, it is also suited for other types of lenses. However, as the phase transformation is obtained under the paraxial approximation, it is not suitable for nonparaxial cases. 3.4.3.2 Fourier transform by a lens

Next, the Fourier-transforming property of a positive lens is discussed. Figure 3.31 shows an object placed at the first focal plane of the positive lens with a focal length of f. Suppose that this object is illuminated by a monochromatic plane wave, and the light field behind the object is notated as U0(x0, y0). According to Eq. (3.14), the light field just in front of the lens, UL1(x1, y1), can be given by U L1 ðx1 , y1 Þ ¼

eikf ilf

∫∫U 0 ðx0 , y0Þei S

k 2 2 2f ½ðx1 x0 Þ þðy1 y0 Þ 

dx0 dy0 :

(3.53) k

This means that UL1(x1, y1) is the convolution of U0(x0, y0) and eilf ei2f ðx1 þy1 Þ . According to the convolution theorem of the Fourier transform presented in the appendices, the equivalent expression of Eq. (3.53) in the frequency domain can be written as ikf

˜ 0 ðu, vÞeikf exp½iplf ðu2 þ v2 Þ, U˜ L1 ðu, vÞ ¼ U

2

2

(3.54) k

where eikfexp[–iplf(u2 þ v2)] is the Fourier transform of eilf ei2f ðx1 þy1 Þ . Considering the field transmission function of the lens [Eq. (3.52)], the light field just behind the lens, UL2(x1,y1), can be written as ikf

2

2

124

Chapter 3

Figure 3.31

Fourier transform by a lens.



 k 2 2 U L2 ðx1 , y1 Þ ¼ U L1 ðx1 , y1 Þ exp i ðx1 þ y1 Þ : 2f

(3.55)

Then, the light field at the second focal plane of the lens can be determined by using the Fresnel approximation [Eq. (3.15)]: U f ðx, yÞ ¼

i 2p k 2 2 eikf i2fk ðx2 þy2 Þ h ∫∫ e U L2 ðx1 , y1 Þei2f ðx1 þy1 Þ ei lf ðx1 xþy1 yÞ dx1 dy1 : (3.56) ilf S

Submitting Eq. (3.55) into Eq. (3.56), the phase quadratic factor in the square brackets of Eq. (3.56) is cancelled, and Eq. (3.56) can be rewritten as U f ðx, yÞ ¼

2p eikf i2fk ðx2 þy2 Þ e ∫∫ U L1 ðx1 , y1 Þei lf ðx1 xþy1 yÞ dx1 dy1 : ilf S

(3.57)

Letting u ¼ x/(lf ) and v ¼ y/(lf ), and submitting u and v into Eq. (3.57), the following expression can be determined: ikf

˜ f ðu, vÞ ¼ e eiplf ðu2 þv2 Þ U ilf



∫∫U L1 ðx1 , y1 Þe S

 i2pðx1 uþy1 vÞ

dx1 dy1 :

(3.58)

Obviously, the expression in the square bracket of Eq. (3.58) is the Fourier transform of UL1(x1, y1). Therefore, submitting Eq. (3.54) into Eq. (3.58), it is easy to obtain

Wave Optics

125 i2kf

˜ f ðu, vÞ ¼ e ˜ ðu, vÞ U U ilf 0 ¼

ei2kf ilf

∫∫U 0 ðx0 , y0 Þei2pðux þvy Þdx0 dy0 : 0

0

(3.59)

S

It can be found that the light field at the second focal plane of the positive lens is exactly the Fourier transform of the object located at the first focal plane of the lens. Moreover, if the object is placed at other planes, the light field at the second focal plane of the positive lens will also be the Fourier transform of the object, although there may be an additional quadratic-phase delay.3 Thus, the second focal plane of the positive lens is also called the Fourier spectrum plane. Note that in Eq. (3.59) the effect of the lens aperture has been neglected. If the effect of the aperture is considered, Eq. (3.59) can be rewritten as3 i2 kf

e U˜ f ðu, vÞ ¼ ilf

∫∫U 0 ðx, yÞPðx, yÞei2pðuxþvyÞdxdy,

(3.60)

S

where P(x, y) is the pupil function of the lens, which equals 1 inside the pupil and 0 otherwise. From the above discussion, it should be clear that Fourier optics is paraxial optics. 3.4.4 Examples of optical Fourier spectra In Section 3.2, the diffraction patterns of several geometrical structures were calculated through integration. These diffraction patterns can also be determined by taking the square of the moduli of the corresponding Fourier transforms of these geometrical structures. In this sense, these diffraction patterns are also the optical Fourier spectra of objects. In the following examples, the Fourier spectra of various objects are analyzed from the perspective of the Fourier transform. 3.4.4.1 Point sources

In Fig. 3.32, a monochromatic point source is placed at point (–a, 0) on the first focal plane of a simple lens. Mathematically, the light field on the object plane is U0d(x þ a)d(y), where U0 is the amplitude of the light field, and d(·) (the d function) is zero everywhere and infinite at zero. A detailed description of d(·) and its Fourier transform can be found in the appendices of this book. The Fourier spectrum of the point source at the rear focal plane of the lens can be determined by the Fourier transform

126

Chapter 3

Figure 3.32 Optical Fourier spectrum of a point source.

U˜ ps ðu, vÞ ¼

þ` þ`

∫ ∫ U 0 dðx þ aÞdðyÞ exp½i2pðux þ vyÞdxdy

` `

¼ U 0 expði2pauÞ:

(3.61)

This means that the Fourier spectrum of a point source is a uniformly distributed plane wave. The optical Fourier spectrum of this point source is |U0|2, a constant. Obviously, this example also implies that a translation in the spatial domain introduces a phase shift in the frequency domain, i.e., the shift theorem of the Fourier transform presented in the appendices. If two monochromatic point sources are symmetrically placed on the x axis, e.g., located at (–a, 0) and (a, 0) in Fig. 3.33, the light field at the object plane is expressed as U0d(y)[d(x – a) þ d(x þ a)]. Similar to the Fourier spectrum of one point source, the Fourier spectrum of two point sources is a superposition of two plane waves; i.e., U 0 ½expði2pauÞ þ expði2pauÞ ¼ 2U 0 cosð2pauÞ: The optical Fourier spectrum of two point sources is 4U02cos2(2pau) on the Fourier spectrum plane, as shown in Fig. 3.34. 3.4.4.2 Plane waves

As shown in Fig. 3.35(a), a plane wave parallel to the optical axis of a lens has a uniform amplitude U0 at the object plane. Its Fourier spectrum can be obtained by

Wave Optics

127

Figure 3.33 Optical Fourier spectrum of two point sources.

Figure 3.34 (a) Two point sources and (b) the corresponding optical Fourier spectrum.

U˜ pw ðu,vÞ ¼

þ` þ`

∫ ∫ U 0 exp½i2pðux þ vyÞdxdy

` `

¼ U 0 dðuÞdðvÞ:

(3.62)

This means that the Fourier spectrum of a plane wave is a point at the Fourier spectrum plane. Moreover, if the propagating direction of a plane wave is oblique to the optical axis of a lens, there will be a phase shift at the object plane for the plane wave. According to the shift theorem of the Fourier transform given in the appendices, there will be a translation at the Fourier spectrum plane, as shown in Fig. 3.35(b).

128

Figure 3.35 optical axis.

Chapter 3

Optical Fourier spectra of plane waves (a) parallel and (b) oblique to the

3.4.4.3 Slits

Slits are one type of commonly used optical components in wave optics, e.g., the slits in the famous Young’s double-slit experiment. Suppose that an infinitely long and infinitely narrow slit (vertical bold line in Fig. 3.36) along the x axis is illuminated by a uniformly monochromatic plane wave. The light field behind the slit can be expressed as U0d(y), and its Fourier spectrum is U˜ s ¼

þ` þ`

∫ ∫ U 0 dðyÞ exp½i2pðux þ vyÞdxdy

` `

¼ U 0 dðuÞ:

(3.63)

Wave Optics

129

Figure 3.36 Optical Fourier spectrum of an infinitely long and infinitely narrow slit; the vertical bold line represents the slit, and the other bold line represents its optical Fourier spectrum.

Figure 3.37

(a) Infinitely long and narrow slit and (b) its optical Fourier spectrum.

It can be seen that the Fourier spectrum of this type of slit is a line perpendicular to the slit, as shown in Fig. 3.37, which shows an infinitely long and narrow slit and its optical Fourier spectrum. If an infinitely long slit with a finite width of a is illuminated as in the above example, the light field behind the slit is U0rect(y/a). As the Fourier transform of the rectangle function is a sinc function, defined as sinc(x) ¼ [sin(px)]/(px), the Fourier spectrum of this slit is U0ad(u)sinc(av). As shown in Fig. 3.38, this optical Fourier spectrum is a line that is also perpendicular to the slit but modulated by the sinc function.

130

Chapter 3

Figure 3.38 (a) Infinitely long and finitely wide slit and (b) its optical Fourier spectrum.

3.4.4.4 Circular apertures

Circular apertures are the most commonly used apertures for optical systems. If a circular aperture with a radius of a is illuminated by a monochromatic plane wave as shown in Fig. 3.39, the light field behind the aperture is   r Uðx, yÞ ¼ U 0 circ , a where r is expressed as

(3.64)

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi x2 þ y2 . The Fourier transform of the circular function is

Figure 3.39 Optical Fourier spectrum of a circular aperture.

Wave Optics

131

aU 0 J ð2parÞ, U˜ circ ðu, vÞ ¼ r 1

(3.65)

where J1(·) is the Bessel function of the first kind of order one, and r is pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u2 þ v2 . This expression has the same formation as Eq. (3.19), and the optical Fourier spectrum of the circular aperture is the diffraction pattern shown in Fig. 3.10. 3.4.4.5 Sinusoidal amplitude gratings

In subsection 3.4.1, two sinusoidal amplitude gratings with different periods were taken as examples to show the concept of spatial frequency. Here, explanations of the results presented in that subsection are provided. Suppose that a sinusoidal grating has a grating constant d (the period of the grating) and is illuminated by a monochromatic plane wave, as shown in Fig. 3.40. The light field behind the grating can be expressed as 

 px U g ðx, yÞ ¼ U 0 sin d      1 1 2px 1 2px ¼ U 0  exp i  exp i : 2 4 d 4 d 2

(3.66)

According to the Fourier spectra of point sources and plane waves, the Fourier spectrum of this grating is easy to obtain by performing the Fourier transform on the light field expressed by Eq. (3.66):

Figure 3.40 Optical Fourier spectrum of a sinusoidal amplitude grating.

132

Chapter 3

˜ g ðu, vÞ ¼ 1 U 0 dðuÞdðvÞ U 2   1 1  U 0d u þ dðvÞ 4 d   1 1 dðvÞ:  U 0d u  4 d

(3.67)

Thus, the optical Fourier spectrum of a sinusoidal grating is three points, as shown in Fig. 3.40. 3.4.4.6 Phase-contrast microscopes

Some examples of optical Fourier spectra have already been provided. Now, a particular example, phase-contrast microscopy, for manipulating the Fourier spectrum is presented. One typical optical layout used for phase-contrast microscopy, as shown in Fig. 3.41, is called the 4f optical imaging system. In this optical system, the light field on the image plane is the Fourier transform of the light field on the Fourier transform plane, which, in turn, is also the Fourier spectrum of the input light field at the object plane. The manipulation of the Fourier spectrum of the input light field is performed at the Fourier spectrum plane. Phase-contrast microscopy—converting the phase of the light field passing through the object to the intensity distribution—is actually a contrastenhancing technique. If a phase object is located at the object plane and illuminated by a monochromatic plane wave as shown in Fig. 3.41, the light field behind the object can be expressed as Aexp[iw(x, y)], where w(x, y) is the phase of the object. If there is no operation on the Fourier spectrum plane, after performing the Fourier transform twice, the light field on the image plane will be Aexp[iw(–x, –y)], where the negative signs are due to a 180-deg rotation between the object and its image. The intensity distribution |A|2 will be uniform

Figure 3.41 Diagram of a 4f optical imaging system setup.

Wave Optics

133

at the image plane, implying that the phase object is impossible to be seen. For small w(x, y), the light field just behind the object can be approximated as A[1 þ iw(x, y)] according to the Taylor expansion. Its Fourier transform at the Fourier spectrum plane can be written as A½dðuÞdðvÞ þ iwðu, ˜ vÞ, where wðu, ˜ vÞ is the Fourier spectrum of w(x, y). If a phase plate, which introduces a phase delay of p/2 only for the zero-order spectrum, is placed on the Fourier spectrum plane, the Fourier transform behind this phase plate will be A½dðuÞdðvÞ expði p2 Þ þ iwðu, ˜ vÞ ¼ iA½dðuÞdðvÞ þ wðu, ˜ vÞ. According to the Fourier integral theorem in the appendices, the light field at the image plane will be iA[1 þ w(–x, –y)]. Thus, the image intensity at the image plane is |A[1 þ w(–x, –y)]|2 ≈ |A|2 [1 þ 2w(–x, –y)] by omitting the second-order term of smallness. As the phase difference is converted to variations of the light intensity, even a slight phase difference can be observed. This is the principle of phase-contrast microscopy for observing phase objects. To become familiar with the Fourier spectra of objects, it is helpful to understand the theory of image formation in Fourier optics. 3.4.5 Formulas governing image formation in Fourier optics The concept of imaging and the imaging laws in geometrical optics have been discussed. However, the real image of an object is different from that predicted by geometrical optics because geometrical optics neglects the wave nature of light. For example, due to the diffraction of light, the image of a point object—even for an aberration-free optical imaging system—cannot be a point image but rather is a diffraction pattern determined by the Fourier transform of the pupil function of the optical system. In the following three subsections, image formation is explained in view of Fourier optics—a method that now plays a fundamental role in the theory of imaging systems. 3.4.5.1 Point spread functions

The point spread function is defined as the response of an optical imaging system to a point object. Since optical imaging systems are usually considered to be linear, the characteristics of their image formation can be conveniently described by their point spread functions. Moreover, as light sources used in optical systems vary—generally as coherent or incoherent, as will be discussed in detail in Chapter 4—there are two kinds of point spread functions for different illumination sources: coherent and incoherent point spread functions. Any optical imaging system composed of either a single lens or several lenses can be abstracted as an entrance pupil and an exit pupil, as shown in Fig. 3.42. For an ideal optical system, a divergent spherical wave on the entrance pupil emitting from a point object on the object plane is converted by the ideal optical system into a perfectly convergent spherical wave on the exit pupil. This forms an ideal point at the image plane of the optical system if the effect of diffraction is not considered. The resulting perfectly spherical wave

134

Chapter 3

Figure 3.42 Diagram of an abstracted optical imaging system.

on the exit pupil is taken as a reference surface for evaluating the wavefront aberration of a real optical imaging system. If the imaging system is aberrated, the wavefront on the exit pupil will deviate from the perfectly spherical wavefront, and the image will be blurred. For this reason, the exit pupil of an optical imaging system plays a key role in the study of optical imaging systems. Here a generalized pupil function on a circular exit pupil is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Pðx1 , y1 Þ ¼ circ x21 þ y21 exp½iwðx1 , y1 Þ, where w(x1, y1) is the equivalent aberrated wavefront of the imaging system on the exit pupil plane. The relationship between a light field on the exit pupil and the corresponding light field on the image plane is a Fourier transform; i.e.,  2p hðx, yÞ ¼ ∫ ∫ Pðx1 , y1 Þ exp i ðxx1 þ yy1 Þ dx1 dy1 , lz ` ` þ` þ`



(3.68)

where h(x, y) is known as the coherent point spread function, and z is the distance between the exit pupil and the image plane. As opposed to coherent cases, the incoherent point spread function is |h(x, y)|2. The coherent point spread function represents the optical field on the image plane generated by the field of a point object propagating through an optical imaging system. The incoherent point spread function represents the optical intensity on the image plane corresponding to the intensity of a point object. 3.4.5.2 Image formation with coherent illumination

Suppose that m is the magnification of an optical imaging system. If an extended object is illuminated by a coherent light, the light field from the extended object is O(x0, y0), and its field at the image plane for an ideal imaging system is (1/m)[O(x, y)], where x ¼ mx0, and y ¼ my0. The light field

Wave Optics

135

(1/m)[O(x, y)] is denoted as o(x, y), the light field for an ideal image on the image plane. For a practical imaging system, an ideal image field o(x, y) doesn’t exist. The actual light field at the image plane of the optical imaging system satisfying the isoplanatic condition can be expressed as þ` þ`

U c ðx, yÞ ¼

∫ ∫

` `

oðj, hÞhðx  j, y  hÞdjdh:

(3.69)

The image of the extended object is I c ðx, yÞ ¼ jU c ðx, yÞj2 :

(3.70)

Let u ¼ x1/(lz) and v ¼ y1/(lz), which are spatial frequencies in the frequency domain. Then, according to the convolution theorem of the Fourier transform presented in the appendices, Eq. (3.69) has the following form in the frequency domain: ˜ vÞ, ˜ c ðu, vÞ ¼ o˜ ðu, vÞhðu, U

(3.71)

˜ vÞ are the Fourier transforms of Uc(x, y), ˜ c(u, v), õ(u, v), and hðu, where U ˜ vÞ is called the coherent transfer function o(x, y), and h(x, y), respectively. hðu, or amplitude transfer function (ATF) of the optical imaging system. Note that, because the coherent point spread function is the Fourier transform of the generalized pupil function as described by Eq. (3.68), the ATF is the same as the generalized pupil function for coherent illumination. This conclusion completely reveals the behavior of a coherent imaging system in the frequency domain. The pupil function of a coherent imaging system determines a finite passband in the frequency domain. Only frequencies in this passband can pass the coherent imaging system, while others are totally eliminated. In particular, at the boundary of this passband, the frequency, whose response to the coherent imaging system suddenly drops to zero, is defined as the cutoff frequency of this system. For example, if the radius of the exit pupil of a coherent imaging system is r, the cutoff frequency of this coherent imaging system is Qc ¼ r/(lz), where Qc is the cutoff frequency. 3.4.5.3 Image formation with incoherent illumination

Similar to the case of coherent illumination, suppose that m is the magnification of an optical imaging system using incoherent illumination. If an extended object is illuminated by incoherent light, and the light intensity of the extended object is Π(x0, y0), its corresponding intensity at the image plane for an ideal imaging system is an ideal image (1/m2)[Π(x, y)], where x ¼ mx0, and y ¼ my0. The ideal image (1/m2)[Π(x, y)] on the image plane is denoted as I(x, y). In the incoherent imaging regime, the object can be thought of as a collection of different point objects. Phase changes of light beams from different points are

136

Chapter 3

very fast and random, which smears out the interferometric phenomenon in the image. For this reason, image formation in an incoherent imaging regime is simply the linear superposition of intensities of light waves from each point of the object. Therefore, the image of an extended object in an optical imaging system satisfying the isoplanatic condition can expressed as þ` þ`

I inc ðx, yÞ ¼

∫ ∫

` `

I ðj, hÞjhðx  j, y  hÞj2 djdh:

(3.72)

Similar to the coherent illumination case, in the frequency domain, image formation with incoherent illumination can be rewritten as ˜ I˜ inc ðu, vÞ ¼ I˜ ðu, vÞHðu, vÞ,

(3.73)

where I˜ inc (u, v) and I˜ (u, v) are the Fourier transforms of Iinc(x, y) and I(x, y), respectively, and u and v are the same as the definition for coherent ˜ illumination. The incoherent transfer function Hðu,vÞ is defined as ˜ Hðu, vÞ ¼

þ` þ`

∫ ∫

` `

jhðx, yÞj2 exp½i2pðux þ vyÞdxdy

(3.74)

and is also well known as the optical transfer function (OTF) of the optical imaging system. The modulus of the OTF is called the modulation transfer function (MTF) of the optical imaging system. The OTF is important in the characterization of an optical imaging system and can comprehensively evaluate the performance of that system. According to the autocorrelation theorem of the Fourier transform found in the appendices, the OTF can be written as the autocorrelation of the generalized pupil function: ˜ Hðu, vÞ ¼

þ` þ`

∫ ∫

` `

Pðx1 , y1 ÞP ðlzu þ x1 , lzv þ y1 Þdx1 dy1 ,

(3.75)

where the superscript * stands for the complex conjugate. Similar to the ATF, the OTF also reveals the behavior of an incoherent imaging system; a detailed discussion can be found in Ref. 3.

3.5 Wavefront Aberrations A wavefront aberration is defined as the deviation of a wavefront from a desired perfect wavefront. In geometrical optics, geometrical aberrations mainly concern the distribution of intersections between light rays and the paraxial image plane, while wavefront aberrations can be considered as an alternative representation of aberrations taken from the perspective of wave

Wave Optics

137

optics. In this section, wavefront aberrations will be presented. First, a key concept in understanding wavefront aberrations, optical path difference (OPD), is presented. Next, methods to evaluate wavefront aberrations are introduced. Finally, the Zernike representation, a practical method for describing wavefront aberrations, is explained. 3.5.1 Optical path difference The optical path length (OPL), also called optical path, is defined as the product of the geometrical path and the refractive index of the medium through which light passes, as presented in Chapter 2. According to the definition of the refractive index in Section 2.1, the optical path in a homogeneous medium is usually expressed as OPL ¼ nl ¼ c(l/y) ¼ cDt, where n is the refractive index of the medium, l is the distance in the medium through which light travels, y is the speed of light in the medium, and Dt is the time interval of light passing through the medium. As such, the OPL can be thought of as the distance that light travels in vacuum during the same time interval Dt it travels a distance l in a medium. The optical path difference (OPD) is the difference between two optical paths or the difference between two different points on one optical path. According to the definition of OPL, a given OPD corresponds to a certain phase delay. The relationship between the OPD Dl and the phase delay Df can be expressed as Dw ¼

2p Dl, l

(3.76)

where l is the wavelength of light traveling in vacuum. 3.5.1.1 Example: Light traveling in different media of the same thickness

As shown in Fig. 3.43, two rays A and B coming from a point source at infinity travel through a piece of glass and a piece of ice with the same thickness d. What is the OPD between them when they pass through the media? If the refractive indices of glass and ice are denoted as n1 and n2, respectively, the OPLs caused by the glass and ice are, respectively, l 1 ¼ n1 d l 2 ¼ n2 d: Then, the OPD between them is OPD ¼ ðn1  n2 Þd, corresponding to a phase delay of

138

Chapter 3

Figure 3.43 Light traveling in different media of the same thickness.

2p ðn  n2 Þd: l 1 3.5.2 Peak-to-valley and root-mean-square values of a wavefront aberration A deviation in the definition of a wavefront aberration is quantitatively measured by the value of the OPD; or, more precisely, a wavefront aberration is a collection of OPDs between each point on the wavefront and the corresponding point on the desired perfect wavefront. Therefore, each wavefront aberration has a particular corresponding surface, as shown in Fig. 3.44.

Figure 3.44 Diagram of a wavefront aberration.

Wave Optics

139

Usually, the peak-to-valley (PV) value and the root-mean-square (RMS) value of a wavefront aberration are used to evaluate the wavefront aberration. The PV value of a wavefront aberration is simply the difference between that aberration’s maximum and minimum values. For example, if the maximum value of a wavefront aberration is 1/3 waves and the minimum value is –1/6 waves, then the PV value of the wavefront aberration is 1/2 waves. The RMS value of a wavefront aberration is defined as the square root of the average of squares of all of the differences between the wavefront aberration and its mean value. Using polar coordinates, a wavefront aberration on a normalized circular pupil can be denoted as W(r, u). The RMS value of the wavefront aberration can be written as vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u 2p 1 u t1 RMS ¼ ∫ ∫½W ðr, uÞ  W¯ 2 rdrdu, (3.77) p 0 0 where 2p 1

¯ ¼ 1 ∫ ∫ W ðr, uÞrdrdu W p 0 0 is the mean value of the wavefront aberration over the pupil. Obviously, it is convenient and simple to specify a wavefront aberration using the PV value. However, as the PV value simply indicates the maximum range of a wavefront aberration, it cannot reveal much information about the error distribution on the wavefront. Therefore, the PV value can sometimes be misleading when evaluating optical systems. For example, a wavefront aberration with a large PV value [Fig. 3.45(a)] results in a better image than one with a small PV value [Fig. 3.45(b)]. In general, if a wavefront aberration is relatively smooth, the PV value of the wavefront aberration can effectively characterize the wavefront aberration. However, if abruptly irregular variations on the wavefront exist, the

Figure 3.45 Radial profiles of wavefront aberrations having (a) a large PV value and (b) a small PV value.

140

Chapter 3

RMS value of the wavefront aberration should be used. The RMS value of a wavefront aberration typically ranges from approximately 1/5 to 1/3 of the PV value of the wavefront aberration.8 This ratio is highly dependent on the shape of the wavefront aberration. For a very smooth wavefront aberration caused by defocus, a typical value of the ratio is 2/7.2 If the wavefront aberration is less smooth, such as the wavefront aberration caused by highorder aberrations or by fabrication errors, the value of the ratio will be smaller, typically 1/5. 3.5.3 Zernike representation of wavefront aberrations A wavefront aberration can usually be mathematically expressed by a set of polynomials. One of the widely used sets of polynomials is the Zernike polynomials—a sequence of polynomials that are continuous and orthogonal over a unit circle. Theoretically, any continuous wavefront can be expressed by the Zernike polynomials. There are slightly varying forms of Zernike polynomials, with similar expressions presented by different researchers. Here, we adopt the form given by Noll.9 According to Noll’s definition, the Zernike polynomials in polar coordinates can be expressed as follows: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ðn þ 1ÞRm n ðrÞ cosðmuÞ, for m ≠ 0, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Z jðoddÞ ¼ 2ðn þ 1ÞRm n ðrÞ sinðmuÞ, for m ≠ 0, pffiffiffiffiffiffiffiffiffiffiffi Zj ¼ n þ 1R0n ðrÞ, for m ¼ 0,

ZjðevenÞ ¼

where r is restricted in the unit circle (0 ≤ r ≤ 1), meaning that the radial coordinate is normalized by the radius of the pupil, and the radial function Rm n ðrÞ is given as Rm n ðrÞ

¼

ðnmÞ∕2 X s¼0

ð1Þs ðn  sÞ!   rn2 s , nþm nm s! 2  s ! 2  s ! 

where the symbol “!” stands for the factorial. It must be noted that values of n and m are always integral and must satisfy m ≤ n, and n – |m| is even. The index j is a mode-ordering number and is a function of n and m. It should be known that different orderings for the Zernike polynomials are used in other literature. The first 15 Zernike polynomials in Noll’s ordering are listed in Table 3.1. A wavefront aberration can be expanded in terms of a sequence of Zernike polynomials as

Wave Optics

141 Table 3.1

The first fifteen Zernike polynomials.

j

n

m

Z j r; u

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0 1 1 2 2 2 3 3 3 3 4 4 4 4 4

0 1 –1 0 –2 2 –1 1 –3 3 0 2 –2 4 –4

1 2r cos u 2r pffiffisin ffi u2  1Þ pffiffi3ffið2r 2 6 r sin 2u pffiffiffi 2 2u p6 ffiffiffir cos 3 8 ð3r  pffiffiffi 3 2rÞ sin u  2rÞ cos u p8ffiffiffið3r 3 sin 3u 8 r pffiffiffi 3 3u p8ffiffiffir cos 4  6r2 þ 1Þ 5 ð6r pffiffiffiffiffi 4 10ð4r  3r2 Þ cos 2u pffiffiffiffiffi 4  3r2 Þ sin 2u 10 pffiffiffiffiffið4r 4 cos 4u 10 r pffiffiffiffiffi 4 10r sin 4u

W ðr, uÞ ¼

X j

Aberration name Piston x tilt y tilt Defocus Primary astigmatism Primary astigmatism Primary y coma Primary x coma

Primary spherical Secondary astigmatism Secondary astigmatism

C j Zj ðr, uÞ,

(3.78)

where Cj denotes the Zernike coefficients and can be determined by Cj ¼

1 ∫∫W ðr, uÞZj ðr, uÞrdrdu, p

(3.79)

where the integration is operated over the unit circle. There are many distinct advantages of using the Zernike polynomials to describe wavefront aberrations. For example, the RMS of a wavefront aberration can be directly obtained from Zernike coefficients by X RMS 2 ¼ C 2j : (3.80) j.1

Note that if you use other forms of Zernike polynomials, Eq. (3.80) might not be correct. Taking the Zernike polynomials given by Born and Wolf1 as an example, since each of the Zernike polynomials given here should be multiplied by a factor of 1/[2(n þ 1)]1/2 to obtain the corresponding terms (given by Born and Wolf), the RMS value of the wavefront aberration represented by Born’s expression of Zernike polynomials can be obtained by RMS 2 ¼

X j.1

A2j , 2ðn þ 1Þ

where Aj is the Zernike coefficients in Born’s expression of Zernike polynomials.

142

Chapter 3

In short, with the help of the Zernike polynomials, a wavefront aberration can be conveniently represented and manipulated.

3.6 Resolution Limits of Optical Imaging Systems In the previous sections, the basic knowledge of wave optics has been discussed. In this section, theoretical resolution limits for optical imaging systems are introduced. Due to the diffraction of light, the resolution of an optical imaging system is limited, even for an aberration-free system. In the remainder of this section, two theoretical resolution limits of optical imaging systems, the Sparrow criterion and Rayleigh criterion, and a measure of image quality, the Strehl ratio, are presented to evaluate the performance of an optical imaging system. When two monochromatic point sources with the same brightness are imaged simultaneously by an aberration-free optical imaging system, each point source will form an Airy pattern (due to the diffraction of light) on the imaging plane. When these point sources are too close, they cannot be resolved due to the overlap of their Airy patterns. As shown in Fig. 3.46(a), if the interval between the two Airy patterns is 0.5l(F/#), there will be a flat intensity profile between the maxima of the two Airy patterns, and these two points are considered to be barely resolved. This is the Sparrow criterion for the resolution of an optical imaging system. When the interval between the two Airy patterns increases to 0.61l(F/#), as shown in Fig. 3.46(b), the maximum of one Airy pattern is at the first dark ring of the other Airy pattern, and these two points can be obviously discerned. This is the Rayleigh criterion for the resolution of an optical imaging system and is widely used to determine the theoretical resolution limit of an optical imaging system. The Strehl ratio, a measure of the image quality of an optical imaging system, is defined by the ratio of the central peak intensity of an aberrated

Figure 3.46 Diagram of the theoretical limits of resolution using: (a) Sparrow criterion and (b) Rayleigh criterion. The thin dashed lines are the individual Airy patterns for corresponding point objects, and the bold solid lines are the final combined diffracted patterns.

Wave Optics

143

image for a point source to that of an aberration-free image for the same point source. In particular, for an optical imaging system with small aberrations, its Strehl ratio can be approximately expressed as S  expðs2 Þ,

(3.81)

where s (in radians) is the RMS value of a wavefront aberration of an optical imaging system. Note that the Strehl ratio is only valid for evaluating the image quality of optical imaging systems with small wavefront aberrations or for wellcorrected optical imaging systems, such as microscopes and astronomical telescopes compensated by adaptive optics systems. This limitation is attributed to the difficulty in locating the central peak intensity of the image of a point source through an optical imaging system with large aberrations.

References 1. M. Born and E. Wolf, Principles of Optics, Seventh Edition, Cambridge University Press, Cambridge (1999). 2. G. Brooker, Modern Classical Optics, Oxford University Press, Oxford (2003). 3. J. W. Goodman, Introduction to Fourier Optics, Second Edition, McGraw-Hill, New York (1996). 4. D. Malacara, Optical Shop Testing, Third Edition, John Wiley & Sons, Hoboken, New Jersey (2007). 5. W. Lauterborn and T. Kurz, Coherent Optics: Fundamentals and Applications, Second Edition, Springer-Verlag, Berlin-Heidelberg-New York (2003). 6. A. Papoulis, The Fourier Integral and Its Applications, McGraw-Hill, New York, San Francisco, London, Toronto (1962). 7. C. W. Li, B. M. Li, and S. J. Zhang, “Phase retrieval using a modified Shack–Hartmann wavefront sensor with defocus,” Applied Optics 53(4), 618–624 (2014). 8. R. E. Fischer, B. Tadic-Galeb, and P. R. Yoder, Optical System Design, Second Edition, McGraw Hill, New York and SPIE Press, Bellingham, Washington (2008). 9. R. J. Noll, “Zernike polynomials and atmospheric turbulence,” Journal of the Optical Society of America 66(3), 207–211 (1976).

Part II COMPONENTS AND CASE STUDIES

Chapter 4

General Optical Components in Optical Systems In the previous chapters, we mainly focused on the theories of optics necessary for understanding optical systems. From this chapter on, we turn to the practical side—from general optical components to case studies for optical systems. In this chapter, some commonly used optical components are presented to show their working principles, structures, and functions in optical systems.

4.1 Light Sources Light sources, including natural ones, are indispensable optical components in optical systems. Generally, there are two types of light sources. One is the incoherent light source and the other is the coherent light source. Figure 4.1 shows schematics of incoherent and coherent light waves emitted from incoherent and coherent sources. To examine the coherence of a light source, we can observe the intensity pattern produced by the superposition of two or more light waves emitted from this light source. If there are interference fringes in the intensity pattern, the light source is coherent; otherwise, the light source is incoherent. More precisely, the coherence of a light source can be identified according to the definition of the coherence of light given in Chapter 3. To gain a basic understanding of light sources, the three Einstein coefficients are briefly elucidated to explain the interaction between light and matter based on the energy levels introduced in Chapter 1. A two-level atom (Fig. 4.2) is taken as an example for introducing the three Einstein coefficients. When such an atom is illuminated by light with the frequency (E2 – E1)/h, the atom can absorb a photon of light and make a transition from state E1 to state E2. The probability of absorption per second depends on the energy levels of this atom and is proportional to the intensity of the incident light. This process of absorption is called stimulated absorption, which can be described by the Einstein coefficient B12. When an atom is in the excited

147

148

Chapter 4

Figure 4.1

Schematic of (a) incoherent light waves and (b) coherent light waves.

Figure 4.2 Diagram of the processes of (a) stimulated absorption, (b) spontaneous emission, and (c) stimulated emission for an atom with two energy levels.

state E2, there is a chance that it will transit to the lower state E1 and emit a photon, regardless of whether light is present. This process is called spontaneous emission, and its probability, denoted by the Einstein coefficient A21, depends on the energy levels of the atom and is independent of light. When light with frequency (E2 – E1)/h shines on an atom in the excited state E2, this atom can emit a photon that is identical to the incident photon. The probability of this emission per second depends on the energy levels of this atom and is proportional to the intensity of the incident light. This kind of emission is called stimulated emission, which can be described by the Einstein coefficient B21. Figure 4.2 illustrates the three processes involved when light interacts with atoms. 4.1.1 Incoherent sources Most light sources are incoherent. Incoherent sources emit light waves with random phases, different frequencies, and even random amplitudes. The process of light generation by transitions between energy levels in an

General Optical Components in Optical Systems

149

incoherent source is random because the physical mechanism of light generation for incoherent sources is mainly spontaneous emission. Therefore, the light waves emitted by an incoherent source have no correlation with each other, as shown in Fig. 4.1(a). Conventional light sources are all incoherent sources; these include lightemitting diodes (LEDs), superluminescent diodes, and broadband light sources, e.g., fluorescent lamps, tungsten halogen lamps, etc. 4.1.2 Coherent sources A coherent source emits light waves that are in phase and have the same frequency, so there are no abrupt phase changes between light waves within the coherent length. If all atoms in a light source react in an almost identical manner, the source produces powerful coherent waves. Invented in 1960, the laser attains this goal. The basic principles of a laser can be summarized as follows. The word LASER is an acronym that stands for Light Amplification by Stimulated Emission of Radiation. In its current use, the word laser is also used to describe a laser device. The main components of a laser, in principle, are the pumping source, the active medium, and an optical resonator that usually consists of two mirrors, as shown in Fig. 4.3. Coherent light is generated in the active medium of a laser. By pumping atoms in the active medium with external energy delivered in an appropriate form, these atoms are excited into their higher energy levels. At the beginning of the process, the energy stored in the higher energy levels of atoms is partially transformed into light by spontaneous emission between two laser energy levels. Because of the optical resonator, the only photons that survive are those with a wavelength l that satisfies the resonant condition l ¼ 2L/m, where L is the length of the optical resonator, and m is an integral number.

Figure 4.3

Main components of a laser.

150

Chapter 4

Laser emission is generated by these surviving photons, which act as seeds for the process. The laser-generating process can be roughly illustrated by taking one surviving photon as an example. A surviving photon in the active medium of a laser will induce (through stimulated emission) an excited atom to emit a photon identical to the surviving photon. After this process, two identical photons exist in the active medium. In this way, an avalanche emission of photons in a chain of stimulated emission occurs. When the photons generated by this multiplicative process exceed the loss of these type of photons, laser light is produced. All of the photons in the laser light are identical due to the stimulated emission process in the active medium. Based on this mechanism of laser generation, the characteristics of laser light are monochromaticity, coherence, and directionality. Note that, in order to generate laser light, a critical condition called population inversion must be satisfied. When the population inversion condition is fulfilled, the upper laser-energy level is more densely populated than the lower level. Only when this condition is satisfied is stimulated emission able to dominate stimulated absorption in the active medium—a prerequisite for the generation of laser. It is impossible to achieve population inversion between the ground state and an excited state in a system with only two energy levels.1 Fortunately, it is possible to achieve population inversion in an active medium with more than two energy levels, e.g., in a three-level system, as shown in Fig. 4.4. The laser has become a very important type of light source and is widely used in academic, industrial, and commercial fields. The many types of lasers

Figure 4.4 system.

Population inversion between energy levels E1 and E0 for a three-level laser

General Optical Components in Optical Systems

151

that have been developed can be classified according to their active medium materials, e.g., solid state lasers, gas lasers, semiconductor lasers, etc. Generally speaking, solid state lasers generate light with higher power than other laser types due to the high densities of solid media; gas lasers produce light with lower divergence angle and high coherence due to the homogeneities of gas media; and semiconductor lasers, also called laser diodes, are small, compact, and highly efficient, so they are widely used in various applications such as optical pumping, laser pointers, and compact disc players.

4.2 Lenses A lens (or singlet) is usually made of a piece of glass or transparent plastic with two refractive surfaces. Lenses can be divided into two types, positive (converging) and negative (diverging) lenses, according to whether they converge or diverge light from infinity by refraction on their two surfaces. Each surface of a lens can be convex, concave, or planar, as shown in Fig. 4.5. Positive lenses include bi-convex lenses, plano-convex lenses, and positive meniscus lenses; negative lenses include bi-concave lenses, plano-concave lenses, and negative meniscus lenses. 4.2.1 Spherical lenses In practice, both of the refractive surfaces of a lens are usually part of a sphere with different radii, and this type of lens is called a spherical lens. Note that a plane can be considered as part of a sphere with infinite radius. A majority of the discussion in Sections 2.4 and 2.5 in Chapter 2 is based on spherical lenses. In particular, the simplest positive doublet, which consists of two lenses cemented together in order to reduce on-axis spherical and chromatic aberrations, was also introduced in Section 2.5. Here, spherical lenses will not be described in detail.

Figure 4.5 Illustration of different lens types.

152

Chapter 4

In addition to the common lens geometries described at the start of Section 4.2, lenses can be produced in a wide variety of other shapes to meet the different requirements of a particular application. 4.2.2 Spherical ball lenses A spherical ball lens is a sphere made of a transparent medium. Its geometry can be described solely by its diameter. Rays from all incident angles exhibit equal properties in a ball lens, and its focal length depends only on its diameter and refractive index. As shown in Fig. 4.6, a beam of light with diameter d is incident on a ball lens with diameter D, and, in general, d < D. By tracing a ray that is incident on the ball lens from infinity and does not pass through the center of the lens, it can be found that the principal planes of this lens lie at the center of the lens. According to Eq. (2.22), the effective focal length fe of the ball lens can be expressed as fe ¼

1 nD ¼ , f 4ðn  1Þ

(4.1)

where n is the refractive index of the lens. Then, the back focal length (bfl) is merely the difference between the effective focal length and the radius D/2 of the ball lens: bfl ¼ f e 

D Dð2  nÞ ¼ : 2 4ðn  1Þ

The numerical aperture in image space (NAimage) of the ball lens in air is 1 NAimage ¼ sin u ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2ffi : nD 1þ4 4dðn  1Þ

Figure 4.6

Parameters of a ball lens (adapted from Edmund Optics).

(4.2)

General Optical Components in Optical Systems

153

In the paraxial region (i.e., d/D ≪ 1), it is easy to determine that sinu ≈ tanu ¼ d/(2fe) ¼ 1/[2(F/#)], and the numerical aperture in image space can be estimated from its F/#, i.e., NAimage 

1 2dðn  1Þ ¼ : 2  F ∕# nD

(4.3)

Ball lenses are very useful in fiber optics communication, endoscopy, microscopy, and laser measurement systems. For the convenience of mounting and alignment, the upper and the lower portions of the ball lens are cut off, and a new type of lens, which has in its clear aperture the same optical properties as the original ball lens, is called a drum lens,2 as shown in Fig. 4.7. Note that the back focal length (bfl) of a ball or drum lens is 0 for the material with a refractive index of 2. For materials whose refractive index is larger than 2, the focal points lie within the ball or drum lens (bfl < 0). Therefore, most optical materials used in the infrared spectrum, such as silicon, germanium, etc., are not suitable for ball or drum lenses. 4.2.3 Cylindrical lenses The configuration of a cylindrical lens is a portion of a cylinder flattened on one surface, such that the lens has curvature in one direction and no curvature in the direction perpendicular to the first. Thus, when a collimated beam of light is incident on a cylindrical lens, the beam will be focused into a line, known as a focal line, at the focal plane of the lens. This focal line is parallel to the intersecting line of the cylindrical surface and its tangent plane, as shown

Figure 4.7

Illustration of a drum lens.

154

Chapter 4

Figure 4.8

Illustration of a cylindrical lens.

in Fig. 4.8. In an imaging system, a cylindrical lens compresses the image in the direction perpendicular to the focal line and leaves it unaltered in the direction parallel to the focal line. This means that a cylindrical lens in an optical system only takes effect in the direction having curvature. Cylindrical lenses are useful as laser line generators or for focusing light into a slit. 4.2.4 Axicons An axicon, having a conical surface, is also known as a conical lens. An axicon can focus a collimated beam into a focal line along its optical axis [Fig. 4.9(a)] or transform it into a ring-shaped beam [Fig. 4.9(b)]. Axicons can be used in eye surgery where a ring-shaped spot is required.

Figure 4.9 Illustration of an axicon focusing a collimated beam into (a) a focal line or (b) a ring-shaped beam.

General Optical Components in Optical Systems

155

4.2.5 Aspheric lenses The configuration of an aspheric lens is a lens with complex curved surfaces rather than spherical surfaces. Using aspheric lenses in an optical image system can reasonably improve the image quality of the system using only a small number of optical elements. For example, spherical lenses are unable to effectively correct distortion in an optical imaging system with an ultrawide field of view in angle; if an aspheric lens is used in this system, the distortion can be removed or greatly minimized. One of the most notable benefits of aspheric lenses is their ability to correct spherical aberration. As discussed in Section 2.6, spherical aberration is proportional to the aperture of a spherical lens and is difficult to eliminate using lenses with spherical surfaces. Spherical aberration is much more easily corrected with an aspheric lens. This is explained in Fig. 4.10(b), which shows a spherical lens with significant spherical aberration compared to an aspheric lens with minimized spherical aberration [Fig. 4.10(a)]. As mentioned in Section 2.6, for a spherical surface, the incident angle of a ray far from the optical axis is larger than that close to the optical axis. Because of the nonlinearity of Snell’s law, spherical aberration of a spherical lens is inevitable. However, if the shape of the spherical surface can be gradually flattened in the region where the rays pass gradually farther from the optical axis, the incident angles of the corresponding rays will be reduced accordingly. Thus, spherical aberration can be reduced and all rays can be focused to a

Figure 4.10 Comparison of (a) a spherical and (b) an aspheric lens, specifying (c) the difference in shape and edge steepness (adapted from Ref. 3).

156

Chapter 4

common focal point, as shown in Fig. 4.10(a). The difference in shape between the spherical surface and the aspheric surface is shown in Fig. 4.10(c). Obviously, the edges of the aspheric surface are less steep than those of the spherical surface. A spherical surface can be described by only one parameter, the radius or curvature of the surface. However, an aspheric surface is usually defined by not only the curvature but also the conic constant and other coefficients. As shown in Fig. 4.11, the sag—the Z component of the displacement of the surface from the vertex—of an aspheric surface can be expressed as ZðrÞ ¼

Cr2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi þ A4 r4 þ A6 r6 þ A8 r8 þ : : : , 1 þ 1  ð1 þ KÞC 2 r2

(4.4)

where Z(r) is the sag of an aspheric surface, C is the curvature, r is the radial distance from the optical axis, K is the conic constant, and A4, A6, A8,. . . , are coefficients of the fourth-, sixth-, eighth- . . . order aspheric terms, respectively. When the higher-order aspheric coefficients, i.e., A4, A6, A8,. . . , are zero, the resulting aspheric surface takes the form of a rotationally conic surface with the sag given as ZðrÞ ¼

Cr2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : 1 þ 1  ð1 þ KÞC 2 r2

(4.5)

The shape of the actual conic surface depends on the magnitude and sign of the conic constant K, as shown in Table 4.1.

Figure 4.11 Sag of an aspheric surface.

General Optical Components in Optical Systems

157

Table 4.1 Conic constants for different conic surfaces. Conic Constant

Conic Surface

K¼0 K > 1 K ¼ –1 K < –1

Sphere Ellipse Parabola Hyperbola

Aspheric lenses have growing applications in optics, such as digital cameras, CD players, laser diode collimation, astronomical telescopes, etc., due to their distinct advantages. However, aspheric lenses are generally difficult to fabricate and test, and are therefore much more expensive compared to spherical lenses. Proposed guidelines for the use of aspheric lens can be found in Optical System Design3 by Fischer, Tadic-Galeb, and Yoder. Currently, due to technological advances, the fabrication and test of aspheric lenses has become easier, providing great convenience and freedom in the design of optical systems. 4.2.6 Plane-parallel plates If the curvatures of two surfaces of a lens become zero, the lens will become an ordinary plane-parallel plate. Next, the properties of plane-parallel plates used in collimated and converging light is discussed. When a light ray is incident on a plane-parallel plate in air with an incident angle u (see Fig. 4.12), the ray will exit from the other side of the plane-parallel plate in the same direction, but with a lateral displacement relative to the incident ray by an amount of d, given by

Figure 4.12

Lateral displacement of a ray by a plane-parallel plate.

158

Chapter 4

d ¼ t

sinðu  u0 Þ , cos u0

(4.6)

where t is the thickness of the plane-parallel plate, and u0 is the refractive angle, as shown in Fig. 4.12. If the incident angle is small, according to the Taylor expansion of the sine and cosine function, Eq. (4.6) can be rewritten as d¼

n1 tu, n

(4.7)

where n is the refractive index of the plane-parallel plate, and u is in radians. Thus, when a collimated beam of light traveling through a plane-parallel plate exits the plate, it is still collimated but with a lateral displacement. If a plane-parallel plate is placed in converging light, it will cause the focal point to move away from the plate along its optical axis, as shown in Fig. 4.13. The image displacement d along the optical axis is d ¼ [(n – 1)/n]t. 4.2.7 Optical wedges An optical wedge is a wedge that is added to a plane-parallel plate. Suppose that the refractive index of an optical wedge is n, and a very small wedge angle of the optical wedge is a. If a parallel ray is incident on the optical wedge, the exiting ray will deviate from the direction of the incident ray by an angle b (Fig. 4.14) according to b ¼ (n – 1)a.

4.3 Mirrors and prisms 4.3.1 Mirrors A mirror works by reflecting light off its surface and can change the propagation direction of light in an optical system. Mirrors are used in a wide

Figure 4.13 Image displacement of a focal point caused by a plane-parallel plate.

General Optical Components in Optical Systems

Figure 4.14

159

Illustration of an optical wedge.

range of fields, such as life sciences, astronomy, metrology, etc. They are categorized into two types according to the shape of the mirror surface: plane mirrors and curved mirrors. 4.3.1.1 Plane mirrors

A plane mirror can reflect light off its reflecting surface by what is called specular reflection. Figure 4.15 shows how the image of an object formed by a plane mirror is determined. If light rays from one point of an object hit a plane mirror, these rays are reflected by the mirror at angles obeying the law of reflection, and the backward extensions of these reflected rays are focused at

Figure 4.15

Imaging by a plane mirror.

160

Chapter 4

one point behind the mirror, producing a virtual image. Since each point of the object produces a virtual image, a virtual image of the object can be formed behind the mirror. The object and its virtual image are located at equal distances from the reflecting surface of the plane mirror, and the image appears the same size as the object. The only difference between the image and the object is that the image is a left–right reversal of the object, which means that the left side of the image is the right side of the object. Thus, a plane mirror is most often used to change the orientation of an image or to fold an optical system for better packaging. Thus, in addition to redirecting the path of light, specular reflection by a plane mirror changes the parity in the image from right to left, or from left to right. Moreover, it should be noted that an even (odd) number of reflections maintains (changes) the parity, a fact that will be used in the following discussion. 4.3.1.2 Curved mirrors

Curved mirrors can be roughly divided into two categories, concave and convex, depending on whether the centers of curvature of the curved surface are located on the left or right side of the reflecting surface, respectively. A concave mirror reflects light from infinity inwards to its focal point and can be used to focus light, as shown in Fig. 4.16(a). A convex mirror reflects light from infinity outwards, as shown in Fig. 4.16(b). The optical properties of concave and convex mirrors are similar to those of positive and negative lenses, respectively. Thus, the imaging formulas in Chapter 2 can also be used to describe images formed by concave and convex mirrors. Moreover, according to the law of reflection, the refractive indices of these mirrors are set to –1 for raytracing. In the paraxial region, the focal length of a curved mirror is one-half of the radius of its reflecting curved surface in magnitude. A spherical mirror with a spherical reflecting surface is a kind of curved mirror. Using the graphical raytracing method, Fig. 4.17(a) shows how a

Figure 4.16 (a) A concave and (b) a convex mirror.

General Optical Components in Optical Systems

161

Figure 4.17 Imaging by (a) a spherical concave mirror and (b) a spherical convex mirror.

spherical concave mirror forms a real, inverted image of an extended object, and Fig. 4.17(b) shows how a spherical convex mirror produces a virtual, erect image of an extended object. The reflecting surfaces can also be cylindrical, paraboloidal, hyperboloidal, or ellipsoidal, in addition to being spherical. Similar to the aspheric lenses presented in Section 4.2, aspheric mirrors have the same advantages and disadvantages as spherical mirrors in optical systems. We will not introduce each type of aspheric mirror in detail, but simply present illustrations of each. A cylindrical mirror (Fig. 4.18) reflects light rays into a focal line. A paraboloidal mirror (Fig. 4.19) focuses a parallel beam of light into a point. A hyperbolical mirror (Fig. 4.20) produces virtual images of objects at its focal point. An ellipsoidal mirror (Fig. 4.21), which has two focal points and is useful as a reflector, focuses light from one focal point to the other. Like lenses, spherical and aspheric mirrors also suffer from some common aberrations, including coma, field curvature, and distortion, but are free of chromatic aberration, one of optical aberrations experienced with lenses when used in broad-spectrum light. We should mention the concept of freeform optics because of its novelty and promising future. With the development of high-precision, single-point diamond turning and mass-replication technology, along with the increasing

162

Chapter 4

Figure 4.18 Illustration of a cylindrical mirror.

Figure 4.19

Illustration of a paraboloidal mirror.

use of plastics for optical components, optics manufacturing allows increasingly precise fabrication of surfaces with almost any shape (freeform). By removing restrictions on surface geometry, freeform optics is opening new possibilities in optical designs and optical systems. A well-designed, freeform lens or mirror surface might be able to accomplish multiple specific functions—such as the simultaneous correction of multiple aberrations—that would otherwise require several lenses or mirrors. This means that freeform surfaces of lenses or mirrors can help scientists and engineers develop optical systems having extremely compact structure and high performance.4,5 In fact, some freeform surfaces—such as headlights of cars for illumination and ophthalmic innovations—have been successfully used in commerce. However, lenses and mirrors with freeform

General Optical Components in Optical Systems

163

Figure 4.20 Illustration of hyperboloidal mirrors: (a) a convex mirror and (b) a concave mirror.

Figure 4.21 Illustration of an ellipsoidal mirror.

164

Chapter 4

surfaces are difficult to fabricate and test, and require newly proposed methods for fabrication and testing6,7 due to their freeform features. Nevertheless, these challenges are not dampening the interest of scientists in freeform optics due to its extremely promising outlook. 4.3.2 Prisms A prism is a transparent optical component with flat polished surfaces that refract or reflect light. In most optical systems, prisms serve one of two major purposes. The first is to disperse light, i.e., to separate light into its constituents with different wavelengths; the second is to displace, deviate, or give the proper orientation of a beam of light or an image. In this second type of use, the prism should be carefully arranged so that it does not disperse light. 4.3.2.1 Dispersing prisms

A dispersing prism in air is schematically shown in Fig. 4.22. Light is incident on one surface of the prism with apex angle A and leaves the prism from the other surface. The outgoing ray is not parallel to the incident ray, and the angle of deviation D is D ¼ ðu1  u01 Þ þ ðu2  u02 Þ,

(4.8)

where u1 and u01 are the incident and refractive angles, respectively, at the first surface of the prism; and u2 and u02 are the incident and refractive angles, respectively, at the second surface of the prism, as shown in Fig. 4.22. According to Snell’s law, the relationship between the incident and refractive angles can be expressed as

Figure 4.22 Diagram of the angle of deviation D for a dispersing prism.

General Optical Components in Optical Systems

165

sin u1 ¼ n sin u01 , 1 sin u2 ¼ sin u02 , n

(4.9)

where n is the refractive index of the prism material. From the geometrical relationship shown in Fig. 4.22, it can be seen that A ¼ u01  u2 :

(4.10)

As presented in Principles of Optics by Born and Wolf,8 when the ray in the prism makes equal angles with the respective normal of each face of the prism, i.e., u1 ¼ –u02 , the angle of deviation will be a minimum for any given wavelength. According to Eqs. (4.9) and (4.10), u01 ¼ –u2 ¼ A/2, and u1 ¼ –u02 ¼ arcsin[n sin(A/2)] in this situation. Thus, the angle of minimum deviation D can be expressed as 

  A D ¼ 2 arcsin n sin  A, 2

(4.11)

where D depends on only the apex angle of the prism and the refractive index of the prism material. A prism can perform dispersion of light because the refractive index n varies with the wavelength of light being refracted, as shown in Fig 4.23. (Dispersion of light by a prism was discussed in Chapter 2.) Note that a dispersing prism is generally oriented to the position where the central wavelength of the spectral range of the incident light has a minimum angle of deviation. This arrangement results in an almost linear relationship between the angle of deviation and the wavelength.

Figure 4.23

Illustration of a dispersing prism.

166

Chapter 4

4.3.2.2 Reflecting prisms

Reflecting prisms are used to redirect light at a desired angle. Most reflecting prisms reflect light using the principle of total internal reflection. According to the description in Chapter 2, total internal reflection is extremely efficient and occurs only when the incident angle exceeds the critical angle. If any reflecting surfaces of a prism cannot satisfy this critical condition, these surfaces must be coated with a metal to achieve high reflectivity. A reflecting prism is typically used for changing the orientation of an image or to fold an optical system for better packaging, as shown in Fig. 4.24, where a right-angle prism is used to fold an optical system. Moreover, because the number of reflections by this prism is one, the parity of the image is changed compared with that of the original. In general, two or more reflecting prisms are used together in an optical system to obtain an image with the intended orientation. Unlike the system shown in Fig. 4.24, in many practical optical systems, such as the Porro prism system shown in Fig. 4.25, the orientation of the image cannot be determined directly. Warren J. Smith proposed a useful technique9 to determine the image orientation based on the reflecting surfaces of the optical components in a system, including plane mirrors and reflecting prisms. Smith’s technique can be explained by envisioning the image as a transverse arrow (or pencil) that bounces off the reflecting surface when it meets that surface. Each part of the arrow is executed on a “first meet, first rebound” basis. As shown in Fig. 4.26 [in the sequence (a), (b), (c), (d)], the arrow first approaches and strikes the reflecting surface, then its head bounces off the reflecting surface and its tail continues in the original direction; the arrow is then in a new

Figure 4.24 Diagram of a right-angle reflecting prism folding an optical system.

General Optical Components in Optical Systems

167

Figure 4.25 Porro prism system.

Figure 4.26 Diagram of Warren’s technique for determining the orientation of a reflected image.

orientation [Fig. 4.26(d)] after this reflection. For the direction perpendicular to the plane of the paper, by repeating this process, the orientation of the image in this direction can also be determined. In this way, the orientation of

168

Chapter 4

the image according to this reflecting surface can be determined. When multiple reflecting surfaces exist in an optical system, the entire procedure should be repeated on each reflecting surface to determine the orientation of the image on the image plane. According to the technique presented above, the orientation of the final image of the Porro system has a 180-deg rotation compared with that of the original. Furthermore, as the number of reflections inside the Porro system is four, the parity of the image is maintained. Most prism systems can be considered as equivalent to a thick, folded glass block, such that a prism can be expanded as an equivalent planeparallel glass plate, e.g., the expansion of the Porro prism system as indicated in Fig. 4.27. There are many types of reflecting prisms. Next, we will introduce some of them. 4.3.2.3 Right-angle prism

The right-angle prism, with angles of 45-45-90 deg, is typically used to bend image paths or to redirect light at 90 deg, as shown in Fig. 4.28. In addition, the parity on the left side of Fig. 4.28 is maintained, and that on the right side of Fig. 4.28 is changed. In general, two right-angle prisms are used together to produce an erect image from an inverted one, as in the Porro prism system shown in Fig. 4.25.

Figure 4.27 Expansion of the Porro prism system.

Figure 4.28

Illustration of the expansion of a right-angle prism.

General Optical Components in Optical Systems

169

Figure 4.29 Illustration of a Dove prism.

4.3.2.4 Dove prism

A Dove prism is a type of reflective prism that is used to invert an image. As shown in Fig. 4.29, a Dove prism is shaped from a truncated rightangle prism. The working principle of the Dove prism can be simply described as follows. A beam of light, being incident on one sloped face of the prism in the direction parallel to its bottom, is first refracted downward to the bottom, then reflected by the bottom of the prism upward to the other sloped face of the prism, and finally refracted by this sloped face. After the second refraction, the light beam, with an inverted orientation, exits the prism parallel to the incident light beam. In this way, images passing through the prism are inverted but not laterally transposed while changing the original parity. The Dove prism is used almost exclusively in parallel light since it introduces some aberrations, such as astigmatism, if used in converging light. The Dove prism’s most important effect on the orientation of the image is that if the prism is rotated around the optical axis (dash-dotted line shown in Fig. 4.29), the image will rotate twice as fast as the prism. 4.3.2.5 Pentaprism

A pentaprism is used to deflect a beam of light by 90 deg, regardless of the incident angle of the entering beam. As shown in Fig. 4.30, the beam reflects twice inside the prism, and the image through it will be neither inverted nor reversed. Additionally, the pentaprism maintains the original parity. Since the reflection inside the pentaprism cannot satisfy the critical condition of total internal reflection, the two reflecting surfaces should be coated to improve their reflectivities.

170

Chapter 4

Figure 4.30

Illustration of a pentaprism.

4.3.2.6 Beam-splitting prisms

Another type of prism, the beam-splitting prism, is widely used in many optical systems, such as amplitude division interferometers, binocular microscopes, and adaptive optics systems. This prism divides a beam of light into two beams that travel in different directions with reduced intensity and the same diameter as the original beam. A beam-splitting prism is usually formed by two right-angled, isosceles prisms that are cemented together to form a cube, as shown in Fig. 4.31. The hypotenuse face of one of the prisms is deposited with a thin film designed to cause a portion (typically half) of the light to be reflected with the parity of this portion of light changing, and the remainder of the light to be transmitted. In addition to the cubic beam-splitting prisms introduced above, there are other types of beam-splitting prisms, such as the polarizing beam-splitting and dichroic prisms. Due to space limitations, these types of beam-splitting prisms are not described in this book.

Figure 4.31 Illustration of a cubic beam-splitting prism.

General Optical Components in Optical Systems

171

4.4 Diffractive Optical Elements Diffractive optical elements (DOEs), including diffraction gratings, binary optics, and holographic optical elements, are an important category in the realm of optical elements, and each has its own characteristics and merits, such as the example of using a binary surface for correction of chromatic aberrations in the IR regime as was presented in Section 2.6. In this section, some basic concepts of a DOE are presented. DOEs have two types of diffraction efficiencies: the absolute diffraction efficiency and the relative diffraction efficiency. The absolute diffraction efficiency is the ratio of the optical power of monochromatic light diffracted into a given order (diffraction order will be discussed in the next subsection) by a DOE to the optical power incident onto this DOE. The concept underlying the relative diffraction efficiency of a reflecting DOE is that the optical power diffracted into a given order accounts for the percentage of the optical power reflected from a plane mirror coated with the same material as the DOE coating. The relative diffraction efficiency of a transmitting (uncoated) DOE is the ratio of the optical power in a given diffraction order to the optical power transmitted by an uncoated substrate. Diffraction efficiency is an important measure of a DOE and depends on many parameters, such as angles of incidence and diffraction, the profile function of the DOE, the refractive index of the material of the DOE, the polarization and wavelength of the incident light, among others. The complete calculation of the DOE’s diffraction efficiency requires a rigorous analysis of Maxwell’s equations applied to the profile of the DOE’s diffractive surface. However, if the periodic structure of a DOE is several times the wavelength of the incident light, the scalar diffraction theory is valid. In this case, the diffraction efficiency of a DOE can also be calculated by scalar diffraction theory. The spectral dependency of diffraction efficiency, also known as polychromatic diffraction efficiency, means that the diffraction efficiency is dependent on the diffraction wavelength. For example, with regard to a diffractive singlet, the diffraction efficiency for a given diffraction order can be maximized only at a single wavelength. A DOE having high diffraction efficiency in the working order is required for detection of low levels of light. The diffraction efficiency of a DOE (e.g., a blazed grating) can be increased by effectively controlling the DOE’s groove shape. Imperfections in a DOE can lead to a reduction in diffraction efficiency. For example, periodic errors in a DOE’s groove spacing give rise to spurious orders, which reduce the resolution or contrast of the DOE between diffraction orders. In practical applications, it is very difficult or even impossible to fabricate a DOE in a continuous profile over small intervals. This limitation

172

Chapter 4

has led to the development of binary optics, a surface-relief optics technology that can produce a discrete approximation to the continuous profile of a DOE based on very large-scale-integration fabrication techniques. In addition to the diffraction efficiency parameters mentioned above, the diffraction efficiency for a DOE with a binary profile is also dependent on the number of levels of the binary profile; the more levels the higher the diffraction efficiency. For example, two levels can only produce a diffraction efficiency on the order of 40%, while eight or more levels will produce a diffraction efficiency of 95% or higher.10 More-detailed discussions of the diffraction efficiency of DOEs can be found in other books.11,12 The following subsections present a detailed examination of a simple and widely used DOE, the diffraction grating. A diffraction grating, classified as reflecting or transmitting, is an optical element with a periodic structure that disperses polychromatic light (e.g., white light) into its monochromatic components. In a reflecting grating, the periodic structure is etched on a highly reflective surface, while in a transmitting grating, the periodic structure is etched on a transparent substrate. The diffraction grating is of utmost importance in spectroscopy because of its property of highly efficient dispersion. Thus, diffraction gratings are commonly used in monochromators and spectrometers. 4.4.1 Principle of a grating and diffraction order The principle of a grating is presented by studying the Fraunhofer diffraction pattern of the simplest transmitting grating, which has many evenly spaced parallel slits etched on a transparent substrate. As shown in Fig. 4.32, a grating G is illuminated by a monochromatic parallel beam with an incident

Figure 4.32 Illustration of diffraction of the simplest grating.

General Optical Components in Optical Systems

173

angle of zero, and the diffraction pattern can be observed on a screen M. If all slits of the grating have the same width a, and the slits are arrayed with a distance d along the x axis, the distribution of the light field on screen M is the superposition of the light fields diffracted from all slits of the grating. The diffraction pattern can be expressed as8 " #" #2 u 2 sinðka sin sinðNkd2sin uÞ 2 Þ I ðPÞ ¼ I 0 , (4.12) ka sin u u sinðkd sin 2 2 Þ where P is a point on screen M, k is the wavenumber, u is the angle between the diffracted light andi the z axis, and N is the number of slits of the grating. h The term

u 2 sinðka sin 2 Þ ka sin u 2

in Eq. (4.12) is known as the single-slit diffraction

factor, which describes the effect of a single slit and represents the distribution of the intensity of the pattern of a single slit, as h diffraction i shown in Fig. 4.33(b). The term

sinðNkd2sin uÞ 2 u sinðkd sin 2 Þ

in Eq. (4.12) is known as the

multibeam interference factor, which represents the effect of the periodic distribution of multiple slits. When (kd sin u)/2 ¼ mp, the interference factor has the maximum value (corresponding to the bright fringe), and the integer m is known as the order of interference. Thus, the diffraction pattern generated by the grating can be considered as the result of an interference pattern of multiple beams modulated by the diffraction pattern of a single slit, as shown in Fig. 4.33. Note that Fig. 4.33 shows that d/a ¼ 2. If the ratio of d to a is changed, the diffraction pattern will be different. 4.4.2 Grating equation Figures 4.34(a) and (b) show the cross sections of a transmitting and a reflecting grating, respectively. The rays in these figures indicate the direction of the incident and diffracted light. The incident angle (the angle between the incident light and the normal to the grating) and the diffraction angle (the angle between the diffracted light and the normal to the grating) are denoted as a and b, respectively. The optical path difference Dd between light waves from two adjacent slits can be expressed as Dd ¼ dðsin a  sin bÞ, where d is the distance between the two adjacent slits and is called the grating constant. According to our understanding of interference (Section 3.3), if there is a maximum value (bright fringe), the optical path difference must be an integral multiple of the wavelength l: dðsin a  sin bÞ ¼ ml,

(4.13)

174

Chapter 4

Figure 4.33 Diagram of (a) the normalized multibeam interference pattern, (b) the normalized diffraction pattern of a single slit, and (c) the normalized diffraction pattern of a grating.

where m ¼ 0,±1,±2,. . . are orders of the grating, corresponding to the orders of the spectral lines (bright fringes). Equation (4.13), known as the grating equation, describes the relationship between the direction of incident light and the positions of spectral lines.

General Optical Components in Optical Systems

Figure 4.34

175

Principle of (a) a transmitting and (b) a reflecting grating.

4.4.3 Dispersion The main function of a diffraction grating is to spatially disperse light by wavelength. The measure of dispersion by a grating is the angular or spatial separation between light that has been diffracted into different wavelengths. If the incident angle a is regarded as a constant, the angular dispersion of a grating can be expressed by differentiating both sides of Eq. (4.13) as db m ¼ : dl d cos b

(4.14)

Then, including the focal length f of the converging optical component in the optical device having a grating, the spatial dispersion of the grating can be expressed as dl db mf ¼f ¼ : dl dl d cos b 4.4.4 Resolution of a grating The resolution of a grating specifies the ability to distinguish spectral lines of light having two neighboring wavelengths l and l þ Dl. In general, the spectral line generated by the grating has a limited width. According to the Rayleigh criterion, when the maximum of the spectral line for wavelength l coincides with the first minimum of the spectral line for another wavelength l þ Dl, the two spectral lines can just be distinguished, as shown in Fig. 4.35. Using the concept of optical path difference (OPD), this problem can be explained as follows. In order to have the maximum of the spectral line for wavelength l0 at angle b, if we are examining the mth-order beam, according to Eq. (4.13), the

176

Chapter 4

Figure 4.35 Resolution of a grating.

OPD of rays from the slits on both ends of the grating must equal Nml0 , where N is the number of slits of the grating. In other words, Nd(sina – sinb) ¼ Nml0 . For the other beam of wavelength l, we want to have a minimum at the angle b; i.e., we want to get the OPD of rays from slits of the both ends of the grating such that Nd(sina – sinb) ¼ Nml þ l. The way to make the spectral line have a minimum at angle b for this OPD is explained as follows. Consider the entire grating with a total length of Nd as a diffracting object and divide the grating into two parts (the upper part and the lower part). If the OPD of rays from both ends of the entire grating has increased an extra wavelength l, the optical fields from the upper and lower parts of the grating have the exact opposite phases. Therefore, the diffraction pattern has a minimum at angle b. From this explanation, we can determine that Nml þ l ¼ mNl0 :

(4.15)

If we let l0 ¼ l þ Dl, we find that l ¼ mN: Dl

(4.16)

The ratio l/Dl is called the resolution of the grating. From Eq. (4.16), for a fixed grating constant d, it can be concluded that given a certain wavelength, the larger the aperture size of the grating the better the grating resolution. This conclusion is consistent with the relationship between the resolution of an optical system and its aperture size. 4.4.5 Free spectral range When light with a wide range of wavelengths is dispersed by a grating, the spectra for different wavelengths might partially overlap at neighboring orders, as shown in Fig. 4.36. When the spectral line at order m þ 1 for wavelength l coincides with the spectral line at order m for wavelength

General Optical Components in Optical Systems

177

Figure 4.36 Diagram of the free spectral range of a grating.

l þ Dl, the spectral lines at order m for wavelengths between l and l þ Dl cannot overlap those at other orders for which the order number is smaller than m. Thus, the free spectral range, the range for which there is no overlap between l and l þ Dl, of the grating should satisfy mðl þ DlÞ ¼ ðm þ 1Þl, and can be expressed as Dl ¼

l : m

(4.17)

The value of the order m of the spectrum that is most-often used is quite small, typically one or two. Thus, the free spectral range of a grating can be very wide. For example, when the incident light has wavelengths greater than 380 nm, if first-order spectra are used, the free spectral range will be from 380 nm to 760 nm; if second-order spectra are used, the free spectral range will be from 380 nm to 570 nm. 4.4.6 Blazing As shown in Fig. 4.33(c), the most intense grating order is the zeroth, which carries no useful spectroscopic information. If the direction of diffraction can be changed, the zeroth order will fade and other useful orders, e.g., the first order, will be enhanced.

178

Chapter 4

Figure 4.37 Diagram of a blazed grating.

Figure 4.37 demonstrates a way to enhance the useful order and decrease the zeroth-order diffraction for a grating. The grating in this figure has sawtooth-shaped grooves that have a blazed angle g with respect to the surface of the substrate. If a parallel beam of light is incident on the grating with an incident angle a, the diffraction angle is b ¼ a – 2g. Using Eq. (4.13), we can obtain d½sin a  sinða  2gÞ ¼ ml: Thus, for properly chosen values of a and g, the intensity at the desired order can be enhanced. Figure 4.38 shows the 1D diffraction distribution of a blazed grating. There are many types of gratings classified by different criteria,13 such as plane and concave gratings, ruled and holographic gratings, transmission and reflection gratings, amplitude and phase gratings, triangular, sinusoidal, and trapezoidal shaped groove gratings, etc. We do not include a discussion of these gratings. Readers who are interested in different types of gratings can refer to the book by E. G. Loewen and E. Popov.13

4.5 Optical Filters The goal of an optical filter is to transmit or reject light having a certain wavelength or range of wavelengths. Optical filters have wide applications in the fields of fluorescence microscopy, spectroscopy, chemistry, astronomy, and others. In the following subsections, several types of optical filters are reviewed.

General Optical Components in Optical Systems

Figure 4.38

179

Diagram of the 1D diffraction distribution of a blazed grating.

4.5.1 Absorptive and interference filters Based on their working principles, optical filters can be generally classified as absorptive filters, interference (or dichroic) filters, Lyot filters, and Christiansen filters, among others. Here we introduce only the first two types, especially the interference filter due to its wide applications. Descriptions of other types of filters can be found in Handbook of Optical Engineering edited by Daniel Malacara and Brian J. Thompson.14 4.5.1.1 Absorptive filters

An absorptive filter is generally made of inorganic or organic materials that are deposited on a glass substrate. These materials can absorb some wavelengths of light while transmitting others. The spectral band of an absorptive filter is not regular because it is dependent on the characteristic absorption of the material that is coated on the substrate. 4.5.1.2 Interference filters

An interference filter is made by coating a series of optical films having different thicknesses and refractive indices on a glass substrate. Interference filters usually reflect light having unwanted wavelengths and transmit the remaining light. The films on the glass substrate form a sequence of reflective cavities. Due to the constraints of these cavities, the intensity of light with desired wavelengths is enhanced by interference, and the intensity of light with other wavelengths is diminished by interference. As shown in Fig. 4.39, the

180

Chapter 4

Figure 4.39 Principle of an interference filter.

function of these reflective cavities is the same as the working principle of the Fabry–Pérot interferometer introduced in Chapter 3, and the interference of light produced by these reflective cavities is mainly related to the incident angle of light. Thus, the transmissions of these interference filters are functions of the incident angle of light. 4.5.2 Optical filters with different functions Optical filters can also be classified as longpass filters, shortpass filters, bandpass filters, and neutral-density filters, again, named according to their functions. A brief introduction to these filters is provided. It should be noted that optical filters are usually described in terms of wavelength rather than frequency, as in the field of electronics. 4.5.2.1 Longpass filters

A longpass (or cut-on) filter is an optical filter that rejects light with shorter wavelengths and transmits light with longer wavelengths over the spectrum of interest (ultraviolet, visible, or infrared), as shown in Fig. 4.40. The cut-on wavelength for a longpass filter is defined as the wavelength at which its transmission is 50% of its peak transmission.

General Optical Components in Optical Systems

181

Figure 4.40 Diagram of the transmission of a longpass filter.

4.5.2.2 Shortpass filters

A shortpass (or cutoff) filter is an optical filter that rejects light with longer wavelengths and transmits light with shorter wavelengths over the spectrum of interest (usually in the ultraviolet and visible region), as shown in Fig. 4.41. The cutoff wavelength for a shortpass filter is also defined as the wavelength at which its transmission is 50% of its peak transmission. Shortpass filters are frequently employed in fluorescence microscopes. 4.5.2.3 Bandpass filters

A bandpass filter transmits only light with a desired wavelength band and blocks light of other wavelengths, as shown in Fig. 4.42. Two important parameters are associated with specifications of bandpass filters. One is the center wavelength, defined as the average of the two wavelengths at which

Figure 4.41 Diagram of the transmission of a shortpass filter.

182

Chapter 4

Figure 4.42 Diagram of the transmission of a bandpass filter.

the transmittance is 50% of the peak transmittance. The other one is the bandwidth, which is the full width in wavelength at the 50% of the peak transmittance. Comparing the diagrams of a shortpass filter and a longpass filter, it is easy see that the bandpass filter can be made by combining a longpass and a shortpass filter. Bandpass filters are often used in astronomy when the goal is to observe a certain process associated with specific spectral lines. In particular, almost all solar telescopes have bandpass filters for observing the Sun in a particular range of spectra. Narrow bandpass filters, or spike filters, selectively transmit a narrow wavelength band of light, typically between 0.2 nm and 8.0 nm. 4.5.2.4 Neutral-density filters

The neutral-density filter has a constant attenuation across a range of wavelengths and is used to reduce the intensity of light by reflection or absorption. The attenuation of the neutral-density filter is specified by the optical density defined as OD ¼  log10

I , I0

where I is the intensity of light attenuated by the neutral-density filter, and I0 is the intensity of light incident on the neutral-density filter. Neutral-density filters can be reflective or absorptive. In the visible band, the reflective type looks like reflective mirrors, and the absorptive type has a dark appearance when its optical density is high. Note that all optical filters have their own working spectral ranges. The working spectral range of an optical filter used in an optical system needs to be carefully considered.

General Optical Components in Optical Systems

183

4.6 Optical Fibers An optical fiber, which is a cylindrical dielectric waveguide, guides light through it to a desired location by total internal reflection. As shown in Fig. 4.43, an optical fiber mainly consists of a central core and a cladding, both of which are made of dielectric materials with different refractive indices. According to the mode of transition of the refractive index at the boundary between the core and cladding of the fiber, the fiber is classified as a step-index or a graded-index fiber. In the step-index fiber, the transition of the refractive index from the core to the cladding varies in an abrupt manner, and the refractive index n1 of the core is slightly higher than the index n2 of the cladding; in the graded-index fiber, this transition gradually varies from the maximum in the center of the core to the minimum at the boundary between the core and cladding. The distribution of the refractive index across the fiber for a step-index and a graded-index fiber is shown in Figs. 4.44(a) and (b), respectively. When light is coupled into a step-index fiber, the light is reflected into the core by total internal reflection when it encounters the boundary between

Figure 4.43

Cross-section profile of an optical fiber.

Figure 4.44 Transition of the refractive index between a core and a cladding for (a) a stepindex fiber and (b) a graded-index fiber.

184

Chapter 4

Figure 4.45

Diagram of the propagation of light in a step-index fiber.

the core and cladding, as shown in Fig. 4.45. In this way, light propagates inside the fiber. As total internal reflection only occurs when light encounters a boundary with an angle greater than the critical angle uc, as shown in the figure, only light entering the fiber from air with an incident angle smaller qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi than umax ¼ arcsinð n21  n22 Þ can propagate in the core of the fiber. The sine qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi of this angle is defined as the numerical aperture, denoted as NA ¼ n21  n22 of the fiber located in air. The numerical aperture of a fiber implies its lightgathering capability. However, in a graded-index fiber, because the refractive index of the core decreases gradually from the core to the cladding, the propagating direction of light continuously bends to the core of the fiber while approaching the boundary, as shown in Fig. 4.46. The light rays that enter the fiber with different angles can refocus at some points (such as points A, B, and C in Fig. 4.46), so the intermodal dispersion (which will be discussed in subsection 4.6.3) can be greatly reduced in the graded-index fiber by carefully designing the function of the refractive index transiting from the core to the cladding. 4.6.1 Multimode and single-mode fibers Here we introduce the concept of the electromagnetic mode of a fiber. Recalling the electromagnetic theory of light, when an electromagnetic wave with wavelength l is constrained, e.g., by two parallel barriers separated by a distance L, only a standing wave can exist, meaning that the pattern of the wave stays fixed in space, and the resonant condition l ¼ (2L)/m (m is an integer) is satisfied. This fixed-wave pattern is called a mode, and the different integral values of m represent the different modes.

Figure 4.46

Diagram of the propagation of light in a graded-index fiber.

General Optical Components in Optical Systems

185

An optical fiber constrains (in the transverse direction) the electromagnetic waves traveling along it. In this way, optical fibers can be classified as multimode or single-mode according to the number of modes of light waves along their cores. A multimode fiber can support several transverse modes of light waves along it, while a single-mode fiber can only support one transverse mode. Multimode fibers have large core diameters—usually much larger than the wavelength of light traveling in them—for supporting several modes of light waves. The modes of light waves in a multimode fiber can be roughly considered in terms of geometrical optics as the different propagation paths that light travels in the fiber, as shown in Fig. 4.45. All modes of light waves in a multimode fiber can be obtained by solving Maxwell’s equations with boundary conditions. The advantages of multimode fibers are their high coupling and transmission efficiencies. However, the intermodal dispersion in a multimode fiber is severe. Compared with a multimode fiber, a single-mode fiber has a relatively narrow core diameter, typically 3–5 mm for visible light and 8–10 mm for the infrared. Due to the narrow core diameter of the single-mode fiber, which is comparable to the wavelength of light traveling in it, the propagation of light in a single-mode fiber cannot be analyzed by geometrical optics, but only by Maxwell’s equation with boundary conditions. It is difficult to couple light into a single-mode fiber because of the small core diameter. However, because a single-mode fiber only supports one mode of light propagation, intermodal dispersion is avoided, and light waves can propagate a long distance in it. 4.6.2 Attenuation in fibers When light waves travel in fibers, their intensities are decreased more or less, and this reduction of light in intensity is known as attenuation. Attenuation in fibers is primarily attributed to absorption of impurities in the fibers and random scattering. 4.6.3 Dispersion of fibers Due to the dispersion of fibers, the duration of a light pulse traveling in a fiber is broadened, as shown in Fig. 4.47. Generally speaking, there are three types of dispersion in fibers. The most universal is chromatic dispersion. Chromatic dispersion, also known as material dispersion, occurs because

Figure 4.47 Diagram of dispersion of an optical fiber.

186

Chapter 4

the refractive index of the dielectric material slightly varies with the wavelength. Due to the intrinsic characteristic of the material, chromatic dispersion exists in both single-mode and multimode fibers. The second type of dispersion is intermodal dispersion. Intermodal dispersion, also known as modal dispersion, occurs because the different modes in fibers have different propagation constants k, leading to different optical paths. Intermodal dispersion only occurs in multimode fibers. An efficient way to reduce intermodal dispersion in a multimode fiber is to use a graded-index fiber. The last type of dispersion is intramodal dispersion, which occurs because the different frequency components of light have different group velocities even for the same transverse mode. Despite the fact that attenuation and dispersion of fibers degrade the input light waves to some extent, fibers have been widely used in many areas, including telecommunications and industry. Furthermore, many specialpurpose fibers, such as dispersion-compensated fibers, polarization-maintaining fibers, and photonic crystal fibers have been fabricated for particular applications. For example, photonic crystal fibers with periodic transverse microstructures along their lengths can avoid attenuation and dispersion by confining light waves in their hollow cores.

4.7 Optical Detectors Optical detectors play a very important role in optical systems. Most optical systems require optical detectors either to record intensity distributions or to measure optical powers or energies at the detecting planes of optical systems. In this section, several types of optical detectors and some frequently used parameters are introduced. Some of the contents of this section are adapted from Ref. 15. 4.7.1 Types of optical detectors Optical detectors can be divided into two categories. One is thermal detectors, which absorb the energy of the incident light, convert the absorbed energy into heat, and measure the changes in temperature. The other is photon detectors, which convert the photon energy into voltages or electric currents by the photoelectric effect. 4.7.2 Thermal detectors Two types of thermal detectors are presented here: thermoelectric and pyroelectric detectors. A thermoelectric detector used together with an attached thermocouple measures the amount of heat in a material. The thermocouple generally

General Optical Components in Optical Systems

187

consists of two dissimilar metals joined at two junctions. One junction of the thermocouple is attached to the absorbing surface of the material, and the other, the reference junction of the thermocouple, is held at a constant temperature. When light is incident on the absorbing surface, the temperature difference between the two junctions of the thermocouple produces a voltage that acts as a measure of the intensity of the incident light. The drawback of the thermoelectric detector is its slow response time. The pyroelectric detector is based on the pyroelectric effect, which is the change of the dipole moment of a ferroelectric material caused by heat converted from the absorbed energy of incident photons. When a pyroelectric material is illuminated, due to the pyroelectric effect of the pyroelectric material, there is a voltage across the material that can be taken as a measure for the energy of the incident light. Compared with the thermoelectric detector, the pyroelectric detector has a shorter response time. Since the mechanism of a thermal detector involves sensing a change in the parameters associated with the temperature and caused by absorbing incident light, the response of a thermal detector is independent of the wavelength of the incident light. 4.7.3 Photon detectors Photon detectors operate based on the photoelectric effect, in which photoelectrons are released from the surface of a metal, as discussed in Chapter 1. The release of photoelectrons requires that the incident photons have sufficient energies. Meanwhile, as the energy of a photon is inversely proportional to its wavelength, there should be a long-wavelength cutoff, at which the response of a photon detector rapidly drops to zero due to the photon having insufficient energy to free an electron. According to different physical effects, photon detectors can be divided into photoemissive detectors, photoconductive detectors, and photovoltaic detectors. 4.7.3.1 Photoemissive detectors

A photoemissive detector has a cathode coated with a metal undergoing the photoelectric effect to release photoelectrons when it is illuminated, and an anode to receive photoelectrons emitted from the cathode. The photomultiplier is a typical photoemissive detector. In addition to a cathode and an anode, a photomultiplier also has a series of dynodes to which incrementally high voltages are applied, as shown in Fig. 4.48. Photoelectrons released by incident light on the surface of the cathode are successively accelerated by the series of dynodes. Each dynode, having a secondary electron emission, releases two electrons from its surface for each incident electron. Finally, a large number of electrons reach the anode of the photomultiplier and produce a relatively large current in a peripheral circuit. Photomultipliers are

188

Chapter 4

Figure 4.48 Diagram of a photomultiplier (adapted from Ref. 15).

highly sensitive and widely used for low-light detection. However, due to the rapidly dropping responses at the cutoff wavelengths, their applications are limited to the wavelengths ranging from the ultraviolet to the near infrared. 4.7.3.2 Photoconductive detectors

Photoconductive detectors rely on changes in the electrical conductivities of materials, which are attributed to the generation of charge carriers caused by absorbing incident photons. Semiconductor materials are primarily used for photoconductive detectors due to the controllability of their conductivities. In thermal equilibrium, the conductivity of a semiconductor is low due to the small number of charge carriers (electrons and holes) in it. However, if a semiconductor is illuminated by light whose photon energy is larger than the energy gap between the valence band and the conduction band, some electrons in the valence band will transit to the conduction band by absorbing the incident photons. The increase in the population of charge carriers leads to an increase in electrical conductivity of the semiconductor, which can generate an electric current in the peripheral circuit. By recording the generated electric current, the energy of incident light can be measured. Some photoconductive semiconductors, such as CdS and CdSe, are commonly used to detect light in the visible range, and others, such as PbS and HgCdTe, are used to detect light in the infrared region. The working wavelength ranges of different photoconductive detectors are only dependent on the energy gaps of semiconductors. 4.7.3.3 Photovoltaic detectors

A photovoltaic detector contains a p–n junction. A p–n junction is made up of two types of semiconductors: the first is a p-type, which is an intrinsic semiconductor (containing no impurities) doped with an element having a deficiency in valence electrons, resulting in mobile holes as the majority

General Optical Components in Optical Systems

189

carriers; the other is an n-type, which is an intrinsic conductor doped with an element having excess valence electrons, resulting in mobile electrons as the majority carriers. When these two types of materials are joined together, the holes in the p-type semiconductor and the electrons in the n-type semiconductor diffuse to the opposite side of the junction due to concentration diffusion, leading to a negatively charged region on the p-side and a positively charged region on the n-side. The two charged regions form a potential electric field pointing from the n-side to the p-side. When the p–n junction is illuminated by light, a large number of electrons and holes are generated and are drawn to the n-side and p-side, respectively, by the potential field to form a voltage that is proportional to the intensity of incident light. The photodiode, which is a typical photovoltaic detector, is widely used in photonic applications. For example, the segmented quadrant detector, using four photodiodes as its four quadrants, is commonly used for tip/tilt sensing in adaptive optics. As shown in Fig. 4.49, when the detector is illuminated, each quadrant of the detector produces a voltage that is proportional to the intensity of light in that quadrant. The lateral displacements of the incident ðV 1 þ V 2 ÞðV 3 þ V 4 Þ 2 þ V 3Þ light can be estimated by ðVV11þþVV42ÞðV þV 3 þ V 4 on the x axis, and V 1 þ V 2 þ V 3 þ V 4 on the y axis, where V1, V2, V3, and V4, are the voltages in the four quadrants. To increase the sensitivity of the photodiode, a sufficiently high reversebias voltage is applied, which stimulates a secondary electron emission. Then, similar to the operation of the photomultiplier, electron multiplication occurs. This type of photodiode is called an avalanche photodiode and is used for low-light detection due to its high sensitivity.

Figure 4.49

Diagram of a segmented quadrant detector.

190

Chapter 4

4.7.3.4 Detector arrays

Detector arrays were developed to record the intensity distribution at the 2D detecting plane and are more complicated than the detectors discussed above. Here, a brief introduction to two typical detector arrays, charge-coupled devices (CCDs) and complementary metal–oxide–semiconductor (CMOS) sensors, are presented; more-detailed discussions of CCDs and CMOS sensors can be found in Refs. 16 and 17. The CCD is commonly used to record the intensity distribution at the imaging plane of an optical system. A typical diagram of a CCD is shown in Fig. 4.50. In the column direction, boundaries achieved by applying sustained and different voltages on pixels in this direction limit the charges to movement between two adjacent pixels. In the row direction, the movement of charges between adjacent pixels is known as the charge transfer in the CCD, which is commanded by applying voltages on pixels controlled with a clock signal. The operation of CCDs can be divided into three steps:18 (1) charge generation and collection, (2) charge transfer, and (3) amplification. 1. Charge generation and collection. The pixels of a CCD, which are thick oxide layers evaporated with metal electrodes, are grown on a piece of a semiconductor doped with impurities.17 As shown in Fig. 4.51, each pixel has three associated gates (subpixel-sized electrodes), allowing for individual application of different voltages controlled by clock circuits.16 These gates provide the feasibility of collecting and storing charges for each pixel. When light illuminates

Figure 4.50

Diagram of a CCD array.

General Optical Components in Optical Systems

191

Figure 4.51 Diagram of (a) charge collection and (b) transfer in a CCD.

the CCD array, charges (electrons) are generated in each pixel. As shown in Fig. 4.51(a), once charges are generated for each pixel, they are collected and stored in the corresponding potential wells formed by applying a set of voltages (that satisfy V1 < V2 < V3) to three gates during the exposure time. As electrons move from low potential to high potential, the generated electrons are stored in gate 3 of each pixel in the CCD. 2. Charge transfer. When the exposure ends, another set of voltages satisfying V2 < V3 < V1 and controlled by the same clock circuit is simultaneously applied to the three gates of each pixel. Similar to the electron movement in step 1, electrons move from gate 3 of one pixel to gate 1 of the next pixel, as shown in Fig. 4.51(b). In this way, charges are transferred from one pixel to the next along the row direction, then are further transferred to the charge shift register (shown in Fig. 4.50). 3. Amplification. Charges transferred to the charge shift register are amplified and converted into current signals, typically, by field effect transistors. CCDs are currently the best imaging devices due to their low readout noise. However, the signal readout method greatly decreases the CCD speed of response. Moreover, CCDs are highly expensive, especially those with a rapid response time for low-light detection. The CMOS sensor—with a grid of photodiodes and a readout circuitry produced on a single semiconductor wafer, meaning that each pixel has its individual readout amplifiers17—has faster response than the CCD and is lower in cost. Although CMOS sensors

192

Chapter 4

have poorer performance than CCDs in terms of image quality (especially in the detection of low light), CMOSs are more flexible, less expensive, and widely used in many areas. 4.7.4 Performance characteristics The performance of optical detectors is generally evaluated by using technical parameters frequently provided in manufacturers’ manuals. In order to select an appropriate optical detector for a special application, the implications of the technical parameters must be understood. Some frequently used parameters are listed below. Responsivity is generally defined as the ratio of the output current/voltage of the detector to the input power of light. The units for responsivity are amperes/watt (A/W) or volts/watt (V/W). Responsivity is used for estimating the magnitude of the output of a detector in different applications. Noise equivalent power (NEP), being dependent on the characteristic of noise, is defined as the incident optical power on a detector that produces an output equal to the noise of the detector in a 1-Hz bandwidth. NEP, given in the units of W/Hz1/2, describes the performance of a detector in detecting low light in the presence of noise. Any undesired signal from the output of the detector can be considered as noise. This noise is random, meaning that the detector output is unpredictable at any instant. Generally, there are three types of noises. The first type is called Johnson noise (or thermal noise) and is generated by thermal fluctuations in the detecting material. To reduce Johnson noise, the detector should be used in low temperatures. The second noise is called shot noise, which results from the discreteness of the photons to be detected. The third noise is the 1/f noise or modulation noise, which is inversely proportional to the frequency of the signal. Detectivity, generally represented by D*, is defined as the ratio of the square root of the area of a detector to its corresponding NEP. As D* is independent of the area of a detector, it shows the intrinsic quality of the material used for the detector. Quantum efficiency is defined as the ratio of the number of photoelectrons produced by photons incident on a detector to the number of incident photons. High quantum efficiency in a detector means a high efficiency of detection. Spectral response is the response of the detector with acceptable quantum efficiencies as a function of the wavelength of incident light. This parameter characterizes the spectral response range of a detector.

General Optical Components in Optical Systems

193

Response time is a measure of the response speed of a detector. It is characterized by the minimum interval of time over which a detector can just discern its response to the variation in light intensity. Linearity is defined as the degree of linearity between the output of a detector and the intensity of incident light. The lowest detecting limit of a detector is determined by noise, and the upper limit of the detector is determined by the saturation value of the detector output. A detector operating in the range between the lower and the upper limits should be linear to ensure the accuracy of measurements of incident light.

References 1. O. Svelto, Principles of Lasers, Fifth Edition, Springer, New York, Dordrecht, Heidelberg, London (2010). 2. http://www.edmundoptics.co.uk/optics/optical-lenses/ball-condenser-lenses/ drum-lenses. 3. R. E. Fischer, B. Tadic-Galeb, and P. R. Yoder, Optical System Design, Second Edition, McGraw Hill, New York and SPIE Press, Bellingham, Washington (2008). 4. F. Duerr, Y. Meuret, and H. Thienpont, “Potential benefits of free-form optics in on-axis imaging applications with high aspect ratio,” Optics Express 21(25), 31072–31081 (2013). 5. J. Rolland and K. Thompson, “Freeform optics: Evolution? No, revolution!” SPIE Newsroom, 19 July (2012) [doi: 10.1117/ 2.1201207.004309]. 6. K. Fuerschbach, K. Thompson, and J. Rolland, “A new generation of optical systems with f-polynomial surfaces,” Proc. ASPE 51, 3–7 (2011). 7. S. DeFisher, T. Lambropoulos, S. Bambrick, and S. Shafrir, “Metrology with a non-contact probe for manufacturing freeform optics,” Proc. ASPE 51, 34–37 (2011). 8. M. Born and E. Wolf, Principles of Optics, Seventh Edition, Cambridge University Press, Cambridge (1999). 9. W. J. Smith, Modern Optical Engineering, Fourth Edition, McGraw-Hill, New York (2008). 10. “ASAP Technical Guide, Diffraction Gratings and DOEs,” Breault Research Organization, Inc., Tucson, Arizona (2007). 11. Y. G. Soskind, Field Guide to Diffractive Optics, SPIE Press, Bellingham, Washington (2011) [doi: 10.1117/3.895041]. 12. C. Palmer and E. Loewen, Diffraction Grating Handbook, Newport Corporation, Irvine, California (2005). 13. E. G. Loewen and E. Popov, Diffraction Gratings and Applications, Marcel Dekker, Inc., New York (1997).

194

Chapter 4

14. D. Malacara and B. J. Thompson, Handbook of Optical Engineering, Marcel Dekker, Inc., New York (2001). 15. J. Ready, “Optical detectors and human vision,” in Fundamentals of Photonics, C. Roychoudhuri, Ed., SPIE Press, Bellingham, Washington, pp. 211–233 (2008). 16. S. B. Howell, Handbook of CCD Astronomy, Second Edition, Cambridge University Press, Cambridge (2006). 17. G. H. Rieke, Detection of Light from the Ultraviolet to the Submillimeter, Second Edition, Cambridge University Press, Cambridge (2003). 18. J. W. Beletic, “Optical and infrared detectors for astronomy,” in Optics in Astrophysics, R. Foy and F. Foy, Eds., Springer, Dordrecht (2005).

Chapter 5

Case Study 1: Confocal Microscopes The confocal microscope is an optical instrument that can obtain a highresolution 3D image of the microscopic structures of an object. In this chapter, the confocal microscope is examined—from its working principle to its optical layout—using mainly geometrical optics. With this example, and using knowledge of geometrical optics, readers can acquire some skills to understand the working principles of other optical systems. In the following sections, first, the fundamentals of a standard optical microscope are summarized in order to help readers understand the confocal microscope; next, the basic principles of the confocal microscope and some of its main components are provided; finally, two typical confocal microscopes are presented.

5.1 Fundamentals of Standard Optical Microscopes A standard optical microscope is an optical system that provides a magnified image of the microscopic structures of an object. Standard optical microscopes are generally used in biology, medicine, material science, and the integrated circuit industry. 5.1.1 Configuration and characteristics of the standard microscope The configuration of a standard optical microscope is shown in Fig. 5.1 and consists of an objective, an eyepiece, and an illumination system. A specimen to be observed is placed on the object plane of the objective. The objective produces a real, inverted, and magnified image of the specimen. Then, the eyepiece further magnifies this real, inverted, magnified image and produces a virtual, inverted image of the specimen. Thus, the image of the specimen has been magnified twice after passing through the microscope, and

195

196

Chapter 5

Figure 5.1

Basic configuration of a standard optical microscope.

the magnification m of the microscope is the product of the magnification mo of the objective and the magnification me of the eyepiece: m ¼ mo me :

(5.1)

According to the transverse magnification formula [Eq. (2.10)] introduced in Chapter 2, mo ¼ nl 0o /(n0 lo), where n and n0 are the refractive indices of media in object and image space of the objective, respectively; and –lo and l 0o are the object and image distance, respectively, in relation to the objective, as shown in Fig. 5.1. The image space of the objective in a microscope is commonly in air; thus, n0 ¼ 1. The magnification of the eyepiece is also known as the visual magnification. The visual magnification of the eyepiece is defined as the ratio of angle A to angle B, if A is the angle subtended by the virtual image produced by the eyepiece to the naked eye, and B is the angle subtended by the real image produced by the objective to the naked eye at the least distance of distinct vision (generally 250 mm). Usually, the real image produced by the objective is located at the object-side focal plane of the eyepiece, and under paraxial approximations the magnification of the eyepiece can be simply expressed as me ¼ L/feye, where L is the least distance of distinct vision (250 mm), and feye is the effective focal length of the eyepiece. Thus, the magnification of the standard microscope can be rewritten as m ¼ mo me ¼ 

nl 0o L  : lo f eye

5.1.1.1 Field

A field stop is usually located at the image plane of the objective in a standard microscope, as shown in Fig. 5.1. If the image size of the specimen is larger than the field stop, the portions outside the field stop cannot been seen

Case Study 1: Confocal Microscopes

197

through the eyepiece. Thus, the field of the microscope is defined by the diameter of the field stop as y ¼ DF ∕mo ,

(5.2)

where DF is the diameter of the field stop, and mo is the magnification of the objective. In general, the field of the microscope is given in length and defines the area of the specimen. 5.1.1.2 Resolution

Since the eyepiece magnifies the image produced by the objective, the image cannot provide more detail than that produced by the objective. Thus, the resolution of a microscope is determined by its objective. The resolution of the objective is ultimately limited by the diffraction of light. According to Eq. (3.20) in Chapter 3, the angular radius of the Airy disk produced by the objective is inversely proportional to the size of the clear aperture of the objective. Correspondingly, the radius of the Airy disk in the image plane is given by d 0 ¼ 1.22 

ll 0o , D

(5.3)

where l is the working wavelength, D is the diameter of the clear aperture of the objective, and l 0o is the image distance. According to the Rayleigh criterion presented in Chapter 3, when the interval between two Airy disks equals d0 , the corresponding two points can be just discerned. In a microscope, the distance between these two points in object space is the lateral resolution. If the transverse magnification of the objective is denoted as mo, the lateral resolution can be obtained by d¼

d0 ll 0o , ¼ 1.22  mo Djmo j

where |mo| is the absolute value of mo. Using the magnification expressed by Eq. (2.10), we get d ¼ 1.22 

ln0 jl o j , Dn

where lo is the object distance for the objective, |lo| is the absolute value of lo, and n and n0 are the refractive indices of the media in object and image space of the objective, respectively. For a microscope, the image space of the objective is commonly in air, and n0 ¼ 1. Moreover, as shown in Fig. 5.2, if the angle of inclination for the marginal ray is denoted as U, we

198

Chapter 5

Figure 5.2

Illustration of the resolution of an objective.

have sinU ≈ D/(–2lo). Substituting this relation into the equation above, we obtain d¼

0.61l , n sin U

where nsinU is commonly called the numerical aperture (NA) of the objective. Finally, the lateral resolution of the objective can be given by d¼

0.61l : NA

(5.4)

Therefore, the lateral resolution of an objective is determined by its numerical aperture and the working wavelength. 5.1.2 Main elements of standard optical microscopes The main elements, including the illumination system, the eyepiece, and the objective, of a standard optical microscope are presented next. In a standard optical microscope, the eyepiece and objective are usually interchangeable for acquiring different resolutions and magnifications for different specimens. Therefore, the tube length, which is the distance from the second focal point of the objective to the first focal point of the eyepiece, of the standard optical microscope is usually a constant for each manufacturer (e.g., the standard tube length for Zeiss is 160 mm). 5.1.2.1 Illumination system

The illumination system in a standard microscope illuminates the specimen to be observed. Thus, the field of the illumination system should be slightly larger than that of the microscope. A proper illumination system can improve the contrast of the image produced by the objective. There are many different illumination systems, such as Abbe illumination, Nelson illumination,

Case Study 1: Confocal Microscopes

199

Koehler illumination, and others. More details on microscope illumination systems can be found in the book Optical Design of Microscopes.1 5.1.2.2 Objective

The objective is a critical component of an optical microscope that determines the resolution of that microscope. It can be characterized by two parameters, its magnification and the NA. The magnification of an objective typically ranges from 5 to 100, while the NA typically ranges from 0.14 to 0.7. Generally speaking, an objective with a higher magnification also has a higher NA. According to Eq. (5.4), a higher NA value is needed for a higher resolution. Thus, oil-immersion or water-immersion objectives have been fabricated to increase the NAs of the objectives. These objectives are used with immersion oil, water, or other index-matching materials whose refractive indices are larger than that of air. The NA of an oil-immersion objective is typically larger than 1. Because the magnification and the NA are important parameters of an objective, they are usually labeled on the objective barrel. The design of the objective is generally one of the main topics in most books on optical system design. Readers who are interested in learning more about the objective can refer to books such as Refs. 1–3. 5.1.2.3 Eyepiece

The function of the eyepiece in an optical microscope is to further magnify the image produced by the objective. As shown in the setup of Fig. 5.3, the eyepiece produces a magnified image at infinity for a relaxed eye, requiring that the image produced by the objective be located near the object-side focal plane of the eyepiece. The magnitude of magnification of the eyepiece depends

Figure 5.3

Schematic of an eyepiece.

200

Chapter 5

on its focal length. Commonly, the eyepiece consists of a set of lenses labeled with the value of the visual magnification. Several properties of the eyepiece should be understood before using or designing an eyepiece. The eyepiece can be considered as an optical system, and its entrance and exit pupils are always outside of itself. The entrance pupil of the eyepiece in a microscope usually lies on the objective. The exit pupil of the eyepiece should be located a proper distance behind its last surface since the eye needs to be held at a certain distance behind the last surface of the eyepiece. The distance between the last surface and the exit pupil of an eyepiece is called the eye relief. It is generally recommended that the eye relief be larger than 10 mm to accommodate the eyelashes of the observer. There are a variety of eyepieces for microscopes, such as Huygens’ eyepieces, Ramsden eyepieces, and Kellner eyepieces, among others. Details on the configurations of these eyepiece types can be found in other books.2,3 It should be noted that the standard optical microscope can be used to observe or image a thin specimen lying on the object plane (close to the objectside focal plane) of the objective. However, if a standard optical microscope images a thick specimen, the thickness of the specimen along the optical axis will cause the portions of the specimen not lying on the object plane to produce a blurred image on the image plane. This blur is due to defocus, as shown in Fig. 5.4. Thus, the image produced by the microscope in this case is blurred. A 3D image of a thick specimen is needed in many applications. Since a standard optical microscope can only produce an image of a flat, thin specimen, a thick specimen must be cut into many thin slices, each of which is imaged separately. Then a 3D image of the thick specimen can be synthesized from the images of all thin slices. For this reason, if the specimen is in vivo, the standard optical microscope is not an appropriate tool for imaging or observing it. To overcome the problem of defocus occurring when imaging a thick specimen, a new type of optical microscope has been developed. The confocal microscope can produce a 3D image with high resolution of a specimen in vivo.

Figure 5.4

Influence of defocus on the resolution of the image.

Case Study 1: Confocal Microscopes

201

5.2 Confocal Microscopes The idea underlying modern confocal microscopes derives from a scanning optical microscope for biological imaging designed by Young and Roberts4 in 1951. A few years later, in 1957, Minsky invented a confocal microscope to obtain a 3D image of the brain.5 In recent years, new developments in laser, CCD, and electronics technologies have greatly advanced the development of confocal microscopes as well as improved their performance. 5.2.1 Principles of confocal microscopes and their configurations A confocal microscope, which is also called a confocal scanning optical microscope (CSOM), is a type of scanning optical microscope. The basic setup of the confocal microscope is two pinholes, one behind the light source to generate a point source and the other just in front of the detector, located on two conjugate planes; the images of these two pinholes are confocal at a single point in the specimen (thus, the term confocal microscope).6 With this configuration, the confocal microscope images only the illuminated spot in the specimen, and the resolution is greatly improved because the stray light from other defocused points in the specimen is blocked by the pinhole in front of the detector. Then each point of the specimen is imaged sequentially, and all of these images are combined to reconstruct a 3D image of the specimen. The stage-scanning confocal microscope invented by Marvin Minsky5 is taken here as an example to explain the principle of the confocal microscope in detail. The configuration of the stage-scanning confocal microscope is illustrated in Fig. 5.5. Light from the point source produced by pinhole P1 is focused on spot S1 in the specimen by lens L1. Then, the scattered light from spot S1 is focused by the objective through pinhole P2 onto the detector. Note that pinholes P1 and P2, and spot S1 in the specimen are conjugate to one another. Thus, pinhole P2 blocks much of stray light from other points (e.g., spot S2) in the specimen, and only allows light from S1 to pass through it.

Figure 5.5

Schematic of a stage-scanning confocal microscope (adapted from Ref. 6).

202

Chapter 5

In this way, the confocal microscope illuminates and detects only one single spot (at the object plane of the confocal microscope) of the specimen at a time. Thus, the confocal microscope allows for simultaneous enhancement of axial resolution and transverse resolution. Typically, the resolution of a confocal microscope is about 1.4 higher than that of a standard optical microscope.6 By scanning the illuminated spot in the specimen point by point, or scanning the specimen in both the axial and the transverse directions, a complete 3D image can be reconstructed. Most modern confocal microscopes work in reflection mode, as shown in Fig. 5.6. In reflection mode, only one objective is used both to illuminate a spot in the specimen and to image the illuminated spot. Importantly, this reflected mode is convenient for observing a specimen in vivo. 5.2.2 Main components of confocal microscopes In this subsection, the key components, such as light sources, pinholes, objectives, detectors, and scanning systems, of a confocal microscope are briefly described. Of course, there are many other important components, such as optical filters, beam splitters, and transfer lenses, which are described in Chapter 4 and thus will not be presented here. 5.2.2.1 Light sources

Light sources in confocal microscopes illuminate the specimen to be observed or imaged. Different types of light sources are used in different types of confocal microscopes. Light sources used in confocal microscopes can be divided into the coherent and incoherent source types categorized in Section 4.1. The most common example of coherent sources are lasers (see Section 4.1). Since lasers can provide extremely bright, monochromatic, and stable light beams, they are commonly used in confocal microscopes. Most modern

Figure 5.6

Schematic of a reflection-mode confocal microscope.

Case Study 1: Confocal Microscopes

203

confocal microscopes use one or more lasers as light sources to provide a wider selection of working wavelengths.6 On the other hand, common examples of incoherent sources are filament lamps, fluorescent lamps, and gas discharge lamps. In confocal microscopes, the characteristics of the light source, such as its wavelength band, intensity, intensity stability, and noise, are very important. An illumination light having different wavelengths is generally required for viewing/imaging different specimens with confocal microscopes. The increase in the intensity of illumination results in an increased signal-to-noise ratio (SNR) and a sharp image. However, illumination with intensity that is too high can damage fluorescent molecules implanted in a specimen, or even damage the specimen itself. Thus, care must be taken when increasing the power of a light source is required. Furthermore, since a change in the intensity of a light source can be mistakenly interpreted as a change in the optical characteristics of a specimen, the intensity stability of the light source is the most important requirement. Another very important requirement when illuminating a specimen is that the light beam from the light source be expanded in diameter to overfill the entrance pupil of the objective in order to obtain better resolution. 5.2.2.2 Objectives

The objective is undoubtedly the most important component of a confocal microscope. The characteristics of the objective have significant effects on the image quality of this microscope type. The objective of the confocal microscope should be suitable for scanning thick specimens and for obtaining a 3D image with a high resolution. Thus, the objective commonly has a long working distance, a high NA, a planar image field, and very low axial chromatic aberration. In most confocal microscopes, the objective is used for both illumination and imaging. Thus, aberrations of the objective will seriously deteriorate the image quality. For this reason, aberrations, especially spherical aberration and axial chromatic aberration, must be reduced to a minimum value to obtain distinct images. 5.2.2.3 Pinholes

A confocal microscope usually has two pinholes called the illumination pinhole and the detector pinhole. The illumination pinhole forms a point source for illuminating the specimen. The detector pinhole is usually placed in front of the detector to convert an area detector into a point detector. These two pinholes have an important effect on both the axial and transverse resolutions of the confocal microscope. If the pinhole is too large, a confocal microscope becomes a standard one. The stray light from out-of-focus spots of the specimen will contribute to the image, and the

204

Chapter 5

resolution of the confocal microscope will become worse. Conversely, if the pinhole is too small, light reaching the detector is reduced, and the SNR decreases, but the resolution of the confocal microscope is enhanced. In short, a larger pinhole results in lower resolution and a higher SNR, while a smaller pinhole results in higher resolution and a lower SNR. Therefore, the size of the pinholes should be a compromise involving a tradeoff between the resolution and the SNR. In practice, the pinhole should be smaller than the central spot of the Airy pattern, typically the same as the full-width-at-half-maximum power of the Airy pattern,7 or 50–75% of the size of the central spot of the Airy pattern.6 According to Eqs. (5.3) and (5.4), the size of the Airy pattern is determined by the working wavelength and the NA of the objective; therefore, the sizes of the pinholes are also decided by these two parameters. Note that the center of the detector pinhole should be aligned with the Airy pattern; otherwise, the resolution will be degraded. 5.2.2.4 Detectors

According to the type of confocal microscope, a point detector or a 2D detector is required. Most types of detectors described in Section 4.7, such as the photomultiplier tube (PMT) and the charge-coupled-device (CCD) array, can be used in confocal microscopes. The PMT is the most common detector used in confocal microscopes due to its sensitivity, stability, ultrafast response, high bandwidth, and relatively low cost.6 5.2.2.5 Scanning systems

The scanning system in a confocal microscope is necessary to build the image of the specimen point by point. Two types of scanning systems are used in confocal microscopes. One is the specimen-scanning system, and the other is the beam-scanning system. In a specimen-scanning system, the specimen is moved transversely in the object plane of the objective in order to reconstruct the sectioning image. A confocal microscope with a specimen-scanning system is usually simple, as shown in Fig. 5.5. Since only the axial region of the objective is used during the scanning of the specimen, off-axis aberrations have only slight effects on the resolution of the image, resulting in better resolution. Moreover, a significant advantage of the specimen-scanning system is that the resolution and the contrast are identical across every region of the specimen due to constant illumination during the scanning of the specimen. The biggest disadvantage of the specimen-scanning system is its slow speed. It normally takes a few seconds to produce one frame of an image in this system. The beam-scanning system, which is used in many commercial confocal microscopes, is much faster than the specimen-scanning system. In a confocal microscope equipped with a beam-scanning system, the light beam is scanned

Case Study 1: Confocal Microscopes

205

Figure 5.7 Schematic of a confocal microscopy with a beam-scanning system.

over the object plane of the objective, and the specimen is stationary. Several types of scanners have been designed for beam scanning, such as galvanometertype mirrors, polygon mirrors, and acousto-optic beam deflectors. A simple schematic diagram of a confocal microscope with a beam-scanning system using a galvanometer-type mirror is illustrated in Fig. 5.7. Note that the beam should always overfill the entrance pupil of the objective during beam scanning to ensure the condition of constant illumination. Moreover, in practice, several transfer lenses are needed to image the center of the scanner into the entrance pupil plane of the objective. More details about beam-scanning systems can be found in Refs. 6 and 7. In addition to the two scanning systems introduced above, there are some other useful techniques, such as using a Nipkow disk or a slit to replace the pinhole in a setup that can be considered as another type of beam scanning, namely, multiple-beam scanning.6 These additional techniques will be introduced in the next section. Due to the high speed of beam scanning, many confocal microscopes using this technique can produce real-time images. 5.3 Types of Confocal Microscopes In subsection 5.2.1, the basic principles of the confocal microscope are presented based on a specimen-scanning confocal microscope due to its simple optical configuration. To gain a comprehensive understanding of confocal microscopes, in this section two types of confocal microscopes based on beam scanning are briefly introduced. 5.3.1 Nipkow-disk scanning confocal microscopes The confocal microscope with a beam-scanning system based on the Nipkow disk is known as the Nipkow-disk scanning confocal microscope (NSCM).

206

Chapter 5

The NSCM was historically the first real-time confocal microscope by which an image could be observed with the naked eye. It was first developed by Petráň and Hadravský in the mid-1960s and was at that time named the tandem-scanning reflected-light microscope.8 The principle of the NSCM is described as follows. As shown in Fig. 5.8, two Nipkow disks are used as substitutes for pinholes in a common confocal microscope. One disk is located behind the source, and the other is located in front of the detector. The two disks are conjugate to each other. In practice, each Nipkow disk contains several sets of pinholes (30–80 mm in diameter) arranged in several sets of Archimedes spirals,6 and each set contains about 100 pinholes. The separation between adjacent pinholes is about 10 the aperture size of the pinhole to minimize crosstalk between pinholes. Each pinhole on one Nipkow disk has an equivalent and conjugate pinhole on the other disk. When this kind of confocal microscope works, illumination light passes through a set of pinholes and forms a set of point sources. Those point sources are focused by the objective to form a set of spots on the specimen to be observed. Then the reflected and scattered light from the spots on the specimen is imaged by the objective into a conjugate set of pinholes on the other Nipkow disk. Finally, a set of point images is recorded by the detector. By synchronously spinning two Nipkow disks, a complete 2D sectioned image of the specimen at the object plane can be observed. Since the NSCM scans the specimen using a set of spots, compared to the microscope with singlepoint scanning, it greatly reduces the time needed to capture a frame of the image and can produce a real-time image. The sun or an arc lamp can normally be used as a source of the NSCM, which produces a true-color image for observation. Thus, the advantages of the NSCM are that it allows for real-time, true color, and direct observation. It should be known that Petráň and Hadravský used both sides of a Nipkow disk with central symmetry for illumination and imaging in the design of their

Figure 5.8

Schematic of a Nipkow-disk scanning confocal microscope.

Case Study 1: Confocal Microscopes

207

tandem-scanning confocal microscope. Thus, their design contains only one Nipkow disk. One notable disadvantage of the NSCM is the large loss of illumination light (low light efficiency), since the ratio of the sum of areas of holes to that of the Nipkow disk is usually only about 1–2%. To overcome this drawback, the NSCM is further improved by introducing another disk with a series of micro-lenslets.6

5.3.2 Scanning-slit confocal microscopes As pointed out above, the Nipkow-disk confocal microscope has a very low light efficiency. In this subsection, another type of confocal microscope having high efficiency in the illumination of the specimen is introduced. This type of confocal microscope uses slits instead of pinholes and is usually known as the scanning-slit confocal microscope (SSCM). The slits used in SSCM allow more light to pass through than do pinholes; thus, the illumination efficiency of the SSCM is greatly improved over that of the NSCM. Because each slit can be considered as the combination of a series of points along its direction, only a beam scan in the direction perpendicular to the slit is needed for reconstructing a complete 2D sectioned image. This not only simplifies the design and construction of scanning systems, but also reduces the time needed to capture a frame of the image. Thus, most SSCMs can work at video rate. Another advantage of the SSCM is that the width of the slit can be easily adjusted, which is very useful in certain applications, such as the observation of a normal human cornea in vivo. Note that the width of the slit serves a function similar to that of the size of the pinhole in a confocal microscope. A wide slit results in a higher SNR and lower resolution, while a narrow slit results in a lower SNR and higher resolution. There are also several disadvantages of SSCMs. One disadvantage is that the resolution of the SSCM is lower than that of the pinhole-based confocal microscope. This low resolution is due to the fact that the slits are truly confocal only in the direction perpendicular to them. Contrary to the pinholes used in the pinhole-based confocal microscope, the slits cannot block the stray light along the direction parallel to the slits. Thus, along the direction perpendicular to the slits, the resolution is the same as that of the pinhole-based confocal microscope; along the direction of the slits, the resolution is the same as that of a standard microscope. Another disadvantage is that when coherent light sources such as lasers are used, speckles can appear due to the interference of coherent light in the direction parallel to the slits on the image.7 The most commonly used confocal microscopes today are laser-scanning confocal microscopes (LSCMs), also known as confocal scanning laser microscopes (CSLMs). Figure 5.6 is a diagram of a simple LSCM with a laser

208

Chapter 5

as the light source. The principle, configuration, and characteristics of the LSCM are fundamentally the same as those of the common confocal microscopes introduced in Section 5.2.

References 1. G. H. Seward, Optical Design of Microscopes, SPIE Press, Bellingham, Washington (2010) [doi: 10.1117/3.855480]. 2. W. J. Smith, Modern Optical Engineering, Fourth Edition, McGraw-Hill, New York (2008). 3. M. Laikin, Lens Design, Fourth Edition, CRC Press, Boca Raton, Florida (2007). 4. J. Z. Young and F. Roberts, “A flying-spot microscope,” Nature 167, 231 (1951) 5. M. Minsky, “Microscopy apparatus,” U.S. Patent 3013467, December 19, 1961. 6. B. R. Masters, Confocal Microscopy and Multiphoton Excitation Microscopy: The Genesis of Live Cell Imaging, SPIE Press, Bellingham Washington (2006) [doi: 10.1117/3.660403]. 7. T. R. Corle and G. S. Kino, Confocal Scanning Optical Microscopy and Related Imaging Systems, Academic Press, London (1996). 8. M. Petráň, M. Hadravský, M. D. Egger, and R. Galambos, “The tandem scanning reflected light microscope,” Journal of the Optical Society of America 58(5), 661–664 (1968).

Chapter 6

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors An online cophasing optical system for cophasing of segmented mirrors—a Mach–Zehnder interferometer equipped with the technique of dual-wavelength digital holography—is presented in this chapter. The principle and optical layout of this optical system is explicated mainly using wave optics. With this example, readers can better understand optical systems in view of wave optics. Compared with monolithic primary mirrors, segmented primary mirrors are more affordable and practical for ground-based astronomical telescopes with large apertures due to their low cost and easy fabrication. However, in order to use a segmented mirror to acquire a high image quality that is equivalent to the image quality using a monolithic mirror with the same aperture, cophasing of the segmented mirror is required. Cophasing, especially online cophasing, of segmented mirrors is a great challenge and has attracted much research interest.1–3 The process of cophasing a segmented mirror involves measuring and further removing height errors among all segments for the entire segmented mirror, including relative piston errors between adjacent segments and tip/tilt errors for each individual segment. A key step in this process is measuring these height errors (piston and tip/tilt errors). In optics, the height errors of a segmented mirror are generally determined by measuring the phase of light reflected from the segmented mirror. Due to the discontinuity of the segmented mirror and the existence of atmospheric turbulence, the problem of 2p ambiguities in the phase measurement makes measurement of the phase of light reflected from the segmented mirror very difficult. In this chapter, dual-wavelength digital holography, a technique commonly used for phase measurement4 and high-resolution imaging,5,6 is adopted to solve this problem.7 In the following sections of this chapter, measurement of the phase of light reflected from a segmented mirror will be explained in detail. First, the

209

210

Chapter 6

principles of dual-wavelength digital holography for phase measurement are presented. Then, a point diffraction Mach–Zehnder interferometer designed for recording holograms is discussed. Next, an algorithm for numerical processing of interferograms is presented. Finally, the cophasing of two simulated segmented mirrors using dual-wavelength digital holography is performed.

6.1 Principles of Dual-Wavelength Digital Holography for Phase Measurement To elaborate on the principles of dual-wavelength digital holography for phase measurements, first single-wavelength digital holography for phase measurement is presented. Then, single-wavelength digital holography is extended to a dual-wavelength technique to increase the range of phase measurement. 6.1.1 Single-wavelength digital holography for phase measurement Digital holograms are normally recorded by an interferometer. Here, an interferometer with an off-axis configuration is adopted for the explanation of phase measurement using single-wavelength digital holography. Figure 6.1(a) schematically shows the recording part of an interferometer with an off-axis configuration. In this configuration, the object beam (solid lines) containing the phase of the object to be measured interferes with the reference beam (dashed lines) that is incident on the recording plane at an offset angle of u. The interference pattern (or interferogram) recorded by the interferometer with this off-axis configuration is also called an off-axis hologram and contains the phase information of the object to be measured. According to our knowledge on the superposition of light fields (presented in Chapter 3), a recorded off-axis interferogram can be expressed as

Figure 6.1 Diagram of (a) the off-axis configuration for hologram recording and (b) the Fourier spectrum of an off-axis hologram.

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

211

I H ðx, yÞ ¼ jO þ Rj2 ¼ jOj2 þ jRj2 þ OR þ O R,

(6.1)

where O(x, y) ¼ oexp[–iw(x, y)] is the light field of the object beam at the recording plane, o is the amplitude of the light field of the object beam, w(x, y) is the phase of the object to be measured, R ¼ rexp{–i2p[(sinu)/l]x} is the light field of the reference beam, r is the amplitude of the light field of the reference beam, u is the offset angle between the object and reference beam, l is the recording wavelength, and the superscript * stands for the complex conjugate. In Eq. (6.1), the interference term of interest is OR*, which can be extracted by Fourier transforming and filtering on an off-axis hologram. If the Fourier transform is performed on Eq. (6.1), according to the shift theorem of the Fourier transform (see the appendices), the reference beam rexp{–i2p[(sinu)/l]x} results in a translation of (sinu)/l (the spatial frequency of the reference beam) in the Fourier frequency domain, as shown in Fig. 6.1(b). This translation in the Fourier frequency domain provides great convenience when extracting the spectrum of the required term OR* in Eq. (6.1) by performing a filtering operation on the Fourier spectrum of the off-axis hologram. If another light field R0 that has the same phase as the reference light field R can be reconstructed irrespective of its amplitude, the term R*R0 will be real. Once OR* is obtained, OR*R0 can be further obtained from OR* by multiplying by R0 . The phase of the object can be extracted by taking the phase argument of the complex amplitude OR*R0 . In order to obtain the complex amplitude OR*R0 that contains only the phase of the object, four steps are taken. First, perform the Fourier transform on the off-axis interferogram denoted by Eq. (6.1). As shown in Fig. 6.1(b), the Fourier spectrum of the off-axis hologram contains three parts. The central part is the Fourier transform of the sum of terms |O|2 and |R|2 on the right side of Eq. (6.1), i.e., the Fourier spectra of the sum of |O|2 and |R|2. The left and right parts, translating relative to the central part due to the linear phase shift of the reference beam in the space domain, are the Fourier spectra of term OR* and its conjugate term O*R. Secondly, filter out the spectrum of the central part and the spectrum of term O*R by a digital filter so that only the Fourier spectrum of term OR*on the right side of Eq. (6.1) is retained. (The design of the digital filter will be discussed in Section 6.3.) Thirdly, perform the inverse Fourier transform on the filtered spectrum to obtain the term OR*. Finally, multiply OR* by the light field R0 to obtain the complex amplitude OR*R0 , which contains only the phase of the object. (The reconstruction of the light field R0 will be presented in Section 6.3.) The four steps described above can be mathematically expressed as OR R0 ¼ F 1 fSF½F ðI H ÞgR0 ,

(6.2)

where F stands for the Fourier transform, F 1 is the inverse Fourier transform, and SF is a digital filter. Then, the phase of the object can be

212

Chapter 6

extracted by w(x, y) ¼ arg[OR*R0 ], where arg(·) is the operation that extracts the phase of the complex amplitude of OR*R0 . The height error z(x, y) can be calculated by z(x, y) ¼ (l/2p)w(x, y). It should be noted that the offset angle is required to ensure that there is no overlap among the three parts of the Fourier spectrum of an off-axis hologram shown in Fig. 6.1(b). To achieve this goal, the spatial frequency of the reference beam (sinu)/l must satisfy8 sin u ≥ 3jmax , l where jmax is the highest spatial frequency of the object of interest. Moreover, because a CCD camera—a commonly used device for recording interferograms—is pixilated, the intensity distribution recorded by the CCD camera only provides a sampled interferogram. This sampled interferogram contains information on the object with limited spatial frequencies that are ultimately confined by the pixel pitch (the distance between adjacent pixels) of the CCD camera. In order to properly sample the interferogram by the CCD camera, according to the Nyquist criterion, the pixel pitch Dx of the CCD camera should satisfy 1 : Dx ≤  sin u þ jmax 2 l Because the spatial frequency of the reference beam (sinu)/l is generally much larger than the highest spatial frequency of the object beam jmax in the off-axis configuration, the pixel pitch requirement of a CCD camera can be further approximated as Dx ≤

1 l : ¼ sin u 2 sin u 2 l

Note that, due to the 2p ambiguity, the measurement range using singlewavelength digital holography is limited to one-half of a working wavelength, which is the range in which off-axis interferograms are recorded. This range of measurement is insufficient for phase measurement in practical applications. In order to increase the measurement range for this method, single-wavelength digital holography is extended to a dual-wavelength approach, which is presented next. 6.1.2 Dual-wavelength digital holography for phase measurement Two off-axis interferograms IH1 and IH2 for two different wavelengths are required for dual-wavelength digital holography. Then, using Eq. (6.2), two

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

213

complex amplitudes containing only the phase of the object for the corresponding wavelengths can be expressed as O1 R1 R01 ¼ F 1 fSF½F ðI H1 ÞgR01 , O2 R2 R02 ¼ F 1 fSF½F ðI H2 ÞgR02 ,

(6.3)

where the subscripts 1 and 2 stand for two different wavelengths l1 and l2, respectively, and R01 and R02 are again, light fields having the same phase as reference light fields R1 and R2, respectively, but for different recording wavelengths. In order to facilitate an understanding of the principles of dual-wavelength digital holography [illustrated mathematically by Eq. (6.4)], the two complex amplitudes in Eq. (6.3) can be written as O1R1 R01 ¼ A1exp[–w1(x, y)] and O2R2 R02 ¼ A2exp[–w2(x, y)], where A1 and w1(x, y) are the real amplitude and phase of the object at the wavelength of l1, respectively, and A2 and w2(x, y) are the amplitude and phase of the object at the wavelength of l2, respectively. Suppose that wavelength l1 is slightly smaller than wavelength l2. Then, the phase of the object for a synthetic wavelength of l1 and l2, wsyn ¼ w1 – w2, can be extracted by wsyn ðx, yÞ ¼ arg ½ðO1 R1 R01 ÞðO2 R2 R02 Þ    1 1 ¼ 2p  zðx, yÞ l1 l2 2p zðx, yÞ, ¼ lsyn

(6.4)

where lsyn ¼ (l1l2)/(|l1 – l2|) is defined as the synthetic wavelength of l1 and l2, and z(x, y) is the height error at point (x, y). Figure 6.2 shows ranges of measurement for two different wavelengths and the corresponding synthetic wavelength. For example, the ranges of measurement for a phase using wavelengths of 1 mm and 1.5 mm are 0.5 mm and 0.75 mm, respectively, as shown by the gray dashed and gray dotted lines in Fig. 6.2. However, as the synthetic wavelength of wavelengths 1 mm and 1.5 mm is 3 mm, calculated by lsyn ¼ (l1l2)/(|l1 – l2|), the range of measurement for the synthetic wavelength can be extended to 1.5 mm, as shown by the black dashed line in Fig. 6.2. In addition, the smaller the difference between the two wavelengths the longer the synthetic wavelength of the two wavelengths, and the larger the range of measurement for the dual-wavelength technique for phase measurement. However, it must be noted that the difference between the two wavelengths should not be too small; otherwise, the two interferograms, recorded at different wavelengths, can not be resolved by the CCD camera. In the following sections, the method of dual-wavelength digital holography for online cophasing of segmented mirrors is presented. This

214

Chapter 6

Figure 6.2 Schematic of ranges of measurement for two different wavelengths and the corresponding synthetic wavelengths.

method was published in 2014 by the journal Publications of the Astronomical Society of the Pacific.7

6.2 Design of the Holographic Recorder: A Point Diffraction Mach–Zehnder Interferometer In order to acquire off-axis interferograms for online cophasing of segmented mirrors, a point diffraction Mach–Zehnder interferometer9 is designed. The off-axis configuration of the point diffraction Mach–Zehnder interferometer is shown in Fig. 6.3. The light wavefront, distorted by atmospheric turbulence, is incident on a telescope with a segmented primary mirror and converges to the focal point of the telescope. The diverging beam from the focus of the telescope is collimated by a positive lens, and then is incident on the point diffraction Mach–Zehnder interferometer. To record offaxis interferograms, a small offset angle u between the beams from the two arms of the interferometer is introduced in the reference arm by tipping a flat mirror FM1, as shown in Fig. 6.3. In this same reference arm, a reference beam is generated by using a pinhole as a spatial filter located at the common focal plane of lenses L2 and L3. To balance lenses L2 and L3 in the reference beam, two lenses, L4 and L5, respectively, which are identical to lenses L2 and L3, are inserted in the other arm, i.e., the object arm. The reference and object beams from the two arms of the interferometer are recombined using a cubic

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

Figure 6.3

215

Diagram of the point diffraction Mach–Zehnder interferometer.

beam splitter CBS2. The off-axis interferograms formed by the reference and object beams are recorded by a high-speed, high-sensitivity CCD camera. Moreover, to obtain light with desired wavelengths for the sake of acquiring interferograms at different wavelengths, narrow-bandpass filters are located in front of the interferometer, as shown in Fig. 6.3. As pointed out in Section 6.1, in order to acquire the phase of the complex amplitude OR*R0 , i.e., the phase of the segmented mirror, a complex amplitude R0 must be reconstructed. As shown in Fig. 6.3, when the pinhole in the reference arm is removed from the optical layout, the only difference between the two beams in phase is the linear phase shift, i.e., the phase of the reference beam before removing the pinhole. In this case, an interferogram denoted as R IR H is recorded. By numerically processing interferogram I H , the complex 0 amplitude R having the same phase as the reference can be reconstructed. The process of reconstructing R0 from hologram I R H will be presented in the next section. The steps involved in recording interferograms for the purpose of obtaining the required interferograms for cophasing of segmented mirrors by dualwavelength digital holography are as follows. First, remove the pinhole in the reference arm from the optical layout and R record the two interferograms I R H1 and I H2 using two narrow-bandpass filters at the central wavelengths of l1 and l2, respectively. These two holograms are used for reconstructing the complex amplitudes of R01 and R02 at wavelengths l1 and l2, respectively. Second, add the pinhole back into the optical layout and record two additional holograms IH1 and IH2 at wavelengths l1 and l2, respectively, using the same two narrow-bandpass filters as were used in step 1. These two

216

Chapter 6

interferograms are used for extracting the phase of the segmented mirror at the synthetic wavelengths l1 and l2. R With these four interferograms (IH1, IH2, I R H1 , and I H2 ), cophasing of the segmented mirror can be performed. A detailed algorithm for cophasing of segmented mirrors is presented in the next section.

6.3 Algorithm for Numerical Processing of Interferograms Numerical processing of off-axis interferograms for online cophasing of segmented mirrors can be divided into two stages: (I) phase extraction and (II) plane fitting. Stage I. Phase extraction

According to the principles of dual-wavelength digital holography for phase measurement described in Section 6.1, stage I can be divided into three steps. Step 1: Reconstruct the complex amplitudes of R01 and R02 from interferograms R IR H1 and I H2 , respectively, R As interferograms I R H1 and I H2 both contain only the phase of the reference beam, the complex amplitudes R01 * and R02 * can be easily reconstructed using the first three steps of the procedure for reconstruction of the complex amplitude OR*R0 presented in Section 6.1. Referring to Eq. (6.2), the complex amplitudes of R01 and R02 can be expressed, respectively, as  R01 ¼ F 1 fSF½F ðI R H1 Þg ,  R02 ¼ F 1 fSF½F ðI R H2 Þg :

(6.5)

The digital filter SF is designed as shown in Fig. 6.4. The spectrum of the offaxis interferogram in the dashed rectangle is set to zero, and the cutoff is set at the position of the minimum intensity between the required spectrum and the central part of the spectrum of the off-axis interferogram.

Figure 6.4

Diagram of the digital filter SF.

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

217

Step 2: Reconstruct the complex amplitudes O1R1 R01 and O2R2 R02 containing only the phase of the segmented mirror from interferograms IH1 and IH2, respectively, by using Eq. (6.3). The digital filter used in Eq. (6.3) is the same as that used in Eq. (6.5). Step 3: Extract the phase of the segmented mirror for a synthetic wavelength using Eq. (6.4). An averaged synthetic phase of the segmented mirror is obtained by averaging many extracted phases, each of which is obtained by following the three steps above to overcome atmospheric turbulence. To achieve this goal, the recorded interferograms should satisfy two conditions. The first condition is that the time interval between recordings of the interferograms should be longer than the coherence time of the atmospheric turbulence. The second condition is that the exposure time of the CCD camera recording the interferograms must be shorter than the coherence time of the atmospheric turbulence. Stage II. Plane fitting

Suppose that each segment of the segmented mirror has only piston and tip/tilt errors. To eliminate any errors caused by measurement noise, the piston and tip/tilt coefficients for each segment of the segmented mirror are obtained by fitting the plane of the averaged synthetic phase extracted in stage I with Eq. (6.6) below. Furthermore, as the phase of the segmented mirror is extracted by performing inverse Fourier transforms on a filtered spectrum, the influence of Gibbs phenomenon on the accuracy of the extracted synthetic phase is significant, especially at the edges of each segment. To reduce the influence of Gibbs phenomenon as much as possible, a central region of each segment of the segmented mirror is selected for plane fitting. From this central region of each segment, the piston coefficient ci is extracted by taking the average of the phase of each point over the selected region. Once the piston coefficient for each segment is obtained, it is substituted into Eq. (6.6) and fixed for plane fitting. The tip/tilt coefficients ai and bi for each segment can be obtained by fitting the plane of the averaged synthetic phase of the segmented mirror over the selected region with the following expression: wsi ðx, yÞ ¼ ai x þ bi y þ ci ,

(6.6)

where wis(x, y) is the averaged synthetic phase in the selected region for the ith segment of the segmented mirror at point (x, y), and ai, bi, and ci are the tip/tilt and piston coefficients for the ith segment of the segmented mirror, respectively. Note that coefficients ai, bi, and ci for each segment are in radians. Since height errors are used to regulate the position of each segment of the segmented mirror, these piston and tip/tilt coefficients should be converted to the corresponding coefficients in the length unit by multiplying by a factor of lsyn/(2p).

218

Chapter 6

For online cophasing of the segmented mirror with relatively high accuracy, the above procedure may need to be repeated about three times or more. Moreover, since the offset angle between the two beams in the interferometer is fixed, the complex amplitudes of R01 and R02 reconstructed by step 1 of stage I are invariant and only need to be measured once.

6.4 Performance Numerical experiments are performed for the cophasing of two simulated segmented mirrors in the presence of noise and atmospheric turbulence. The first simulated segmented mirror (hereafter, abbreviated as S1) contains only piston errors, and the second simulated segmented mirror (hereafter, abbreviated as S2) contains both piston and tip/tilt errors. Recording wavelengths l1 and l2 for the simulated interferograms are taken as 0.532 mm and 0.632 mm, respectively. The corresponding synthetic wavelength is 3.36 mm, meaning that the range of measurement for phase measurement can be up to 1.68 mm. Note that the interferograms in this section are all numerically simulated, and Gaussian white noise with a SNR (see definition in Ref. 10) of 20 is added. Random phase screens that have been simulated according to the Kolmogorov model (to be introduced in Section 7.1) are used to disturb the interferograms in order to simulate the situation of online cophasing of segmented mirrors. The performance of cophasing simulated mirrors S1 and S2 by dualwavelength digital holography is presented next. 6.4.1 Online cophasing of S1 by dual-wavelength digital holography Figure 6.5 shows simulated segmented mirror S1 with 37 hexagonal segments. The surface of S1 has a PV value of 1.5 mm and a RMS value of 0.41 mm. R As shown in Fig. 6.6, interferograms I R H1 and I H2 are numerically generated at wavelengths l1 and l2, respectively, when the pinhole is removed in the reference arm shown in Fig. 6.3. The interferograms in Fig. 6.6 have only straight fringes, indicating that they contain only the phase difference of tip caused by the offset angle between the two beams of the interferometer. By R 0 0 applying Eq. (6.5) to interferograms I R H1 and I H2 , R1 and R2 for wavelengths l1 and l2, respectively, can be reconstructed. One thousand atmospheric phase screens with an averaged PV value of 0.51 waves are used to disturb the interferograms. For each wavelength, 500 interferograms, each of which contains both the phase of S1 and 1 of 500 phase screens, are numerically generated. Figure 6.7 shows two interferograms IH1 and IH2, each of which are 1 of the 500 interferograms for the corresponding wavelength. Following stage I in Section 6.3, as shown in Fig. 6.8, the averaged synthetic phase for S1 is obtained by averaging 500 synthetic phases.

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

Figure 6.5

Figure 6.6

219

Diagram of S1 with only piston errors (scales in microns).

R Holograms of I R H1 and I H2 at wavelengths of (a) 0.532 mm and (b) 0.632 mm.

Following stage II in Section 6.3, by fitting planes on each segment of the averaged synthetic phase of S1 as shown in Fig. 6.8, the piston and tip/tilt coefficients for each segment of the segmented mirror can be obtained. Note that, although S1 contains only piston errors, the tip/tilt coefficients are not zero, but are extremely small due to errors in numerical calculations. After repeating the above procedure for cophasing of segmented mirrors three times, the residual height error of S1 is obtained, as shown in Fig. 6.9. The PV value of the residual height error is 0.088 mm, and the RMS value of the height error is 9.75  10–3 mm. Figure 6.10 shows the image quality in the central part of the point spread function of the simulated mirror S1 with and without compensation at a wavelength of 0.532 mm. Obviously, after cophasing S1, the point spread

220

Chapter 6

Figure 6.7 Holograms of IH1 and IH2 for S1 at wavelengths of (a) 0.532 mm and (b) 0.632 mm.

Figure 6.8

Averaged synthetic phase of S1 (scale in radians).

function (presented in Chapter 3) is brighter and narrower than that without compensation, indicating that the image quality is greatly improved. 6.4.2 Online cophasing of S2 by dual-wavelength digital holography Figure 6.11 shows simulated segmented mirror S2, also with 37 hexagonal segments. The surface of S2 has a PV value of 1.5 mm and a RMS value of 0.24 mm. Similar to the online cophasing of S1 in the presence of noise and atmospheric turbulence, 500 interferograms for each wavelength are generated. Two

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

Figure 6.9

221

Residual height error of S1 (scale in microns).

Figure 6.10 Central part of the point spread function for S1 (a) with and (b) without compensation at a wavelength of 0.532 mm.

of these interferograms for each wavelength are shown in Fig. 6.12. In each segment of S2, the fringe spacings and orientations of the interferograms shown in Fig. 6.12 greatly differ from those shown in Fig. 6.7. These differences in fringe spacing and orientation are caused by different piston and tip/tilt errors for each segment of S2. Following the steps in stage I of Section 6.3, the averaged synthetic phase of S2 over 500 synthetic phases is extracted and shown in Fig. 6.13. Following stage II in Section 6.3, the piston and tip/tilt coefficients can be obtained for regulating each segment of S2.

222

Chapter 6

Figure 6.11 Diagram of S2 (scale in microns).

Figure 6.12 Holograms of S2 for (a) 0.532 mm and (b) 0.632 mm.

After repeating the above procedures for the cophasing of a segmented mirror three times, the residual height error of S2 is obtained and shown in Fig. 6.14. The PV and RMS values of the residual height error of S2 are 0.154 and 0.019 mm, respectively. The image quality in the central part of the point spread function of simulated mirror S2 with and without compensation at a wavelength of 0.532 mm is shown in Fig. 6.15. The improvement in image quality for simulated mirror S2 is more prominent than for mirror S1.

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

223

Figure 6.13 Averaged synthetic phase of S2 (scale in radians).

Figure 6.14

Residual height error of S2 (scale in microns).

The performance study of the cophasing of S1 and S2 in the presence of noise and atmospheric turbulence shows that online cophasing of segmented mirrors based on dual-wavelength digital holography can reach relatively high accuracy. This means that a point diffraction Mach–Zehnder interferometer equipped with dual-wavelength digital holography can sense height errors of segmented mirrors due to misalignments.

224

Chapter 6

Figure 6.15 Central part of the point spread function for S2 (a) with and (b) without compensation at 0.532 mm.

References 1. G. Chanan, C. Ohara, and M. Troy, “Phasing the mirror segments of the Keck telescopes II: the narrow-band phasing algorithm,” Applied Optics 39(25), 4706–4714 (2000). 2. C. Pizarro, J. Arasa, F. Laguarta, N. Tomàs, and A. Pinto, “Design of an interferometric system for the measurement of phasing errors in segmented mirrors,” Applied Optics 41 (22), 4562–4570 (2002). 3. N. Yaitskova, K. Dohlen, P. Dierickx, and L. Montoya, “Mach–Zehnder interferometer for piston and tip-tilt sensing in segmented telescopes: theory and analytical treatment,” Journal of Optical Society of American A 22(6), 1093–1105 (2005). 4. J. Gass, A. Dakoff, and M. Kim, “Phase imaging without 2p ambiguity by multiwavelength digital holography,” Optics Letters 28(13), 1141–1143 (2003). 5. F. Charrière, N. Pavillon, T. Colomb, C. Depeursinge, T. J. Heger, E. A. Mitchell, P. Marquet, and B. Rappaz, “Living specimen tomography by digital holographic microscopy: morphometry of testate amoeba,” Optics Express 14(16), 7005–7013 (2006). 6. C. Mann, L. Yu, C. Lo, and M. Kim, “High-resolution quantitative phase-contrast microscopy by digital holography,” Optics Express 13(22), 8693–8698 (2005). 7. C. Li and S. Zhang, “A digital holography approach for co-phasing of segmented telescopes: proof of concept using numerical simulations,” Publications of the Astronomical Society of the Pacific 126(937), 280–286 (2014).

Case Study 2: Online Cophasing Optical Systems for Segmented Mirrors

225

8. P. Hariharan, Optical Holography Principles, Technique, and Applications, Second Edition, Cambridge University Press, Cambridge (1996). 9. J. Angel, “Ground-based imaging of extrasolar planets using adaptive optics,” Nature 368(17), 203–207 (1994). 10. M. C. Roggemann, D. W. Tyler, and M. F. Bilmont, “Linear reconstruction of compensated images: theory and experimental results,” Applied Optics 31(35), 7429–7441 (1992).

Chapter 7

Case Study 3: Adaptive Optics Systems Adaptive optics (AO) is an effective optical technique that enables groundbased astronomical telescopes to attain distinct images by reducing the effect of atmospheric turbulence on wavefronts of light waves. In this chapter, an AO system for a ground-based astronomical telescope is taken as an example to facilitate understanding of an optical system from its design to its implementation using both geometrical and wave optics. The image quality of a ground-based astronomical telescope is greatly degraded due to distorted wavefronts caused by the atmospheric turbulence of incoming light waves. To improve the image quality of the astronomical telescope, the distorted wavefronts of incoming light waves need to be compensated in real time, which is the work of AO. Although the basic idea of AO is simple, it is very difficult to implement a practical AO system because implementation involves multiple scientific branches and technical realms such as optics, automatic control, and electronics, among others. In order to provide a deep understanding of one AO system, this final chapter is structured as follows. First, the principles of AO are presented. Second, brief descriptions of an astronomical telescope and atmospheric seeing are given as prerequisites to designing a practical AO system. Third, the detailed design of an AO system for a telescope is illustrated. Then, the core components and related algorithms of the telescope AO system are given. Next, two criteria—one for the order estimation in the modal wavefront reconstruction, and the other for the matching problem between the Shark–Hartmann sensor and the deformable mirror—are discussed. Finally, the implementation and performance of the AO system are presented.

7.1 Principles of Adaptive Optics In Chapter 3, it was shown that the angular resolution of an optical imaging system is ultimately determined by the system’s aperture size due to diffraction

227

228

Chapter 7

of light for a certain wavelength. The larger the aperture of an optical imaging system the better the angular resolution of that system for a certain wavelength. This is one of the most important reasons that ground-based astronomical telescopes are being constructed larger all the time. However, the practical resolution of a large-aperture telescope is greatly limited by atmospheric turbulence. For example, the resolution of a large-aperture telescope is generally equivalent to that of a telescope with an aperture of 10  20 cm. This degradation in the resolution of an astronomical telescope is attributed to the distorted wavefronts of incident light waves. These distorted wavefronts are caused by random fluctuations in the refractive index of the atmosphere above the telescope. AO was first proposed by Babcock1 in 1953 and has since developed into an indispensable technique used with astronomical telescopes having large apertures. As shown in Fig. 7.1, an AO system consists of three main parts: a wavefront sensor, a wavefront corrector, and a controller. In an AO system, the distorted wavefront of the incident light is measured by the wavefront sensor, and the wavefront sensed by the wavefront sensor is converted into control signals for the wavefront corrector by the controller. The signals from the controller drive the wavefront corrector to compensate the incoming

Figure 7.1

Schematic of an AO system.

Case Study 3: Adaptive Optics Systems

229

wavefront in real time. In this way, the image quality of the telescope equipped with an AO system can be greatly improved. Based on the working principle of AO systems, obviously, an AO system can also be considered as a closed-loop control system. Because AO was developed to overcome the blurring effects of atmospheric turbulence on imaging, before understanding and designing an AO system, it is essential to become familiar with the theory of imaging through turbulence. Therefore, before introducing the principles of wavefront sensing, wavefront correction, and the control system, the theory behind imaging through atmospheric turbulence is first summarized. 7.1.1 Imaging through atmospheric turbulence Since fluctuations in the refractive index of atmospheric turbulence are random, and distorted wavefronts caused by atmospheric turbulence are also random, these phenomena can only be statistically described. In fact, the statistical properties of distorted wavefronts provide a highly effective way to estimate the image quality of an astronomical telescope and can further serve as a starting point for designing an AO system. 7.1.1.1 Structure function of the refractive index and its power spectrum

The temperature of the atmosphere surrounding the earth is a function of altitude. Air currents mixing turbulence at different altitudes results in inhomogeneities in the atmospheric temperature. Furthermore, variations in inhomogeneous temperature cause random fluctuations in the refractive index of atmospheric turbulence. The variance of the difference between refractive indices at two different points with a separation of r is defined as Dn ðrÞ ¼ hjnðrÞ  nðr þ rÞj2 i,

(7.1)

where Dn(r) is the structure function of the refractive index, and n(r) is the refractive index at position r. Generally speaking, atmospheric turbulence can be considered as isotropic and homogeneous. According to Kolmogorov’s dimensional analysis for atmospheric turbulence,2 the structure function of the refractive index defined in Eq. (7.1) satisfies Dn ðrÞ ¼ C 2n r2∕3 ,

(7.2)

where r ¼ |r|, and C 2n is the structure coefficient of the refractive index of atmospheric turbulence. This power law of Dn(r) has also been verified experimentally for moderate values of r. In Eq. (7.2), the value of C 2n , which evolves with time and is mainly dependent on altitude, characterizes the strength of the fluctuations in the refractive index of atmospheric turbulence.

230

Chapter 7

So far, our attention has focused on the spatial characteristics of atmospheric turbulence. To gain a comprehensive understanding of atmospheric turbulence, the power spectrum of the refractive index of atmospheric turbulence is also presented. The power spectrum of the refractive index of atmospheric turbulence can be obtained by performing the Fourier transform on the structure function of the refractive index:3 ˜ n ðKÞ ¼ 0.033C 2n K 11∕3 , F

(7.3)

where K is the spatial frequency of atmospheric turbulence, and K ¼ |K| (K ¼ Kx, Ky, Kz). Kolmogorov provided a pictorial description of motion in atmospheric turbulence.2 He said that large-scale motion in atmospheric turbulence continually breaks down into smaller-scale motion, so the scales of motions become smaller and smaller. During this break-down process, the kinetic energy of large-scale motion is transferred into the kinetic energy of smallscale motion. However, the scale of motion cannot be infinitely small because the Reynolds number VL/n0 has a limit value for the occurrence of turbulence, where V is the characteristic velocity of motion in atmospheric turbulence, L is the characteristic scale of the motion, and n0 is the kinematic viscosity of the atmosphere. Once the Reynolds number is sufficiently small, turbulence will disappear and the kinetic energy of the smallest-scale motion will be converted into heat by viscous friction. This process is schematically shown in Fig. 7.2. In order for Eq. (7.3) to be valid, the spatial frequency of atmospheric 1 turbulence needs to satisfy L1 0 ≤ K ≤ l 0 , where L0 is the outer scale of the atmospheric turbulence, typically several tens of meters or more, and l0 is the inner scale, typically several tens of millimeters. In the spatial domain, moderate values of r lie in the range of (l0, L0).

Figure 7.2 Schematic of the Kolmogorov spectrum on a log–log scale.

Case Study 3: Adaptive Optics Systems

231

7.1.1.2 Phase structure function and its power spectrum

When light propagates through atmospheric turbulence, according to the relationship between the phase delay and the corresponding OPD expressed by Eq. (3.76), the phase delay (resulting from the OPD caused by the refractive index of atmospheric turbulence) can be calculated by wðx, yÞ ¼ k ∫nðx, y, zÞdz,

(7.4)

where k (k ¼ 2p/l) is the wavenumber of light, x and y are the coordinates on a plane through which light passes, and z is the propagation path of light in atmospheric turbulence. Based on the assumptions of isotropy and homogeneity in atmospheric turbulence, the variance of the difference in the phase at two points is defined by Dw ðrÞ ¼ hjwðrÞ  wðr þ rÞj2 i,

(7.5)

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where Dw(r) is known as the phase structure function, and r ¼ x2 þ y2 . By substituting Eqs. (7.4) and (7.2) into Eq. (7.5) and performing the integration of the difference between the propagation paths of light waves through the two points, the phase structure function can be rewritten as3 Dw ðrÞ ¼ 2.91k 2 r5∕3 ∫C 2n ðzÞdz:

(7.6)

As the structure coefficient C 2n is mainly dependent on altitude, the phase structure function can be directly expressed as the integral of altitude: Dw ðrÞ ¼

2.91k 2 5∕3 2 r ∫C n ðhÞdh, cos g

(7.7)

where g is the zenith angle, and h is the altitude. By denoting h i3∕5 , r0 ¼ 0.423k 2 ðcos gÞ1 ∫C 2n ðhÞdh which is called the Fried parameter or the coherence length of atmospheric turbulence,4 Eq. (7.7) can be simplified as  5∕3 r Dw ðrÞ ¼ 6.88 : (7.8) r0 Although r0 is just a statistical value of atmospheric turbulence due to the uncertainty of C 2n , it is of great significance when estimating the image quality of an astronomical telescope. Obviously, if the aperture of a telescope is less

232

Chapter 7

than or equals the value of r0, the variance in phase caused by atmospheric turbulence can be neglected. However, for a telescope with an aperture much larger than the value of r0, its image quality will be greatly degraded due to the large variance in phase caused by atmospheric turbulence. The variance calculated by Eq. (7.8) describes the spatial coherence characteristic of atmospheric turbulence. Similarly, the temporal coherence of atmospheric turbulence can be expressed by Dw ðtÞ ¼ hjwðr, tÞ  wðr, t þ tÞj2 i,

(7.9)

where Dw(t) is known as the temporal phase structure function of atmospheric turbulence, and t is the time interval. If all of the atmospheric turbulence above the telescope were propagating at the same velocity y, the wavefront distortion would also propagate at velocity y without noticeable variations when crossing the telescope aperture. This assumption is called the Taylor approximation, which is also known as Taylor’s frozen turbulence approximation. In this case, the phase w(r, t þ t) can be simply considered to be the same as w(r þ yt, t). Thus, according to the spatial phase structure function [Eq. (7.8)], the temporal phase structure function of atmospheric turbulence can be simply rewritten as  5∕3 yt Dw ðtÞ ¼ 6.88 : r0

(7.10)

Corresponding to the coherence length of atmospheric turbulence, the coherence time of atmospheric turbulence is defined as t ¼ r0/y. The isoplanatic phase distortion can also be obtained in a manner similar to the treatment of the phase distortion evolving with time. Suppose that the angle between a target star and a guide star subtended to the telescope lying at the zenith angle g is u (see Fig. 7.3). Because angle u is generally very small, the distance between the target star and the guide star can be calculated by hu/cos g. Referring to the phase structure function in space, the angular phase structure function describing the correlation between the phase at g and the phase at g þ u can be written as5,6  Dw ðuÞ ¼ 6.88

hu r0 cos g

5∕3

:

(7.11)

The power spectrum of the phase structure function can be obtained by performing the Fourier transform on the phase structure function as follows:7 wðKÞ ˜ ¼ 0.023r5∕3 K 11∕3 , 0

(7.12)

Case Study 3: Adaptive Optics Systems

Figure 7.3

233

Geometrical diagram of a target and a guide star.

where wðKÞ ˜ is the power spectrum of the spatial phase structure function, and K is the spatial frequency of the atmospheric turbulence. The power spectrum indicates that low frequencies of atmospheric turbulence dominate the phase distortion of a light wave. 7.1.1.3 Image formation through atmospheric turbulence

Now that the basic theory behind phase distortions caused by atmospheric turbulence has been covered, the theory of image formation through atmospheric turbulence can be discussed. Phase distortions caused by atmospheric turbulence have different effects on imaging quality, and these effects are dependent on exposure time. Therefore, long-exposure and short-exposure imaging modes are reviewed next. 7.1.1.3.1 Long-exposure imaging

Suppose that a telescope is an aberration-free optical system so the wavefront distortion on the pupil plane of the telescope is only due to the distortion from atmospheric turbulence. The generalized pupil function (as defined as in Section 3.4) of the telescope is denoted as P(r) ¼ circ(r)exp[iw(r)], where circ(·) is the circular function, and w(r) is the phase distortion (caused by atmospheric turbulence) on the pupil plane of the telescope. The composite OTF [as presented by Eq. (3.75) in Section 3.4] of the telescope and atmospheric turbulence for long-exposure-time imaging can be written as a time average by

234

Chapter 7

*2p R ˜ L ðKÞ ¼ H

+

∫ ∫ PðrÞPðr þ lZKÞdrdu 0 0

2p R

¼ hexpfi½wðrÞ  wðr þ lZKÞgi ∫ ∫ circðrÞcircðr þ lZKÞdrdu 0 0

˜ ¼ hexpfi½wðrÞ  wðr þ lZKÞgiTðlZKÞ,

(7.13)

where R is the radius of the telescope, Z is the distance from the pupil plane to ˜ the image plane of the telescope, K ¼ r/(lZ), and TðlZKÞ—the autocorrelation of the circular function determined by the aperture of the telescope—is the OTF of the telescope regardless of distortions caused by atmospheric turbulence. Note that the integral time for the time average in Eq. (7.13) corresponds to the exposure time of the camera. The exposure time for longexposure imaging should be much longer than the coherence time of the atmospheric turbulence. Since the phase distortion w(r) in Eq. (7.13) is caused by the integral of random fluctuations in the refractive index of the atmospheric turbulence along the propagation path of light, according to the central-limit theorem, this phase distortion as well as the difference of phase distortion, w(r) – w(r þ lZK), should follow Gaussian statistics. Then, the composite OTF expressed by Eq. (7.13) can be further calculated by 

 1 2 ˜ L ðKÞ ¼ TðlZKÞ ˜ H exp  hjwðrÞ  wðr þ lZKÞj i : 2

(7.14)

Substituting Eq. (7.8) into Eq. (7.14), the composite OTF can be simplified as i h ˜ ˜ L ðKÞ ¼ TðlZKÞ (7.15) exp 3.44ðlZK∕r0 Þ5∕3 : H From Eq. (7.15), it can be seen that even an image from a diffraction-limited telescope, and especially an image from a large-aperture telescope, would be greatly degraded by atmospheric turbulence. The OTF of atmospheric turbulence Ã(lZK) is expressed as h i ˜ AðlZKÞ ¼ exp 3.44ðlZK∕r0 Þ5∕3 : (7.16) 7.1.1.3.2 Short-exposure imaging

The Kolmogorov power spectrum has shown that the major contributors to phase distortion are the low-frequency components of atmospheric turbulence. In the power spectrum of atmospheric turbulence, approximately 87% of the total power roughly corresponds to that of tip/tilt.7 With short-exposure

Case Study 3: Adaptive Optics Systems

235

imaging, the exposure time is so short that the sharpness of the image is insensitive to the tip/tilt components of phase distortion. Hence, tip/tilt components, being dominant portions of the phase distortion caused by atmospheric turbulence, should be subtracted from the total distortion. Therefore, the composite OTF for short-exposure-time imaging can be expressed as *2p R ˜ S ðKÞ ¼ H

+

∫ ∫ PðrÞP ðr þ lZKÞdrdu 

0 0

0 ˜ ðrÞ  w0 ðr þ lZKÞgi, ¼ TðlZKÞhexpfi½w

(7.17)

where w0 ðrÞ ¼ wðrÞ  ðax þ byÞ, w0 ðr þ lZKÞ ¼ wðr þ lZKÞ  ½aðx þ lZK x Þ þ bðy þ lZK y Þ, a and b are random coefficients of tip/tilt, and Kx and Ky are spatial frequencies corresponding to the x and y axes, respectively. Note that the integral time for the time average in Eq. (7.17) should be shorter than the coherence time of the atmospheric turbulence. During the exposure time of the camera, the distortion caused by atmospheric turbulence can be considered as frozen. According to Fried’s assumptions8 that w0 (r) and w0 (r þ lZK) are independent of the coefficients of a and b, and obey Gaussian statistics, Eq. (7.17) can be further simplified as   1 2 ˜ ˜ H S ðKÞ ¼ TðlZKÞ exp  ½Dw ðlZKÞ  hðalZK x þ blZK y Þ i : (7.18) 2 According to Refs 8 and 9, hðalZK x þ blZK y Þ2 i can be calculated by  1∕3 lZK : (7.19) hðalZK x þ blZK y Þ2 i ¼ Dw ðlZKÞ Dw ðlZKÞ By substituting Eq. (7.19) into Eq. (7.18), the composite OTF for shortexposure-time imaging can be expressed as    1∕3  1 lZK ˜ ˜ H S ðKÞ ¼ TðlZKÞ exp  Dw ðlZKÞ 1  2 Dw ðlZKÞ    1∕3  lZK 5∕3 ˜ ¼ TðlZKÞ exp 3.44ðlZK∕r0 Þ 1 : (7.20) Dw ðlZKÞ 7.1.2 Wavefront sensing Distorted wavefronts of light waves must first be measured before atmospheric turbulence can be compensated. Wavefront sensing provides a means of measuring wavefronts of light waves. A review of wavefront sensing

236

Chapter 7

by Campbell et al. can be found in Ref. 10. Wavefronts or phases of light waves are lost during the measurement process because detectors can only record intensity distributions of light waves. Therefore, wavefront sensors are forced to sense wavefronts indirectly, e.g., by reconstructing a wavefront using local slopes or curvatures of the phase distortion extracted from the intensity distribution recorded by the sensor. The three most commonly used wavefront sensors are the Shack–Hartmann wavefront sensor, the lateral shearing interferometer, and the curvature wavefront sensor. The first two of these sensors reconstruct the wavefront by measuring local slopes, and the third reconstructs the wavefront by measuring local curvatures. Next, these three wavefront sensors are briefly introduced. 7.1.2.1 Shack–Hartmann wavefront sensor

The Shack–Hartmann (SH) wavefront sensor is the most widely used and well-known wavefront sensor. As shown in Fig. 7.4, the SH sensor consists of two parts: a lenslet array with the same focal length for each lenslet, and a detector placed on the focal plane of the lenslet array. The lenslet array, being equivalent to an array of subapertures, divides the wavefront of an incoming

Figure 7.4 Diagram of a SH sensor and the resulting spot patterns for (a) a plane wave and (b) a distorted wave.

Case Study 3: Adaptive Optics Systems

237

light beam into corresponding subregions and focuses the light beam in each lenslet onto the detector. When a plane wave is incident on the SH sensor, a regular spot pattern is formed on the focal plane of the lenslet array, as shown in Fig. 7.4(a). This regular spot pattern is taken as a reference spot array for measuring the distorted wavefront incident on the SH sensor. Here we assume that (xj, yj) are the coordinates of the centroid of the jth reference spot, i.e., the intersection point of the optical axis of the jth lenslet and the detector. When a distorted wave is incident on the SH sensor, the spot formed by each lenslet deviates from its reference spot, as shown in Fig. 7.4(b). The coordinates of the centroid of the jth spot from a distorted wavefront of an incoming light wave on the SH sensor can be simply calculated by P P ðxh , yh Þ∈Aj xh I ðxh , yh Þ ðx , y Þ∈A yh I ðxh , yh Þ , ycj ¼ P h h j , (7.21) xcj ¼ P ðxh , yh Þ∈Aj I ðxh , yh Þ ðxh , yh Þ∈Aj I ðxh , yh Þ where (xcj, ycj) are the calculated coordinates of the centroid of the jth spot, I(xh, yh) is the intensity recorded by the detector at pixel (xh, yh), Aj is the domain of the jth spot, and ∈ stands for “belongs to.” As shown in Fig. 7.5, due to geometrical similarity, angle u1 equals angle u2. Angle u1 can be approximately expressed as the slope of the wavefront at point (xj, yj) along the x axis: l ­wðx, yÞ , 2p ­x ðxj , yj Þ

Figure 7.5 Geometrical relationship between the slope of the wavefront on a lenslet and the deviation of the spot from its reference spot.

238

Chapter 7

where w(x, y), expressed in radians, is the wavefront of the incident light wave on the lenslet array; l is a working wavelength; and angle u2 can be approximately written as (xcj – xj)/f, where f is the focal length of the lenslet. Hence, it can be found that xcj  xj l ­wðx, yÞ ¼ f 2p ­x ðxj , yj Þ along the x axis. By the same reasoning, it can be found that ycj  yj l ­wðx, yÞ ¼ f 2p ­x ðxj , yj Þ along the y axis. Because these two equations are the theoretical basis for sensing a wavefront using a SH wavefront sensor, we join them as follows: xcj  xj l ­wðx, yÞ , ¼ f 2p ­x ðxj , yj Þ

ycj  yj l ­wðx, yÞ : ¼ f 2p ­y ðxj , yj Þ

(7.22)

The wavefront of the incoming light wave can be reconstructed by the measured slopes across all subapertures of the SH sensor. The reconstruction algorithm for the SH senor will be presented in Section 7.4. In order for a SH sensor to accurately measure the local slopes of a wavefront caused by atmospheric turbulence, the size of the sensor subaperture should be smaller than or equal the coherence length of atmospheric turbulence. Thus, high spatial frequencies of the wavefront that greatly degrade the image quality of a telescope can be sufficiently sampled and can alleviate the phenomenon of frequency aliasing during wavefront sensing. The SH sensor is simple, compact, highly sensitive, and independent of the wavelengths of incident light waves. Due to these multiple advantages, SH sensors are widely used not only in AO for astronomy but also in optical testing and other areas. 7.1.2.2 Lateral shearing interferometer

The lateral shearing interferometer (LSI) was a commonly used wavefront sensor in the early development of AO for astronomy. As shown in Fig. 7.6, two copies of the wavefront of an incoming light wave on the pupil plane are generated by a shearing device. One of these wavefronts undergoes a small shift s created by the shearing device. Together, the two wavefronts now produce an interference pattern in their overlapping area on the detector. According to the theory of the interference of light (Section 3.3), the recorded interference pattern can be expressed as

Case Study 3: Adaptive Optics Systems

239

Figure 7.6 Diagram of a LSI wavefront sensor.

I ðrÞ ¼ j exp½iwðrÞ þ exp½iwðr þ sÞj2 ¼ 2 þ 2 cos½wðrÞ  wðr þ sÞ,

(7.23)

where w(r) is one wavefront, and w(r þ s) is the other wavefront having a small shift s. If the shift quantity in the x and y axes is small, Eq. (7.23) can be simplified by the Taylor expansion as    ­wðx, yÞ I x ðx, yÞ ¼ 2 1 þ cos sx , ­x    ­wðx, yÞ I y ðx, yÞ ¼ 2 1 þ cos sy , (7.24) ­y where Ix(x, y) and Iy(x, y) are interference patterns along the x and y axes, respectively; sx and sy are the x and y components of the small shift s, respectively; l ­wðx, yÞ 2p ­x

and

l ­wðx, yÞ 2p ­y

are the slopes of the wavefront at point (x, y) along the x and y axes, respectively; and l is a working wavelength. Thus, the wavefront of the incoming light

240

Chapter 7

wave can be reconstructed using the slopes of the wavefront along the x and y axes.11 In a practical AO system performing wavefront sensing, the LSI requires an appropriate shear that fulfills two conditions. First, the shear of the LSI should be smaller than the coherence length of the atmospheric turbulence r0 (defined in Section 7.1). Second, the phase difference between light waves with and without a shear must be smaller than 2p to avoid the 2p ambiguity in wavefront reconstruction. 7.1.2.3 Curvature wavefront sensor

The curvature wavefront sensor (CS) was first proposed and developed by Roddier12 in 1988. The CS measures local curvatures of a wavefront using two intensity distributions recorded at two planes (such as planes P1 and P2 in Fig. 7.7) that are symmetrically located at the both sides of the focal plane. The locally convex curvature of the wavefront produces a small, bright illumination on plane P1 and a large, dim illumination on plane P2. Conversely, the locally concave curvature of the wavefront produces the opposite illuminations on planes P1 and P2. The difference between the intensity distributions on the two planes provides information on the local curvatures of the incoming wavefront. The local curvatures of the incoming wavefront can be described mathematically by the intensity transport equation as follows:12      I 1 ðrÞ  I 2 ðrÞ lf 2 fr fr 2 ¼ ∇w dc  ∇ w , I 1 ðrÞ þ I 2 ðrÞ 2pl l l

(7.25)

where f is the focal length; l is the depth of defocus; I1(r) and I2(–r) are intensity distributions corresponding to the position vector r on planes P1 and P2, respectively; ∇w is the first radial derivative of the wavefront at the edge;

Figure 7.7

Diagram of a CS.

Case Study 3: Adaptive Optics Systems

241

dc is a linear impulse distribution around the pupil edge; and ∇2 is the Laplacian. Note that the negative symbol in I2(–r) is due to a 180-deg rotation between I1 and I2 by the lens. With boundary conditions, the wavefront of the incoming light wave can be reconstructed by the algorithm described in Ref. 12. The CS is very sensitive to the distance between the focal plane and the recording planes. When sensing a highly distorted wavefront, the CS requires this distance to be small to avoid ambiguity. When sensing a small distorted wavefront, the CS requires a large distance between the focal plane and the recording planes. This characteristic greatly limits the CS’s dynamic range for wavefront sensing. 7.1.3 Wavefront correction The operation of wavefront correction is performed on a distorted wavefront that has been detected by a wavefront sensor. The purpose of this distortion compensation is to improve the image quality of an astronomical telescope. This operation is performed by a wavefront corrector, which is a key device due to its crucial influence on the performance of an AO system. A wavefront corrector compensates the distorted wavefront by generating a wavefront that is conjugate to the incident distorted wavefront. In this section, a brief review of wavefront correctors is given; a more detailed discussion of wavefront correctors can be found in Ref. 13. 7.1.3.1 Types of wavefront correctors

Wavefront correctors are generally divided onto two types according to the way they generate conjugate wavefronts to compensate the phase distortion of the incoming light wave. The first type is transmitted wavefront correctors, which generate conjugate wavefronts for the incoming light waves by changing the refractive indices of the media through which the light waves pass. An example of a transmitted wavefront corrector is the liquid crystal wavefront corrector. The second type is reflective wavefront correctors, which generate conjugate wavefronts for the incoming light waves by deforming the surface shapes of reflecting mirrors such as deformable mirrors. The advantages of reflective wavefront correctors are their high speed of response, large range of correction, high optical efficiency, and wavelength-independent capability to correct wavefronts. Due to these merits, reflective wavefront correctors are widely used in AO for astronomy. Reflective wavefront correctors can be divided into two classes according to their applications in AO systems. One type of reflective wavefront corrector is used to correct tip/tilt components in wavefronts caused by atmospheric turbulence and is therefore called a tip/tilt mirror or a fast-steering mirror. A tip/tilt mirror is a flat mirror with several actuators that drive the mirror to be inclined in two orthogonal directions for compensating the tip/tilt

242

Chapter 7

components in wavefronts. Because tip/tilt components dominate wavefront distortions caused by atmospheric turbulence, a tip/tilt mirror is indispensable as a first-order wavefront-compensation tool in an AO system. The other type of reflective wavefront corrector is used to correct high-order components in wavefront distortions caused by atmospheric turbulence; this type of reflective wavefront corrector is called a deformable mirror. In this subsection deformable mirrors used for correcting the high-order components of wavefront distortions are discussed in detail. A deformable mirror can be further divided into a segmented or continuous mirror, depending on its facesheet. A segmented mirror is made up of an arrangement of individual mirrors, each of which can be a square, rectangular, or hexagonal flat mirror. Each segment is driven by one or more individual actuators. The number of actuators for each segment determines the degrees of freedom of that segment. For example, the segments of the segmented mirror might each have one single actuator [Fig. 7.8(a)], meaning that each segment can compensate only the local piston errors of the wavefronts. However, if each segment of the segmented mirror has two or more actuators [Fig. 7.8(b)], each segment can compensate local piston and local tip/tilt errors of the wavefronts. The merits of a segmented mirror are a large dynamic range of correction, easy assembly, and easy replacement. Its drawbacks are a loss of optical energy due to the gaps between the mirror segments, and a high degree of fitting error between the incoming wavefront and the conjugate wavefront compared to the fitting error of a continuous deformable mirror with the same aperture and same number of actuators.

Figure 7.8 Schematic of two segmented mirrors compensating (a) only piston errors and (b) piston errors plus tip/tilt errors.

Case Study 3: Adaptive Optics Systems

243

Figure 7.9 Schematic of two continuous deformable mirror types: (a) a membrane deformable mirror and (b) a piezoelectric deformable mirror.

Figure 7.9 shows diagrams of two types of continuous deformable mirrors (both of which will be discussed in detail later in this subsection): a membrane deformable mirror [Fig. 7.9(a)] and a piezoelectric deformable mirror [(Fig. 7.9(b)]. Compared to segmented mirrors, continuous deformable mirrors have high optical efficiencies and a low degree of fitting error due to their continuous facesheets. The drawbacks of continuous deformable mirrors are that they have a small dynamic range of compensation, and are difficult to repair and maintain. Continuous deformable mirrors are commonly used in AO for astronomy due to their high performance in compensating distorted wavefronts caused by atmospheric turbulence. Hereafter, the deformable mirror is abbreviated as DM. DMs can also be classified according to the force that drives them, e.g., piezoelectric DMs, electrostatic DMs, magnetostrictive DMs, electromagnetic DMs, etc. Among these DM types, electrostatic and piezoelectric DMs are the most widely used and are discussed in detail in the following two subsections. 7.1.3.2 Membrane deformable mirrors

The membrane deformable mirror (MDM) is an example of an electrostatic DM. A MDM is a conductive membrane coated with a highly reflective material. It is driven by electrostatic forces that are generated by applying a voltage to each of its electric poles. When the voltage is applied to the electric pole, the portion of the conductive membrane corresponding to this electric pole is pulled, and the shape of the membrane is deformed. The higher the voltage the higher the degree of deformation. The advantages of the MDM are its compact structure, high-speed response, and low cost, giving it many potential applications in many

244

Chapter 7

Figure 7.10 Schematic of an MDM (adapted from OKO (Netherlands) MDM user manual).

technological areas. Figure 7.10 is a schematic of a MDM. The membrane is made up of silicon nitride coated with aluminum on its top surface for reflecting light waves and gold on its bottom for excellent conduction. Actuators are packaged on a printed circuit board (PCB). Because the conductive membrane is deformed by electrostatic forces, when bidirectional deformation is required, it is essential to apply a bias voltage to all electric poles to draw the membrane in the middle of its maximum stroke. 7.1.3.3 Piezoelectric deformable mirrors

The continuous-facesheet mirror of a piezoelectric DM is made up of quartz or fused quartz coated with a highly reflective material, and its actuators are made up of piezoelectric ceramics. The piezoelectric DM is driven by forces generated by the piezoelectric effect. If a voltage is applied to a piezoelectric ceramic disk along its longitudinal direction, randomly oriented dipoles in the material will align parallel to the applied electric field, leading to a longitudinal deformation in the piezoelectric ceramic disk. Generally speaking, an electric field with several hundred voltages can only generate a deformation of 0.1  0.2 mm in the disk actuator of a piezoelectric DM. This deformation is too small to compensate the distorted wavefront caused by atmospheric turbulence. To increase the deformation size of the piezoelectric DM, several piezoelectric ceramic disks are stacked together to obtain the desired deformation range. The actuators in this type of piezoelectric DM are called stacked disk actuators. Compared with MDMs, the merits of piezoelectric DMs are their larger apertures and larger dynamic ranges for correction of distorted wavefronts. The drawback of piezoelectric DMs is the inherent hysteresis exhibited in piezoelectric ceramics. Fitting error due to this hysteresis can be compensated by the piezoelectric DM itself when the DM is operating in a closed-loop

Case Study 3: Adaptive Optics Systems

245

feedback with a high frequency. However, for extremely high-performance AO systems, fitting error caused by the hysteresis of piezoelectric ceramics should be considered. 7.1.3.4 Technical parameters of the deformable mirror

The performance of a DM can be evaluated by a number of technical parameters that are frequently used by manufacturers to specify their DMs. The following technical parameters provide guidelines for the selection of DMs for particular applications. Number of actuators

Generally speaking, the number of actuators in a DM determines the fitting error, i.e., the spatial bandwidth, of that DM. The fitting error for the conjugated wavefront generated by a DM with more actuators is lower than that generated by a DM with the same aperture but fewer actuators. However, the cost of the DM will rapidly increase with an increase in the number of actuators, and this increase will further require a more advanced control system. A DM with an appropriate number of actuators should be selected according to the particular requirements of the AO system. Maximum stroke

The maximum stroke of a DM is the maximum range of wavefront compensation. In general, the wavefront to be compensated by an AO includes two types of distortion: the static distortion of the telescope and the AO system, and the dynamic distortion caused by atmospheric turbulence. Therefore, the maximum stroke of the DM should be greater than the sum of the static distortion and the dynamic distortion. Influence function and crosstalk value

The influence function of an actuator is the deformation that the DM undergoes when a unit voltage is applied to the actuator. Under the linear assumption, the combined influence function from each actuator in a stack can be considered as the influence function of the DM. (The influence function of a DM will be discussed in detail in subsection 7.4.2.) The crosstalk value of a DM is the ratio of the center deformation of an adjacent actuator caused by the working actuator to that of the working actuator. Both the influence function and the crosstalk value influence the performance of the DM. Crosstalk leads to the coupling of actuators, which should be decoupled by control algorithms. Crosstalk values in the range of 5  12% are generally considered to be reasonable for a DM. Sensitivity and response time

The sensitivity of a DM, also called the resolution of the actuator displacement, is the deformation of the DM when a minimum permissible voltage is applied

246

Chapter 7

on the actuator. The response time is the time interval between applying a voltage on the actuator and achieving steady state deformation of the mirror. In AO for astronomy, the sensitivity of a DM should be on the order of 10 nm, and the response time should be shorter than 1 ms. Initial shape

It is very difficult to manufacture a DM with a very “good” initial shape due to its extremely thin and quite complicated structure. The wavefront distortion caused by a good initial shape of a DM can be compensated by AO. However, the influence of a “bad” initial shape of a DM on the performance of the AO system must be compensated when designing the AO system. The technical parameters discussed above should be carefully considered when incorporating DMs into an AO system design. 7.1.4 Control system As mentioned earlier, an AO system can be considered as an automatic control system. The main task of this control system is to convert the wavefront information obtained from the wavefront sensor into control signals using a control algorithm that drives the DM to correct the distorted wavefront of the incoming light wave. This subsection outlines the control system and the basic idea of the modal control approach. 7.1.4.1 Closed-loop control of an AO system

A negative-feedback control loop is usually adopted in an AO system. The advantage of this closed-loop control approach is that it can overcome the phenomena of hysteresis and the creeping motions of the DM. Figure 7.11 is a block diagram of an AO system with a negative-feedback control loop, in which the wavefront sensor measures the residual wavefront after correction by the DM. As shown in the diagram, the main components of an AO system are: a wavefront sensor (WFS), a high-speed digital processor (HDP), a digital-to-analog convertor (DAC), a high-voltage amplifier (HVA), and a DM. wtur(x, y, t) is the distorted wavefront caused by

Figure 7.11 Block diagram of an AO system.

Case Study 3: Adaptive Optics Systems

247

atmospheric turbulence; wdm(x, y, t) is the wavefront conjugate to the distorted wavefront and was generated by the DM; and wres(x, y, t) is the residual wavefront, such that wres ðx, y, tÞ ¼ wtur ðx, y, tÞ  wdm ðx, y, tÞ: The goal of the AO system is to make wdm(x, y, t) track wtur(x, y, t) in real time to minimize wres(x, y, t) as far as possible. As shown in Fig. 7.11, the residual wavefront is sampled by the WFS and reconstructed by the HDP; the HDP calculates the digital control signals from the wavefront reconstructed by the WFS and sends these control signals to the DAC to convert them into analog signals; then, the analog signals are amplified by the HVA to drive the DM to correct the distorted wavefront. 7.1.4.2 Description of modal control

The distorted wavefront caused by atmospheric turbulence is a random signal that continuously varies with space and time, and its frequencies in both space and time are much larger than those that an AO system can process. Therefore, an AO system can only partially correct the distorted wavefront. To address this limitation, modal control was developed for correcting the modal components, which mostly degrade the image quality of a telescope, of the distorted wavefront. The principle of the modal control used in an AO system is schematically shown in Fig. 7.12. The reconstructed wavefront is decomposed into m modal components (Modal 1 to Modal m). Via the m controllers, H 1CC , H 2CC , · · · H m CC , these modals are further converted to m corresponding voltage vectors, u1,u2,. . . ,um, the length of each vector equaling the number of actuators in the DM. Then, these m voltage vectors are combined to one voltage vector u, which is applied to the DM. The rapid increase in aperture size of current astronomical telescopes requires that AO systems adopt more-advanced control algorithms,14–16 e.g., linear quadratic Gaussian control17 based on the state-space approach.

Figure 7.12

Diagram showing the scheme of the modal control used in an AO system.

248

Chapter 7

7.2 Astronomical Telescopes and Atmospheric Seeing Before designing an AO system for an astronomical telescope, it is essential to be familiar with the astronomical telescope concerned and the atmospheric seeing at the observatory where the astronomical telescope is located. 7.2.1 Astronomical telescopes If an astronomical telescope is required to be equipped with an AO system, some important optical parameters of this telescope should be known. One of such optical parameters is the telescope’s clear aperture, which can be used to estimate the variance of the distorted wavefront caused by atmospheric turbulence based on the discussion in the first section of this chapter; another is the position of the focal point of the telescope and the telescope’s corresponding F/#. To improve the observing efficiency of a telescope, the telescope generally has more than one focal point. The appropriate focal points should be selected after considering space limitation and installation convenience. The 2.16-m telescope at the Xinglong Observatory in the Hebei province of China has a clear aperture of 2.16 m and has three focal points: the primary, the Cassegrain, and the coudé. The F/# of the coudé focal point is 45. The coudé focal point is chosen as the station for installing the AO system due to its location and stationary nature. 7.2.2 Atmospheric seeing Atmospheric seeing (or simply seeing) is a measure of the strength of atmospheric turbulence and causes the image degradation of an astronomical telescope. In order to improve the image quality of the telescope using an AO system, the influence of the seeing on the image quality of the telescope should be thoroughly examined. As presented in Section 7.1, the coherence length, coherence time, and average wind velocity of atmospheric turbulence, which are closely related to the design decisions for an AO system, should be carefully considered. In astronomy, the seeing is estimated by the full width at half maximum of the long-exposure image of a point object, which can be expressed as angle l/r0, where l is the wavelength of light, and r0 (which varies as l6/5) is the coherence length of atmospheric turbulence as defined in subsection 7.1.1. The seeing determines the image quality of a ground-based telescope with an aperture whose diameter is larger than r0. This fact can be explained by the OTF of atmospheric turbulence, as expressed by Eq. (7.16). The OTF of atmospheric turbulence with poor seeing and a small r0 greatly decreases for high spatial frequencies due to the negative exponent function in Eq. (7.16). The cutoff frequency of the composite OTF expressed by Eq. (7.15) is determined by r0. The size of r0 determines the resolution of long-exposure imaging through atmospheric turbulence using a telescope with an aperture that is larger than r0.

Case Study 3: Adaptive Optics Systems

249

The coherence length of atmospheric turbulence at the Xinglong Observatory is about 6 cm at a wavelength of 0.55 mm, and the average wind velocity of atmospheric turbulence is 6 m/s. Compared with the seeing at other sites,18,19 the seeing at Xinglong site is poor. This means that the AO system for the 2.16-m telescope requires more-advanced hardware to achieve the corresponding performance compared to AO systems for telescopes on sites with good seeing.

7.3 Optical Design of the AO System In this section, an AO system developed for the 2.16-m telescope by NIAOT (Nanjing Institute of Astronomical Optics & Technology) is taken as an example to explain how to design an AO system for an astronomical telescope. Firstly, the first-order design of the AO system is presented. Then, the detailed design of the AO system in Zemax is provided. The core components used in the AO system for the 2.16-m telescope will be further described in Section 7.4. 7.3.1 First-order design of the AO system Several important points need to be considered before starting to design an AO system for an astronomical telescope. The first point is that the working wavelength band of the AO system to be designed must be determined; the second point is the that F/# of the light input part of the AO system should match that of the telescope; the third point is that the optical components should be reflective to avoid chromatic aberrations in the AO system; the final point is that the field of view for the AO system must be determined; the field of view for a conventional AO system is generally very small due to the small isoplanatic angle of the atmospheric turbulence—roughly about 10 arcsec. A diagram of the AO system for the 2.16-m telescope is shown in Fig. 7.13. Note that this diagram a detailed design result from Zemax; it is used here simply to illustrate the first-order design of the AO system. As shown in Fig. 7.13, an off-axis parabolic mirror (OAP1) collimates light waves from the coudé focal point of the 2.16-m telescope. In order to match the F/# of the 2.16-m telescope at the coudé focal point, the F/# of OAP1 should be smaller than 45 for no loss of light energy from the telescope. The collimated light beam from OAP1 is expanded by the combination of OAP2 and OAP3 to match the apertures of the tip/tilt mirror and a piezoelectric DM (DM1). To compensate the static aberrations of the AO system, including aberrations caused by the initial shape of DM1, another piezoelectric DM (DM2) is used. The use of DM2 permits the stroke of DM1 to be completely employed to correct the dynamic distorted wavefronts caused by atmospheric turbulence. To achieve this goal, the static aberrations of the entire AO system must be smaller than the range of correction of DM2. To match the aperture

250

Chapter 7

Figure 7.13 Diagram of the AO system for the 2.16-m telescope.

of DM2, the light beam from DM1 is compressed by the combination of OAP4 and OAP5. Then, the compressed light beam from DM2 is divided into three branches of beams by two beam splitters, BS1 and BS2. One beam, converged by an achromatic doublet L1, is incident on a quadrant avalanche photodiode (see Section 4.7) for tip/tilt sensing in the visible band. Another beam, compressed by the combination of two achromatic doublets (L2 and L3) to match the aperture of the wavefront sensor, is incident on a SH sensor for highorder wavefront sensing in the visible band. The final beam is focused on a science camera by OAP6 for imaging in the IR at a wavelength of 2.2 mm. Note that to improve the efficiency of optical energy use, and to satisfy the requirements of wavefront sensing and imaging in different bands, BS1 should have high reflectivity in the IR band and a splitting ratio of 50:50 in the visible band; BS2 should be a dichroic beam splitter that is longpass for the IR band and has high reflectivity for the visible band. It should be pointed out that the positions of the DM1 and DM2 mirror surfaces and the lenslet array of the SH sensor should all be conjugate to the pupil plane of the telescope. 7.3.2 Detailed design of the AO system The results of the first-order design for the AO system described above should be further adjusted to minimize the static aberrations of the three branches of

Case Study 3: Adaptive Optics Systems

251

beams by optimizing the structure parameters of the optical components in the AO system using the procedure presented in Section 2.7. To achieve this goal, the results of the first-order design are optimized in the commercial optical design software Zemax. The optical layout of the AO system after optimization is shown in Fig. 7.13. The field of view of this AO system is defined in term of angles. The full field of view of the AO system for the 2.16-m telescope is about 10 arcsec. The static wavefronts for the three branches of the beams of the AO system (see Fig. 7.14) are all in the half field of view of 5 arcsec. The wavefronts of the three beams in the AO system indicate that the optimized design result satisfies the requirements of the AO system. Note that the initial shapes of DM1 and DM2 are set as ideal planes in the Zemax detailed design of the AO system. Because the initial shapes of DM1 and DM2 can be substantially corrected by DM2 during the operation of the AO system, this negligence of their initial shapes is plausible. Furthermore, the optimized design result is a good guide for implementation of the AO system.

7.4 Core Components of the AO System and Related Algorithms In this section, both the core components and the related algorithms of the AO system are presented. The discussion covers the SH wavefront sensor and the wavefront reconstruction algorithm, two piezoelectric DMs and their control algorithms, and the tip/tilt mirror. 7.4.1 Shack–Hartmann wavefront sensor 7.4.1.1 Technical parameters

The technical parameters of the SH sensor used in the AO system are shown in Table 7.1. When choosing the SH sensor, both the atmospheric seeing and the requirements of the AO system are considered. When the SH sensor works under the mode of 256  256 pixels, the acquisition rate can be up to 1000 frames per second (fps), and the size of the spot array of the SH sensor is 11  11, approximately corresponding to a diameter of 3.5 mm for the input beam. 7.4.1.2 Wavefront reconstruction algorithm

As pointed out in Chapter 3, the distorted wavefront incident on the SH sensor can be expressed as a linear combination of the Zernike polynomials by wðx, yÞ ¼ a0 þ

m X i¼1

ci Zi ðx, yÞ,

(7.26)

where a is the piston error of the distorted wavefront, Zi(x, y) is the ith term of the Zernike polynomials in the Cartesian coordinates (x, y), and ci is the

252

Chapter 7

Figure 7.14 Wavefronts of the three branches of beams of the AO system.

Case Study 3: Adaptive Optics Systems

253

Table 7.1 Technical parameters of the SH wavefront sensor. Parameter

Description

Diameter of aperture Pixel size RMS value of the measurement accuracy Lenslet diameter Lenslet focal length Working wavelength Acquisition frequency

5 mm 14 mm l/100 300 mm 4.5 mm 350–1100 nm ≥ 1000 fps (256  256 pixels/frame)

coefficient of Zi(x, y). Note that, because the piston error has no effects on the image degradation of the telescope, the term representing the piston is neglected. Suppose that the spot array acquired by the SH sensor consists of J spots. Let shifts of the centroid for the jth spot in two orthogonal directions be denoted as Dxj and Dyj, respectively. Now Eq. (7.22) can be rewritten as Dyj Dxj l ­wðx, yÞ l ­wðx, yÞ , : (7.27) ¼ ¼ f f 2p ­x ðxj , yj Þ 2p ­y ðxj , yj Þ Substituting Eq. (7.26) into Eq. (7.27), equations for each spot can be given by 8 > ­Z 1 ðx, yÞ ­Z 2 ðx, yÞ ­Z m ðx, yÞ Dx1 > > þ c þ · · · þ c ¼ 2p c > 1 2 m ­x ­x ­x l f > > ðx1 , y1 Þ ðx1 , y1 Þ ðx1 , y1 Þ > > > .. > > > . > > > > ­Z 1 ðx, yÞ ­Z 2 ðx, yÞ ­Z m ðx, yÞ DxJ > > þ c2 ­x þ · · · þ cm ­x ¼ 2p < c1 ­x l f ðxJ , yJ Þ ðxJ , yJ Þ ðxJ , yJ Þ > ­Z ðx, yÞ ­Z ðx, yÞ ­Z ðx, yÞ Dy1 1 2 m > > c1 ­y þ c2 ­y þ · · · þ cm ­y ¼ 2p > l f > > ðx1 , y1 Þ ðx1 , y1 Þ ðx1 , y1 Þ > > > . > > .. > > > > > ­Z ðx, yÞ ­Z ðx, yÞ ­Z ðx, yÞ DyJ 1 2 m > þ c2 ­y þ · · · þ cm ­y ¼ 2p > l f : : c1 ­y ðxJ , yJ Þ

ðxJ , yJ Þ

ðxJ , yJ Þ

(7.28) In matrix notation, the equations in Eq. (7.28) can be simply denoted as Dc ¼ G,

(7.29)

where D, belonging to R2J  m, is a matrix consisting of the slope values of each Zernike polynomial at J subapertures; c, belonging to Rm  1, is the matrix comprising the coefficients of Zernike polynomials; and G, belonging to R2J  1, is the matrix comprising the slope values of the residual wavefront at J

254

Chapter 7

subapertures measured by the SH sensor. The solution c can be expressed via the generalized inverse of D by c ¼ Dþ G,

(7.30)

where D þ stands for the generalized inverse of D. Finally, D þ , belonging to Rm2J , is also known as the reconstruction matrix. Note that the number of terms in the Zernike polynomials used for correcting the distorted wavefront is a very important parameter for wavefront reconstruction; the criterion for estimating this number is presented in Section 7.5. 7.4.2 Piezoelectric deformable mirrors 7.4.2.1 Technical parameters

Two piezoelectric DMs used in the AO system of the 2.16-m telescope and their technical parameters are listed in Table 7.2. DM1, which has 109 actuators, is used to correct the distorted wavefront caused by atmospheric turbulence; DM2, which has 37 actuators, is employed to correct the static aberrations of the entire AO system. 7.4.2.2 Influence function matrix

As mentioned in subsection 7.1.3, when a unit voltage is applied to each actuator of the DM, the change in the wavefront Dw(x, y) caused by the deformation of the DM is considered as the influence function of this actuator, which can be expressed by the Zernike coefficient vector ci ¼ ½ci1 , ci2 , · · · ,cim T , where T stands for the transpose of the matrix, and ci is the influence function vector of this actuator of the DM. When considering the influence function of the entire DM, two assumptions are made: one is an approximately linear relationship between the deformation of the surface of the DM and the change in voltages applied to actuators of the DM; the other is that the total deformation of the DM surface can be regarded as a linear superposition of the deformation caused by each actuator of the DM. Based on these two assumptions, the influence Table 7.2 Technical parameters of two piezoelectric deformable mirrors. Description Parameter Diameter of clear aperture Number of actuators Initial RMS deviation from reference sphere Main initial aberration Actuator voltages Maximum stroke

DM1 50 mm 109 < 2 mm Sphere with R40 m 0  400 V (maximum) 8 mm

DM2 30 mm 37 < 0.077 mm Sphere with R50 m

Case Study 3: Adaptive Optics Systems

255

function of the DM can be expressed as a stack of influence function vectors for all actuators of the DM; i.e., M ¼ [c1c2. . . cn], where M is the influence function matrix of the DM, and obviously, M belongs to Rm  n. With the influence function matrix of the DM, the control vector of the DM can be obtained by solving the expression Ms þ c ¼ 0,

(7.31)

where s is the vector of voltages that will be applied on the DM actuators, and c, the reconstructed wavefront acquired by solving Eq. (7.30), is the Zernike coefficients of the distorted wavefront to be corrected. Equation (7.31) can be solved using the method of modal control, as explained in the following subsection. 7.4.2.3 Modal control method

According to the principle of singular value decomposition, there are two orthogonal matrices U and V for the influence function matrix M. U belongs to Rm  m, and V belongs to Rn  n. The two matrices U and V satisfy M ¼ U  S  VT ,

(7.32)

where S, belonging to Rm  n, is a diagonal matrix containing the singular values of M. It can be denoted that   Sr 0 S¼ , 0 0 where, in the diagonal matrix 0s

· · ·

1

B Sr ¼ B @ .. . 0

s2

..

.

01 C C, A sr

s1 to sr are nonzero singular values of matrix Sr and satisfy s1 ≥ s2 ≥ . . . sr > 0; and r is the rank of the influence function matrix. Furthermore, matrices U and V are denoted as U ¼ [u1, u2,. . . ,um] and V ¼ [v1, v2,. . . ,vn], respectively, where u1, u2,. . . ,um and v1,v2,. . . ,vn represent column vectors of matrices U and V, respectively. The column vectors of matrix U represent different deformation modes of the DM; the column vectors of matrix V represent different voltage modes applied to the actuators of the DM. Because a particularly small singular value means that a large actuator voltage is required to produce the unit amplitude of a corresponding deformation mode, modes corresponding to particularly small singular values should be discarded to improve the performance of the AO system. In the case where only the first p(p < r) modes

256

Chapter 7

are selected for correcting the distorted wavefront, the influence function matrix can be rewritten as Mp ¼

p X i¼1

si ui vTi :

(7.33)

Obviously, the rank of the matrix Mp is p. Then, it is easy to find that Mþ p ¼

p X 1 vi uTi , s i¼1 i

(7.34)

where Mþ p is the generalized inverse of matrix Mp. In the modal control of the AO system for the 2.16-m telescope, the control voltages of the DM are updated by the following algorithm: N sNþ1 ¼ hðsN  mMþ p c Þ,

(7.35)

where sN þ 1 are the control voltages for the actuators of the DM for the (N þ 1)th iteration, cN are the Zernike coefficients of the residual wavefront for the Nth iteration, m is the integration factor, and h(·) is the vector function in n dimensions for limiting the control voltages to the ranges allowed by the DM. 7.4.3 Piezoelectric tip/tilt mirror A tip/tilt mirror (TTM) is used in AO systems to correct the tip/tilt components of the distorted wavefront. The primary technical parameters of the TTM used in the AO system for the 2.16-m telescope are shown in Table 7.3. This TTM is driven by two pairs of actuators made of piezoelectric ceramics. Figure 7.15 is a schematic of a pair of piezoelectric actuators showing the working principle of a TTM in one dimension, i.e., along the x axis. To smoothly incline the mirror platform of a TTM along the x axis (as shown in the figure), the pair of actuators works in a differential push–pull mode. This differential push–pull mode means that any increase (decrease) in voltage applied to actuator 1 of this pair leads to a decrease (increase) in voltage— with the exact same magnitude as that on the actuator—being applied to actuator 2. A detailed description of the differential push–pull mode can be Table 7.3

Technical parameters of the tip/tilt mirror.

Parameter

Description

Diameter of clear aperture Tip/tilt angle Tip/tilt angle resolution

50 mm 2 mrad 0.02 mrad

Case Study 3: Adaptive Optics Systems

257

Figure 7.15 Schematic of a pair of piezoelectric actuators in a tip/tilt mirror (adapted from a PI user manual).

found in the user manuals of fast-steering mirrors made by Physik Instrumente (PI) GmbH & Co. KG. When an AO system is operating, the TTM together with the quadrant avalanche photodiode (QPD) make a closed loop of control for correcting the tip/tilt components of the distorted wavefront. In this control loop, the tip/tilt components of the distorted wavefront result in a shift in the light spot in two orthogonal directions on the QPD detector (see subsection 4.7.3.3). The shift of the light spot on the QPD detector results in different voltages in each of the four quadrants of the QPD. The differences between the four voltages characterize the quantity of the tip/tilt components. These voltage differences are converted by the control algorithm to voltages for driving the actuators of the TTM. In this way, the tip/tilt components of the wavefront can be corrected. The control algorithm for the TTM in the AO system also adopts the scheme of modal control, which is explained in the description of the control approach used in the piezoelectric DM (see subsection 7.4.2.3).

7.5 Order Estimation in Modal Wavefront Reconstruction Up to now, both the design and core components of the AO system for the 2.16-m telescope have been described. This section and Section 7.6 present two useful criteria for the stable operation of an AO system. A criterion for estimating the order of Zernike polynomials in modal wavefront reconstruction using an SH sensor will be discussed first. As pointed out by Noll,7 ideally, the residual reconstruction error—the difference between the distorted wavefront and the reconstructed wavefront—is determined only by the order of Zernike polynomials used for wavefront

258

Chapter 7

reconstruction. The higher the order of Zernike polynomials used the smaller the residual reconstruction error. However, the wavefront is sampled by a finite number of subapertures of the SH sensor, so the order of Zernike polynomials used to reconstruct the distorted wavefront is limited due to the insufficient sampling on the distorted wavefront. Therefore, the number of subapertures of the SH sensor determines the order of Zernike polynomials used for reconstructing the distorted wavefront. The problem of wavefront reconstruction using Zernike polynomials can be roughly analogized to the problem of curve fitting with polynomials. If a curve is sampled on M points, the highest order of the polynomial used for curve fitting to ensure a well-posed fitting is M – 1. In the problem of reconstructing the distorted wavefront with Zernike polynomials, if the number of subapertures of the SH sensor is M  N, the appropriate order of the Zernike polynomials used to reconstruct the distorted wavefront in a 2D plane is min(M, N) – 1. If the order of Zernike polynomials is larger than this value, the reconstructed wavefront is not accurate owing to the sampling being limited by the deficient number of subapertures of the SH sensor. If the order of Zernike polynomials is smaller than min(M, N) – 1, the sensing capability of the SH sensor in an AO system might not be fully exploited.

7.6 Matching Problem between the SH Sensor and the DM in an AO System In this section, another criterion for the matching problem between the SH sensor and the DM in an AO system is discussed. As pointed out in Section 7.1, the voltages applied to the actuators of a DM to correct the distorted wavefront are calculated from the reconstructed wavefront via the SH sensor. That is, in an AO system, the number of subapertures of a SH senor determines the accuracy of wavefront reconstruction, and the number of actuators of a DM determines the accuracy of wavefront correction. To maximize the performance of an AO system, the capability of wavefront sensing by the SH sensor should match that of wavefront correction by the DM; otherwise, the performance of the SH sensor or the DM might be limited. This means that the number of subapertures (or lenslets) of the SH sensor and the number of actuators of a DM should satisfy a certain condition to maximize the performance of an AO system. Let us consider that the number of lenslets of an SH sensor and the number of actuators of a DM are P and Q, respectively. In mathematics, the matching problem is turned to the problem of solving 2P equations to get voltages for Q actuators of the DM. The number of equations 2P is attributed to the slopes of the wavefront measured by the SH sensor in two orthogonal directions. If 2P is smaller than Q, the solution of the Q voltages is not unique, and the operation of the AO system is not stable. However, if 2P is larger

Case Study 3: Adaptive Optics Systems

259

than Q, then a unique least-squares solution of voltages for the Q actuators is always possible, and the operation of the AO system is stable. When matching the SH sensor to the DM in an AO system, the criterion P ≥ 1∕2Q should be satisfied. In practice, the value of 2P is not suitable if it is much larger than the value of Q because in this case the performance of the AO system is mainly limited by the DM.

7.7 Implementation of the AO System Controller For the issue of implementing the AO system, the optical and mechanical components are first manufactured and assembled according to the corresponding designs, then the controller is developed to achieve the expected performance of the entire AO system. The manufacture and assembly of the optical and mechanical components are omitted here due to the space limitation of this book. Interested readers can refer to Ref. 20 to find related topics. As the controller coordinates the operation of the whole AO system, it is described in detail as follows. Figure 7.16 shows the architecture of the controller of the AO system designed for the 2.16-m telescope. The controller consists of three control loops that are all commanded by a master computer. The first control

Figure 7.16 Schematic of the architecture of the AO system controller for the 2.16-m telescope.

260

Chapter 7

loop—composed of a real-time computer (RTC), a TTM, and a QPD—is used for correcting the tip/tilt components of the distorted wavefront caused by atmospheric turbulence. RTC1 is based on a high-speed computing platform of a field-programmable gate array (FPGA) and consists of an analog-to-digital convertor (ADC), a DAC, and a HVA. The second control loop, which comprises RTC2, DM1, and the SH sensor, is employed to correct high-order components of the distorted wavefront caused by atmospheric turbulence. RTC2, also based on a FPGA, is linked to the SH sensor by an interface called Camera Link. The third loop, which consists of the master computer, DM2, and the SH sensor, is applied to correct static aberrations of the entire AO system. As there is no need to control DM2 with a high frequency, the control of DM2 is carried out by the master computer. To achieve the expected performance of the AO system, the bandwidth of the system should be larger than the temporal bandwidth of atmospheric turbulence under the condition of a prescribed residual wavefront error. A detailed discussion of the bandwidth specification for the controller of an AO system is presented in Ref. 21 by Greenwood. Considering that the control loop frequency is roughly 10 times the bandwidth of the AO system designed for the 2.16-m telescope, the loop frequencies of RTC1 and RTC2 are 2000 Hz and 1000 Hz, respectively. The master computer is connected to RTC1, RTC2, and DM2 by means of Ethernet, and the IR camera is connected to the master computer by USB. A suite of user-friendly interface software is developed to allow free interaction between an operator and each subsystem of the three control loops. The performance of the entire AO system can be comprehensively tested under the coordination of the controller described above.

7.8 Performance of the AO System The installation of the designed AO system on the telescope for astronomical observations has not yet been completed at the time of writing this book, so all experimental data presented in this section have been acquired in the laboratory. A laser diode with a working wavelength of 0.66 mm coupled into a singlemode fiber is used as a point source to emulate a star for testing the performance of the AO system. Dynamic distorted wavefronts generated by a rotated atmospheric turbulence phase screen are introduced in the AO system. The phase screen is a disk of quartz with simulated phase distortions etched onto it.22 Images of the point source distorted by the simulated atmospheric turbulence phase screens before and after correction by the AO system are shown in Fig. 7.17. Before correction of the distorted wavefront, the maximum intensity of the image is 10, and the full width at half maximum

Case Study 3: Adaptive Optics Systems

261

Figure 7.17 Images of the point source (a) before and (b) following correction by the AO.

of the image is 0.82 arcsec, corresponding to a Strehl ratio about 0.02. After correction of the distorted wavefront, the maximum intensity of the image increases to 235, and the full width at half maximum of the image decreases to 0.064 arcsec, corresponding to a Strehl ratio of 0.67.

References 1. H. W. Babcock, “Possibility of compensating astronomical seeing,” Publications of the Astronomical Society of the Pacific 65(386), 229–236 (1953). 2. V. I. Tatarski, Wave Propagation in a Turbulent Medium, Sixth Edition, Translated by R. A. Silverman, McGraw-Hill, New York (1961). 3. F. Roddier, “The effects of atmospheric turbulence in optical astronomy,” Progress in Optics 19, 283–376 (1981). 4. D. L. Fried, “Statistics of a geometric representation of wavefront distortion,” Journal of the Optical Society of America 55(11), 1427–1435 (1965). 5. D. L. Fried, “Anisoplanatism in adaptive optics,” Journal of the Optic Society of America 72(1), 52–61 (1982). 6. F. Roddier, “Imaging through turbulence,” in Adaptive Optics in Astronomy, F. Roddier, Ed., Cambridge University Press, Cambridge, pp. 9–22 (1999). 7. R. J. Noll, “Zernike polynomials and atmospheric turbulence,” Journal of the Optical Society of America 66(3), 207–211 (1976). 8. D. L. Fried, “Optical resolution through a randomly inhomogeneous medium for very long and very short exposures,” Journal of the Optical Society of America 56(10), 1372–1379 (1966).

262

Chapter 7

9. D. L. Fried, “Statistics of a geometric representation of wavefront distortion,” Journal of the Optical Society of America 55(11), 1427–1435 (1965). 10. H. I. Campbell and A. H. Greenaway, “Wavefront sensing: from historical roots to the state-of-the-art,” in Astronomy with High Contrast Imaging III, A. Ferrari, M. Carbillet, C. Aime, and A. Ferrari, Eds., EAS Publications Series 22, 165–185, (2006). 11. C. Roddier, “Measurements of the atmospheric attenuation of the spectral components of astronomical images,” Journal of the Optical Society of America 66(5), 478–482 (1976). 12. F. Roddier, “Curvature sensing and compensation: a new concept in adaptive optics,” Applied Optics 27(7), 1223–1225 (1988). 13. M. Séchaud, “Wave-front compensation devices,” in Adaptive Optics in Astronomy, F. Roddier, Ed., Cambridge University Press, Cambridge, pp. 57–90 (1999). 14. D. Gavel and M. Reinig, “Wavefront control algorithms for the Keck next-generation adaptive optics system,” Proc. SPIE 7736, 773616 (2010) [doi: 10.1117/12.857738]. 15. A. G. Basden and R. M. Myers, “The Durham adaptive optics real-time controller: capability and extremely large telescope suitability,” Monthly Notices of the Royal Astronomical Society 424(2), 1483–1494 (2012). 16. A. Guesalaga, B. Neichel, F. Rigaut, J. Osborn, and D. Guzman, “Comparison of vibration mitigation controllers for adaptive optics systems,” Applied Optics 51(19), 4520–4535 (2012). 17. C. Kulcsár, H. Raynaud, C. Petit, and J. Conan, “Minimum variance prediction and control for adaptive optics,” Automatica 48(9), 1939–1954 (2012). 18. F. Rigaut, G. Rousset, P. Kern, J. C. Fontanella, J. P. Gaffard, F. Merkle, and P. Léna, “Adaptive optics on a 3.6-m telescope: results and performances,” Astronomy and Astrophysics 250(1), 280–290 (1991). 19. F. J. Roddier, L. L. Cowie, J. Elon Graves, A. Songaila, D. L. McKenna, J. Vernin, M. Azouit, J. L. Caccia, E. J. Limburg, C. A. Roddier, D. A. Salmon, S. Beland, D. J. Cowley, and S. Hill, “Seeing at Mauna Kea: a joint UH-UN-NOAO-CFHT study,” Proc. SPIE 1236, 485–491 (1990) [doi: 10.1117/12.19219]. 20. P. R. Yoder Jr., Opto-Mechanical Systems Design, Third Edition, CRC Press, Boca Raton, Florida (2006). 21. D. P. Greenwood, “Bandwidth specification for adaptive optics system,” Journal of the Optical Society of America 67(3), 390–393 (1977). 22. P. Jia and S. Zhang, “Simulation of the phase screen for atmospheric turbulence based on the fractal,” Research in Astronomy and Astrophysics 12(5), 584–590 (2012).

Appendices Appendix A: Dirac d Function A.1 Definition The Dirac d function (or simply, the d function) is a generalized function that is zero everywhere except at the origin, where it is infinite. In mathematics, the 1D d function can be written as  þ` , x ¼ 0 dðxÞ ¼ 0, x ≠ 0, and it also satisfies þ`

∫ dðxÞdx ¼ 1:

`

The 2D d function, frequently used in optics, is defined as  þ` , x ¼ 0, y ¼ 0 dðx, yÞ ¼ 0, else, and þ` þ`

∫ ∫ dðx, yÞdxdy ¼ 1:

` `

Furthermore, the 2D d function is a product of the 1D d functions separately in each variable, given formally as dðx, yÞ ¼ dðxÞdðyÞ: A.2 Properties A.2.1 Scaling property

For a nonzero scalar a, the d function satisfies the following scaling property: dðaxÞ ¼

263

dðxÞ : jaj

264

Appendices

Then, the symmetry property can be obtained immediately as dðxÞ ¼ dðxÞ, which also means that the d function is an even function. A.2.2 Shifting property

For an integrable function f(x), þ`



`

f ðxÞdðx  x0 Þdx ¼ f ðx0 Þ,

where d(x – x0) is a d function located at x ¼ x0. This integral is referred to as the shifting property, or the sampling property, because it picks out the value of the function f(x) at the location of the d function. The 2D version of this property, þ` þ`

∫ ∫ f ðx, yÞdðx  x0 , y  y0 Þdxdy ¼ f ðx0 , y0Þ,

` `

is used extensively in image formation theory. A.3 d Function as a limit The d function can be viewed as the limit of a sequence of functions: dðxÞ ¼ lim gN ðxÞ, N→`

where gN are functions having a tall spike at the origin and becoming narrower as N grows large. Many functions can be chosen to generate the function sequence gN. For example, using the rectangular function, 8 jxj . 0.5 < 0 rectðxÞ ¼ 0.5 jxj ¼ 0.5 : 1 jxj , 0.5, the d function can be represented as dðxÞ ¼ lim N rectðNxÞ: N→`

In the above definition, Nrect(Nx) is a function with a rectangular shape located at the origin and having a unit area. This rectangular shape becomes taller and narrower as N grows large, and Nrect(Nx) is a d function when N is infinite. Two other commonly used limit representations of the d function are based on the Gaussian function, i.e.,

Appendices

265

N 2 2 dðxÞ ¼ lim pffiffiffiffiffiffi eN x ∕2 , N→` 2p and N 2 2 dðxÞ ¼ lim pffiffiffiffiffiffiffi eiN x ∕2 : N→` 2pi A.4 A useful formula A useful formula involving the d function is dðx  x0 Þ ¼ ∫ei2pðxx0 Þj dj, which actually is the Fourier transform of the unit constant function.

Appendix B: Convolution B.1 Definition Let f(x) and g(x) be two 1D functions. The convolution of f(x) and g(x), denoted by ð f  gÞðxÞ, is defined as þ`

ð f  gÞðxÞ ¼



`

f ðjÞgðx  jÞdj:

Similarly, the convolution of the 2D functions f(x, y) and g(x, y) is defined as þ` þ`

ð f  gÞðx, yÞ ¼

∫ ∫

` `

f ðj, hÞgðx  j, y  hÞdjdh:

B.2 Description The process involved in the convolution of two functions f(x) and g(x) can be described as follows. According to the definition of ð f  gÞðxÞ, g is first reversed and shifted by an amount x, i.e., gðjÞ ⇒ gðjÞ ⇒ gðx  jÞ, where j is a dummy variable. Then the integral of the pointwise multiplication of f(j) and g(x – j) is taken as the value of f  g at x. Because the amount of shift x varies from  ` to þ `, a function ð f  gÞðxÞ is obtained. In fact, f  g at x can be considered as a weighted average of the function f(j), where the weighting function is given by g(j) with a shift of x. (This weighting function is often referred to as the convolution kernel.) Therefore, the convolution of two functions produces a third function that can be viewed as a modified version of one of the original functions, taking the other original function as the weighting function.

266

Appendices

B.3 Properties B.3.1 Algebraic properties

Let f, g, and h be arbitrary functions, and let a be a scalar. Convolution satisfies the following four algebraic properties: Commutativity: f g ¼gf Associativity: f  ðg  hÞ ¼ ð f  gÞ  h Distributivity: f  ðg þ hÞ ¼ ð f  gÞ þ ð f  hÞ Associativity with scalar multiplication: að f  gÞ ¼ ðaf Þ  g B.3.2 Convolution with the d function

f d¼f B.3.3 Translation invariance

τx ð f  gÞ ¼ ðτx f Þ  g ¼ f  ðτx gÞ, where τx f is the translation of the function f by x defined by (τx f) (j) ¼ f (j – x).

Appendix C: Correlation C.1 Definition Correlation, also known as cross-correlation, is a mathematical operation similar to convolution. The 1D cross-correlation of two functions f (x) and g(x), denoted by ð f ⊗ gÞðxÞ, is defined as þ`

ð f ⊗ gÞðxÞ ¼

∫ f ðjÞg ðx þ jÞdj,

`

where g* denotes the complex conjugate of g. ð f ⊗ gÞðxÞ is often referred to as a correlation function. The cross-correlation of a function with itself, defined as

Appendices

267 þ`

ð f ⊗ f ÞðxÞ ¼



`

f ðjÞf  ðx þ jÞdj,

is also called autocorrelation or an autocorrelation function. The cross-correlation of the 2D functions f (x, y) and g(x, y) is defined as þ` þ`

ð f ⊗ gÞðx, yÞ ¼

∫ ∫

` `

f ðj, hÞg ðx þ j, y þ hÞdjdh:

C.2 Description Although the definition of correlation is similar to that of convolution, the physical significances of the two terms are different. As described in its definition, the correlation of two functions at x is the integral of the pointwise multiplication of one function and the complex conjugate of the other function with a shift x. This integral can be considered as a measure of similarity, or more precisely, a slide similarity, between two the functions. Because the amount of shift x varies from  ` to þ `, similarities between the two functions with different shifts are obtained. C.3 Properties The correlation of functions f(x) and g(x) is equivalent to the convolution of f(x) and g*(–x). Therefore, if g is a Hermitian function, i.e., g(–x) ¼ g*(x), the correlation of f(x) and g(x) has the same properties as those of the convolution of f(x) and g(x). But if g(x) is not Hermitian, these properties may not hold. For example, the ordinary correlation operation does not obey the commutativity rule, i.e., f ⊗ g ≠ g ⊗ f:

Appendix D: Statistical Correlation The correlation operation is defined using deterministic functions. However, for stochastic functions, the values of which may vary over time, the previously defined correlation operation is not suitable, and statistical correlation is preferred. For stationary stochastic functions f1(x) and f2(x), the statistical correlation between them is defined as1 G12 ðτÞ ¼ h f 1 ðxÞf 2 ðx þ τÞi, where τ is a shift between f2 and f1, and h · i is the ensemble average. In practice, the ensemble average is replaced by the time average operation. To obtain a measure for the statistical correlation irrespective of the individual magnitudes of f1(x) and f2(x), a normalized correlation function is defined as

268

Appendices

G12 ðτÞ pffiffiffiffiffiffiffiffiffiffiffiffiffi , g12 ðτÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi G11 ð0Þ G22 ð0Þ where G11(0) and G22(0) are the statistical autocorrelation functions at zero for f2 and f1, respectively. By the Cauchy–Schwarz inequality, jh f 1 ðxÞ, f 2 ðx þ τÞij2 ≤ h f 1 ðxÞ, f 1 ðxÞih f 2 ðx þ τÞ, f 2 ðx þ τÞi, it can be inferred that 0 ≤ jg12 ðτÞj ≤ 1: The fact that |g12(τ)| ¼ 1 indicates that f2 and f1 are completely correlated. When |g12(τ)| ¼ 0, f2 and f1 are uncorrelated. The closer the value of |g12(τ)| grows to 1 the stronger the correlation between f2 and f1. Statistical correlation was previously defined for 1D functions. Its counterpart for 2D spatial functions can be similarly defined.

Appendix E: 2D Fourier Transform E.1 Definition The Fourier transform of a 2D spatial function f(x, y) is defined as f˜ ðu, vÞ ¼

þ` þ`

∫ ∫

` `

f ðx, yÞ exp½i2pðux þ vyÞdxdy,

(E.1)

where u and v are the spatial frequencies conjugate to x and y, respectively. f˜ ðu, vÞ is the Fourier spectrum of f(x, y). Conversely, f(x, y) can be reconstructed by f˜ ðu, vÞ via the inverse Fourier transform, defined as þ` þ`

f ðx, yÞ ¼

∫ ∫

` `

f˜ ðu, vÞ exp½i2pðux þ vyÞdudv:

(E.2)

The Fourier transform is also known as the Fourier analysis, while the inverse Fourier transform is also known as the Fourier synthesis. Hereafter, F ð · Þ and F 1 ð · Þ are used to represent the Fourier transform and the inverse Fourier transform, respectively. E.2 Description The Fourier transform is a tool that decomposes a function into an alternative representation in the frequency domain. As indicated in Eq. (E.1), f(x, y) is projected onto a complex exponential function exp[i2p(ux þ vy)] by the integral of the pointwise multiplication of f(x, y) and exp[–i2p(ux þ vy)], and

Appendices

269

the result of this integral is the magnitude of the component of f(x, y) at frequency (u, v). Mathematically speaking, the integral of the product of two functions is a kind of inner product (the minus sign in exp[–i2p(ux þ vy)] is due to the conjugating operation required by the inner product operation). The complex exponentials fexp½i2pðux þ vyÞju, v ∈ ð` , þ ` Þg are a set of complete orthogonal basis functions. Therefore, f(x, y) can be completely represented by a combination of its Fourier spectrum f˜ ðu, vÞ, as shown in Eq. (E.2). E.3 Properties In this section, some important and useful properties of the Fourier transform are presented as mathematical theorems. The proofs of these theorems, which can be found in Ref. 2, are omitted here. E.3.1 Linearity theorem

˜ vÞ, then For any scalars a and b, if F ½ f ðx, yÞ ¼ f˜ ðu, vÞ and F ½ gðx, yÞ ¼ gðu, ˜ vÞ, F ½af ðx, yÞ þ bgðx, yÞ ¼ af˜ ðu, vÞ þ bgðu, which means that the Fourier transform of the linear combination of two functions equals the linear combination of their individual Fourier transforms. E.3.2 Similarity theorem

For any nonzero scalars a and b, if F ½ f ðx, yÞ ¼ f˜ ðu, vÞ, then   1 ˜ u v F ½ f ðax, byÞ ¼ f , , jabj a b which indicates that a stretch (or contraction) of the coordinates in one domain results in a contraction (or stretch) of the coordinates in the other domain. E.3.3 Shift theorem

For any real numbers x0 and y0, if F ½ f ðx, yÞ ¼ f˜ ðu, vÞ, then F ½ f ðx  x0 , y  y0 Þ ¼ f˜ ðu, vÞ exp½i2pðux0 þ vy0 Þ, which means that a translation in the space domain leads to a linear phase shift in the frequency domain. This is a property of spatial shifting. A symmetrical counterpart to frequency shifting for any real numbers u0 and v0, if F ½ f ðx, yÞ ¼ f˜ ðu, vÞ, is

270

Appendices

F fexp½i2pðu0 x þ v0 yÞf ðx, yÞg ¼ f˜ ðu  u0 , v  v0 Þ, which means that a translation in the frequency domain also introduces a linear phase shift in the spatial domain. E.3.4 Parseval’s theorem

If F ½ f ðx, yÞ ¼ f˜ ðu, vÞ, then þ` þ`

þ` þ`

∫ ∫ jf ðx, yÞj dxdy ¼ ∫ ∫ 2

` `

` `

jf˜ ðu, vÞj2 dudv:

Parseval’s theorem, as an expression of the law of conservation of energy, indicates that the total energy contained in f (x, y) summed across all of x and y equals the total energy of its Fourier transform f˜ ðu, vÞ summed across all of u and v. Therefore, it can be concluded that the Fourier transform preserves the energy of the original quantity. E.3.5 Convolution theorem

If F ½ f ðx, yÞ ¼ f˜ ðu, vÞ and F ½gðx, yÞ ¼ g˜ ðu, vÞ, then F ½ f ðx, yÞ  gðx, yÞ ¼ f˜ ðu, vÞ˜gðu, vÞ: Similarly, F ½ f ðx, yÞgðx, yÞ ¼ f˜ ðu, vÞ  g˜ ðu, vÞ: The convolution theorem indicates that the convolution operation in one domain results in a simple multiplication operation in the other domain. This theorem can be used to avoid performing the complicated convolution operation; e.g., the convolution of two functions in the space domain can be obtained by the inverse Fourier transform of the product of the Fourier spectra of the two functions in the frequency domain. E.3.6 Autocorrelation theorem

If F ½ f ðx, yÞ ¼ f˜ ðu, vÞ, then F ½ð f ⊗ f Þðx, yÞ ¼ j f˜ ðu, vÞj2 : Similarly, F ½j f ðx, yÞj2  ¼ ð f˜ ⊗ f˜ Þðu, vÞ:

Appendices

271

The autocorrelation of f(x, y) is just the convolution of f(x, y) and f *(–x, –y), such that the autocorrelation theorem can be regarded as a special form of the convolution theorem. Additionally, this theorem relates the autocorrelation function to the power spectral density (which is presented in Appendix F) via the Fourier transform. E.3.7 Fourier integral theorem

At each continuity point of f(x, y), F F 1 ½ f ðx, yÞ ¼ F 1 F ½ f ðx, yÞ ¼ f ðx, yÞ, and FF ½ f ðx, yÞ ¼ F 1 F 1 ½ f ðx, yÞ ¼ f ðx,  yÞ: The Fourier integral theorem shows that successively performing the Fourier transform and the inverse Fourier transform on a function yields that same function again, except at points of discontinuity, while applying the Fourier transform twice (or the inverse transform twice) rotates the function 180 deg about the origin. The above Fourier integral theorems are frequently used and provide great convenience when manipulating the Fourier transform.

Appendix F: Power Spectrum The power spectrum, also known as power spectral density (PSD), of a signal describes the distribution of the average power over the frequencies of this signal. For a 2D stationary stochastic signal f(x, y), its spatial power spectrum can be defined as Sðu, vÞ ¼ lim hjLðu, vÞj2 i, M→`

where M

M

1 Lðu, vÞ ¼ ∫ ∫ f ðx, yÞ exp½i2pðux þ vyÞdxdy, 4M 2 M M u and v are the spatial frequencies conjugate to x and y, respectively, [–M,M] is the range for both x and y axes, and h · i is the ensemble average. As previously mentioned, in practice, the ensemble average is replaced by the time average.

272

Appendices

Appendix G: Linear Systems This appendix is adapted from Ref. 3. Let f (x1, y1) be the input of the system and g(x2, y2) be the corresponding response of the system to the input; then a system can be expressed as gðx2 , y2 Þ ¼ Sf f ðx1 , y1 Þg, where Sf · g represents the mapping from the input of the system to the response of the system. A system is linear if it satisfies the superposition property as Sfaf 1 ðx1 , y1 Þ þ bf 2 ðx1 , y1 Þg ¼ aSff 1 ðx1 , y1 Þg þ bSff 2 ðx1 , y1 Þg, where f1(x1, y1) and f2(x1, y1) are two arbitrary input functions, and a and b are two arbitrary scalars. G.1 Impulse response and superposition integral By the shifting property of the d function, a function f (x1, y1) can be expressed as a linear combination of a weighted and displaced d function: þ` þ`

f ðx1 , y1 Þ ¼

∫ ∫

` `

f ðj, hÞdðx1  j, y1  hÞdjdh:

Then the response of a system to the input f(x1, y1) can be written as (þ` þ` ) gðx2 , y2 Þ ¼ S

∫ ∫

` `

f ðj, hÞdðx1  j, y1  hÞdjdh :

Since f (j, h) is the weight factor for d(x1 – j, y1 – h) in the aforementioned combination representation, if this system is linear, Sf · g can directly operate on d(x1 – j, y1 – h), yielding þ` þ`

gðx2 , y2 Þ ¼

∫ ∫

` `

f ðj, hÞSfdðx1  j, y1  hÞgdjdh:

Sfdðx1  j, y1  hÞg, the response of the system to a d function, is called the impulse response of the system, denoted as h(x2, y2;j, h). The system input and response can now be expressed as þ` þ`

gðx2 , y2 Þ ¼

∫ ∫

` `

f ðj, hÞhðx2 , y2 ;j, hÞdjdh:

(G.1)

Appendices

273

This equation, called a superposition integral, relates the very important fact that the impulse response of a system h(x2, y2;j, h) can completely characterize the system. G.2 Invariant linear systems A linear system is space-invariant (or isoplanatic, in optics) if its impulse response h(x2, y2;j, h) depends only on the distances (x2 – j) and (y2 – h). The impulse response of such a system can be written as hðx2 , y2 ;j, hÞ ¼ hðx2  j, y2  hÞ: For an invariant linear system, the superposition integral [Eq. (G.1)] reduces to a particularly simple form: þ` þ`

gðx2 , y2 Þ ¼

∫ ∫

` `

f ðj, hÞhðx2  j, y2  hÞdjdh,

(G.2)

where the response of the system to the input function can be expressed as a 2D convolution of the input function with the impulse response of the system. Performing the Fourier transform on both sides of Eq. (G.2) and using the convolution theorem of the Fourier transform, the system’s spectral response g˜ ðu, vÞ and spectral input f˜ ðu, vÞ are related by a much simpler equation: ˜ vÞf˜ ðu, vÞ, g˜ ðu, vÞ ¼ hðu, ˜ vÞ, the Fourier transform of the impulse response, is called the where hðu, transfer function of the system.

References 1. J. W. Goodman, Statistical Optics, Wiley Classics Library Edition, John Wiley & Sons Inc., New York (2000). 2. A. Papoulis, The Fourier Integral and Its Applications, McGraw-Hill Inc., New York (1962). 3. J. W. Goodman, Introduction to Fourier Optics, Second Edition, McGraw-Hill, New York (1996).

Index A absorptive filter, 179 adaptive optics, 225 Airy pattern, 142, 95–96 amplitude of a wave, 81, 83 amplitude transfer function (ATF), 135 angular dispersion, 175 angular frequency, 81, 83 angular spectrum, 117–121 aspheric lens, 155–157 aspheric surface, 156 atmospheric turbulence, 225, 229–230, 232–234 axicon, 154

confocal microscope, 201–202 convex mirror, 160–161 cophasing, 208, 215, 218 cross-correlation function (see also mutual coherence function), 101 curvature wavefront sensor, 240–241 curved mirror, 160 cylindrical lens, 153–154 cylindrical mirror, 161 D deformable mirror, 241–243 detectivity, 192 detector array, 190 detector pinhole, 203 dielectric constant (see also electric permittivity), 78 diffraction, 6, 9, 10–11 diffraction factor, 173 dispersing prism, 164–165 division of the amplitude, 109 division of the wavefront, 107 Dove prism, 169 drum lens, 153 dual-wavelength digital holography, 208, 210

B bandpass filter, 181 beam-scanning system, 204–205 beam-splitting prism, 170 blazing, 178 C charge-coupled device (CCD), 190 CMOS sensor, 191 coherence, 99, 100 coherence length, 104, 231 coherence time, 104, 232, 234–235 coherent light source, 149 complex degree of coherence, 101–102 concave mirror, 160–161 concentric beam, 18

E Einstein coefficients, 147–148 electric charge density, 78 electric conductivity, 78 electric current density, 78 electric displacement, 78

275

276

electric field (see also electric vector), 78, 83 electric permittivity (see also dielectric constant), 78, 80 electric vector (see also electric field), 78, 83 ellipsoidal mirror, 161 energy levels, 147–150 eye relief, 200 eyepiece, 198–200 F Fabry–Pérot interferometer, 111 far-field diffraction, 93 filter, 179 finesse of fringes, 113 Fourier optics, 116, 122, 125, 133 Fourier transform, 90, 93, 113, 115–117, 119, 122–123, 125, 129–131, 135 Fraunhofer approximation, 93 Fraunhofer diffraction integral, 93 free spectral range, 177 frequency, 116–117, 83, 135 Fresnel approximation, 88–90, 93 Fresnel diffraction integral, 89–90, 93 fringe pattern (see also interference pattern), 99, 101–102, 105, 107, 109, 111, 116 fringe sharpness, 113 fringes of equal inclination, 112 fringes of equal thickness, 110

Index

Huygens–Fresnel principle, 87, 103 hyperbolical mirror, 161 I illumination pinhole, 203 incident angle, 22, 24–26, 42, 55–56 incident plane, 53 incident ray, 22–24, 31, 55, 57 incoherent light sources, 148 index of refraction, 19–20, 26 influence function, 245, 254–255 interference, 6, 11, 76, 99, 101–103, 109, 179 interference filter, 180 interference pattern (see also fringe pattern), 99, 109, 110 interference term, 101, 106 isoplanatic condition, 89, 135 K Kolmogorov power spectrum, 234

G graded-index fiber, 183 grating resolution, 175–176 gratings, 171

L laser, 149–150 lateral shearing interferometer, 238, 240 law of reflection, 23–24 law of refraction, 23–24, 28, 54–55, 59 light, 3–4, 9, 10 light intensity, 83, 95, 100, 102, 105, 108, 114, 135 light rays, 18, 20, 29, 41 light sources, 147 linearity, 193 longpass filter, 180 lower-index medium, 25

H half-intensity width, 113 Helmholtz equation, 81, 86 higher-index medium, 25

M magnetic field (see also magnetic vector), 78 magnetic induction, 78

Index

magnetic permeability, 78, 80 magnetic vector (see also magnetic field), 78, 83 Maxwell’s equations, 76, 79 microscope, 194 mode, 183–184, 186 modulation transfer function (MTF), 136 multibeam interference, 111 multibeam interference factor, 173 multimode fiber, 185 mutual coherence function (see also cross-correlation function), 101 N near-field diffraction, 93 negative (diverging) lenses, 151, 160 neutral-density filter, 182 noise, 192 noise equivalent power, 192 numerical aperture (NA), 198 O objective, 194, 197, 199, 203 optical detectors, 186 optical fiber, 183 optical filter, 178 optical Fourier spectrum, 126 optical path (see also optical path length), 105, 109, 137 optical path difference (OPD), 102, 105–106, 108, 137 optical path length (OPL) (see also optical path), 20, 30, 137 optical systems, 10, 12 optical transfer function (OTF), 136 optical wedge, 158 order of grating/interference, 173–174 order of interference, 109

277

P paraboloidal mirror, 161 parallel beam, 18–19 peak-to-valley (PV) value, 139 pentaprism, 169 phase, 81, 83–84, 132 phase structure function, 231–232 photoconductive detector, 188 photodiode, 189 photoelectric effect, 4–5 photoemissive detector, 187 photomultiplier, 187 photon detector, 187 photovoltaic detector, 188 pinholes, 203–204 plane mirror, 159–160 plane wave, 18, 22, 81 plane-parallel plate, 157–158 point diffraction Mach–Zehnder interferometer, 214–215 point spread function, 90, 133–135 polarization, 76, 82 Porro prism system, 166, 168 positive (converging) lens, 151, 160 Poynting vector, 83 prism, 164 pyroelectric detector, 187 Q quantum efficiency, 192 R Rayleigh criterion, 142 Rayleigh–Sommerfeld formula, 81, 86–88 reflected angle, 23–24 reflected ray, 23, 25 reflecting prism, 166 reflection, 18, 25, 43, 51 refracted angle, 23–25, 28 refracted ray, 23, 25 refraction, 18, 28, 43, 51 response time, 193

278

responsivity, 192 right-angle prism, 168 root-mean-square (RMS) value, 139 S scanning system, 204 seeing, 248–249 segmented mirror, 208, 215, 242 Shack–Hartmann (SH) wavefront sensor, 236, 238, 251 shortpass filter, 181 single-mode fiber, 185 Snell’s law, 23, 25, 41, 51 space-invariant system, 89 spatial coherence, 105, 115 spatial dispersion, 175 Sparrow criterion, 142 specimen-scanning system, 204 spectral lines, 174–176 spectral response, 192 spherical ball lens, 152 spherical lens, 151, 155–156 spherical waves, 18, 81–82, 87, 133 spontaneous emission, 148–149 step-index fiber, 183 stimulated absorption, 147–148, 150 stimulated emission, 148, 150 structure function of the refractive index, 229–230

Index

Strehl ratio, 142–143 T temporal coherence, 103, 113 thermal detectors, 186 thermoelectric detector, 186 tip/tilt mirror, 241, 256–257 total internal reflection, 25–26 two-beam interference, 107, 109, 111 V visibility of fringes, 102, 107 W wave equations, 80–81 wave vector, 81, 83, 118 wave–particle duality, 6–7, 12 wavefront, 11, 18, 30 wavefront aberration, 136, 138–140, 142 wavefront correction, 241, 258 wavefront sensing, 235, 258 wavelength of light, 83–84, 90, 111 wavenumber, 81 work function, 5 Z Zernike polynomials, 140, 142

Sijiong Zhang received his Ph.D. degree in optical instruments from the department of optical engineering at Beijing Institute of technology in 1996, his M.Sc. in Optics at Xi’an Institute of Optics and Fine Mechanics in 1989, and his B.Sc. degree in physics at Inner Mongolia University in 1986. He was an optical scientist and senior optical scientist at Heriot-Watt University, University of Cambridge, and STS, Imperial College from 1999 to 2010. He has been a professor of Adaptive Optics for astronomy at Nanjing Institute of Astronomical Optics and Technology (Chinese Academy of Sciences) since July 2010. His research interests are in adaptive optics and optical imaging. Changwei Li received his Ph.D. degree in optics from the department of physics at Harbin Institute of Technology in 2010, and his B.Sc. degree in physics at Northeast Normal University in 2005. He held a post-doctorate position in adaptive optics from 2010 to 2012 at Nanjing Institute of Astronomical Optics and Technology (Chinese Academy of Sciences) before becoming a full-time faculty member in 2012. His research interest is in adaptive optics. Shun Li received his Ph.D. degree in optics from Changchun Institute of Optics, Fine Mechanics and Physics (Chinese Academy of Sciences) in 2012, and his B.Sc. degree in electronic science and technology from the School of Physical Engineering at Zhengzhou University in 2004. He has been a staff member of Nanjing Institute of Astronomical Optics and Technology (Chinese Academy of Sciences) since 2012. His research interests include adaptive optics and digital holography.

through Theory and Case Studies Sijiong Zhang, Changwei Li, and Shun Li

P.O. Box 10 Bellingham, WA 98227-0010 ISBN: 9781510608351 SPIE Vol. No.: PM276

UNDERSTANDING OPTICAL SYSTEMS through Theory and Case Studies

ZHANG, LI, LI

This book explains how to understand and analyze the working principles of optical systems by means of optical theories and case studies. Part I focuses mainly on the theory of classical optics, providing an introduction to geometrical and wave optics, and some concepts of quantum and statistical optics. Part II presents case studies of three practical optical systems that comprise important and commonly used optical elements: confocal microscopes, online co-phasing optical systems for segmented mirrors, and adaptive optics systems. With the theoretical background gained in Part I, readers can apply their understanding of the optical systems presented in Part II to the conception of their own novel optical systems. The book can be used as a text or reference guide for students majoring in optics or physics. It can also be used as a reference for any scientist, engineer, or researcher whose work involves optical systems.

Understanding Optical Systems through Theory and Case Studies

UNDERSTANDING OPTICAL SYSTEMS

Sijiong Zhang Changwei Li Shun Li