432 36 70MB
English Pages 542 [543] Year 2023
Imaging Life
Imaging Life Image Acquisition and Analysis in Biology and Medicine
Lawrence R. Griffing Biology Department Texas A&M University Texas, United States
Copyright © 2023 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750–8400, fax (978) 750–4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748–6011, fax (201) 748–6008, or online at http://www.wiley.com/go/permission. Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762–2974, outside the United States at (317) 572–3993 or fax (317) 572–4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. A catalogue record for this book is available from the Library of Congress Hardback ISBN: 9781119949206; ePub ISBN: 9781119081579; ePDF ISBN: 9781119081593 Cover image(s): © Westend61/Getty Images; Yaroslav Kushta/Getty Images Cover design: Wiley Set in 9.5/12.5pt STIXTwoText by Integra Software Services Pvt. Ltd., Pondicherry, India
ffirs.indd 4
06-03-2023 10:24:40
v
Contents Preface xii Acknowledgments xiv About the Companion Website xv Section 1 Image Acquisition 1 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14
Image Structure and Pixels 3 The Pixel Is the Smallest Discrete Unit of a Picture 3 The Resolving Power of a Camera or Display Is the Spatial Frequency of Its Pixels 6 Image Legibility Is the Ability to Recognize Text in an Image by Eye 7 Magnification Reduces Spatial Frequencies While Making Bigger Images 9 Technology Determines Scale and Resolution 11 The Nyquist Criterion: Capture at Twice the Spatial Frequency of the Smallest Object Imaged 12 Archival Time, Storage Limits, and the Resolution of the Display Medium Influence Capture and Scan Resolving Power 13 Digital Image Resizing or Scaling Match the Captured Image Resolution to the Output Resolution 14 Metadata Describes Image Content, Structure, and Conditions of Acquisition 16 Pixel Values and Image Contrast 20 Contrast Compares the Intensity of a Pixel with That of Its Surround 20 Pixel Values Determine Brightness and Color 21 The Histogram Is a Plot of the Number of Pixels in an Image at Each Level of Intensity 24 Tonal Range Is How Much of the Pixel Depth Is Used in an Image 25 The Image Histogram Shows Overexposure and Underexposure 26 High-Key Images Are Very Light, and Low-Key Images Are Very Dark 27 Color Images Have Various Pixel Depths 27 Contrast Analysis and Adjustment Using Histograms Are Available in Proprietary and Open-Source Software 29 The Intensity Transfer Graph Shows Adjustments of Contrast and Brightness Using Input and Output Histograms 30 Histogram Stretching Can Improve the Contrast and Tonal Range of the Image without Losing Information 32 Histogram Stretching of Color Channels Improves Color Balance 32 Software Tools for Contrast Manipulation Provide Linear, Non-linear, and Output-Visualized Adjustment 34 Different Image Formats Support Different Image Modes 36 Lossless Compression Preserves Pixel Values, and Lossy Compression Changes Them 37
vi
Contents
3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 6 6.1 6.2
Representation and Evaluation of Image Data 42 Image Representation Incorporates Multiple Visual Elements to Tell a Story 42 Illustrated Confections Combine the Accuracy of a Typical Specimen with a Science Story 42 Digital Confections Combine the Accuracy of Photography with a Science Story 45 The Video Storyboard Is an Explicit Visual Confection 48 Artificial Intelligence Can Generate Photorealistic Images from Text Stories 48 Making Images Believable: Show Representative Images and State the Acquisition Method 50 Making Images Understood: Clearly Identify Regions of Interest with Suitable Framing, Labels, and Image Contrast 51 Avoid Dequantification and Technical Artifacts While Not Hesitating to Take the Picture 55 Accurate, Reproducible Imaging Requires a Set of Rules and Guidelines 56 The Structural Similarity Index Measure Quantifies Image Degradation 57 Image Capture by Eye 61 The Anatomy of the Eye Limits Its Spatial Resolution 61 The Dynamic Range of the Eye Exceeds 11 Orders of Magnitude of Light Intensity, and Intrascene Dynamic Range Is about 3 Orders 63 The Absorption Characteristics of Photopigments of the Eye Determines Its Wavelength Sensitivity 63 Refraction and Reflection Determine the Optical Properties of Materials 67 Movement of Light Through the Eye Depends on the Refractive Index and Thickness of the Lens, the Vitreous Humor, and Other Components 69 Neural Feedback in the Brain Dictates Temporal Resolution of the Eye 69 We Sense Size and Distribution in Large Spaces Using the Rules of Perspective 70 Three-Dimensional Representation Depends on Eye Focus from Different Angles 71 Binocular Vision Relaxes the Eye and Provides a Three-Dimensional View in Stereomicroscopes 74 Image Capture with Digital Cameras 78 Digital Cameras are Everywhere 78 Light Interacts with Silicon Chips to Produce Electrons 78 The Anatomy of the Camera Chip Limits Its Spatial Resolution 80 Camera Chips Convert Spatial Frequencies to Temporal Frequencies with a Series of Horizontal and Vertical Clocks 82 Different Charge-Coupled Device Architectures Have Different Read-out Mechanisms 85 The Digital Camera Image Starts Out as an Analog Signal that Becomes Digital 87 Video Broadcast Uses Legacy Frequency Standards 88 Codecs Code and Decode Digital Video 89 Digital Video Playback Formats Vary Widely, Reflecting Different Means of Transmission and Display 91 The Light Absorption Characteristics of the Metal Oxide Semiconductor, Its Filters, and Its Coatings Determine the Wavelength Sensitivity of the Camera Chip 91 Camera Noise and Potential Well Size Determine the Sensitivity of the Camera to Detectable Light 93 Scientific Camera Chips Increase Light Sensitivity and Amplify the Signal 97 Cameras for Electron Microscopy Use Regular Imaging Chips after Converting Electrons to Photons or Detect the Electron Signal Directly with Modified CMOS 99 Camera Lenses Place Additional Constraints on Spatial Resolution 101 Lens Aperture Controls Resolution, the Amount of Light, the Contrast, and the Depth of Field in a Digital Camera 106 Relative Magnification with a Photographic Lens Depends on Chip Size and Lens Focal Length 107 Image Capture by Scanning Systems 111 Scanners Build Images Point by Point, Line by Line, and Slice by Slice 111 Consumer-Grade Flatbed Scanners Provide Calibrated Color and Relatively High Resolution Over a Wide Field of View 111
Contents
6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11
Scientific-Grade Flatbed Scanners Can Detect Chemiluminescence, Fluorescence, and Phosphorescence 114 Scientific-Grade Scanning Systems Often Use Photomultiplier Tubes and Avalanche Photodiodes as the Camera 118 X-ray Planar Radiography Uses Both Scanning and Camera Technologies 119 Medical Computed Tomography Scans Rotate the X-ray Source and Sensor in a Helical Fashion Around the Body 121 Micro-CT and Nano-CT Scanners Use Both Hard and Soft X-Rays and Can Resolve Cellular Features 123 Macro Laser Scanners Acquire Three-Dimensional Images by Time-of-Flight or Structured Light 125 Laser Scanning and Spinning Disks Generate Images for Confocal Scanning Microscopy 126 Electron Beam Scanning Generates Images for Scanning Electron Microscopy 128 Atomic Force Microscopy Scans a Force-Sensing Probe Across the Sample 128 Section 2 Image Analysis 135
7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11
Measuring Selected Image Features 137 Digital Image Processing and Measurements are Part of the Image Metadata 137 The Subject Matter Determines the Choice of Image Analysis and Measurement Software 140 Recorded Paths, Regions of Interest, or Masks Save Selections for Measurement in Separate Images, Channels, and Overlays 140 Stereology and Photoquadrat Sampling Measure Unsegmented Images 144 Automatic Segmentation of Images Selects Image Features for Measurement Based on Common Feature Properties 146 Segmenting by Pixel Intensity Is Thresholding 146 Color Segmentation Looks for Similarities in a Three-Dimensional Color Space 147 Morphological Image Processing Separates or Connects Features 149 Measures of Pixel Intensity Quantify Light Absorption by and Emission from the Sample 153 Morphometric Measurements Quantify the Geometric Properties of Selections 155 Multi-dimensional Measurements Require Specific Filters 156
8.10 8.11 8.12 8.13
Optics and Image Formation 161 Optical Mechanics Can Be Well Described Mathematically 161 A Lens Divides Space Into Image and Object Spaces 161 The Lens Aperture Determines How Well the Lens Collects Radiation 163 The Diffraction Limit and the Contrast between Two Closely Spaced Self-Luminous Spots Give Rise to the Limits of Resolution 164 The Depth of the Three-Dimensional Slice of Object Space Remaining in Focus Is the Depth of Field 167 In Electromagnetic Lenses, Focal Length Produces Focus and Magnification 170 The Axial, Z-Dimensional, Point Spread Function Is a Measure of the Axial Resolution of High Numerical Aperture Lenses 171 Numerical Aperture and Magnification Determine the Light-Gathering Properties of the Microscope Objective 172 The Modulation (Contrast) Transfer Function Relates the Relative Contrast to Resolving Power in Fourier, or Frequency, Space 172 The Point Spread Function Convolves the Object to Generate the Image 176 Problems with the Focus of the Lens Arise from Lens Aberrations 177 Refractive Index Mismatch in the Sample Produces Spherical Aberration 182 Adaptive Optics Compensate for Refractive Index Changes and Aberration Introduced by Thick Samples 183
9 9.1 9.2
Contrast and Tone Control 189 The Subject Determines the Lighting 189 Light Measurements Use Two Different Standards: Photometric and Radiometric Units 190
8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
vii
viii
Contents
9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 10 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8
11 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 11.11 11.12 11.13
The Light Emission and Contrast of Small Objects Limits Their Visibility 194 Use the Image Histogram to Adjust the Trade-off Between Depth of Field and Motion Blur 194 Use the Camera’s Light Meter to Detect Intrascene Dynamic Range and Set Exposure Compensation 196 Light Sources Produce a Variety of Colors and Intensities That Determine the Quality of the Illumination 197 Lasers and LEDs Provide Lighting with Specific Color and High Intensity 199 Change Light Values with Absorption, Reflectance, Interference, and Polarizing Filters 200 Köhler-Illuminated Microscopes Produce Conjugate Planes of Collimated Light from the Source and Specimen 203 Reflectors, Diffusers, and Filters Control Lighting in Macro-imaging 207 Processing with Digital Filters 212 Image Processing Occurs Before, During, and After Image Acquisition 212 Near-Neighbor Operations Modify the Value of a Target Pixel 214 Rank Filters Identify Noise and Remove It from Images 215 Convolution Can Be an Arithmetic Operation with Near Neighbors 217 Deblurring and Background Subtraction Remove Out-of-Focus Features from Optical Sections 221 Convolution Operations in Frequency Space Multiply the Fourier Transform of an Image by the Fourier Transform of the Convolution Mask 222 Tomographic Operations in Frequency Space Produce Better Back-Projections 224 Deconvolution in Frequency Space Removes Blur Introduced by the Optical System But Has a Problem with Noise 224 Spatial Analysis 231 Affine Transforms Produce Geometric Transformations 231 Measuring Geometric Distortion Requires Grid Calibration 231 Distortion Compensation Locally Adds and Subtracts Pixels 231 Shape Analysis Starts with the Identification of Landmarks, Then Registration 232 Grid Transformations are the Basis for Morphometric Examination of Shape Change in Populations 234 Principal Component Analysis and Canonical Variates Analysis Use Measures of Similarity as Coordinates 237 Convolutional Neural Networks Can Identify Shapes and Objects Using Deep Learning 238 Boundary Morphometrics Analyzes and Mathematically Describes the Edge of the Object 240 Measurement of Object Boundaries Can Reveal Fractal Relationships 245 Pixel Intensity–Based Colocalization Analysis Reports the Spatial Correlation of Overlapping Signals 246 Distance-Based Colocalization and Cluster Analysis Analyze the Spatial Proximity of Objects 250 Fluorescence Resonance Energy Transfer Occurs Over Small (1–10 nm) Distances 252 Image Correlations Reveal Patterns in Time and Space 253
Temporal Analysis 260 Representations of Molecular, Cellular, Tissue, and Organism Dynamics Require Video and Motion Graphics 260 12.2 Motion Graphics Editors Use Key Frames to Specify Motion 262 12.3 Motion Estimation Uses Successive Video Frames to Analyze Motion 265 12.4 Optic Flow Compares the Intensities of Pixels, Pixel Blocks, or Regions Between Frames 266 12.5 The Kymograph Uses Time as an Axis to Make a Visual Plot of the Object Motion 268 12.6 Particle Tracking Is a Form of Feature-Based Motion Estimation 269 12.7 Fluorescence Recovery After Photobleaching Shows Compartment Connectivity and the Movement of Molecules 273 12.8 Fluorescence Switching Also Shows Connectivity and Movement 276 12.9 Fluorescence Correlation Spectroscopy and Raster Image Correlation Spectroscopy Can Distinguish between Diffusion and Advection 280 12.10 Fluorescent Protein Timers Provide Tracking of Maturing Proteins as They Move through Compartments 282 12 12.1
Contents
Three-Dimensional Imaging, Modeling, and Analysis 287 Three-Dimensional Worlds Are Scalable and Require Both Camera and Actor Views 287 Stacking Multiple Adjacent Slices Can Produce a Three-Dimensional Volume or Surface 291 Structure-from-Motion Photogrammetry Reconstructs Three-Dimensional Surfaces Using Multiple Camera Views 292 13.4 Reconstruction of Aligned Images in Fourier Space Produces Three-Dimensional Volumes or Surfaces 295 13.5 Surface Rendering Produces Isosurface Polygon Meshes Generated from Contoured Intensities 296 13.6 Texture Maps of Object Isosurfaces Are Images or Movies 299 13.7 Ray Tracing Follows a Ray of Light Backward from the Eye or Camera to Its Source 300 13.8 Ray Tracing Shows the Object Based on Internal Intensities or Nearness to the Camera 300 13.9 Transfer Functions Discriminate Objects in Ray-Traced Three Dimensions 301 13.10 Four Dimensions, a Time Series of Three-Dimensional Volumes, Can Use Either Ray-Traced or Isosurface Rendering 303 13.11 Volumes Rendered with Splats and Texture Maps Provide Realistic Object-Ordered Reconstructions 303 13.12 Analysis of Three-Dimensional Volumes Uses the Same Approaches as Two-Dimensional Area Analysis But Includes Voxel Adjacency and Connectivity 305 13.13 Head-Mounted Displays and Holograms Achieve an Immersive Three-Dimensional Experience 307 13 13.1 13.2 13.3
Section 3 Image Modalities 313 Ultrasound Imaging 315 Ultrasonography Is a Cheap, High-Resolution, Deep-Penetration, Non-invasive Imaging Modality 315 Many Species Use Ultrasound and Infrasound for Communication and Detection 315 Sound Is a Compression, or Pressure, Wave 316 The Measurement of Audible Sound Intensity Is in Decibels 317 A Piezoelectric Transducer Creates the Ultrasound Wave 318 Different Tissues Have Different Acoustic Impedances 319 Sonic Wave Scatter Generates Speckle 321 Lateral Resolution Depends on Sound Frequency and the Size and Focal Length of the Transducer Elements 322 Axial Resolution Depends on the Duration of the Ultrasound Pulse 323 Scatter and Absorption by Tissues Attenuate the Ultrasound Beam 324 Amplitude Mode, Motion Mode, Brightness Mode, and Coherent Planar Wave Mode Are the Standard Modes for Clinical Practice 324 14.12 Doppler Scans of Moving Red Blood Cells Reveal Changes in Vascular Flows with Time and Provide the Basis for Functional Ultrasound Imaging 327 14.13 Microbubbles and Gas Vesicles Provide Ultrasound Contrast and Have Therapeutic Potential 329
14 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 14.10 14.11
15 15.1 15.2 15.3 15.4 15.5 15.6 15.7
Magnetic Resonance Imaging 334 Magnetic Resonance Imaging, Like Ultrasound, Performs Non-invasive Analysis without Ionizing Radiation 334 Magnetic Resonance Imaging Is an Image of the Hydrogen Nuclei in Fat and Water 337 Magnetic Resonance Imaging Sets up a Net Magnetization in Each Voxel That Is in Dynamic Equilibrium with the Applied Field 338 The Magnetic Field Imposed by Magnetic Resonance Imaging Makes Protons Spin Like Tops with the Same Tilt and Determines the Frequency of Precession 338 Magnetic Resonance Imaging Disturbs the Net Magnetization Equilibrium and Then Follows the Relaxation Back to Equilibrium 339 T2 Relaxation, or Spin–Spin Relaxation, Causes the Disappearance of Transverse (x-y Direction) Magnetization Through Dephasing 342 T1 Relaxation, or Spin-Lattice Relaxation, Causes the Disappearance of Longitudinal (z-Direction) Magnetization Through Energy Loss 342
ix
x
Contents
15.8 Faraday Induction Produces the Magnetic Resonance Imaging Signal (in Volts) with Coils in the x-y Plane 343 15.9 Magnetic Gradients and Selective Radiofrequency Frequencies Generate Slices in the x, y, and z Directions 343 15.10 Acquiring a Gradient Echo Image Is a Highly Repetitive Process, Getting Information Independently in the x, y, and z Dimensions 344 15.11 Fast Low-Angle Shot Gradient Echo Imaging Speeds Up Imaging for T1-Weighted Images 346 15.12 The Spin-Echo Image Compensates for Magnetic Heterogeneities in the Tissue in T2-Weighted Images 346 15.13 Three-Dimensional Imaging Sequences Produce Higher Axial Resolution 347 15.14 Echo Planar Imaging Is a Fast Two-Dimensional Imaging Modality But Has Limited Resolving Power 347 15.15 Magnetic Resonance Angiography Analyzes Blood Velocity 347 15.16 Diffusion Tensor Imaging Visualizes and Compares Directional (Anisotropic) Diffusion Coefficients in a Tissue 349 15.17 Functional Magnetic Resonance Imaging Provides a Map of Brain Activity 350 15.18 Magnetic Resonance Imaging Contrast Agents Detect Small Lesions That Are Otherwise Difficult to Detect 351 Microscopy with Transmitted and Refracted Light 355 Brightfield Microscopy of Living Cells Uses Apertures and the Absorbance of Transmitted Light to Generate Contrast 355 16.2 Staining Fixed or Frozen Tissue Can Localize Large Polymers, Such as Proteins, Carbohydrates, and Nucleic Acids, But Is Less Effective for Lipids, Diffusible Ions, and Small Metabolites 361 16.3 Darkfield Microscopy Generates Contrast by Only Collecting the Refracted Light from the Specimen 365 16.4 Rheinberg Microscopy Generates Contrast by Producing Color Differences between Refracted and Unrefracted Light 368 16.5 Wave Interference from the Object and Its Surround Generates Contrast in Polarized Light, Differential Interference Contrast, and Phase Contrast Microscopies 369 16.6 Phase Contrast Microscopy Generates Contrast by Changing the Phase Difference Between the Light Coming from the Object and Its Surround 369 16.7 Polarized Light Reveals Order within a Specimen and Differences in Object Thickness 374 16.8 The Phase Difference Between the Slow and Fast Axes of Ordered Specimens Generates Contrast in Polarized Light Microscopy 376 16.9 Compensators Cancel Out or Add to the Retardation Introduced by the Sample, Making It Possible to Measure the Sample Retardation 379 16.10 Differential Interference Contrast Microscopy Is a Form of Polarized Light Microscopy That Generates Contrast Through Differential Interference of Two Slightly Separated Beams of Light 383 16 16.1
Microscopy Using Fluoresced and Reflected Light 390 Fluorescence and Autofluorescence: Excitation of Molecules by Light Leads to Rapid Re-emission of Lower Energy Light 390 17.2 Fluorescence Properties Vary Among Molecules and Depend on Their Environment 391 17.3 Fluorescent Labels Include Fluorescent Proteins, Fluorescent Labeling Agents, and Vital and Non-vital Fluorescence Affinity Dyes 394 17.4 Fluorescence Environment Sensors Include Single-Wavelength Ion Sensors, Ratio Imaging Ion Sensors, FRET Sensors, and FRET-FLIM Sensors 399 17.5 Widefield Microscopy for Reflective or Fluorescent Samples Uses Epi-illumination 402 17.6 Epi-polarization Microscopy Detects Reflective Ordered Inorganic or Organic Crystallites and Uses Nanogold and Gold Beads as Labels 405 17.7 To Optimize the Signal from the Sample, Use Specialized and Adaptive Optics 405 17.8 Confocal Microscopes Use Accurate, Mechanical Four-Dimensional Epi-illumination and Acquisition 408 17.9 The Best Light Sources for Fluorescence Match Fluorophore Absorbance 410 17.10 Filters, Mirrors, and Computational Approaches Optimize Signal While Limiting the Crosstalk Between Fluorophores 411 17 17.1
Contents
17.11 The Confocal Microscope Has Higher Axial and Lateral Resolving Power Than the Widefield Epi-illuminated Microscope, Some Designs Reaching Superresolution 415 17.12 Multiphoton Microscopy and Other Forms of Non-linear Optics Create Conditions for Near-Simultaneous Excitation of Fluorophores with Two or More Photons 419 Extending the Resolving Power of the Light Microscope in Time and Space 427 Superresolution Microscopy Extends the Resolving Power of the Light Microscope 427 Fluorescence Lifetime Imaging Uses a Temporal Resolving Power that Extends to Gigahertz Frequencies (Nanosecond Resolution) 428 18.3 Spatial Resolving Power Extends Past the Diffraction Limit of Light 429 18.4 Light Sheet Fluorescence Microscopy Achieves Fast Acquisition Times and Low Photon Dose 432 18.5 Lattice Light Sheets Increase Axial Resolving Power 435 18.6 Total Internal Reflection Microscopy and Glancing Incident Microscopy Produce a Thin Sheet of Excitation Energy Near the Coverslip 437 18.7 Structured Illumination Microscopy Improves Resolution with Harmonic Patterns That Reveal Higher Spatial Frequencies 440 18.8 Stimulated Emission Depletion and Reversible Saturable Optical Linear Fluorescence Transitions Superresolution Approaches Use Reversibly Saturable Fluorescence to Reduce the Size of the Illumination Spot 447 18.9 Single-Molecule Excitation Microscopies, Photo-Activated Localization Microscopy, and Stochastic Optical Reconstruction Microscopy Also Rely on Switchable Fluorophores 452 18.10 MINFLUX Combines Single-Molecule Localization with Structured Illumination to Get Resolution below 10 nm 455 18 18.1 18.2
19 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 19.9 19.10 19.11 19.12 19.13 19.14
Electron Microscopy 461 Electron Microscopy Uses a Transmitted Primary Electron Beam (Transmission Electron Micrography) or Secondary and Backscattered Electrons (Scanning Electron Micrography) to Image the Sample 461 Some Forms of Scanning Electron Micrography Use Unfixed Tissue at Low Vacuums (Relatively High Pressure) 462 Both Transmission Electron Micrography and Scanning Electron Micrography Use Frozen or Fixed Tissues 465 Critical Point Drying and Surface Coating with Metal Preserves Surface Structures and Enhances Contrast for Scanning Electron Micrography 467 Glass and Diamond Knives Make Ultrathin Sections on Ultramicrotomes 468 The Filament Type and the Condenser Lenses Control Illumination in Scanning Electron Micrography and Transmission Electron Micrography 471 The Objective Lens Aperture Blocks Scattered Electrons, Producing Contrast in Transmission Electron Micrography 474 High-Resolution Transmission Electron Micrography Uses Large (or No) Objective Apertures 475 Conventional Transmission Electron Micrography Provides a Cellular Context for Visualizing Organelles and Specific Molecules 479 Serial Section Transmitted Primary Electron Analysis Can Provide Three-Dimensional Cellular Structures 482 Scanning Electron Micrography Volume Microscopy Produces Three-Dimensional Microscopy at Nanometer Scales and Includes In-Lens Detectors and In-Column Sectioning Devices 483 Correlative Electron Microscopy Provides Ultrastructural Context for Fluorescence Studies 488 Tomographic Reconstruction of Transmission Electron Micrography Images Produces Very Thin (10-nm) Virtual Sections for High-Resolution Three-Dimensional Reconstruction 490 Cryo-Electron Microscopy Achieves Molecular Resolving Power (Resolution, 0.1–0.2 Nm) Using Single-Particle Analysis 492 Index 497
xi
xii
Preface Imaging Life Has Three Sections: Image Acquisition, Image Analysis, and Imaging Modalities The first section, Image Acquisition, lays the foundation for imaging by extending prior knowledge about image structure (Chapter 1), image contrast (Chapter 2), and proper image representation (Chapter 3). The chapters on imaging by eye (Chapter 4), by camera (Chapter 5), and by scanners (Chapter 6) relate to prior knowledge of sight, digital (e.g., cell phone) cameras, and flatbed scanners. The second section, Image Analysis, starts with how to select features in an image and measure them (Chapter 7). With this knowledge comes the realization that there are limits to image measurement set by the optics of the system (Chapter 8), a system that includes the sample and the light- and radiation-gathering properties of the instrumentation. For light-based imaging, the nature of the lighting and its ability to generate contrast (Chapter 9) optimize the image data acquired for analysis. A wide variety of image filters (Chapter 10) that operate in real and reciprocal space make it possible to display or measure large amounts of data or data with low signal. Spatial measurement in two dimensions (Chapter 11), measurement in time (Chapter 12), and processing and measurement in three dimensions (Chapter 13) cover many of the tenets of image analysis at the macro and micro levels. The third section, Imaging Modalities, builds on some of the modalities necessarily introduced in previous chapters, such as computed tomography (CT) scanning, basic microscopy, and camera optics. Many students interested in biological imaging are particularly interested in biomedical modalities. Unfortunately, most of the classes in biomedical imaging are not part of standard biology curricula but in biomedical engineering. Likewise, students in biomedical engineering often get less exposure to microscopy-related modalities. This section brings the two together. The book does not use examples from materials science, although some materials science students may find it useful.
Imaging Life Can Be Either a Lecture Course or a Lab Course This book can stand alone as a text for a lecture course on biological imaging intended for junior or senior undergraduates or first- and second-year graduate students in life sciences. The annotated references section at the end of each chapter provides the URLs for supplementary videos available from iBiology.com and other recommended sites. In addition, the recommended text-based internet, print, and electronic resources, such as microscopyu.com, provide expert and in-depth materials on digital imaging and light microscopy. However, these resources focus on particular imaging modalities and exclude some (e.g., single-lens reflex cameras, ultrasound, CT scanning, magnetic resonance imaging [MRI], structure from motion). The objective of this book is to serve as a solid foundation in imaging, emphasizing the shared concepts of these imaging approaches. In this vein, the book does not attempt to be encyclopedic but instead provides a gateway to the ongoing advances in biological imaging. The author’s biology course non-linearly builds off this text with weekly computer sessions. Every third class session covers practical image processing, analysis, and presentations with still, video, and three-dimensional (3D) images. Although these computer labs may introduce Adobe Photoshop and Illustrator and MATLAB and Simulink (available on our university computers), the class primarily uses open-source software (i.e., GIMP2, Inkscape, FIJI [FIJI Is Just ImageJ], Icy, and Blender). The course emphasizes open-source imaging. Many open-source software packages use published and
Preface
archived algorithms. This is better for science, making image processing more reproducible. They are also free or at least cheaper for students and university labs. The images the students acquire on their own with their cell phones, in the lab (if taught as a lab course), or from online scientific databases (e.g., Morphosource.org) are the subjects of these tutorials. The initial tutorials simply introduce basic features of the software that are fun, such as 3D model reconstruction in FIJI of CT scans from Morphosource, and informative, such as how to control image size, resolving power, and compression for analysis and publication. Although simple, the tutorials address major pedagogical challenges caused by the casual, uninformed use of digital images. The tutorials combine the opportunity to judge and analyze images acquired by the students with the opportunity to learn about the software. They are the basis for weekly assignments. Later tutorials provide instruction on video and 3D editing, as well as more advanced image processing (filters and deconvolution) and measurement. An important learning outcome for the course is that the students can use this software to rigorously analyze and manage imaging data, as well as generate publication-quality images, videos, and presentations. This book can also serve as a text for a laboratory course, along with an accompanying lab manual that contains protocols for experiments and instructions for the operation of particular instruments. The current lab manual is available on request, but it has instructions for equipment at Texas A&M University. Besides cell phones, digital single-lens reflex cameras, flatbed scanners, and stereo-microscopes, the first quarter of the lab includes brightfield transmitted light microscopy and fluorescence microscopy. Assigning Chapter 16 on transmitted light microscopy and Chapter 17 on epi-illuminated light microscopy early in the course supplements the lab manual information and introduces the students to microscopy before covering it during class time. Almost all the students have worked with microscopes before, but many have not captured images that require better set-up (e.g., Köhler illumination with a sub-stage condenser) and a more thorough understanding of image acquisition and lighting. The lab course involves students using imaging instrumentation. All the students have access to cameras on their cell phones, and most labs have access to brightfield microscopy, perhaps with various contrast-generating optical configurations (darkfield, phase contrast, differential interference contrast). Access to fluorescence microscopy is also important. One of the anticipated learning outcomes for the lab course is that students can troubleshoot optical systems. For this reason, it is important that they take apart, clean, and correctly reassemble and align some optical instruments for calibrated image acquisition. With this knowledge, they can become responsible users of more expensive, multi-user equipment. Some might even learn how to build their own! Access to CT scanning, confocal microscopy, multi-photon microscopy, ultrasonography, MRI, light sheet microscopy, superresolution light microscopy, and electron microscopy will vary by institution. Students can use remote learning to view demonstrations of how to set up and use them. Many of these instruments have linkage to the internet. Zoom (or other live video) presentations provide access to operator activity for the entire class and are therefore preferable for larger classes that need to see the operation of a machine with restricted access. Several instrument companies provide video demonstrations of the use of their instruments. Live video is more informative, particularly if the students read about the instruments first with a distilled set of instrument-operating instructions, so they can then ask questions of the operators. Example images from the tutorials for most of these modalities should be available for student analysis.
xiii
xiv
Acknowledgments Peter Hepler and Paul Green taught a light and electron microscopy course at Stanford University that introduced me to the topic while I was a graduate student of Peter Ray. After working in the lab of Ralph Quatrano, I acquired additional expertise in light and electron microscopy as a post-doc with Larry Fowke and Fred Constabel at the University of Saskatchewan and collaborating with Hilton Mollenhauer at Texas A&M University. They were all great mentors. I created a light and electron microscopy course for upper-level undergraduates with Kate VandenBosch, who had taken a later version of Hepler’s course at the University of Massachusetts. However, with the widespread adoption of digital imaging, I took the course in a different direction. The goals were to introduce students to digital image acquisition, processing, and analysis while they learned about the diverse modalities of digital imaging. The National Science Foundation and the Biology Department at Texas A&M University provided financial support for the course. No single textbook existing for such a course, I decided to write one. Texas A&M University graciously provided one semester of development leave for its completion. Martin Steer at University College Dublin and Chris Hawes at Oxford Brookes University, Oxford, read and made constructive comments on sections of the first half of the book, as did Kate VandenBosch at the University of Wisconsin. I thank them for their help, friendship, and encouragement. I give my loving thanks to my children. Alexander Griffing contributed a much-needed perspective on all of the chapters, extensively copy edited the text, and provided commentary and corrections on the math. Daniel Griffing also provided helpful suggestions. Beth Russell was a constant source of enthusiasm. My collaborators, Holly Gibbs and Alvin Yeh at Texas A&M University, read several chapters and made comments and contributions that were useful and informative. Jennifer Lippincott-Schwartz, senior group leader and head of Janelia’s four-dimensional cellular physiology program, generously provided comment and insight on the chapters on temporal operations and superresolution microscopy. I also wish to thank the students in my lab who served as teaching assistants and provided enthusiastic and welcome feedback, particularly Kalli Landua, Krishna Kumar, and Sara Maynard. The editors at Wiley, particularly Rosie Hayden and Julia Squarr, provided help and encouragement. Any errors, of course, are mine. The person most responsible for the completion of this book is my wife, Margaret Ezell, who motivates and enlightens me. In addition to her expertise and authorship on early modern literary history, including science, she is an accomplished photographer. Imaging life is one of our mutual joys. I dedicate this book to her, with love and affection.
xv
About the Companion Website This book is accompanied by a companion website: www.wiley.com/go/griffing/imaginglife Please note that the resources are password protected. The resources include: Images and tables from the book Examples of the use of open source software to introduce and illustrate important features with video tutorials on YouTube ● Data, and a description of its acquisition, for use in the examples ● ●
1
Section 1 Image Acquisition
3
1 Image Structure and Pixels 1.1 The Pixel Is the Smallest Discrete Unit of a Picture Images have structure. They have a certain arrangement of small and large objects. The large objects are often composites of small objects. The Roman mosaic from the House VIII.1.16 in Pompeii, the House of Five Floors, has incredible structure (Figure 1.1). It has lifelike images of a bird on a reef, fishes, an electric eel, a shrimp, a squid, an octopus, and a rock lobster. It illustrates Aristotle’s natural history account of a struggle between a rock lobster and an octopus. In fact, the species are identifiable and are common to certain bays in the Italian coast, a remarkable example of early biological imaging. It is a mosaic of uniformly sized square colored tiles. Each tile is the smallest picture element, or pixel, of the mosaic. At a certain appropriate viewing distance from the mosaic, the individual pixels cannot be distinguished, or resolved, and what is a combination of individual tiles looks solid or continuous, taking the form of a fish, or lobster, or octopus. When viewed closer than this distance, the individual tiles or pixels become apparent (see Figure 1.1); the image is pixelated. Beyond viewing it from the distance that is the height of the person standing on the mosaic, pixelation in this scene was probably further reduced by the shallow pool of water that covered it in the House of Five Floors. The order in which the image elements come together, or render, also describes the image structure. This mosaic was probably constructed by tiling the different objects in the scene, then surrounding the objects with a single layer of tiles of the black background (Figure 1.2), and finally filling in the background with parallel rows of black tiles. This form of image construction is object-order rendering. The background rendering follows the rendering of the objects. Vector graphic images use object-ordered rendering. Vector graphics define the object mathematically with a set of vectors and render it in a scene, with the background and other objects rendered separately. Vector graphics are very useful because any number of pixels can represent the mathematically defined objects. This is why programs, such as Adobe Illustrator, with vector graphics for fonts and illustrated objects are so useful: the number (and, therefore, size) of pixels that represent the image is chosen by the user and depends on the type of media that will display it. This number can be set so that the fonts and objects never have to appear pixelated. Vector graphics are resolution independent; scaling the object to any size will not lose its sharpness from pixelation. Another way to make the mosaic would be to start from the top upper left of the mosaic and start tiling in rows. One row near the top of the mosaic contains parts of three fishes, a shrimp, and the background. This form of image structure is image-order rendering. Many scanning systems construct images using this form of rendering. A horizontal scan line is a raster. Almost all computer displays and televisions are raster based. They display a rasterized grid of data, and because the data are in the form of bits (see Section 2.2), it is a bitmap image. As described later, bitmap graphics are resolution dependent; that is, as they scale larger, the pixels become larger, and the images become pixelated. Even though pixels are the smallest discrete unit of the picture, it does have structure. The fundamental unit of visualization is the cell (Figure 1.3). A pixel is a two-dimensional (2D) cell described by an ordered list of four points (its corners or vertices), and geometric constraints make it square. In three-dimensional (3D) images, the smallest discrete unit of the volume is the voxel. A voxel is the 3D cell described by an ordered list of eight points (its vertices), and geometrics constraints make it a cube.
Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
4
1 Image Structure and Pixels
Figure 1.1 The fishes mosaic (second century BCE) from House VII.2.16, the House of Five Floors, in Pompeii. The lower image is an enlargement of the fish eye, showing that light reflection off the eye is a single tile, or pixel, in the image. Photo by Wolfgang Rieger, http://commons.wikimedia.org/wiki/File:Pompeii_-_ Casa_del_Fauno_-_MAN.jpg and is in the public domain (PD-1996).
Figure 1.2 Detail from Figure 2.1. The line of black tiles around the curved borders of the eel and the fish are evidence that the mosaic employs object-order rendering.
Color is a subpixel component of electronic displays; printed material; and, remarkably, some paintings. Georges Seurat (1859– 1891) was a famous French post-impressionist painter. Seurat communicated his impression of a scene by constructing his picture from many small dabs or points of paint (Figure 1.4); he was a pointillist. However, each dab of paint is not a pixel. Instead, when standing at the appropriate viewing distance, dabs of differently colored paint combine to form a new color. Seurat pioneered this practice of subpixel color. Computer displays use it, each pixel being made up of stripes (or dots) of red, green, and blue color (see Figure 1.4). The intensity of the different stripes determines the displayed color of the pixel. For many printed images, the half-tone cell is the pixel. A halftone cell contains an array of many black and white dots or dots of different colors (see Figure 1.10); the more dots within the halftone cell, the more shades of gray or color that are possible. Chapter 2 is all about how different pixel values produce different shades of gray or color.
Figure 1.4 This famous picture A Sunday Afternoon on the Island of La Grande Jatte (1884–1886) by Georges Seurat is made up of small dots or dabs of paint, each discrete and with a separate color. Viewed from a distance, the different points of color, usually primary colors, blend in the mind of the observer and create a canvas with a full spectrum of color. The lower panel shows a picture of a liquid crystal display on a laptop that is displaying a region of the Seurat painting magnified through a lens. The view through the lens reveals that the image is composed of differently illuminated pixels made up of parallel stripes of red, green, and blue colors. The upper image is from https://commons.wikimedia.org/wiki/File:A_Sunday_on_La_ Grande_Jatte,_Georges_Seurat,_1884.jpg. Lower photos by L. Griffing.
Figure 1.3 Cell types found in visualization systems that can handle two- and three-dimensional representation. Diagram by L. Griffing.
6
1 Image Structure and Pixels
1.2 The Resolving Power of a Camera or Display Is the Spatial Frequency of Its Pixels In biological imaging, we use powerful lenses to resolve details of far away or very small objects. The round plant protoplasts in Figure 1.5 are invisible to the naked eye. To get an image of them, we need to use lenses that collect a lot of light from a very small area and magnify the image onto the chip of a camera. Not only is the power of the lens important but also the power of the camera. Naively, we might think that a powerful camera will have more pixels (e.g., 16 megapixels [MP]) on its chip than a less powerful one (e.g., 4 MP). Not necessarily! The 4-MP camera could actually be more powerful (require less magnification) if the pixels are smaller. The size of the chip and the pixels in the chip matter. The power of a lens or camera chip is its resolving power, the number of pixels per unit length (assuming a square pixel). It is not the number of total pixels but the number of pixels per unit space, the spatial frequency of pixels. For example, the eye on the bird in the mosaic in Figure 1.1 is only 1 pixel (one tile) big. There is no detail to it. Adding more tiles to give the eye some detail requires smaller tiles, that is, the number of tiles within that space of the eye increases – the spatial frequency of pixels has to increase. Just adding more tiles of the original size will do no good at all. Common measures of spatial frequency and resolving power are pixels per inch (ppi) or lines per millimeter (lpm – used in printing). Another way to think about resolving power is to take its inverse, the inches or millimeters per pixel. Pixel size, the inverse of the resolving power, is the image resolution. One bright pixel between two dark pixels resolves the two dark pixels. Resolution is the minimum separation distance for distinguishing two objects, dmin. Resolving power is 1/dmin. Note: Usage of the terms resolving power and resolution is not universal. For example, Adobe Photoshop and Gimp use resolution to refer to the spatial frequency of the image. Using resolving power to describe spatial frequencies facilitates the discussion of spatial frequencies later. As indicated by the example of the bird eye in the mosaic and as shown in Figure 1.5, the resolving power is as important in image display as it is in detecting the small features of the object. To eliminate pixelation detected by eye, the resolving power of the eye should be less than the pixel spatial frequency on the display medium when viewed from an appropriate viewing distance. The eye can resolve objects separated by about 1 minute (one 60th) of 1 degree of the almost 140-degree field of view for binocular vision. Because things appear smaller with distance, that is, occupy a
Figure 1.5 Soybean protoplasts (cells with their cell walls digested away with enzymes) imaged with differential interference contrast microscopy and displayed at different resolving powers. The scale bar is 10 μm long. The mosaic pixelation filter in Photoshop generated these images. This filter divides the spatial frequency of pixels in the original by the “cell size” in the dialog box (filter > pixelate > mosaic). The original is 600 ppi. The 75-ppi images used a cell size of 8, the 32-ppi image used a cell size of 16, and the 16-ppi image used a cell size of 32. Photo by L. Griffing.
1.3 Image Legibility Is the Ability to Recognize Text in an Image by Eye
Table 1.1 Laptop, Netbook, and Tablet Monitor Sizes, Resolving Power, and Resolution.
Size (Diagonal)
Horizontal × Vertical Pixel Number
Resolving Power: Dot Pitch ( ppi)
Resolution or Pixel Size (mm)
6.8 inches (Kindle Paperwhite 5)
1236 × 1648
300
0.0846
Aspect Ratio (W:H)
4:3
11 inches (iPad Pro)
2388 × 1668
264 (retina display)
0.1087
4:3
10.1 inches (Amazon Fire HD 10 e)
1920 × 1200
224
0.1134
16:10
Pixel Number (×106)
2.03 3.98 2.3
12.1 inches (netbook)
1400 × 1050
144.6
0.1756
4:3
1.4
13.3 inches (laptop)
1920 ×1080
165.6
0.153
16:9
2.07
14 inches (laptop)
1920 × 1080
157
0.161
16:9
2.07
2560 × 1440
209.8
0.121
16:9
3.6
15.2 inches (laptop)
1152 × 768
91
0.278
3:2
0.8
15.6 inches (laptop)
1920 × 1200
147
0.1728
8:5
2.2
3840 × 2160
282.4
0.089
16:9
8.2
17 inches (laptop)
1920 × 1080
129
0.196
16:9
2.07
smaller angle in the field of view, even things with large pixels look non-pixelated at large distances. Hence, the pixels on roadside signs and billboards can have very low spatial frequencies, and the signs will still look non-pixelated when viewed from the road. Appropriate viewing distances vary with the display device. Presumably, the floor mosaic (it was an interior shallow pool, so it would have been covered in water) has an ideal viewing distance, the distance to the eye, of about 6 feet. At this distance, the individual tiles would blur enough to be indistinguishable. For printed material, the closest point at which objects come into focus is the near point, or 25 cm (10 inches) from your eyes. Ideal viewing for typed text varies with the size of font but is between 25 and 50 cm (10 and 20 inches). The ideal viewing distance for a television display, with 1080 horizontal raster lines, is four times the height of the screen or two times the diagonal screen dimension. When describing a display or monitor, we use its diagonal dimension (Table 1.1). We also use numbers of pixels. A 14-inch monitor with the same number of pixels as a 13.3-inch monitor (2.07 × 106 in Table 1.1) has larger pixels, requiring a slightly farther appropriate viewing distance. Likewise, viewing a 24-inch HD 1080 television from 4 feet is equivalent to viewing a 48-inch HD 1080 television from 8 feet. There are different display standards, based on aspect ratio, the ratio of width to height of the displayed image (Table 1.2). For example, the 15.6-inch monitors in Table 1.1 have different aspect ratios (Apple has 8:5 or 16:10, while Windows has 16:9). They also use different standards: a 1920 × 1200 monitor uses the WUXGA standard (see Table 1.2), and the 3840 × 2160 monitor uses the UHD-1 standard (also called 4K, but true 4K is different; see Table 1.2). The UHD-1 monitor has half the pixel size of the WUXGA monitor. Even though these monitors have the same diagonal dimension, they have different appropriate viewing distances. The standards in Table 1.2 are important when generating video (see Sections 5.8 and 5.9) because different devices have different sizes of display (see Table 1.1). Furthermore, different video publication sites such as YouTube and Facebook and professional journals use standards that fit multiple devices, not just devices with high resolving power. We now turn to this general problem of different resolving powers for different media.
1.3 Image Legibility Is the Ability to Recognize Text in an Image by Eye Image legibility, or the ability to recognize text in an image, is another way to think about resolution (Table 1.3). This concept incorporates not only the resolution of the display medium but also the resolution of the recording medium, in this case, the eye. Image legibility depends on the eye’s inability to detect pixels in an image. In a highly legible image, the eye does not see the individual pixels making up the text (i.e., the text “looks” smooth). In other words, for text to be highly legible, the pixels should have a spatial frequency near to or exceeding the resolving power of the eye. At near point (25 cm), it is difficult for the eye to resolve two points separated by 0.1 mm or less. An image that resolves 0.1 mm pixels has a resolving power of 10 pixels per mm (254 ppi). Consequently, a picture reproduced at 300 ppi would
7
8
1 Image Structure and Pixels
Table 1.2 Display Standards. Aspect Ratio (Width:Height in Pixels)
4:3
8:5 (16:10)
QVGA
CGA
320 × 240
320 × 200
16:9
Various
SIF/CIF 384 × 288 352 × 288 VGA
WVGA (5:3)
WVGA
640 × 480
800 × 480
854 × 480
PAL
PAL
768 × 576
1024 × 576
SVGA
WSVGA
800 × 600
1024 × 600
XGA
WXGA
HD 720
1024 × 786
1280 × 800
1280 × 720
SXGA+
WXGA+
HD 1080
SXGA (5:4)
1400 × 1050
1680 × 1050
1920 × 1080
1280 × 1024
UXGA
WUXGA
2K (17:9)
UWHD (21:9)
1600 × 1200
1920 × 1200
2048 × 1080
2560 × 1080
QXGA
WQXGA
WQHD
QSXGA (5:4)
2048 × 1536
1560 × 1600
2560 × 1440
2560:2048
UHD-1
UWQHD (21:9)
3840 × 2160
3440 × 1440
4K (17:9) 4096 × 2160 8K 7680 × 4320
Table 1.3 Image Legibility. Resolving Power ppi
lpm
Legibility
Quality
200 100
8
Excellent
High clarity
4
Good
Clear enough for prolonged study
50
2
Fair
Identity of letters questionable
25
1
Poor
Writing illegible
lpm, lines per inch; ppi, pixels per inch.
have excellent text legibility (see Table 1.3). However, there are degrees of legibility; some early computer displays had a resolving power, also called dot pitch, of only 72 ppi. As seen in Figure 1.5, some of the small particles in the cytoplasm of the cell vanish at that resolving power. Nevertheless, 72 ppi is the borderline between good and fair legibility (see Table 1.3) and provides enough legibility for people to read text on the early computers. The average computer is now a platform for image display. Circulation of electronic images via the web presents something of a dilemma. What should the resolving power of web-published images be? To include computer users who use old displays,
1.4 Magnification Reduces Spatial Frequencies While Making Bigger Images
Table 1.4 Resolving Power Required for Excellent Images from Different Media. Imaging Media
Resolving Power (ppi)
Portable computer
90–180
Standard print text
200
Printed image
300 (grayscale) 350–600 (color)
Film negative scan
1500 (grayscale) 3000 (color)
Black and white line drawing
1500 (best done with vector graphics)
the solution is to make it equal to the lowest resolving power of any monitor (i.e., 72 ppi). Images at this resolving power also have a small file size, which is ideal for web communication. However, most modern portable computers have larger resolving powers (see Table 1.1) because as the numbers of horizontal and vertical pixels increase, the displays remain a physical size that is portable. A 72-ppi image displayed on a 144-ppi screen becomes half the size in each dimension. Likewise, high-ppi images become much bigger on low-ppi screens. This same problem necessitates reduction of the resolving power of a photograph taken with a digital camera when published on the web. A digital camera may have 600 ppi as its default output resolution. If a web browser displays images at 72 ppi, the 600-ppi image looks eight times its size in each dimension. This brings us to an important point. Different imaging media have different resolving powers. For each type of media, the final product must look non-pixelated when viewed by eye (Table 1.4). These values are representative of those required for publication in scientific journals. Journals generally require grayscale images to be 300 ppi, and color images should be 350–600 ppi. The resolving power of the final image is not the same as the resolving power of the newly acquired image (e.g., that on the camera chip). The display of images acquired on a small camera chip requires enlargement. How much is the topic of the next section.
1.4 Magnification Reduces Spatial Frequencies While Making Bigger Images As discussed earlier, images acquired at high resolving power are quite large on displays that have small resolving power, such as a 72-ppi web page. We have magnified the image! As long as decreasing the spatial frequency of the display does not result in pixelation, the process of magnification can reveal more detail to the eye. As soon as the image becomes pixelated, any further magnification is empty magnification. Instead of seeing more detail in the image, we just see bigger image pixels. In film photography, the enlargement latitude is a measure of the amount of negative enlargement before empty magnification occurs and the image pixel, in this case the photographic grain, becomes obvious. Likewise, for chip cameras, it is the amount of enlargement before pixelation occurs. Enlargement latitude is E = R / L,
(1.1)
in which E is enlargement magnification, R is the resolving power (spatial frequency of pixels) of the original, and L is the acceptable legibility. For digital cameras, it is how much digital zoom is acceptable (Figure 1.6). A sixfold magnification reducing the resolving power from 600 to 100 ppi produces interesting detail: the moose calves become visible, and markings on the female become clear. However, further magnification produces pixelation and empty magnification. Digital zoom magnification is common in cameras. It is very important to realize that digital zoom reduces the resolving power of the image. For scientific applications, it is best to use only optical zoom in the field and then perform digital zoom when analyzing or presenting the image. The amount of final magnification makes a large difference in the displayed image content. The image should be magnified to the extent that the subject or region of interest (ROI) fills the frame but without pixelation. The ROI is the image area of the most importance, whether for display, analysis, or processing. Sometimes showing the environmental context of a feature is important. Figure 1.7 is a picture of a female brown bear being “herded” by or followed by a male in the spring (depending
9
10
1 Image Structure and Pixels
Figure 1.6 (A) A photograph of a moose at 600 ppi. (B) When A is enlarged sixfold by keeping the same information and dropping the resolving power to 100 ppi, two calves become clear (and a spotted rump on the female). (C) Further magnification of 1.6× produces pixelation and blur. (D) Even further magnification of 2× produces empty magnification. Photo by L. Griffing.
Figure 1.7 (A) A 600-ppi view of two grizzlies in Alaska shows the terrain and the distance between the two grizzlies. Hence, even though the grizzlies themselves are not clear, the information about the distance between them is clear. (B) A cropped 100-ppi enlargement of A that shows a clearly identifiable grizzly, which fills the frame. Although the enlargement latitude is acceptable, resizing for journal publication to 600 ppi would use pixel interpolation. Photo by L. Griffing.
1.5 Technology Determines Scale and Resolution
on who is being selective for their mate, the male or the female). The foliage in the alders on the hillside shows that it is spring. Therefore, showing both the bears and the time of year requires most of the field of view in Figure 1.7A as the ROI. On the other hand, getting a more detailed view of the behavior of the female responding to the presence of the male requires the magnified image in Figure 1.7B. Here, the position of the jaw (closed) and ears (back) are clear, but they were not in the original image. This digital zoom is at the limit of pixelation. If a journal were to want a 600 ppi image of the female, it would be necessary to resize the 100 ppi image by increasing the spatial frequency to 600 ppi using interpolation (see Section 1.7).
1.5 Technology Determines Scale and Resolution To record objects within vast or small spaces, changing over long or very short times, requires technology that aids the eye (Figure 1.8). Limits of resolution set the boundaries of scale intrinsic to the eye (see Section 4.1) or any sensing device. The spatial resolution limit is the shortest distance between two discrete points or lines. To extend the spatial resolution of the eye, these devices provide images that resolve distances less than 0.1 mm apart at near point (25 cm) or angles of separation less than 1 arc-minute (objects farther away have smaller angles of separation). The temporal resolution limit is the shortest time between two separate events. To extend the temporal resolution of the eye, devices detect changes that are faster than about one 20th of a second.
Figure 1.8 Useful range for imaging technologies. 3D, three dimensional; CT, computed tomography. Diagram by L. Griffing.
11
12
1 Image Structure and Pixels
The devices that extend our spatial resolution limit include a variety of lens and scanning systems based on light, electrons, or sound and magnetic pulses (see Figure 1.8), described elsewhere in this book. In all of these technologies, to resolve an object, the acquisition system must have a resolving power that is double the spatial frequency of the smallest objects to be resolved. The technologies provide magnification that lowers the spatial frequency of these objects to half (or less) that of the spatial frequency of the recording medium. Likewise, to record temporally resolved signals, the recording medium has to run a timed frequency that is twice (or more) the speed of the fastest recordable event. Both of these rules are a consequence of the Nyquist criterion.
1.6 The Nyquist Criterion: Capture at Twice the Spatial Frequency of the Smallest Object Imaged In taking an image of living cells (see Figure 1.5), there are several components of the imaging chain: the microscope lenses and image modifiers (the polarizers, analyzers, and prisms for differential interference contrast), the lens that projects the image onto the camera (the projection lens), the camera chip, and the print from the camera. Each one of these links in the image chain has a certain resolving power. The lenses are particularly interesting because they magnify (i.e., reduce the spatial frequency). They detect a high spatial frequency and produce a lower one over a larger area. Our eyes can then see these small features. We use still more powerful cameras to detect these lowered spatial frequencies. The diameter of small organelles, such as mitochondria, is about half of a micrometer, not far from the diffraction limit of resolution with light microscopy (see Sections 5.14, 8.4, and 18.3), about a fifth of a micrometer. To resolve mitochondria with a camera that has a resolving power of 4618 ppi (5.5-μm pixels, Orca Lightning; see Section 5.3, Table 5.1), the spatial frequency of the mitochondrial diameter
Figure 1.9 (A) and (B) Capture when the resolving power of the capture device is equal to the spatial frequency of the object pixels. (A) When pixels of the camera and the object align, the object is resolved. (B) When the pixels are offset, the object “disappears.” (C) and (D) Doubling the resolving power of the capture device resolves the stripe pattern of the object even when the pixels are offset. (C) Aligned pixels completely reproduce the object. (D) Offset pixels still reproduce the alternating pattern, with peaks (white) at the same spatial frequency as the object. Diagram by L. Griffing.
1.7 Archival Time, Storage Limits, and the Resolution of the Display Medium Influence Capture and Scan Resolving Power
(127,000 ppi for a 0.5-μm object) has to be reduced by at least a factor of 40 (3175 ppi) to capture the object pixel by pixel. Ah ha! To do this, we use a 40× magnification objective. However, we have to go up even further in magnification because the resolving power of the capture device needs to be double the new magnified image spatial frequency. To see why the resolving power of the capture device has to be double the spatial frequency of the object or image it captures, examine Figure 1.9. In panel A, the object is a series of alternating dark and bright pixels. If the capture device has the same resolving power as the spatial frequency of the object and its pixels align with the object pixels, the alternating pixels are visible in the captured image. However, if the pixels of the camera are not aligned with the pixels of the object, then the white and black pixels combine to make gray in each capture device pixel (bright + dark = gray), and the object pattern disappears! If we double the resolving power of the capture device, as in panel B, the alternating pixel pattern is very sharp when the pixels align, as in panel A. However, even if the capture device pixels do not align with object pixels, an alternating pattern of light and dark pixels is still visible; it is still not perfect, with gray between the dark and bright pixels. To resolve the alternating pattern of the object pixels, the camera has to sample the object at twice its spatial frequency. This level of sampling uses Figure 1.10 An image of a coat captured with color camera showing regions of moiré patterns. the Nyquist criterion and comes from statistical sampling theory. If the Image from Paul Roth. Used with permission. camera has double the spatial frequency of the smallest object in the field, the camera faithfully captures the image details. In terms of resolution, the inverse of resolving power, the pixel size in the capturing device should be half the size of the pixel size, or the smallest resolvable feature, of the object. The camera needs finer resolution than the projected image of the object. Getting back to the magnification needed to resolve a 0.5-μm mitochondrion with a 4618-ppi camera, a 40× lens produces an image with mitochondria at a spatial frequency of about 3175 ppi. To reduce the spatial frequency in the image even more, a projection lens magnifies the image 2.5 times, projecting an enlarged image onto the camera chip. This is the standard magnification of projection lenses, which are a basic part of compound photomicroscopes. The projection lens produces an image of mitochondria at a spatial frequency of 1270 ppi, well below the sampling frequency of 2309 ppi needed to exceed the Nyquist criterion for the 4618-ppi camera. Sampling at the Nyquist criterion reduces aliasing, in which the image pixel value changes depending on the alignment of the camera with the object. Aliasing produces moiré patterns. In Figure 1.10, the woven pattern in the sports coat generates wavy moiré patterns when the spatial frequency of the woven pattern matches or exceeds the spatial frequency of the camera and its lens. A higher magnification lens can eliminate moiré patterns by reducing the spatial frequency of the weave. In addition, reducing the aperture of the lens will eliminate the pattern by only collecting lower spatial frequencies (see Sections 5.14 and 8.3). The presence of moiré patterns in microscopes reveals that there are higher spatial frequencies to capture. Capturing higher frequencies than the diffraction limit (moiré patterns) by illuminating the sample with structured light (see Section 18.7) produces a form of superresolution microscopy, structured illumination microscopy.
1.7 Archival Time, Storage Limits, and the Resolution of the Display Medium Influence Capture and Scan Resolving Power Flatbed scanners archive photographs, slides, gels, and radiograms (see Sections 6.1 and 6.2). Copying with scanners should use the Nyquist criterion. For example, most consumer-grade electronic scanners for printed material now come with a 1200 × 1200 dpi resolving power because half this spatial frequency, 600 ppi, is optimal for printed color photographs (see Table 1.4). For slide scanners, the highest resolving power should be 1500 to 3000 dpi, 1500 dpi for black and white and 3000 dpi for color slides (see Table 1.4).
13
14
1 Image Structure and Pixels
Figure 1.11 (A) Image of the central region of a diatom scanned at 72ppi. The vertical stripes, the striae, on the shell of the diatom are prominent, but the bumps, or spicules, within the striae are not. (B) Scanning the images at 155 ppi reveals the spicules. However, this may be too large for web presentation. (C) Resizing the image using interpolation (bicubic) to 72 ppi maintains the view of the spicules and is therefore better than the original scan at 72 ppi. This is a scan of an image in Inoue, S. and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press New York, NY. p. 528.
When setting scan resolving power in dots per inch, consider the final display medium of the copy. For web display, the final ppi of the image is 72. However, the scan should meet or exceed the Nyquist criterion of 144 ppi. In the example shown in Figure 1.11, there is a clear advantage to using a higher resolving power, 155 ppi, in the original scan even when software rescales the image to 72 ppi. If the output resolution can only accommodate a digital image of low resolving power, then saving the image as a lowresolving-power image will conserve computer disk space. However, if scanning time and storage limits allow, it is always best to save the original scan that used the Nyquist criterion. This fine-resolution image is then available for analysis and display on devices with higher resolving powers.
1.8 Digital Image Resizing or Scaling Match the Captured Image Resolution to the Output Resolution If the final output resolution is a print, there are varieties of printing methods, each with its own resolving power. Laser prints with a resolving power of 300 dpi produce high-quality images of black and white text with excellent legibility, as would be expected from Table 1.1. However, in printers that report their dpi to include the dots inside half-tone cells (Figure 1.12), which are the pixels of the image, the dpi set for the scan needs to be much higher than the value listed in Table 1.4. Printers used by printing presses have the size of their half-tone screens pre-set. The resolution of these printers is in lines per inch or lines per millimeter, each line being a row of half-tone cells. For these printers, half-tone images of the highest quality come from a captured image resolving power (ppi) that is two times (i.e., the Nyquist criterion) the printer half-tone screen frequency. Typical screen frequencies are 65 lpi (grocery coupons), 85 lpi (newsprint), 133 lpi (magazines), and 177 lpi (art books).
1.8 Digital Image Resizing or Scaling Match the Captured Image Resolution to the Output Resolution
Figure 1.12 Half-tone cells for inkjet and laser printers. (A) Two half-tone cells composed of a 5 × 5 grid of dots. A 300-dpi printer with 5 × 5 half-tone cells would print at 300/5 or 60 cells per inch (60 ppi). This is lower resolution than all computer screens. These cells could represent 26 shades of gray. (B) Two half-tone cells composed of a 16 × 16 grid of dots. A 1440-dpi printer with 16 × 16 half-tone cells would print at 90 cells per inch. This is good legibility but not excellent. These cells (90 ppi) could represent 256 shades of gray. Diagram by L. Griffing.
Figure 1.13 Information loss during resizing. (A) The original image (2.3 inches ×1.6 inches). (B) The result achieved after reducing A about fourfold (0.5 inches in width) and re-enlarging using interpolation during both shrinking and enlarging. Note the complete blurring of fine details and the text in the header. Photo by L. Griffing.
For image display at the same size in both a web browser and printed presentation, scan it at the resolution needed for printing and then rescale it for display on the web. In other words, always acquire images at the resolving power required for the display with the higher resolving power and rescale it for the lower resolving power display (see Figure 1.11). Digital image resolving power diminishes when resizing or scaling produces fewer pixels in the image. Reducing the image to half size could just remove every other pixel. However, this does not result in a satisfactory image because the image leaves out a large part of the information in the scene that it could otherwise incorporate. A more satisfactory way is to group several pixels together and make a single new pixel from them. The value assigned to the new pixel comes from the values of the grouped pixels. However, even with this form of reduction, there is, of course, lost resolving power (compare Figure 1.11C with 1.11B and Figure 1.13B with 1.13A). Computational resizing and rescaling a fine resolution image (Figure 1.11C) is better than capturing the image at lower resolving power (Figure 1.11A). Enlarging an image can either make the pixels bigger or interpolate new pixels between the old pixels. The accuracy of interpolation depends on the sample and the process used. Three approaches for interpolating new pixel values in order of increasing accuracy and processing time are the near-neighbor process, the bilinear process, and the bicubic process (see also Section 11.3 for 3D objects). Generating new pixels might result in a higher pixels per inch, but all of the information necessary to generate the scene resides in the original smaller image. True resolving power is not improved; in fact, some information might be lost. Even simply reducing the image is problematic because shrinking the image by the process described earlier using groups of pixels changes information content of the image.
15
16
1 Image Structure and Pixels
1.9 Metadata Describes Image Content, Structure, and Conditions of Acquisition Recording the settings for acquiring an image in scientific work (pixels per inch of acquisition device, lenses, exposure, date and time of acquisition, and so on) is very important. Sometimes this metadata is in the image file itself (Figures 1.13 and 1.14). In the picture of the bear (Figure 1.13), the metadata is a header stating the time and date of image acquisition. In the picture of the plant meristem (Figure 1.14), the metadata is a footer stating the voltage of the scanning electron microscope, the magnification, a scale bar, and a unique numbered identifier. Including the metadata as part of the image has advantages. A major advantage is that an internal scale bar provides accurate calibration of the image upon reduction or rescaling. A major disadvantage is that resizing the image can make the metadata unreadable as the resolving power of the image decreases (Figure 1.13B, header). Because digital imaging can rescale the x and y dimensions differently (without a specific command such as holding down the shift key), a 2D internal scale bar would be best, but this is rare. For digital camera and recording systems, the image file stores the metadata separately from the image pixel information. The standard metadata format is EXIF (Exchangeable Image File) format. Table 1.5 provides an example of some of the recorded metadata from a consumer-grade digital camera. However, not all imaging software recognizes and uses the same codes for metadata. Therefore, the software that comes with the camera can read all of the metadata codes from that camera, but other more general image processing software may not. This makes metadata somewhat volatile because just opening and saving images in a new software package can remove it. Several images may share metadata. Image scaling (changing the pixels per inch) is a common operation in image processing, making it very important that there be internal measurement calibration on digital scientific images. Fiducial markers are calibration standards of known size contained within the image, such as a ruler or coin (for macro work), a stage micrometer (for microscopy), or gold beads (fine resolution electron microscopy). However, their inclusion as an internal standard is not always possible. A separate picture of such calibration standards taken under identical conditions as the picture of the object produces a fiducial image, and metadata can refer to the fiducial image for scaling information of the object of interest. Image databases use metadata. A uniform EXIF format facilitates integration of this information into databases. There are emerging standards for the integration of metadata into databases, but for now, many different standards exist. For example, medical imaging metadata standards are different from the standards used for basic cell biology research. Hence, the databases for these professions are different. However, in both these professions, it is important to record the conditions of image acquisition in automatically generated EXIF files or in lab, field, and clinical notes.
Figure 1.14 Scanning electron micrograph with an internal scale bar and other metadata. This is an image of a telomerase-minus mutant of Arabidopsis thaliana. The accelerating voltage (15 kV), the magnification (×150), a scale bar (100 µm), and a negative number are included as an information strip below the captured image. Photo by L. Griffing.
1.9 Metadata Describes Image Content, Structure, and Conditions of Acquisition
Table 1.5 Partial Exchangeable Image File Information for an Image from a Canon Rebel. Title
IMG_6505
Image description
Icelandic buttercups
Make
Canon
Model
Canon EOS DIGITAL REBEL XT
Orientation
Left side, bottom
X resolution
72 dpi
Y resolution
72 dpi
Resolution unit
Inch
Date/time
2008:06:15 01:42:00
YCbCr positioning
Datum point
Exposure time
1/500 sec
F-number
F5.6
Exposure program
Program action (high-speed program)
ISO speed ratings
400
Exif version
2.21
Date/time original
2008:06:15 01:42:00
Date/time digitized
2008:06:15 01:42:00
Components configuration
YCbCr
Shutter speed value
1/256 sec
Aperture value
F5.6
Exposure Bias value
0
Metering mode
Multi-segment
Flash
Unknown (16)
Focal length
44.0 mm
User comment FlashPix version
1
Color space
sRGB
EXIF image width
3456 pixels
EXIF image height
2304 pixels
Focal plane X resolution
437/1728000 inches
Focal plane Y resolution
291/1152000 inches
Focal plane resolution unit
Inches
Compression
JPEG compression
Thumbnail offset
9716 bytes
Thumbnail length
12,493 bytes
Thumbnail data
12,493 bytes of thumbnail data
Macro mode
Normal
Self-timer delay
Self-timer not used
Unknown tag (0xc103)
3
Flash mode
Auto and red-eye reduction
Continuous drive mode
Continuous (Continued)
17
18
1 Image Structure and Pixels
Table 1.5 (Continued) Title
IMG_6505
Focus mode
AI Servo
Image size
Large
Easy shooting mode
Sports
Contrast
High
Saturation
High
Sharpness
High
Annotated Images, Video, Web Sites, and References 1.1 The Pixel Is the Smallest Discrete Unit of a Picture The mosaic in Figures 1.1 and 1.2 resides in the Museo Archeologico Nazionale (Naples). For image-order and object-order rendering, see Schroder, W., Martin, K., and Lorensen, B. 2002. The Visualization Toolkit. Third Edition. Kitware Inc. p. 35–36. For a complete list of the different cell types, see Schroder, W., Martin, K., and Lorensen, B. 2002. The Visualization Toolkit. Third Edition. Kitware Inc. p. 115. The original painting in Figure 1.4 resides at the Art Institute of Chicago. More discussion of subpixel color is in Russ, J. 2007. The Image Processing Handbook. CRC Taylor and Francis, Boca Raton, FL. p. 136.
1.2 The Resolving Power of a Camera or Display Is the Spatial Frequency of Its Pixels The reciprocal relationship between resolving power and resolution is key to understanding the measurement of the fidelity of optical systems. The concept of spatial frequency, also called reciprocal space or k space, is necessary for the future treatments in this book of Fourier optics, found in Chapters 8 and 14–19. For more on video display standards, see https://en.wikipedia.org/wiki/List_of_common_resolutions. Appropriate viewing distance is in Anshel, J. 2005. Visual Ergonomics Handbook. CRC Press, Taylor and Francis Group, Boca Raton, FL.
1.3 Image Legibility Is the Ability to Recognize Text in an Image by Eye Williams, J. B. 1990. Image Clarity: High Resolution Photography. Focal Press, Boston, MA. p 56, further develops the information in Table 1.3. Publication guidelines in journals are the basis for the stated resolving power for different media. For camera resolving powers, see Section 5.3 and Table 5.1.
1.4 Magnification Reduces Spatial Frequencies While Making Bigger Images More discussion of the concept of enlargement latitude is in Williams, J.B. 1990. Image Clarity: High Resolution Photography. Focal Press, Boston, MA. p 57.
1.5 Technology Determines Scale and Resolution Chapters 8 and 14–19 discuss the resolution criteria for each imaging modality.
Annotated Images, Video, Web Sites, and References
1.6 The Nyquist Criterion: Capture at Twice the Spatial Frequency of the Smallest Object Imaged The Nyquist criterion is from Shannon, C. 1949. Communication in the presence of noise. Proceedings of the Institute of Radio Engineers 37:10–21. and Nyquist, H. 1928. Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers 47:617–644.
1.7 Archival Time, Storage Limits, and the Resolution of the Display Medium Influence Capture and Scan Resolving Power Figure 1.10 is a scan of diatom images in Inoue, S. and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY. p. 528.
1.8 Digital Image Resizing or Scaling Match the Captured Image Resolution to the Output Resolution See the half-tone cell discussion in Russ, J. 2007. The Image Processing Handbook. CRC Taylor and Francis, Boca Raton, FL. p. 137 Printer technology is now at the level where standard desk jet printers are satisfactory for most printing needs.
1.9 Metadata Describes Image Content, Structure, and Conditions of Acquisition Figure 1.12 is from the study reported in Riha, K., McKnight, T., Griffing, L., and Shippen, D. 2001. Living with genome instability: Plant responses to telomere dysfunction. Science 291: 1797–1800. For a discussion of metadata and databases, see Chapter 7 on measurement.
19
20
2 Pixel Values and Image Contrast 2.1 Contrast Compares the Intensity of a Pixel with That of Its Surround How well we see a pixel depends not only on its size, as described in Chapter 1, but also on its contrast. If a ladybug’s spots are black, then they stand out best on the part of the animal that is white, its thorax (Figure 2.1A and C‑).Black pixels have the lowest pixel value, and white pixels have the highest (by most conventions); the difference between them is the contrast. In this case, the spots have positive contrast; subtracting the black spot value from the white background value gives a positive number. Negative contrast occurs when white spots occur against a dark background. In the “negative” of Figure 2.1A, Figure 2.1B shows the ladybug’s spots as white. They have high negative contrast against the darker wings; subtracting the white spot value from the black background gives a negative number.
Figure 2.1 Grayscale and color contrast of ladybugs on a leaf. Positive-contrast images (A and C) compared with negative-contrast images (B and D). In the positive-contrast images, the ladybugs’ spots appear dark against a lighter background. In the negative-contrast images, the spots appear light against a darker background. The contrast between the ladybugs and the leaf in C is good because the colors red and green are nearly complementary. A negative image (D) produces complementary colors, and the negative or complementary color to leaf green is magenta. (E) Histograms display the number of pixels at each intensity. Grayscale positive and negative images have mirror-image histograms. (F) The histograms of color images show the number of pixels at each intensity of the primary colors, red, green, and blue. A color negative shows the mirror image of the histogram of the color positive: making a negative “flips” the histogram. Photo by L. Griffing. Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
2.2 Pixel Values Determine Brightness and Color
The terms positive contrast and negative contrast come directly from the algebraic definition of percent contrast in Figure 2.2. If pixels in the background have higher intensity than the pixels of the object, then the value of the numerator is positive, and the object has positive contrast. If the object pixels have a higher intensity than the background pixels, then the value in the numerator is negative, and the object has negative contrast. The negatives of black-and-white photographs, or grayscale photographs, have negative contrast. Although the information content in the positive and Figure 2.2 Algebraic definition of percent contrast. If Ibkg > Iobj, there is positive contrast. If Iobj > Ibkg, negative images in Figure 2.1 is identical, our ability to distinguish features in there is negative contrast. Diagram by L. Griffing. the two images depends on the perception of shades of gray by eye and on psychological factors that may influence that perception. In a color image, intensity values are combinations of the intensities of the primary colors, red, green, and blue. While the human eye (see Section 4.2) can only distinguish 50–60 levels or tones of gray on a printed page (Figure 2.3), it can distinguish millions of colors (Figure 2.4). Consequently, color images can have much more contrast and more information than grayscale images. In Figure 2.1C, the distinction between the orange and red ladybugs is more apparent than in Figure 2.1A. Figure 2.1D shows the negative color contrast image of Figure 2.1C. The negative of a color is its complementary color (see Figure 2.4). The highest contrast between colors occurs when the colors are complementary.
Figure 2.3 Grayscale spectra. (A) Spectrum with 256 shades of gray. Each shade is a gray level or tone. The tones blend so that each individual tone is indistinguishable by eye. (B) Spectrum with 64 shades of gray. The tones at this point cease to blend, revealing some of the individual “slices” of gray. (C) Spectrum with 16 shades of gray. The individual bands or regions of one gray level are apparent. Such color or gray-level banding is “posterized.” Diagram by L. Griffing.
Figure 2.4 (A) Red, green, blue (RBG) color spectrum. (B) Color negative of (A), with complementary colors of the primary RGB shown with arrows. Diagram by L. Griffing.
2.2 Pixel Values Determine Brightness and Color That pixels have intensity values is implicit in the definition of contrast (see Figure 2.2). In a black-and-white, or grayscale, image, intensity values are shades of gray. If the image has fewer than 60 shades of gray, adjacent regions (where the gray values should blend) become discrete, producing a banded, or posterized, appearance to the image. Dropping the number of gray values from 64 to 16 produces a posterized image, as shown in Figure 2.3B and C. Likewise, as the number of gray values diminishes below 64, more posterization becomes evident (Figure 2.5).
21
22
2 Pixel Values and Image Contrast
Figure 2.5 Grayscale images at various pixel depths of a plant protoplast (a plant cell with the outer wall removed) accompanied by their histograms. The differential interference contrast light micrographs show the object using a gradient of gray levels across it (see Section 16.10). Histograms can be evaluated in Photoshop by selecting Window > Histogram; for ImageJ, select Analyze > Histogram. (A) An 8-bit image (8 bits of information/pixel) has 28 or 256 gray levels. Scale bar = 12 μm. (B) A 6-bit image has 26 or 64 gray levels and discernible gray-level “bands” or posterization in the regions of shallow gray-level gradients begin to appear. The features of the cell, like the cell cortical cytoplasm, are still continuous. (C) A 4-bit image has 24 or 16 gray levels. Posterization is severe, but most of the cell, such as the cortical cytoplasm and cytoplasmic strands, are recognizable. (D) A 2-bit image has 22 or 4 gray levels. Much of the detail is lost; the cell is becoming unrecognizable. (E) A 1-bit image has 21 or 2 gray levels. The cell is unrecognizable. (a–e) Histograms of images A–E. This plots the number of pixels occurring in the image at each gray level against the possible number of gray levels in the image, which determines the pixel depth. This is just a plot of the number of gray levels, removing all of the spatial information. Note that this image has no pixels in the lowest and highest intensity ranges at pixel depths between 8 bit and 4 bit. In these images, the tonal range of the image is lower than the pixel depth of the image. Photo by L. Griffing.
2.2 Pixel Values Determine Brightness and Color
In digital imaging, the image information comes in discrete information bits, the bit being a simple “on/off” switch having a value of 0 or 1. The more bits in a pixel, the larger the amount of information and the greater the pixel depth. Increasing the number of bits increases the information by powers of 2 for the two states of each bit. Consequently, an image with 8 bits, or one byte, per pixel has 28, or 256, combinations of the “on/off” switches. Because 0 is a value, the grayscale values range from 0 to 255 in a 1-byte image. Computers that process and display images with large pixel depth have a lot of information to handle. To calculate the amount of information in a digital image, multiply the pixel dimensions in height, width, and pixel depth. A digitized frame 640 pixels in height × 480 pixels in width × 1 byte deep requires 307.2 kilobytes (kB) for storage and display. A color image with three 1-byte channels and the same pixel height and width will be 921 kB. The pixel depth of the image limits the choice of software for processing the image (Table 2.1). A major distinguishing feature of different general-purpose image processing programs is their ability to handle high pixel depth commonly found in scientific cameras and applications (see Table 2.1). Regardless of the pixel depth, a graph of how many pixels in the image have a certain pixel value, or the image histogram, is in most software. Table 2.1 Raster Graphics Image Processing Programs Commonly Used for Contrast Analysis and Adjustment.a
Software Package
Operating Systemsb: Win OSX Lin
Adobe Photoshop
Yes
Yes
Corel Paint Shop Pro
Yes
Yes
Color Spaces and Image Modes Supported
Supported Features
File Formats Supported + PNG, JPG, RAWh
Layersd
Large Pixel Depthe
sRGB aRGBf
CMYKg Indexed Grayscale TIFF
SVG
XCF
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
No
No
sRGB
No
Yes
Yes
Yes
Plgin
No
Histogram
Editable Selectionc
No
Yes
No
No
No
No
Proprietary– purchase
Proprietary– freeware IrfanView Paint.Net
Yes
No
Yes
Yes
Yes
Yes
No
sRGB
Some
Some
Some
Plgin Plgin
Yes
Google Photos
Yes
Yes
Yes
Yes
No
No
No
sRGB
No
Some
Some
Imprt Yes
No
GIMP2 (or Yes GIMPShop)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Some
Yes
Yes
Some Yes
Yes
ImageJ
Yes
Yes
Yes
Yes
Some
Yes
Yes
No
Yes
Yes
Yes
No
Open Source
a
Yes
Some
There is no standard image processing software for the “simple” tasks of contrast enhancement in science. (The image analysis software for image measurement is described in Chapter 8). The demands of scientific imaging include being able to recognize images of many different formats, including some RAW formats proprietary to certain manufacturers, and large pixel depths of as much as 32 bits per channel. XCF is the native format for GIMP2. PNG, JPG, and RAW are all supported. b Win = Windows (Microsoft), OSX = OSX (Apple), and Lin = Unix (The Open Group) or Linux (multiple distributors). c Editable selections can be either raster or vector based. Vector based is preferred. d Layers can include contrast adjustment layers, whereby the layer modifies the contrast in the underlying image. e Usually includes 12-, 16-, and 32-bit grayscale and higher bit depth multi-channel images. f aRGB = Adobe (1998) RGB colorspace, sRGB = standard RGB. g CMYK = cyan, magenta, yellow and black colorspace. h PNG = portable network graphic, JPG = joint photographic experts group, RAW = digital negative, TIFF = tagged image file format, SVG = scalable vector graphic, XCF = experimental computing facility format. Imprt, opens as an imported file; Plgin, opens with Plugin.
23
24
2 Pixel Values and Image Contrast
2.3 The Histogram Is a Plot of the Number of Pixels in an Image at Each Level of Intensity The image histogram is a plot of the number of pixels (y-axis) at each intensity level (x-axis) (see Figure 2.5a–e). As the bit depth and intensity levels increase, the number on the x-axis of the histogram increases (see Figure 2.5a–e). For 8-bit images, the brightest pixel is 255 (see Figure 2.5a). The blackest pixel is zero. Histograms of color images show the number of pixels at each intensity value of each primary color (see Figure 2.1F). To produce a negative image, “flip” or invert the intensity values of the histogram of grayscale images (see Figure 2.1E) and of each color channel in a color image (see Figure 2.1F). As the pixel depth decreases, the number of histogram values along the x-axis (see Figure 2.5a–e) of the histograms decreases. When there are only two values in the histogram, the image is binary (Figure 2.5E). Gray-level detail in the image diminishes as pixel depth decreases, as can be seen in the posterization of the protoplast in Figure 2.5B–E. The histogram has no spatial information in it other than the total number of pixels in the region represented by the histogram, which, for purposes of discussion in this chapter, is the whole image. To get spatial representations of pixel intensity, intensities can be plotted along a selected line, the intensity plot (Figure 2.6A and B) or the intensities can be mapped across a two-dimensional region of interest (ROI) with a three-dimensional surface plot, (Figure 2.6A and C).
Figure 2.6 Intensity plots of a grayscale light micrograph of a plant protoplast taken with differential interference contrast optics. (A) Plant protoplast with a line selection (yellow) made across it in ImageJ. (B) Intensity plot of line selection (A) using the Analyze > Plot Profile command in ImageJ. (C) Surface plot of the grayscale intensity of the protoplast in (A) using the Analyze > Surface Plot command in ImageJ. Photo by L. Griffing.
2.4 Tonal Range Is How Much of the Pixel Depth Is Used in an Image
Information about the histogram in standard histogram displays (Figure 2.7, insets B and D) includes the total number of pixels in the image, as well as the median value, the mean value, and the standard deviation around the mean value. The standard deviation is an important number because it shows the level of contrast or variation between dark and light, higher standard deviations meaning higher contrast. One way to assess sharpening of the image is by pixel intensity standard deviation, with higher standard deviations producing higher contrast and better “focus” (see Section 3.8). However, there is a trade-off between contrast and resolution (see Section 5.15) – higher contrast images of the same scene do not have higher resolving power.
2.4 Tonal Range Is How Much of the Pixel Depth Is Used in an Image The number of gray levels represented in the image is its tonal range. The ideal situation for grayscale image recording is that the tonal range matches the pixel depth, and there are pixels at all the gray values in the histogram. If only a small region of the x-axis of the image histogram has values in the y-axis, then the tonal range of the image is too small. With a narrow range of gray tones, the image becomes less recognizable, and features may be lost as in Figure 2.7A, in which the toads are hard to see. Likewise, if the tonal range of the scene is greater than the pixel depth (as in over- and underexposure; see Section 2.5), information and features can be lost. In Figure 2.7C, the tonal range of the snail on the leaf is good, but there are many pixels in the histogram at both the 0 and 255 values (Figure 2.7D). Objects with gray-level values below zero, in deep shade, and above 255, in bright sunlight, have no contrast and are lost. The values above 255 saturate the 255 limit of the camera. The ROI, such as the snail on the leaf in Figure 2.7C, may have good tonal range even though there are regions outside the ROI that are over- or underexposed. A histogram of that ROI would reveal its tonal range. Do not confuse the sensitivity of the recording medium with its pixel depth, its capacity to record light gray levels or gradations. The ISO setting on cameras (high ISO settings, 800–3200, are for low light) adjusts pixel depth, but this does not make the camera more sensitive to light; it just makes each pixel shallower, saturating at lower light levels (see Section 5.11). Scene lighting and camera exposure time are the keys to getting good tonal range (see Section 9.5). Many digital single lens reflex (SLR) cameras display the image histogram of the scene in real time. The photographer can use that information to match the tonal range of the scene to the pixel depth of the camera, looking for exposure and lighting conditions where there are y-axis values for the entire x-axis on the image histogram. After taking the image, digital adjustment (histogram stretching; see Section 2.10) of the tonal range can help images with limited tonal range that are not over- or underexposed. However, it is always better to optimize tonal range through lighting and exposure control (see Sections 9.5 and 9.10). Figure 2.7 Grayscale images and their histograms. (A) Grayscale image of toads. (B) Histogram of image in (A) shows a narrow tonal range of the image with no pixels having high- or low-intensity values. (C) Image of a land snail on plants. (D) Pixels in (C) have a wide tonal range. The histogram shows pixels occupy almost all of the gray values. Photos by L. Griffing.
25
26
2 Pixel Values and Image Contrast
2.5 The Image Histogram Shows Overexposure and Underexposure Underexposed images contain more than 5% of pixels in the bottom four gray levels of a 256 grayscale. The ability of the human eye to distinguish 60 or so gray levels (see Chapter 5) determines this; four gray levels is about 1/60th of 256. Hence, differences within the first four darkest intensities are indistinguishable by eye. A value of zero means that the camera was not sensitive enough to capture light during the exposure time. Assuming that the ROI is 100% of the pixels, then underexposure of 5% of the pixels is a convenient statistically acceptable limit. There are underexposed areas in Figure 2.7C but none in Figure 2.7A. However, less than 5% of Figure 2.7C is underexposed. Underexposed Figure 2.8A has more than 40% of the pixels in the bottom four gray levels. The argument is the same for overexposure, the criterion being more than 5% of the pixels in the top four gray levels of a 256 grayscale. For pixels with a value in the highest intensity setting, 255 in a 256–gray-level spectrum, the camera pixels saturate with light and objects within those areas have no information. Figure 2.7C has some bright areas that overexposed, as shown by histogram in Figure 2.7D. However, the image as a whole is not overexposed. Figure 2.9A is bright but not overexposed, with just about 5% of the pixels in the top four gray levels (Figure 2.9B). Figure 2.8 Low-key grayscale image and its histogram. (A) Low-key image of the upper epidermis of a tobacco leaf expressing green fluorescent protein engineered so that it is retained it in the endoplasmic reticulum. Like many fluorescence microscopy images, most of the tonal range is in the region of low intensity values, so it is very dark. (B) Histogram of fluorescence image. The large number of pixels at the lowest intensities show that the image is low key and underexposed. Photo by L. Griffing.
Figure 2.9 High key grayscale image and its histogram. (A) High-key image of polar bears. Most of the tonal range is in the region of high intensity values, so it is very light. (B) Histogram of the polar bear image. Note that although there are primarily high intensity values in the image, there are very few at the highest value; the image is not overexposed. Photo © Jenny Ross, used with permission.
2.7 Color Images Have Various Pixel Depths
2.6 High-Key Images Are Very Light, and Low-Key Images Are Very Dark Figure 2.9A is high key because most of the pixels have an intensity level greater than 128, half of the 255 maximum intensity value. Figure 2.8A is low key because most of the pixels have less than 128. For these scenes, the exposure – light intensity times the length of time for the shutter on the camera to stay open – is set so that the interesting objects have adequate tonal range. Exposure metering of the ROI (spot metering; see Sections 5.2 and 9.4) is necessary because taking the integrated brightness of the images in Figures 2.8 and 2.10 for an exposure setting would result in overexposed fluorescent cells. Hence, the metered region should only contain fluorescent cells. Over- or underexposure of regions that are not of scientific interest is acceptable if the ROIs have adequate tonal range. Low-key micrographs of fluorescent (see Section 17.3) or darkfield (see Sections 9.3 and 16.3) objects are quite common.
2.7 Color Images Have Various Pixel Depths Pixels produce color in a variety of modes. Subpixel color (see Section 1.1) produces the impression that the entire pixel has a color when it is composed of different intensities of three primary colors. The lowest pixel depth for color images is 8 bit, in which the intensity of a single color ranges from 0 to 255. These are useful in some forms of monochromatic imaging, such as fluorescence microscopy, in which a grayscale camera records a monochromatic image. To display the monochromatic image in color, the entire image is converted to indexed color and pseudocolored, or false colored, with a 256-level color table or look-up table (LUT) (Figure 2.10). Color combinations arise by assigning different colors, not just one color, to the 256 different pixel intensities in an 8-bit image; 8-bit color images are indexed color images. Indexed color mode uses less memory other color modes because they are only one channel. Not all software supports this mode (see Table 2.1).
Figure 2.10 Indexed color image of Figure 2.8 pseudocolored using the color table shown. The original sample emitted monochromatic green light from green fluorescent protein captured with a grayscale camera. Converting the grayscale image to an indexed color image and using a green lookup table mimics fluoresced green light. Photo by L. Griffing.
27
28
2 Pixel Values and Image Contrast
Figure 2.11 Color image and its associated red, green, and blue channels. (A) Full-color image of a benthic rock outcrop in Stetson Bank, part of the Flower Gardens National Marine Sanctuary, shows all three channels, red, green, and blue. There is a circle around the red fish. (B) Red channel of the color image in A, shows high intensities, bright reds, in the white and orange regions of the image. Note the red fish (circle) produces negative contrast against the blue-green water, which has very low red intensity. (C) Green channel of the image. Note that the fish (circle) is nearly invisible because it had the same intensity of green as the background. (D) Blue channel of image. Note that the fish (circle) is now visible in positive contrast because it has very low intensities of blue compared with the blue sea surrounding it. Photo by S. Bernhardt, used with permission.
A more common way of representing color is to combine channels of primary colors to make the final image, thereby increasing the pixel depth with each added channel. The two most common modes are RGB (for red, green, and blue) and CMYK (for cyan, magenta, yellow, and black). Figure 2.11 is an RGB image in which a red fish (circled) appears in the red (negative contrast) and blue (positive contrast) channels but disappears in the green channel. The three 8-bit channels, each with its own histogram (see Figure 2.1), add together to generate a full-color final image, producing a pixel depth of 24 bits (3 bytes, or three channels that are 8 bits each). Reducing the pixel depth in the channels produces a color-posterized image (Figure 2.12B–D). At low pixel depths, background gradients become posterized (arrows in Figure 2.12), and objects such as the fish become unrecognizable (Figure 2.12D). Video and computer graphic displays use RGB mode, whereby adding different channels makes the image brighter, producing additive colors. The CMYK mode uses subtractive colors, whereby combining different cyan, magenta, and yellow channels make the image darker, subtracting intensity, as happens with inks and printing. Because the dark color made with these three channels never quite reaches true black, the mix includes a separate black channel. Consequently, the CMYK uses four 8-bit channels, or has a pixel depth of 32 bits (4 bytes). Because these color spaces are different, they represent a different range, or gamut, of colors (see Section 4.3). Even within a given color space, such as RGB, there are different gamuts, such as sRGB and Adobe RGB (see Table 2.1).
2.8 Contrast Analysis and Adjustment Using Histograms Are Available in Proprietary and Open-Source Software
Figure 2.12 Color versions of the same image produced from reduced pixel depth in each of its red, green, and blue channels. The double arrow identifies posterization of the blue gradient of the background. (A) Eight bits/channel, 256 levels of intensity/channel. Pixel depth = 24 bits (8 bits/channel × 3 channels). (B) Three bits/channel, 8 levels of intensity/channel. Pixel depth = 9 bits. (C) Two bits/channel, 4 levels of intensity per channel. Pixel depth = 6 bits. (D) 1 bit/channel, two levels of intensity per channel. Pixel depth = 3 bits. Photo by S. Bernhardt, used with permission.
2.8 Contrast Analysis and Adjustment Using Histograms Are Available in Proprietary and Open-Source Software Both proprietary and open-source software are available to the scientist for contrast analysis and adjustment (see Table 2.1). On a note of caution and advice, it is best for scientists to use open-source image processing and analysis programs with published algorithms and source code for the operations. Proprietary software does not usually publish its algorithms and source code, so the actual operations done on the image are frequently unknown. As software versions change, so do the algorithms and source code. Image processing operations done using proprietary software algorithms may become unrepeatable because the software version used for the operations is no longer available or no longer works on newer operating systems. With many open-source imaging programs, published algorithms and source code are available. Archives of older versions often exist. Unfortunately, there are no community standards for image processing algorithms in the life sciences, although there are standards for general contrast enhancement (see Section 3.6) in biology. Many of the example software packages used to demonstrate image processing and measurement in this book are open source. This has two advantages in addition to having published algorithms and source code. First, it is frequently available for download at no charge. Second, coders within the community often solve complexities or problems in the software, thereby increasing the usability of these packages. At the end of each chapter are sources and the current internet references (links) for the described software. Because updates are common for open-source software, it is good to archive older versions. Some sites keep these archives for you, but the lifetime of these archives is quite variable. It is best practice to keep an old software package and download the new one instead of “updating” an existing package, particularly if archives are not maintained by the community.
29
30
2 Pixel Values and Image Contrast
2.9 The Intensity Transfer Graph Shows Adjustments of Contrast and Brightness Using Input and Output Histograms Changing pixel intensity values changes contrast, and contrast enhancement of the entire image operates at the level of the image histogram. Scientific publication allows only entire image contrast enhancement (see Section 3.9). Plotting the image histogram before the operation along the x-axis (“pixel in” values) and the image histogram after the operation on the y-axis (“pixel out” values) produces an intensity transfer graph (Figure 2.13). If there is no change in intensity, the plot is a line with a “pixel out” intercept at 0 and with a slope of 1. For linear relationships, the slope is the gamma of the image, while for non-linear relationships, the curve is the gamma. In scientific imaging, use linear contrast enhancement by increasing the slope (see Figure 2.13, lower panel). Changing the y intercept and the single value slope is easier to reproduce than complicated curves. In addition, increasing contrast in just the light or dark areas of an image using a curved gamma comes dangerously close to image manipulation in regions of the image rather than across the entire image. Photographers frequently use gamma to describe the relationship between the intensity of output (e.g., photographic density, camera chip voltage) and the intensity of input (e.g., exposure to light with time). The changes in color or intensity between input and output images when displayed in tabular form is an LUT. Figure 2.10 shows an example LUT of colors assigned to each gray level of a 256-grayscale image.
Figure 2.13 Intensity transfer graph. This graph shows the relationship between the values of the pixels before (x-axis) and after (y-axis) a linear operation changes pixel intensities. The upper panel shows the identity relationship where the slope of the line is 1. The slope of the line is the gamma. Increasing the slope increases contrast and tonal range of the image, as shown in the lower panel. Note how changing the slope from 1 (upper panel) to 1.6 (lower panel) changes the output histograms. Decreasing the slope decreases contrast. Diagram by L. Griffing.
2.9 The Intensity Transfer Graph Shows Adjustments of Contrast and Brightness Using Input and Output Histograms
To brighten or darken images simply move the gamma line up or down the y-axis, thereby increasing or decreasing pixel out values, respectively (Figure 2.14A–C). As the line of the intensity transfer graph moves up the y-axis, the values in the processed image become higher, or brighter (see Figure 2.14B), and the output histogram is right shifted to the higher values. During the right shift, the second peak on the original histogram moves “off” the output histogram and into its highest value, and all of the information in that peak is lost in the same way that it would be lost with overexposure. As the line moves down the y-axis, the processed image becomes darker (see Figure 2.14C), and the histogram is left shifted to darker values. A similar loss of information is apparent in the left-shifted histogram in Figure 2.14C. Beware – simple brightening and darkening of images can cause loss of information! Contrast in the image increases as the gamma of the linear intensity transfer graph increases (Figure 2.14D). This makes the shadows deeper and the highlights brighter but results in loss if the y-intercepts of the gamma are too high or too low, as is apparent in the output histogram in Figure 2.14D. The next technique, histogram stretching, increases contrast without diminishing information content by paying very close attention to the input histogram.
Figure 2.14 Changing brightness and contrast with the ImageJ brightness/contrast tool (Image > Adjust > Brightness/Contrast). (A) Unprocessed image of a plant protoplast (see Figure 3.4). (B) Brightened image: the red arrow on the brightness bar shows that it moves to the right. The intensity transfer graph in the dialog box shifts upward (red arrow) from where it is in the original (red line). The output histogram shifts right. (C) Darkened image: the red arrow on the brightness bar shows that it moves to the left. The intensity transfer graph is lower on the y-axis from where it was originally (red line). The output histogram shifts left. (D) Contrastenhanced image: the red arrow on the contrast bar shows that it moves to the right. The slope of the original intensity transfer graph (red line) increases, shifting the upper values of the intensity transfer graph to the left (red arrow) and the lower values to the right (red arrow). The central peak of the original histogram then spreads out across a larger range of gray values. Photo by L. Griffing.
31
32
2 Pixel Values and Image Contrast
Figure 2.15 Gray-level histogram stretching produces contrast enhancement. (A) Original gray-level image of toads, where they lack contrast. (B) Original histogram dialog box with the Levels command (Image > Adjustment > Levels) in Photoshop. To stretch the histogram, upper (white) and lower (black) triangles move in the direction of the red arrows. (C) Contrast-enhanced image of toads. (D) New histogram seen in the Levels dialog box. Photo by L. Griffing.
2.10 Histogram Stretching Can Improve the Contrast and Tonal Range of the Image without Losing Information Changing the slope of a linear intensity transfer graph changes the upper and lower values in the original image to new ones (see Figure 2.14D). When increasing the slope of gamma, a narrow window of pixel values is linearly “stretched” over a broader window of pixel values, as shown in the bottom panel of Figure 2.13. This operation is histogram stretching. Histogram stretching uniformly inserts “unoccupied” gray values into the “occupied” gray levels of the histogram. It makes whites “whiter” and darks “darker.” In Figure 2.15A, there is a limited range of pixel intensities, and the picture is low contrast. By moving the darkest value of the input levels up to where there are pixels and the lightest value of the input levels down to where there are pixels (Figure 2.15B), the new image has higher contrast toads (Figure 2.15C), and the histogram (Figure 2.15D) shows that the values are stretched to occupy the entire spectrum of gray levels. Information is not lost because all the pixel values in the input range are between 0 and 255 in the output range; there is no accumulation of pixels in the overexposure or underexposure regions of the image. The final image has more gray values to represent the image, so small differences in contrast between the object and surround become greater. The final image has a greater tonal range. Histogram stretching matches the grayscale values of the image to the pixel depth. The final image uses all 256 gray levels (or more if the camera has greater than 8-bit pixel depth) instead of only a fraction of them as in the original image. The improvement in tonal range is the main advantage of histogram stretching.
2.11 Histogram Stretching of Color Channels Improves Color Balance The purpose of histogram stretching in color images is not only to enhance contrast but also to balance the color, so that the image has similar tonal ranges in each of the color channels. In Figure 2.16A, the intensity or total luminosity histogram of the original image RGB color image has three peaks, each representing a primary color, with the high intensities in the red, mid intensities in the green, and low intensities in the blue. Histogram stretching each of the color
2.11 Histogram Stretching of Color Channels Improves Color Balance
Figure 2.16 Color balancing a light micrograph of dermal cells on a scale from a red drum fish. (A) Original image and red, green, blue (RGB) histogram in the Levels dialog box in Photoshop (Image > Adjustments > Levels). The three separate peaks of the histogram are the RGB peaks, as revealed from the histogram of each channel (B–D). (B) Image resulting from stretching red channel histogram by bringing up the lower value (black triangle) of the Input levels. (C) Image resulting from stretching histogram of the green channel, in addition to the red channel in (B). (D) Image resulting from stretching the blue channel histogram in addition to the other two channels. (D) Final image and histogram. The histogram shows a single peak in the RGB histogram with the RGB peaks superimposed. The orange condensed pigment granules in the chromophore, and the dispersed pigment granules in the melanophore have higher contrast than in the original image (A). The background color is now white. Photo by L. Griffing.
33
34
2 Pixel Values and Image Contrast
channels enhances contrast and improves white balance or color balance (Figure 2.16B–D). During the histogramstretching process, the image color changes from a slightly darker red in Figure 2.16B to brighter and darker greens in Figure 2.16C to mostly brighter blues in Figure 2.16D. The final image (Figure 2.16E) shows a white, color-balanced background with more contrast for the contracted red pigment granules of the chromophores and the dispersed pigment granules in the melanophore on the fish scale. The idea of color balance is to generate a natural white level in the background and true colors of the object in that light. Tonal range and color balance depend on the spectrum of the illumination and the color sensitivity of the camera (see Sections 9.6 and 9.7). If these are mismatched, then histogram stretching may compensate.
2.12 Software Tools for Contrast Manipulation Provide Linear, Non-linear, and OutputVisualized Adjustment There are other ways to perform histogram stretching and contrast enhancement than those shown in Figures 2.15 and 2.16. Frequently, software provides contrast and brightness (exposure, offset, gamma, highlights, shadows) user interfaces that do not include a histogram or an intensity transfer graph. The amount of information cut off by adjusting contrast is unknown when using such sliders, so avoid them. Interfaces that provide an image histogram (as in Figures 2.15 and 2.16), an intensity transfer graph, or both (Figure 2.17) very precisely adjust brightness and contrast without information loss. The increase in the slope of the intensity transfer graph, or gamma, is readily apparent. Higher gamma is higher contrast. Other interfaces independently adjust the contrast of dark (low-key) and light (high-key) regions of images. Figure 2.18 shows how the poorly color-balanced original picture of the dermal cells of fish scales in Figure 2.16 can be color balanced and contrast enhanced with a user interface that includes many variations of the image. In this case, only the high-key part of the image is adjusted because there are few low-key pixels in the original histogram (see Figure 2.16A). This visual interface provides a color guide within the image for when information is being lost. As shown in Figure 2.18A, regions of over- and underexposure are pseudocolored to highlight the loss (the cyan regions in the “more red,” “more yellow,” and “more magenta” options). Repetitive clicks on the “more cyan” option (Figure 2.18B) and “more
Figure 2.17 Contrast enhancement using a dialog box that has both an image histogram and an intensity transfer graph – the Curves command in Photoshop (Image > Adjustments > Curves). (A) Enhanced contrast of toad image. (B) Enhancing contrast by raising the dark level and lowering the light level of the input pixels (red arrows). Note that increased slope, or gamma value, of the intensity transfer graph. Photo by L. Griffing.
2.12 Software Tools for Contrast Manipulation Provide Linear, Non-linear, and Output-Visualized Adjustment
Figure 2.18 Color balance and contrast enhancement with an image-based dialog box – the Variations command of Photoshop (Image > Adjustment > Variations). (A) Only the highlights are chosen for changing. The original image is the current pick. More red shows cyan spots that are overexposed. (B) After three clicks on more cyan in A, the current pick is now more cyan. (C) After four clicks on the more blue in B, the correct image is now the current pick. Photo by L. Griffing.
blue” option (Figure 2.18C) produces a final “current pick,” which is an improvement over the original but not as easy to obtain (and record the method) as the image with histogram-stretched channels (see Figure 2.16E). Situations arise when there are high- and low-key regions that need more gray levels to represent them (higher contrast) and middle tones that need fewer (lower contrast). As mentioned earlier, scientific publication does not allow just changing the bright or dark regions of an image (see Section 3.9). That said, histogram equalization operates on the entire image and produces non-linear adjustment of the intensity transfer graph (Figure 2.19). This command equalizes the histogram so that there are relatively equal numbers of pixels occupying each intensity region. Compare the histograms of Figure 2.19 with Figures 2.17B and 2.15B. It is much “flatter,” stretching some regions of the histogram and compressing others. Histogram equalization is sometimes the enhance contrast command, as it is in ImageJ, but as seen earlier, there are sometimes better ways to enhance contrast.
35
36
2 Pixel Values and Image Contrast
Figure 2.19 Histogram equalization. A single command (Image > Adjustments > Equalize) produces a flattened histogram where there are relatively equal numbers of pixels at all of the different intensities. If there are mostly mid-tones in the image, then to produce this image, the intensity transfer graph is a non-linear S-curve. Photo by L. Griffing.
2.13 Different Image Formats Support Different Image Modes Color and grayscale image formats have different ways of representing the pixel values in the image, image modes (Table 2.2). Some modes are for printing (CMYK), others are for computer displays (RGB), and still others are useful for analysis (grayscale). They represent differing color spaces with differing gamuts (see Section 4.3). Most scientific applications use grayscale, indexed color, RGB, CMYK, or multichannel modes (see Table 2.1). Grayscale images are simple intensity maps representing pixel depths of 256 gray levels (8 bit), 65,536 gray levels (16 bit), or 4,294,967,296 gray levels (32 bit). Indexed color images represent a color image with a single channel of 256 separate colors, not as combinations of colors. It can represent an intensity spectrum within one color (see Figure 2.10). RGB images have three channels, each with a pixel depth of 8, 16, or 32 bits. If the 8-bit depth channel is used, the image pixels have a total depth of 24 bits (8 bits of red, 8 bits of green, and 8 bits of blue). This generates a color space of almost 16 million colors. Almost all computer displays use RGB displays. Multichannel images contain 256 gray levels in each of several channels and are useful for scientific imaging in which several different channels are acquired at once, such as during confocal microscopy. These 8-bit intensity mapped channels are spot channels. Converting an RGB image to multichannel mode creates RGB spot channels. Digital images come in a variety of file formats, such as jpg, tiff, and raw (see Table 2.1). Different formats support different image modes, depending on the pixel depth and number of channels that they can specify (Table 2.3). The information in the image file is not a simple listing of each pixel, with a value for the pixel in bits. All image files have a header that gives information about the number of pixels in the picture width and height, the number of channels represented, and the pixel bit depth in each channel followed by a listing of codewords that represent the intensity values of each pixel. Codewords are a set of binary values represented by a fixed number of bits. All of this is important information for correct read-out and display. Also associated with the file header is image metadata that gives the date of creation, modification, camera type used, and other EXIF (Exchangeable Image File) information (see Section 1.8).
2.14 Lossless Compression Preserves Pixel Values, and Lossy Compression Changes Them
Table 2.2 Image Mode and Color Space Dependency on Pixel Depth and Channel Number. Image Mode
Bits/Pixel
Channels
Bitmap
1
None
Grayscale
8, 16, 32
Original and alpha
Duotone
8
One for each ink (up to four)
Indexed color
8
One
RGB color (additive color – displays)
24, 48, 96 (8, 16, and 32 bits/channel)
Red, green, blue, and alpha
CMYK color (subtractive color – prints)
32, 64 (8, 16 bits/channel)
Cyan, magenta, yellow, K-black, and alpha
Lab color (device independent)
24, 48 (8, 16 bits/channel)
Lightness, a, b, and alpha
Multichannel
8
Multiple color spot channels, each mapped to 256 gray levels
CMYK, cyan, magenta, yellow, and black; RGB, red, green, blue.
Table 2.3 Still Image File Formats; the Modes, Color Spaces, and Pixel Depths Supported; and Nature of Compression. Format
Pixel Depth and Mode
Compression: Lossy
Lossless
TIFF: Tagged Image File Format
Up to 32 bits per channel; many channels: for fax, 8 bit
Can use JPEG compression for color, CCITT compression for faxes
LZW compression
JPEG: Joint Photographic Expert Group
8 bits per channel; three to four channels Compresses similar features (RGB or CMYK) within 8 × 8 blocks
GIF: Graphics Exchange Format (pronounced JIF)
8-bit single channel, indexed color with 256 colors
PNG: Portable Network Graphics
Up to 16 bits per channel, three channels, one alpha channel; does not support CMYK
BMP: bitmapped picture: developed for Microsoft Windows
Various pixel depths selectable (up to 24)
Lower bit depth values lose information
PICT: picture file developed for Apple Macintosh
Various pixel depths selectable (up to 24)
Lower bit depth values lose information; JPEG compression available
PGM, PPM, and PBM: portable grey, pix, or bit map; originally on UNIX
More than 8 bits per channel, three channels
Not compressed
Not compressed
PSD: Photoshop Document: Used for Adobe Photoshop
More than 8 bits per channel, multiple channels, multiple layers, paths
Not compressed
Not compressed
Indexing color leads to less accuracy and loss of color
Further compression can be done with LZW compression Uses LZ77 variant for compression Lossless compression available
CMYK, cyan, magenta, yellow, and black; RGB, red, green, blue; JPEG, joint photographic experts group; CCITT, compression scheme from the telecommunications standardization sector (black and white only), LZW, Lempel-Ziv-Welch compression; LZ77, Lempel Ziv compression from 1977.
2.14 Lossless Compression Preserves Pixel Values, and Lossy Compression Changes Them Using a smaller amount of information to communicate an image than is in the original file is file compression. Lossy compression removes valuable, unrecoverable information from the image file. Lossless compression removes redundant, recoverable information in the image file. Different file formats have different levels of compression; although some have no compression, others have lossless or lossy compression or both (see Table 2.3). A suffix to the file name following a period specifies the format; for example, image.tiff, describes the file, image, in the tiff format.
37
38
2 Pixel Values and Image Contrast
After acquisition, if the camera does not compress it or process it for display (e.g., with a given, often camera-dependent, color space or gamut [see Section 4.3]), the file format is called raw. The purpose of having a raw format is to save the information from the image sensor with minimum loss and with the metadata intact. Many camera types have a raw format for their pictures, but these files are usually very large because there is no compression. Some cell phones can record raw images, but the default is often jpg. A raw file is a digital negative containing all of the metadata and image values. Not all raw formats are the same; different manufacturers have different raw formats with different suffix names (e.g., crw or cr2 – Canon, nrw – Nikon, and orf – Olympus), some encrypting the information with proprietary software. Consequently, to get everything out of a raw format file, use the proprietary software available from the manufacturer of the camera. The danger of storing raw images for archival purposes is that manufacturers and their software change unpredictably, and certain raw formats may become unreadable over time. It is therefore advisable to save the original raw file, convert a copy to a standard tiff file with lossless compression for archiving, and then export all of the EXIF information to a spreadsheet if the tiff format does not completely import it. Check the EXIF files with an EXIF reader (available as a plugin in ImageJ and in FIJI). The de facto standard for most biological research is the tagged image file format, or tiff, image. There are many different ways to store tiff images, some of which are lossy (see Table 2.3). It is important to use lossless compression, such as the LZW (Liv-Zempel-Welch) compression. Be aware of the possibility that a tiff file of unknown origin may have lossy compression. Another excellent lossless format is png (portable network graphics). It also uses a variant of the LZW lossless compression algorithm (see Table 2.3). The ability of the tiff format to contain up to 32-bit pixels is very important. For most imaging devices on high-end research equipment, the bit depth is greater than 8 bits per channel. Most digital video cameras for research have 12- to 16-bit pixels (see Section 5.11). Most consumer electronic cameras record their images in jpeg format. Jpeg (also designated jpg, short for Joint Photographic Expert Group) companies and academic institutions come together to form two international bodies that determine international standards for information technology and telecommunications. The jpeg format is the international standard for image communication via the internet and other telecommunications. It can handle 32-bit images (four 8-bit channels, e.g., CMYK). It is a lossy compression format, becoming lossier at lower settings. The compression occurs through three operations: a discrete cosine transform in 8- × 8-pixel matrices, matrix multiplication for quantization, and coding of the quantized product. Quantization occurs when the codewords representing pixel intensity levels are set to various lengths. The user can select the level of compression by choosing the amount of quantization (smaller codewords mean more compression), which “tosses out” information about the picture. As shown in Figure 2.20, images degrade during the loss. This photograph starts out with
Figure 2.20 Noticeable differences between jpeg compression images. (A) JPEG compression setting 10 – best image, lowest level of compression. The image, however, already has a low 100-ppi resolving power. (B) JPEG compression setting 5 – slightly degraded image, mid-range compression. (C) JPEG compression setting at the highest compression setting (1) produces a noticeably degraded image. Photo by L. Griffing.
Annotated Images, Video, Web Sites, and References
marginal legibility at 100 ppi (Figure 2.20A). Jpeg compression degrades it more with medium (setting 5) and high (setting 1) compression values (Figure 2.20B–C). In Figure 2.20C, the 8- × 8-pixel compression is visible by its blocky appearance. Other image formats such as gif, bmp, and pict files are typically lossy; however, note that bmp can be saved with lossless compression as well (see Table 2.3). They compress files by representing color with a limited number of channels and pixel depths. Lossy compression is unacceptable for life science images involving future processing and intensity analysis. They are satisfactory for final print output or for web page display but are unacceptable as archived original data unless constrained by the acquisition system.
Annotated Images, Video, Web Sites, and References 2.1 Contrast Compares the Intensity of a Pixel with That of Its Surround Note that the definition of contrast in Figure 2.2 is sometimes expressed as an index of visibility, the ratio of the difference between the object and background intensities divided by their sum. See Slayter, E.M. 1970. Optical Methods in Biology. Robert E. Krieger Publishing Company, Huntington, NY. p. 253.
2.2 Pixel Values Determine Brightness and Color Table 2.1 shows the proprietary and open-source software on the three major platforms. For further information, consult this updated comparison of raster-graphics image editors: https://en.wikipedia.org/wiki/Comparison_of_raster_graphics_editors.
2.3 The Histogram Is a Plot of the Number of Pixels in an Image at Each Level of Intensity User interfaces frequently show histograms graphically. Saving the data for a histogram is possible using, for example, the “List” option in the ImageJ histogram user interface. This approach informed the construction of Figure 2.5.
2.4 Tonal Range Is How Much of the Pixel Depth Is Used in an Image Tonal range also relates to the information capacity of the image. The information capacity (C) of the image is the number of pixels (n) times the binary log of the number of intensity gradations or gray levels (d): C = n (log 2d). The binary log of the number of gray levels is the bit depth. The number of gradations required for presentation to the human eye is only 60, but an object may contain many more meaningful gradations than detectable with the human eye, and these are distinguished or measured with higher bit-depth sensors.
2.5 The Image Histogram Shows Overexposure and Underexposure The top five and bottom five gray-level criteria for over- and underexposure derive from its definition on film. An optimal exposure on film is when the least exposed area is 0.1 density units above the base plus fog (80%–100% light transmission by the film negative); Williams, J.B. 1990. Image Clarity: High Resolution Photography. Focal Press. Boston, MA p. 140. Overexposed regions are greater than 2 density units (less than 1% light transmission by the film negative); Slayter, E.M. 1970. Optical Methods in Biology. Robert E. Krieger Publishing Company, Huntington, NY. p. 463.
2.6 High-Key Images Are Very Light and Low-Key Images Are Very Dark Figure 2.9 © Jenny Ross, used with permission.
2.7 Color Images Have Various Pixel Depths Figures 2. 11–2.12 are from a study reported in Bernhardt, S.P. and Griffing, L.R. 2001. An evaluation of image analysis at benthic sites based on color segmentation. Bulletin of Marine Science 69: 639–653. See also Sections 7.4 and 7.7, Figures 7.11 and 7.18.
39
40
2 Pixel Values and Image Contrast
2.8 Contrast Analysis and Adjustment Using Histograms Are Available in Proprietary and Open-Source Software Downloads for the open-source graphics packages in Table 2.1 are Gimp: http://www.gimp.org/downloads. CinePaint: The Windows version is old, is not updated or supported, and is not recommended. To get the OSX and Linux versions, see http://sourceforge.net/projects/cinepaint/files/CinePaint. ImageJ: https://imagej.nih.gov/ij/download.html. Many plugins are available separately on the plugins page: https:// imagej.nih.gov/ij/plugins/index.html. Downloads for Proprietary freeware: Irfanview: http://www.irfanview.com/main_download_engl.htm Paint.Net: https://www.dotpdn.com/downloads/pdn.html Google Photos: https://photos.google.com Purchase sites for Proprietary packages: Adobe Photoshop: For educational use: http://www.adobe.com/products/catalog.html?marketSegment=EDU. Corel Paint Shop Pro: http://www.corel.com/corel/allProducts.jsp. This is much cheaper than Photoshop.
2.9 The Intensity Transfer Graph Shows Adjustments of Contrast and Brightness Using Input and Output Histograms The intensity transfer graph is the graph of the intensity transformation function. LUTs store intensity transformation functions in tabular form. González, R.C and Woods, R.E. 2008. Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ. p. 85. The term intensity transfer graph is in common use in software guides and web tutorials. See the interactive Java tutorial: Spring, K., Russ, J., Parry-Hill, M. and Davidson, M. Brightness and Contrast in Digital Images. http://micro. magnet.fsu.edu/primer/java/olympusmicd/digitalimaging/contrast/index.html.
2.10 Histogram Stretching Can Improve the Contrast and Tonal Range of the Image Without Losing Information See the interactive Java tutorial: Contrast stretching and histogram normalization by Spring, K., Russ, J., ParryHill, J., Long, J., Fellers, T. and Davidson, M. https://micro.magnet.fsu.edu/primer/java/digitalimaging/processing/ histogramstretching/index.html.
2.11 Histogram Stretching of Color Channels Improves Color Balance This approach is novel. It is not included in standard works such as González, R.C. and Woods, R.E. 2008. Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ. p. 438, who suggest histogram equalization followed by saturation adjustment in the HSI color space.
2.12 Software Tools for Contrast Manipulation Provide Linear, Non-linear, and Output-Visualized Adjustment Histogram equalization assigns a new brightness value, k, to each pixel based on its old value j by counting the number of pixels in the image with a value equal to or less than j and dividing by the total number of pixels, T k = Σ ji=0Ni /T Russ, J. 2007. The Image Processing Handbook. CRC Taylor and Francis. Boca Raton, FL. p. 275.
Annotated Images, Video, Web Sites, and References
2.13 Different Image Formats Support Different Image Modes There are hundreds of graphics file formats. They all start with a signature or “magic number” in their headers that identifies them. Many are device specialized (e.g., all of the raw types), many are software -specialized (e.g., Photoshop .psd), and many are interchange formats (e.g., tiff and pdf). The ones listed in Table 2.3 are the most important. A comprehensive and useful, very detailed, advanced resource that has been on the web since 1994 is Martin Reddy’s Graphics File Format page: http://www.martinreddy.net/gfx.
2.14 Lossless Compression Preserves Pixel Values, and Lossy Compression Changes Them There is a good, easy-to-understand, treatment of this in Efford, N. Image compression. In Digital Image Processing, A Practical Introduction Using Java. Addison-Wesley, Harlow, England. pp. 298–322. See Section 3.10 on just noticeable differences and SSIM, Figure 3.17.
41
42
3 Representation and Evaluation of Image Data 3.1 Image Representation Incorporates Multiple Visual Elements to Tell a Story Effective image data representation often uses the visual confection, elegantly defined by Edward R. Tufte in his book Visual Explanations as an assembly of many visual events selected …from various Streams of Story, then brought together and juxtaposed on the still flatland of paper. By means of a multiplicity of image-events, confections illustrate an argument, combine the real and the imagined, and tell us yet another story (p. 121). Juxtaposing related images and quantifying the data within them drive the story forward. In the primary research literature, they combine with other data to tell the story of a new discovery. In the secondary literature of textbooks, reviews, and grants, they summarize results or illustrate scientific models. When combined with text or mathematical description, they can answer questions such as how many, how often, what color, what size, how fast, and how close to each other. The research articles of many top-tier science journals, such as Nature and Science, routinely incorporate figures with multiple visual events, illustrating a story summarized by a brief text statement at the beginning of the figure legend. Research article summaries and perspectives often explain the story with additional illustrations. These confections are high-quality image data representations.
3.2 Illustrated Confections Combine the Accuracy of a Typical Specimen with a Science Story Important examples of illustrated confections are botanical illustrations (Figure 3.1), including those in field guides and keys. These illustrations serve well in producing a drawing of a perfectly representational typical specimen, incorporating information from many individuals and eliminating their various faults. Like all good illustrations, they provide an image of an intelligently constructed model of the living creature. Although illustration has the potential for exaggeration and caricature, it can also simplify and select features in a meaningful way, thereby clarifying communication. The confection in Figure 3.1 includes the bog asphodel (Figure 3.1B), showing a combination of a detail of the anther in the flower, the entire plant, scale bars showing the size of the anther (6 mm), and a 1-cm scale bar indicating the size of the plant and its flowers. Botanical illustration can magnify important diagnostic features, such as the hairs (trichomes) on the anther, while providing a detailed visual explanation of the characteristics that distinguish it from other similar species, such as color and shape. The depth of field possible with illustration provides a three dimensionality that also aids identification. Related lilies, the star of Bethlehem in both its white and yellow forms, have similar flowers, but their arrangement, color, and size differ from those of the asphodel (see Figure 3.1). Comparing illustrated field guides to photographic field guides, illustrations provide novices with more immediate access to comparative analysis. Variations in lighting and the surrounding environment often make it difficult to clearly reveal important characters with photographs from nature. For example, even though the photograph of the bog asphodel in Figure 3.2 is very good, it does not as clearly reveal the characters of the stem, leaves, and roots as the illustration in Figure 3.1B, and it obscures parts of the individual flowers, the region of interest (ROI). Labeling and scale bars are often Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
3.2 Illustrated Confections Combine the Accuracy of a Typical Specimen with a Science Story
Figure 3.1 Bog asphodel illustration. A) Flower. B) Whole plant, scale bar 1 cm. C) Pistil. D) Flower spike. E) Anther, scale bar 6 mm. F) Mature flower spike with seed pods. G) Seed pod. H) Carpels. I) Seed. From Sturm, J. 1796. Deutschlands Flora in Abbildungen. Vol. 1 - Plate 41. https://comons.wikimedia.org/w/index.php?curid=968517 modified by Kilom691.
missing from photographs and, when present, are difficult to clearly distinguish from complicated natural backgrounds. On the other hand, photographs from nature can reveal an environmental context for the species of interest. Hence, the use of different media in image confections depends on the story being told. Popular birding and scuba field guides use illustrated confections to highlight the differences between similar birds or fishes. Aligning illustrations of similar species side by side visually emphasizes their differences (Figure 3.3A). Just this simple juxtaposition of multiple illustrations of similar species (or other objects) can tell a story (see Figure 3.1). Highlighting these differences with text clarifies the differences. Label lines across the bird would interfere with the drawing itself (Figure 3.3B). By providing a map of the seasonal geographic distribution of the species of interest and a description of the bird song, the confection also tells the science stories of migration, overlapping territories, potential for hybridization, and changes in the population over time with climate. A different example of an illustrated confection in the life sciences is the cladogram, the graph of a group of organisms descended from a common ancestor. The difference in the number of characters (either morphological or genetic) is the x-axis of the graph, and the vertical dimension simply provides room for the different species names (Figure 3.4). Although it is a branched structure, more than two branches from a single branch indicates uncertain evidence. This cladogram tells the story of the relationship of the earliest modern primate, Teilhardina asiatica, to other primates (euprimates) and non-primates (the out-group, identified on the vertical). Good design techniques are apparent in the color-coded branches that tell the story of diurnal or nocturnal habit, asterisks that show extant taxa, and marginal annotation that indicates groups within the euprimates and the skull of T. asiatica itself. A point of confusion might be that the scale bars in the cladogram and
Figure 3.2 Bog asphodel. Narthecium ossifragum, flowers Black Isle, Scotland. Photograph by Aroche. Creative Commons 2.5. https://en.wikipedia.org/wiki/ Narthecium_ossifragum#/media/File:Narthecium_ ossifragum3.jpg.
43
44
3 Representation and Evaluation of Image Data
Figure 3.3 Figures for identification of the American tree sparrow and the field sparrow. (A) Comparison of the birds in the genus Spizella. (B) Comparison of flight, coloration, song, and distribution of the American tree sparrow and the field sparrow. From Sibley, D.A. 2014. The Sibley Guide to Birds. Second Edition. Alfred A. Knopf, New York, NY. pp. 504, 515.
3.3 Digital Confections Combine the Accuracy of Photography with a Science Story
Figure 3.4 Cladogram and skull reconstruction of the earliest modern primate. (A) Strict consensus of 33 equally parsimonious trees with the optimization of activity patterns. Tree length = 2,076, consistency index (CI) = 0.3685, retention index (RI) = 0.5519. Asterisks denote extant taxa. Daggers denote the terminal taxa presenting reconstructions of activity patterns. Blue, diurnal; green, nocturnal; orange, equivocal. Scale bar, 30 characters. (B) Reconstruction of the skull of Teilhardina asiatica sp. nov. (IVPP V12357) with a gray shadow indicating the missing parts. Scale bar, 5 mm. From Ni, X., Wang, Y., Hu, Y. et al. 2004. A euprimate skull from the early Eocene of China. Nature 427: 65–68. https://doi.org/10.1038/nature02126. Springer Nature. Used with permission.
skull reconstruction are different, so labeling them with their units in the figure itself, rather than the figure legend, would be good practice in this case. When Tufte analyzed this cladogram, he pointed out the good design features but worried that the description of the statistical approaches used in its construction as a “strict consensus of parsimonious trees” has two potential meanings, one being very narrow and technical, the other being a broad feel-good pitch (who would object to a consensus, let alone a strict one?). Avoid jargon if possible, particularly if it might give a false impression.
3.3 Digital Confections Combine the Accuracy of Photography with a Science Story There are large collections of preserved plants, birds, insects, invertebrates, and vertebrates in museums, universities, and private holdings around the world. The preservation differs depending on creature, when it was collected (more recent collections may use freeze preservation), and the stated goal of the collection (e.g., biomolecular analysis or
45
46
3 Representation and Evaluation of Image Data
imaging analysis). Mounted collections of insects and ethanol-embalmed creatures in jars accompany the skins and other taxidermy of larger feathered and furry animals. Herbarium vouchers are the currency of plant herbaria collections around the world. These are dried, pressed, and preserved plant specimens mounted on sheets of archival paper. Vouchers may not include all of the plant; typical leaves and flowers, the root system if small, the stem and branches, and a packet of seeds may make up the voucher. There is also metadata associated with each specimen – who collected what, where, and when (specimen-level data capture). Of course, as these dead creatures age, they change, losing their color and flexibility. There are now worldwide efforts to digitize these collections because they provide a record of plants and animals over hundreds of years. Some of the specimens are holotypes, original preserved specimens used as the basis for the description of an entire species. The task is daunting. Insects alone represent more than a million described species and more than half a billion preserved specimens. Robot-assisted image acquisition uses a variety of imaging tools covered later in this book, such as macro- and microphotography (Figure 3.5), micro computed tomography (CT) scanning (Figure 3.6; see Section 6.7), and confocal microscopy (see Sections 6.9 and 17.10). Digital searchable databases store the results. For example, iDigBio.org provides image documentation of many plants, insects (see Figure 3.5), and other invertebrates, and morphosource.org provides micro-CT data sets of many vertebrates (see Figure 3.6) (usually archived as image slices; see Sections 13.6 and 13.8 for three-dimensional [3D] reconstruction).
Figure 3.5 Archived images of the Old-World swallowtail butterfly, Papilio machaon linnaeus from collections held by the Museum of Comparative Zoology at Harvard University available from iDigBio.org. The images include a machine-readable QR (quick response) code, a ruler, and a color card. (A) Specimen from the Mekong River Valley in China’s Yunnan Province. (B) Specimen from Switzerland. Museum of Comparative Zoology, Harvard University. Retrieved from A) https://www.idigbio.org/portal/mediarecords/7a49372c-293b4248-bcc7-1e805ce59da5. B) https://www.idigbio.org/portal/mediarecords/00e45814-63a0-4a3e-b44c-86e40fa7e291.
3.3 Digital Confections Combine the Accuracy of Photography with a Science Story
Figure 3.6 Two three-dimensional (3D) reconstructions of micro computed tomography scans of the common leopard frog from morphosource.org, Morphosource media_169946. (A) Ray-traced 3D image of the body and soft tissue. (B) 3D reconstruction of the skeleton of the frog in A. Images reconstructed using the Fiji plugin, 3D Viewer by L. Griffing.
Some of the questions that arise about digitization of collections are: What scale and resolution are best for recording the sample? What metadata should reside on the image? What type of symbols should represent geolocation, color, or stage of growth and development? What is the best format for storing other metadata? What standards should apply to recording the separate parts of larger organisms or recording the images in three dimensions? The answers to these questions depend on the story being told, the story behind the scientific model that serves as the basis for questions about the images and the life they represent. If the story is a description of an adult plant, Figure 3.7 provides an accurate, calibrated, detailed example. It is a botanical image confection of the flax plant that uses highquality digital imaging at several different scales to show the exact position and size of the different plant parts. It shows the important stages of the growth and development of the flower and seed. There are icons for the sex of the plant and its annual growth habit. Color and lighting play a large role in this work. Most raw data from digital archives include a color calibration card and an internal ruler in the field of view (FOV) of the photographed specimen (see Sections 5.10 and 9.7). In Figure 3.7, there is a standardized color scheme based on the color standards adopted by the Royal Horticultural Society to describe flower and plant colors. These are important to relate the colors in modern botanical illustration to the digital botanical confections. It is important to maintain optimal storage conditions for both digital information and the preserved specimens themselves. Although plant vouchers and other preserved specimens often lose their color, preserved specimens are a potential repository of DNA and other biomolecules. However, without maintaining optimized storage conditions, these molecules may not be extractable intact. Likewise, the digital information
Figure 3.7 Digital voucher of the flax plant. Note the color standards in the left; the scale bars for every feature; the symbols in the upper left, which in this case show the plant is monoecious and annual; and that the flowering time for the plant is in May and June. A) Lower portion of the stem including the root system. B) Top portion of the stem with opening flower. C) Bud. D) Top portion of the flowering stem with buds. E) Single flower with front petal removed. F) Flower with all petals and calyx removed. G) Two views of the stamen. H) Gynoecium. J) Petal. K) Sepal. L) Calyx. M) Side view of the flower. N) Flower from above. P) Upper surface of a leaf. Q) Lower surface of a leaf. R) Side view of a fruit. S) Immature fruit cross-section. T) Top portion of a fruiting stem. U) Seeds. From Simpson, N. and Barnes, P.G. 2008. Photography and contemporary botanical illustration. Curtis’s Botanical Magazine 25: 258–280. Used with permission.
47
48
3 Representation and Evaluation of Image Data
may be lost when operating systems or publication software change. Routine updates of preserved digital documents are necessary, just as preservation of animal and plant collections requires routine maintenance.
3.4 The Video Storyboard Is an Explicit Visual Confection Researchers often collect data as video, and reporting their work in a video format is now possible online (see Sections 5.8, 5.9, 12.1, and 12.2). Furthermore, video in the appropriate format helps to explain science and tell its story to people outside the field. Videos usually tell a story with an introduction, body, and conclusion. To plan a video, directors make an illustrated storyboard from the script. This is a series of images that tell the story as a timed series of scenes in the movie. Video image confections bridge the worlds of space and time. The best way to see how a storyboard works is by example. Figure 3.8 shows how a storyboard is a visual video plan, describing the scenes and transitions in the video “Fantastic Vesicle Traffic” (see the images and annotated reference list for the URL of the brief movie). The video tells the story of planting a seed of a transgenic plant expressing a gene that codes for a fusion between green fluorescent protein (GFP; see Section 17.2) and a small guanosine triphosphate–binding protein on the surface of secretory vesicles. The secretory vesicles labeled with GFP bind a motor protein, myosin, and move along actin filaments. This, briefly, is the story told by this 2-minute movie. However, for most storyboards, the storyboard exists before shooting the movie. They are illustrations of what the story ideally would look like from different camera points of view (POVs). A storyboard is a plan for a movie used by the director to decide how a movie will be shot and edited. Hence, the storyboard also shows edits, cuts, and transitions between the scenes. The storyboard translates the screenplay into timed visual elements. The storyboard shows the time of first appearance of a scene, its importance, and its duration. It sets the pace of the video by allocating time for each cut. Transitioning recognizably from one scene to the next between cuts maintains the flow of the story. Video editing uses transitions and POV to effectively tell the story. There is a maxim for movie directors and video producers: “Think like an editor.” This holds for most storytelling.
3.5 Artificial Intelligence Can Generate Photorealistic Images from Text Stories An artificial intelligence (AI) approach (there are several) that can generate images from text snippets (stories) is a neural network (see Section 11.7), trained on a variety of image and text pairs, called contrastive language image pretraining (CLIP). Convolutional neural networks can identify images with training (see Section 11.7), but CLIP generates images from text. The algorithm uses both the style and syntax of the text. For example, the Dall · E 2 implementation of CLIP can produce the photorealistic image in Figure 3.9 from the text snippet “a closeup of a handpalm with leaves growing from it.” The photorealism of some Dalle · E 2 compositions can pass the Turing test, that is, most people would guess that they are real pictures, not generated by computer. They are deep fakes. There is, of course, the potential for the fraudulent claim that such a computer-generated image is real. If life science researchers use any of these images, clear statements about their method of generation by computer must accompany them. In the ideal situation, the code could be archived or published as part of a Jupyter notebook (see Section 7.1) so that others could reproduce the same image with the same software. These images could be a powerful aid for modeling potential imaging outcomes from experiments or for planning still or video compositions. A video storyboard (see Figure 3.8) is a planning tool composed of illustrations made prior to video production to give direction to the camera operator, actors, special effects team, and film editors. Hand-drawn illustrations for the imagined scenes in a video sequence are time consuming, even for those skilled in illustration. An AI approach, such as Dall · E 2, could supply such illustrations from text snippets, making storyboarding easier for the research scientist. As computer-generated photorealistic images become more common, it becomes even more important to communicate the believability of the images presented in research. This has always been an issue when comparing illustration with photography. Illustration is inherently less believable because it is less objective than photography, giving rise to the old saw “the camera does not lie.” However, the camera lies all the time! Quite different views of the same event arise from changes in lighting, sample preparation, camera angle, and calibration. Representative images make the camera more truthful.
3.5 Artificial Intelligence Can Generate Photorealistic Images from Text Stories
Figure 3.8 Video storyboard for the “Fantastic Vesicle Traffic” video. A cut-away edit used between scenes 1 and 2 is to hands planting the seeds. This theme develops with a zoom-in cut showing tweezers planting seeds. Scene 5 frames the seed, producing fill and reveal framing, where the seed fills the field of view (FOV), then the frame reveals the activity—planting and germinating. A cut-away from the petri dish shows the scientist in scene 7, then zooms in to show the individual seeds before they germinate. A match cut matches up seeds between two scenes. Then seed germination shows a slice of life in time-lapse. Scene 11 uses a spiral transition to the brightfield image of the growing root hair. There is a multi-take edit in scene 12, photographing the seedlings with different optics, brightfield and fluorescence microscopy. Scene 13 zooms in to show real-time movement of vesicles in the root hair. A final zoom-in transition leads to the thesis of the video—the movement of the vesicle along actin strands—all this in 1 minute! The transition to the three-dimensional animation in scene 15 uses virtual camera effects. Placing the virtual camera in front of an actin filament model in scene 16 allows the vesicle and myosin molecule to walk into the FOV. This is walk and reveal framing. A pull-focus spiral transition leads to an image of the growing seedlings in scene 19 followed by a flash-back cut to the seedlings in a petri dish. The final scene before the credits is a look-at edit, wherein the scientist looks at the seedling and shows his response. Still images used with permission from Daniel Wangenheim.
49
50
3 Representation and Evaluation of Image Data
Figure 3.9 Photorealistic composition generated by computer using the open.ai program, Dall · E 2. The text snippet for generation of this image was “a closeup of a handpalm with leaves growing from it.” From Ramesh, A. Dhariwal, P., Nichol, A., et al. 2022. Hierarchical textconditional image generation with CLIP latents. arXiv. https:// doi.org/10.48550/arXiv.2204.06125.
3.6 Making Images Believable: Show Representative Images and State the Acquisition Method Figures 3.10, 3.11, and 3.12 are micrographs from scanning electron microscopy (SEM) that tell the stories of the distinguishing features of red and white blood cells and, in Figures 3.11C and 3.12, their relationship to other platelets and fibrin in a blood clot. How many similar images are necessary before the image becomes representative? This question goes to the heart of imaging as a form of data representation. Take images from reproduced experiments (usually greater than three) and state that number and the number of within-experiment replicants that the images represent. A representative image from multiple repeats of an experiment makes it more believable. Some images require fewer replicants, as in Figure 3.10, in which the large FOV provides many similar red blood cells, making their biconcave shape more believable. For features within images, statistical sampling rules apply (see Section 7.4).
Figure 3.10 The figure legend for this image is “a scanning electron micrograph of human red blood cells. The cells have a biconcave shape and lack a nucleus and other organelles.” From Alberts, B., Raff, M., Lewis, J. et al. 2002. Molecular Biology of the Cell. Fourth Edition. Garland Science, Taylor and Francis Group New York, NY. p. 600.
3.7 Making Images Understood: Clearly Identify Regions of Interest with Suitable Framing, Labels, and Image Contrast
Figure 3.11 Textbook examples of blood cells with their original captions. (A) Red Blood Cells. Oxygen is carried in your blood by red blood cells. These cells are specialized for this function. Unlike other cells in the body, they have no mitochondria or nuclei. (B) White Blood Cells. A whole army of cells in the blood protect your body against disease. These are the white blood cells. When you get sick, the number of white blood cells increases. (C) Platelets. Colorless bits of cells called platelets, cause your blood to clot. When a blood vessel is injured, proteins in the plasma form long sticky threads called fibrin. The fibrin traps platelets, which then collect, form a clot, and plug the hole. (D) The red blood cells on the left have been magnified 1,300 times. How do they compare with the ones you saw through the microscope? (E) The white blood cells on the right have been magnified 1650 times. A–C from DiSpezio, M.A. 1999. Exploring Living Things (Science Insights). Scott Foresman— Addison Wesley, Menlo Park, CA p. 464, D–E from Photo Researchers Inc. Used with permission.
Figure 13.10 shows an artifact-free representation of red blood cells. It is believable because all of them are very similar, and they are not artifactually clumped or damaged by the processing. Figures 3.11A and 3.11D show red blood cells stacked like coins, the “rouleaux” artifact common in blood smears when the plasma proteins are elevated through evaporation. Rouleaux is also common in several blood pathologies. In Figure 3.11D, the apparent blebbing or crenulation of some of the cells is another artifact of preparation and a symptom of blood pathology arising, for example, from a venomous snakebite. These damaged cells make the claim that red blood cells are biconcave less believable. Scanning electron micrographs of the type in Figures 3.10 and 3.11 only reveal the surfaces of cells not their interiors. Therefore, to make the point that red blood cells do not have nuclei or other internal organelles while referring to these images is misleading. The audience would not see these organelles even if they were present. As in Figure 3.10, the caption for Figure 3.11 should state that these images are from the scanning electron microscope that reveal shape differences between red and white blood cells but not internal differences. Furthermore, the caption to Figure 3.11D calls for students to compare these images with the ones that they have seen, presumably with the light microscope. These two microscopies do not image the same thing, and without some form of explanation, students could easily be confused or, worse, disbelieve their own light microscopy. Simply including the imaging method and its limitations in the caption avoids such confusion.
3.7 Making Images Understood: Clearly Identify Regions of Interest with Suitable Framing, Labels, and Image Contrast Figure 3.12 shows how post-processing a single original SEM image (see Figure 3.12A) achieves a successful representation. The story here is to represent different blood cell types while also showing their association with each other in a blood clot. The first step is to identify the ROI (see Section 1.4). This ROI contains a concave red blood cell, a white blood cell, and a fibrin–platelet tangle. The grayscale labels in the grayscale ROI (see Figure 3.12B) have low contrast. However, in Figure 3.12C, pseudocoloring improves the contrast of the labels. Pseudocoloring SEM is a common practice but often is not needed (see Figure 3.10) and may give novices the false impression that SEM records color. Converting the grayscale image to an indexed color image (see Section 2.7, Figure 2.10 and Section 2.13, Table 2.2) is the best technique for pseudocoloring.
51
52
3 Representation and Evaluation of Image Data
Figure 3.12 Scanning electron micrograph (SEM) of a blood clot highlighting red blood cells (RBCs), a white blood cell (WBC), and strands of fibrin with entangled platelets. (A) Original SEM micrograph, with the region of interest highlighted. Original scale bar in metadata line is 10 µm. (B) Flipped, cropped, resized, and labeled image. The new scale bar is 5 µm. (C) Magenta pseudocolored image of B with high-contrast labels. (D) Partial red and yellow pseudocolored image with a WBC remaining white in grayscale. Color range selection tools (e.g., the magic wand in Photoshop ) can isolate connected colored objects (see Section 7.7). The pseudocolor distinguishes the different cell types. RBCs are red, WBCs are white, and the fibrin–platelet tangle is yellow-orange. The arrowhead points to a platelet in a fibrin tangle. Scale bar is 5 μm. Images by David Gregory and Debbie Marshall. Attribution 4.0 International (CC BY 4.0).
The pseudocolor palette is the indexed color lookup table, which in Figure 3.12C is the spectrum of different intensities of a single color, magenta. A different pseudocolor palette eliminates the need for text labels altogether. Text labels can obscure image features (see Section 3.9), and if they do so, eliminating them or removing them from the picture is important. Pseudocoloring the red blood cells red, the white blood cell white, and fibrin–platelet tangle yellow (see Figure 3.12D) improves their contrast and color codes cell identity. Compare the pseudocolors of Figure 3.12D with those in Figure 3.11. The pseudocolor of white blood cells is yellow in Figure 3.11 and white in Figure 3.12D. Coloring white blood cells yellow in Figure 3.11 may confuse a novice audience (if they are yellow, why call them white?). The red blood cell pseudocoloring in Figure 3.11 also has flaws. In Figure 3.11A, it is incomplete with magenta and grayscale features in the background. In Figure 3.11D, there is a red haze “outside the lines” of the edges of the cells, probably painted in manually. In Figure 3.11C, the platelet–fibrin tangle is not pseudocolored, but red blood cells are. The caption refers to the colorless regions as platelets with no identification of platelets and fibrin threads in the dark film (which could equally well be the mounting adhesive on the SEM sample holder; see Section 19.5). The fibrin thread tangle contains platelets in Figure 3.12D, and platelet form is better appreciated from Figure 3.13, which does not need pseudocolor for comparing platelets with red blood cells. Manual pseudocoloring, in which the colors are painted in with a digital paintbrush, often produces contouring, contrast generated by abrupt intensity or color transitions. Contouring can also occur during contrast enhancement and feature sharpening (see Section 10.4). Avoid it. It is easy to detect, obscures dim features, and diminishes the believability of the image. Pseudocoloring is common in fluorescence microscopy. Many fluorescence systems use filter sets to collect different colors with a device that simply records their intensity. These different colors can make up different channels, such as the red, green, and blue channels (see Sections 2.7 and 2.13) in a color image (Figure 3.14). The color represented in each channel may not be the color of the fluorescence seen by eye but a pseudocolor. Combining the channels produces a multicolored image. This technique is multiplexing (see Sections 17.5 and 17.8) and may include many more than three different colors or channels.
3.7 Making Images Understood: Clearly Identify Regions of Interest with Suitable Framing, Labels, and Image Contrast
Figure 3.13 A Scanning electron micrograph of a platelet and a red blood cell showing the form and size of a platelet (left). Scale bar = 1 μm. Images by David Gregory and Debbie Marshall. Attribution 4.0 International (CC BY 4.0).
Figure 3.14 Multiplexed image of circulating tumor cells marked with cytokeratin against a background of white blood cells. Each row shows a colored merged image from the grayscale signals from the blue fluorescent DNA dye, DAPI (4′,6-diamidino-2-phenylindole), a red fluorescent antibody to cytokeratin, and a green fluorescent antibody to CD45, a cytoplasmic white blood cell marker. (A) Typical circulating tumor cell. (B) Circulating tumor cell that is the same size as surrounding white blood cells. (C) Circulating tumor cell aggregate. Scale bar = 10 μm. Creative commons. Werner S.L. et al. 2015 Analytical validation and capabilities of the epic CTC platform: enrichment-free circulating tumour cell detection and characterization. Journal of Circulating Biomarkers 4: 3 doi: 10 .5772/60725. Used with permission.
Brightfield microscopy, microscopy that uses white light illumination, unlike SEM, can use cameras to record the true color of a biological specimen. However, along with the virtues of color come its vices. A green background (Figure 3.15A) occurs when using a complementary green filter to increase the contrast of red objects (see Section 2.1). Unfortunately, this contrast increase also changes the color of the objects. Not achieving correct white balance is probably the most common error in scientific photography (see Sections 2.11 and 9.7). Images should have a white background and represent the true color of the object. A yellow-orange or red background (Figure 3.15B) occurs when using a tungsten light source while the camera is set to daylight white balance. A blue background (Figure 3.15C) occurs when using a white light-emitting diode
53
54
3 Representation and Evaluation of Image Data
Figure 3.15 Light micrographs of blood cells and platelets. Original captions: (A) These platelets have been magnified 1000 times. Platelets are much smaller than red and white cells. From Capra, J and Jefferson County Public School District. 1991. Middle School Life Science. Kendall-Hunt Publishing Company, Dubuque, Iowa. (B) Two white blood cells are shown among many red blood cells. Platelets are cell fragments. From Purves, W.K. et al. 2001. Life: The Science of Biology. Sixth Edition, Sinauer Associates and W.H. Freeman and Company, New York, NY. Figure 18.2. (C) A light micrograph of a blood smear stained with the Romanowsky stain, which colors the white blood cells strongly. From Alberts, B., Johnson, A., Lewis J., et al. 2002. Molecular Biology of the Cell. Fourth Edition. Garland Science, Taylor and Francis Group, New York, NY. p. 1285.
(LED) source while the camera is set to tungsten white balance (although in this case, it might be residual Romanowsky stain). Capturing the true color of an object requires using some form of calibration standard for the camera, such as the color card (see Figure 3.5) or an 18% gray card (see Section 9.6). Software solutions are available for images taken with incorrect white balance (see Figures 2.16 and 2.18, Section 2.11), but it is best to get it right in the original image. In an adequately enlarged image, the main object fills the frame (see Section 1.4). There are many different ways to frame cells. The frame should not cut off many of the cells in the image, as is done in Figure 3.11D. Circular framing in Figure 3.15A occurs when the FOV of the camera exceeds the size of the image projected through the microscope tube. This is a form of vignetting. Avoid vignetting; journals do not accept vignetted images. Trim vignetted images to remove the blank field region to produce a rectangular image. Likewise, out-of-bounds photographs with objects outside the frame (see Figure 3.11B) might be useful for a pseudo-3D effect in art, but research publications do not include them. Compare the labeling in Figures 3.15B and 3.15C. When composing an image for publication, arrange the frame and image to leave room for labeling. In both of these images, the labels reside outside of the frame and do not obscure the features within the frame. However, for professional publication, the lines and text or call-out labels indicating the features of interest should be thin and muted, as in Figure 3.15C, unlike the comic speech bubbles in Figure 3.15B. Using explanations within the context of the label is common in textbooks and reviews (see Figure 3.3) but rare in the primary literature. Brief labels should draw the eye to the features of interest while providing identification. Labels should not use bit-mapped fonts (see Section 1.1). Software that uses vector graphics fonts, such as Adobe Photoshop and Illustrator and Inkscape, are excellent tools for preparing images for presentation. Vector graphics produce image labels that are scalable without pixelation.
3.8 Avoid Dequantification and Technical Artifacts While Not Hesitating to Take the Picture
3.8 Avoid Dequantification and Technical Artifacts While Not Hesitating to Take the Picture Scientific imaging needs to include quantification, and for most imaging, this means having a scale bar or internal standard for size. There are other standards for object color and intensity (see Figure 3.5 and Section 9.6). Omitting scale bars and standards, image dequantification, leads to misunderstanding by the audience and misrepresentation of the data. In Figures 3.11A–C, there is no indication of size, but the white blood cells are represented as being about 10 times the size of red blood cells, clearly a misrepresentation based on images containing both white and red blood cells (see Figures 3.12 and 3.15B and C). From Figures 3.9 and 3.12D, the red blood cell is 5–7 μm in diameter. From Figures 3.12D and 3.15B and C, white blood cells vary between 8 and 15 μm in diameter, while platelets are about 2 μm in diameter (see Figure 3.13). When both red and white blood cells are present and the images in the confection have different magnifications (Figure 3.16), the relative size of the different white blood cell types is not immediately obvious in the absence of scale bars. Contrast this with a single FOV that includes the different white blood cell types along with red blood cells (see Figure 3.15C) and has a scale bar. Simply stating the magnification of the original image, as done in Figures 3.11D and E and 3.15A, is inadequate for two reasons. First, as the images are copied and re-printed, the magnification can easily change. Digital resizing can also alter the true width-to-height ratio, the aspect ratio, of the image. So, it is important to be very careful that both dimensions are equally enlarged. In this regard, a two-dimensional scale bar is superior to a one-dimensional line but is rarely done. Second, even if the stated magnification is correct, to know the exact size of an object requires measuring the printed image of the object and calculating the actual size from the stated magnification. Not many readers would bother. The gallery of images used in an immunology textbook (see Figure 3.16) has technical errors in panels A, B, and D in addition to the absence of scale bars. Compare this gallery with the single image in Figure 3.15C that is free of technical problems (outside of the blue background). Panel A is out of focus and has an artifactual line above the blurry lymphocyte. One way to quantify the relative focus of an image is to determine the pixel standard deviation from the image histogram (see Section 2.3). Images with low standard deviations, such as in Figure 3.16A, have more blur than those with higher pixel standard deviation, such as Figure 3.16C. Panel B has another line that looks like a shadow from misapplied adhesive
Figure 3.16 A comparison of blood cell types from an immunology textbook. Analysis of pixel standard deviation (SD) of five sampled points in each image. (A) Lymphocyte (SD, 58.4). Out of focus. Large scratch above lymphocyte. Red blood cells (RBCs) in rouleaux. (B) Basophil (SD, 66.5). Large mark to left of basophil. (C) Monocyte (SD, 62.9). (D) Polymorphonuclear leucocytes (SD, 61.8). Deformed RBCs. No scale bars are present. All are at different magnification based on the sizes of the RBCs Adapted from Hood, L.E. et al. 1984. Immunology. Second Edition. Benjamin/Cummings Publishing Company, Inc., Menlo Park, CA. p. 5.
55
56
3 Representation and Evaluation of Image Data
tape and low cytoplasmic contrast outside of the granules in the basophil. Panel D has malformed red blood cells. These technical errors arise during sample preparation as in panel D (poor smear technique), initial image acquisition as in panel A (out of focus), or post-production as in panels A and B (contamination with mounting adhesives). The challenges of taking good pictures in biological research can come in many forms, not the least of which are keeping the equipment very clean and obtaining a sample that is not contaminated. However, some forms of contamination or dirt in the optical system are unavoidable. Do not hesitate to capture “messy” pictures of important, reproducible data. That said, even in unavoidably compromised optical systems, digital subtraction of a field without the sample can remove dirt, scratches, and other contaminants in the optical path (see Section 10.5), while proper Köhler illumination also removes contaminants from the optical path (see Section 9.9, Figure 9.19). Background subtraction requires detailed reporting in the publication. If there is good information in an image, even if there are technical flaws, the image is valuable. The data within the image are much more important than getting a beautiful, technically perfect image. The work of scientists is to produce usable data. Usable data are not necessarily aesthetically pretty, but there is a component of aesthetics that allows the captured image to be maximally useful for analysis and for publication. The rule is “when in doubt, take the picture!”
3.9 Accurate, Reproducible Imaging Requires a Set of Rules and Guidelines The professional scientific community abides by several image-processing rules. Adoption of these rules was the consequence of several cases of scientific fraud that occurred using digital techniques, “photoshopping.” 1) Do not enhance, obscure, move, remove, or introduce any specific image feature. By adding or subtracting features in your image, you are committing scientific fraud, and the punishments can be severe. Figure 3.17A shows several new cells introduced into the image after acquisition (those with black backgrounds in Figure 3.17B). There are digital techniques for background subtraction and removing dirt and noise, but these operate on the whole image, not on a specific feature. 2) Explicitly state (in the figure legend) and demarcate (e.g., by using dividing lines) the grouping of images from different parts of the same gel or from different gels, fields, or exposures. This rule addresses problems of the addition of bands on macromolecule separation gels or the addition of cells from different photographic fields. False juxtaposition or combining of features from separate images (a common practice in the paparazzi industry) is fraudulent is science. However, explicit and well-documented forms of juxtaposition or “cut-and-paste” composition during the creation of image confections are quite legitimate and are some of the best means of visual science communication.
Figure 3.17 Fraudulent altering of a field of cultured cells by cutting and pasting. (A) Altered, manipulated image. (B) Contrast enhancement of the low-key region of the image reveals the cut-and-pasting modification. Many journals have an automated screen such as this to check all images submitted for publication. From Rossner, M. and Yamada, K.M. 2004 What’s in a picture? The temptation of image manipulation. Journal of Cell Biology 166: 11–15.
3.10 The Structural Similarity Index Measure Quantifies Image Degradation
3) Apply adjustments of brightness, contrast, or color balance to the whole image without obscuring or eliminating information present in the original. Just like adding or subtracting features, contrast enhancement of single features can easily lead to misinterpretation. Record any specific adjustments of low- and high-key regions of an image (i.e., nonlinear adjustments) and state the values in the figure legend or methods section of the article. In addition to the rules that address fraud are guidelines that ensure reproducibility and accuracy. The most important guideline is the proper and complete reporting of image processing steps. As mentioned earlier (see Section 2.8), some algorithms (often those that are proprietary) can be a black box, where the steps are not published or are unclear. Reproducibility requires transparency in the acquisition, storage, and processing of data. Maintaining original data archives stored without data loss and using open-source software with archived versions for image processing go a long way to assuring reproducibility. However, it is also important to archive all the steps and the values used in these steps in an image processing sequence or pipeline (see Section 7.1). The accuracy of standard image processing steps, such as noise reduction in an image, varies with the algorithm used. Hence, an additional important guideline is to assess the accuracy of the processing algorithm when several are available (e.g., whether to denoise with Gaussian smoothing, median filtering, or non-local means denoising).
3.10 The Structural Similarity Index Measure Quantifies Image Degradation In Figure 3.18, blurring degrades an image. Can you see the difference? Human visual perception is not a constant. It is different between different individuals (e.g., astigmatism as in Section 4.1 and color blindness as in Section 4.2) and changes as people age. Experienced people may see more than inexperienced, hence the concept of the trained eye, in which people with more experience looking for something in images can see it more clearly. The just-noticeable difference (JND) is a measure of the subjective judgment of a group of people. JND is the probability that a group of people will see differences between images of higher and lower quality. A more objective quantification of image degradation is the structural similarity index measure (SSIM). The SSIM measures the similarity between an unprocessed, uncompressed, or distortion-free reference image and an altered test image. The test image could be either the same image, only processed, or an image of the same object with different optical settings. For example, Figure 3.18A is the reference for the blurred images in Figure 3.18B and C. The SSIM value, 0.4936,
Figure 3.18 Structural similarity index measure (SSIM) comparisons between blurred and non-blurred indexed color images. (A) Reference image. (B) Gaussian blur of reference image using 1-pixel radius. SSIM = 0.4936. (C) Gaussian blur of reference image using a 3-pixel radius. SSIM = 0.0898. These values are from the SSIM plugin for ImageJ. Photograph by L. Griffing.
57
58
3 Representation and Evaluation of Image Data
for the image Figure 3.17B, blurred using a 1-pixel radius Gaussian blur (see Section 10.4), is higher than the SSIM value, 0.0898, for the blurrier image (Figure 3.18C) blurred with a 3-pixel radius Gaussian blur. The less similar, the lower the SSIM value. In image deconvolution, an operation that decreases the optical blur intrinsic to optics (see Section 10.8, Table 10.2), even the best deconvolution operations result in relatively low SSIM values compared with an unblurred reference image. The SSIM employs human perceptual properties such as luminance masking and contrast masking. Luminance masking means that distortions tend to be less obvious in regions of high intensity. Contrast masking means that distortions become less visible when there is “texture” to the image. As the image rescales, a form of SSIM, multi-scale SSIM, analyzes images over multiple scales and regions of sampling (scale space and scale-invariant feature transforms [see Section 11.4]).
Annotated Images, Video, Web Sites, and References 3.1 Image Representation Incorporates Multiple Visual Elements to Tell a Story There are discussions of the importance of stories in science communication in the following: Jones, M.D. and Crow, D.A. 2017. How can we use the “science of stories” to produce persuasive scientific stories? Palgrave Communications 3: 53. https://doi.org/10.1057/s41599-017-0047-7. Olson, R. 2015. Houston, We Have a Narrative: Why Science Needs Story. University of Chicago Press, Chicago, IL. Dahlstrom, M.F. 2014. Using narratives and storytelling to communicate science with nonexpert audiences. PNAS 111(suppl) 4): 13614–13620 https://doi.org/10.1073/pnas.1320645111. There are also arguments against storytelling. Katz, Y. 2013. Against storytelling of scientific results. Nat Methods 10: 1045. Some of the discussion in the above references revolves around what constitutes a story and the advantages and disadvantages of various narrative devices in science communication. In this chapter, the device is visual, and the story can be as simple as a declarative statement such as the summary statements that come first in figure legends in Science and Nature. The concept of visual confections, with more examples, is in E.R. Tufte, 1997. Visual confections: juxtaposition from the ocean of the streams of story. In Visual Explanations. Graphics Press, Cheshire, CT. pp. 121–151. The design elements for good images can be found throughout the series of books authored by Tufte, which, besides Visual Explanations, include The Visual Display of Quantitative Information (1983, 2001), Envisioning Information (1990), and Beautiful Evidence (2006), all published by Graphics Press, Cheshire, CT. They are excellent.
3.2 Illustrated Confections Combine the Accuracy of a Typical Specimen with a Science Story A discussion of using adjacently mapped images, such as in bird field guides, for explanatory image analysis is in Tufte, E.R. 2006. Beautiful Evidence. Graphics Press, Cheshire, CT. p. 45. He makes the distinction between explanatory presentations and exploratory presentations in which such mapping might divert viewers who are trying to see the image with fresh eyes. He also points out that a measurement scale in these field guides would be welcome. The historical distinction between a type specimen (an individual representative specimen) and a typical specimen (a representation of the true ideal type of a species) is in Daston, L. and Galison, P. 2007. Objectivity. Zone Books, Brooklyn, NY. pp.105–113. This book also has extensive analysis of what constitutes an objective record for scientific research and how this has changed over the years, from illustrated records to photographic records. It is a comprehensive treatment of the history of scientific imaging and data analysis. Tufte’s analysis of the cladogram is in Tufte, E.R. 2006. Beautiful Evidence. Graphics Press, Cheshire, CT. p. 74.
3.3 Digital Confections Combine the Accuracy of Photography with a Science Story For approaches to digitization of collections, see, for example, Short, A.E.Z., Dikow, T., and Moreau, C.S. 2018. Entomological collections in the age of big data. Annual Review of Entomology 63: 513–530.
Annotated Images, Video, Web Sites, and References
Work on botanical symbols is in Simpson, N. 2010. Botanical symbols: a new symbol set for new images. Botanical Journal of the Linnaean Society 162: 117–129. Work on color standards is in: Simpson N. 2009. Colour and contemporary digital botanical illustration. Journal of Optics and Laser Technology 43: 330–336. doi: 10.1016/j.optlastec.2008.12.014.
3.4 The Video Storyboard Is an Explicit Image Confection The “Fantastic Vesicle Traffic: video is on YouTube at http://www.youtube.com/watch?v=7sRZy9PgPvg. For a description of the art of storyboarding, see Chand, A. 2020. What is storyboarding for film? The Conversation. https://theconversation.com/explainer-what-is-storyboarding-for-film-131205. See https://seeingthewoods.org/2019/12/19/storytelling-and-storyboarding-science-the-global-science-film-festival for a summary of a recent meeting on this topic.
3.5 Artificial Intelligence Can Generate Photorealistic Images from Text Stories For an introduction to Dall · E 2, see https://openai.com/dall-e-2. The publication describing Dall · E 2 is Ramesh, A. Dhariwal, P., Nichol, A., et al. 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv. https://doi.org/10.48550/arXiv.2204.06125.
3.6 Making Images Believable: Show Representative Images and State the Acquisition Method Red blood cell crenation and rouleaux formation: Turgeon, M.L. 2005. Clinical Hematology: Theory and Procedures. Fourth Edition. Lippincott, Williams, and Wilkins, Baltimore, MD. pp. 101–105. Figure 3.11D–E: From a textbook attributing Photo Researchers Inc., now Getty Images.
3.7 Making Images Understood: Clearly Identify Regions of Interest With Suitable Framing, Labels, and Image Contrast For the originals in Figure 3.11, see https://wellcomecollection.org/works. Labels modify these images of a scanning electron micrograph of blood corpuscles in clot: Figure 3.11A and B: Wellcome B0000565. Figure 3.11C: Wellcome B0004829. Figure 3.11D: Wellcome B0004831. Figure 3.12: Modified from Wellcome B0000583. The Wellcome Trust in the United Kingdom has a yearly imaging competition similar to the Nikon Small World competition. There are two sites; the current winners are at https://wellcome.ac.uk/what-we-do/our-work/wellcome-photography-prize/2021. Keelan, B.W. 2002. Handbook of Image Quality. Marcel Decker, New York, NY. pp. 136–138 discusses contouring in the context of just noticeable differences. A discussion of white balance for microscopy using video chips and tubes is in Inoue, S. and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY. p. 377. E.R. Tufte describes how authors should design labels, in Visual elements that make a clear difference but no more. The smallest effective difference. In Visual Explanations. 1997. Graphics Press, Cheshire, CT. pp. 73–78.
3.8 Avoid Dequantification and Technical Artifacts While Not Hesitating to take the Picture E.R. Tufte’s works often argue against dequantification of scientific images. Of particular note is Chapter 1, Images and quantities. In Visual Explanations. 1997. Graphics Press, Cheshire, CT. pp. 13–27. Digital montaging using vector graphics, as opposed to mounting pictures and photographing the montage, diminishes the chances that there will be scratches on prints and negatives.
59
60
3 Representation and Evaluation of Image Data
3.9 Accurate, Reproducible Imaging Requires a Set of Rules and Guidelines The initial article in Journal of Cell Biology that laid out these rules is Rossner, M. and Yamada, K.M. 2004. What’s in a picture? The temptation of image manipulation. Journal of Cell Biology 166: 11–15. An update on the guidelines for image production is in Blatt, M. and Martin, C. 2013. Manipulation and misconduct in the handling of image data. Plant Physiology 163: 3–4. A detailed analysis of the software and its reproducible use is in Miura K. and Nørrelykke S.F. 2021. Reproducible image handling and analysis. The EMBO Journal 40: e105889. An important guide to accurately reporting image processing to make it reproducible is in Aaron J, and Chew TL. 2021. A guide to accurate reporting in digital image processing – can anyone reproduce your quantitative analysis? J Cell Sci 134(6): jcs254151.
3.10 The Structural Similarity Index Measure Quantifies Image Degradation Just-noticeable differences measure very subtle differences detected by the human visual system, as described by Cohn, T.E. and Lasley, D.J. 1986. Visual sensitivity. Annual Review of Psychology 37: 495–521. Establishing image quality standards and calibrating psychometrics is in Keelan, B.W. 2002. Handbook of Image Quality. Marcel Decker, New York, NY. The structural similarity index measure (SSIM) is discussed in the article Wang, Z., Bovik, A.C., Sheikh, H.R., and Simoncelli, E.P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4): 600–612. doi:10.1109/TIP.2003.819861. ISSN 1057-7149. Inventors of SSIM achieved prime time engineering Emmys in 2015 for their contribution to the video industry.
61
4 Image Capture by Eye 4.1 The Anatomy of the Eye Limits Its Spatial Resolution The lens of the eye inverts incoming light, which shines on the light-sensitive tissue at the back of the eye, the retina (Figure 4.1). The retina integrates the information and transmits it to the optic nerve. The optic chiasm receives the signal from the optic nerve and splits it so that the right hemisphere of the brain processes views from the left, and the left hemisphere processes views from our right. The nerves pass through the lateral geniculate nuclei (a knee-like bend; Figure 4.1A) and radiate to the primary visual cortex in the back of the head. The lateral geniculate nucleus is the region that integrates feedback from the primary visual cortex with incoming signals so that we can see what we “expect” to see. The crystalline lens of the eye has a much higher refractive index than the liquid (humor) in the vitreal chamber, thereby bending the light along the visual axis to focus on the spot called the fovea (see Figure 4.1B). The muscles surrounding the eye can adjust the focal length of the eye so that focus can be achieved, a process called accommodation. The spacing of light-sensitive cells, the rods and cones, in the retina (Figure 4.2) determines ultimate visual resolution. The cones in the retina sense bright colors, providing photopic vision, while the rods sense dim, mostly green, light, providing scotopic vision (Figure 4.3).
Figure 4.1 Anatomy of the eye and its connection to the brain. (A) Frozen cross-section of the eye and brain showing the perception of objects to our right, sensed and transmitted by green-tinted tissue, is in the left brain and objects to our left, sensed and transmitted by red-shaded tissue, in the right brain. From Braune W et al. 1888 / Leipzig : Verlag von Veit & Comp. (B) Cross-section of the human eye showing the optic and visual axes. The gradient of numbers and spacing of cones and rods on the retina is shown by the color gradients indicated. n = refractive index. Diagram by L. Griffing. Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
62
4 Image Capture by Eye
The fovea (see Figure 4.1B) is the most sensitive region of the retina. The center-to-center spacing of the outer segments of the retinal cones in the fovea is about 2.5 μm. The spacing there is closer than in the rest of the retina because there are no intervening rods, whereas just outside the fovea, the number of rods peaks and then diminishes (orange-toyellow gradient; see Figure 4.1B). Hence, the resolving power in the fovea is higher than that outside of the fovea. For the cones in the fovea to resolve two points of light, there would have to be an intervening unilluminated cone between the two illuminated cones. Therefore, the smallest resolvable spacing is 5 μm (the distance across three cones, with the unilluminated one in the center). The lens of the eye is a reducing lens. It reduces an object held at near point (see Section 1.2), about 25 cm, by about 10-fold. At near point, the eye could theoretically resolve two points about 50 μm (10 times the resolvable spacing) apart. This is the value for the fovea. Outside of the fovea, the cone spacing diminishes, and the numbers of obscuring blood vessels and cell nuclei increase. This increases the value of Figure 4.2 Cross-section of the primate retina outside the the average spacing for three cones to 7.5 μm. Like all other fovea where there are more rods than cones. Light has to be image-sensing devices, the resolution of the eye is subject to transmitted through the ganglion and bipolar cells before the Nyquist criterion (see Section 1.5), which dictates that being collected by the rods and cones. The different colors of the pixels of the eye, the rods and cones, need to oversample the rods and cones indicate their differing sensitivity to different wavelengths of light. Diagram by L. Griffing. by twofold the spatial frequency of the incoming image. Therefore, a general value for the resolution of the eye is 10 (reducing power of lens) × 7.5 μm (spacing of photo sensors) × 2 (Nyquist criterion) = 150 μm. Another way of looking at resolution of the eye is with the contrast transfer function (Figure 4.4), the standard method of describing the contrast and resolution of an image acquisition system (see Section 8.9). It plots the detectable contrast of an object against resolving power of the system. This graph shows the trade-off between contrast and resolution: as resolving power increases, contrast declines. The contrast transfer function of the eye falls to zero or has a cut-off frequency of 20 lines/mm, a resolution of 50 μm (see Figure 4.3 The dynamic range of the human eye as measured Figure 4.4). However, the eye only achieves acceptable conby photometric light intensities visible to the eye in foot trast of 85% at resolving powers lower than 6–7 lines/mm, or candles and lux and by ambient light. Adapted from Inoue, S. a resolution of about 150 μm. and Spring, K. 1997. Video Microscopy. Second Edition. Plenum This contrast transfer function assumes only one diamPress, New York, NY. p. 167. eter of the pupil of the iris. When the pupil is larger, the eye resolves smaller objects than when the pupil is smaller. This is because the size of the pupil determines how well the eye collects light; in other words, it determines the aperture of the eye. The aperture of an imaging system limits resolution because larger apertures collect higher orders of diffracted light, the light scattered by an edge (see Sections 5.14 and 8.4). Other resolution limitations are the optical aberrations, or the optical errors of the eye that produce different types of blur (e.g., astigmatism; see Section 8.11), produced by more curvature across one axis of the lens than another (see Section 4.5). As with other lens systems, as the pupil gets larger, the eye collects more light and sees smaller things, but it uses a larger part of the lens, so any deformations of the lens that could contribute to aberration begin to influence resolution. No wonder doctors dilate your pupils before looking for aberrations!
4.3 The Absorption Characteristics of Photopigments of the Eye Determines Its Wavelength Sensitivity
Figure 4.4 The contrast transfer function of the human eye plotting image contrast versus the log scale of the spatial frequency of the incoming image. The absolute cut-off frequency (highest detectable frequency, smallest detectable object) is a resolving power of about 20 lines per mm or a resolution of about 50 μm. However, acceptable contrast (about 80%) is only achieved at a resolving power of about 7 lines per millimeter, or 142 μm. Adapted from Williams, J.B. 1990. Image Clarity: High Resolution Photography. Focal Press, Boston, MA. pp. 39–46.
4.2 The Dynamic Range of the Eye Exceeds 11 Orders of Magnitude of Light Intensity, and Intrascene Dynamic Range Is about 3 Orders The dynamic range of the imaging system is the range of intensity detected by an imaging system, from minimum detectable light intensities to saturating light intensities. The eye has a very high dynamic range (see Figure 4.3). In high light intensities, it cuts out bright lights with the iris, decreasing pupil diameter from 8 to 2 mm. On a bright day at the beach, to read text from a white sheet of paper, it may take a while for the eyes to adjust their pupil size adequately. At very low light, scotopic vision has an adaptation response, whereby the eye can adapt to darkness so well that it can detect only one photon. Adaptation in the dark or in red or far-red light that the rods do not detect (see Figure 4.8) takes a several seconds. Without adaptations or irises, the eye has an intrascene dynamic range. This is the minimum-to-maximum intensity detected by the eye within a scene. It is important to realize that this is only about 3 orders of magnitude of intensity, rather than 11, with the ability to only distinguish about 60 levels of intensity or shades of gray (see Section 2.1). Hence, when changing from brightfield microscopy to fluorescence or darkfield microscopy, the eye needs time to readapt because without this time, features in the fluorescence or darkfield image will be invisible. During this process, not only does the iris readjust, but also the primary visual set of dim light photosensors, the rods, have to take over from the cones.
4.3 The Absorption Characteristics of Photopigments of the Eye Determines Its Wavelength Sensitivity Before continuing our discussion of light in the eye, it is important to review a few fundamental physical properties of light. Wavelength, λ, is the distance or time it takes to get from one wave crest to the next (Figure 4.5). Amplitude, A0, is observable as intensity. The intensity is proportional to the amplitude squared. Measurement of amplitude or intensity is done with a variety of calibrated photosensors, some sensitive only to visible light, which measure in photometric units (e.g., foot candles and lux; see Figure 4.3), and others sensitive to many different wavelengths of electromagnetic radiation and use radiometric units (e.g., watts per square meter; see Section 9.2). When using wavelength to describe visible light, remember that for this purpose, wavelength is a distance in nanometers (nm) measured in a vacuum. As described later, the wavelength of light changes as it moves through media other than a vacuum. The mnemonic, used to remember the order of the colors, ROY G BIV (red, orange, yellow, green, blue, indigo, and violet), starts with longer, lower energy wavelengths, red (680 nm), and decreases in 50-nm steps to the short, high-energy, wavelengths, violet (380 nm) (Figure 4.6). The sensitivity of the eye to different wavelengths of light is limited to the region of the spectrum approximately between 375 nm (violet) and 740 nm (far red). These colors are a combination of the three primary colors, red, green, and blue. Young (1802) and Helmholtz (1866) suggested that three forms of color receptors combine their signals to form the colors perceived by the human visual system. In 1931, the Commission Internationale de l’Eclairage (CIE) developed an
63
64
4 Image Capture by Eye
Figure 4.5 The wavelength (λ) of light is the distance from one crest of the wave to the next. The amplitude (A0) of light is shown as the height of the crest. Diagram by L. Griffing.
Figure 4.6 The wavelength of visible light is a small region in the electromagnetic spectrum. The visible wavelengths vary between 380 and 740 nm. You can use the ROY G BIV (red, orange, yellow, green, blue, indigo, and violet) mnemonic to remember the order of the colors and make it quantitative by starting at 680 and using a 50-nm interval between the colors. Note that energy increases as wavelength decreases. Diagram by L. Griffing.
international chromaticity standard based on the theories of Young and Helmholtz, which weight the three primary colors linearly to give the color space shown in Figure 4.7. The names for colors in the CIE chart differ slightly from the ROY G BIV designation (see Figure 4.6). Orange in ROYGBIV would be on the boundary of reddish orange and red in the CIE chart. In addition, the CIE chart does not use the terms indigo and violet. The eye can distinguish about 60 shades of gray within a scene under favorable conditions (see Section 2.1). Color gives much more information in a scene, but the scene does need to be well-illuminated (see Figure 4.3) to lie within the dynamic range of the cones. In an extremely satisfying confirmation of theory, physiologists found, about a century after Helmholtz, that there are three different cone populations, with absorption maxima in blue, green, and red (Figure 4.8). The rods have peak sensitivity in the blue-green region of the spectrum, at 496 nm. The insensitivity of rods to red light means that when the eyes are dark adapted to dim light, as when viewing dim objects through a telescope or microscope, red light is useful for reading star charts or notes without influencing the dark adaptation of the rods. The perception of color as bright or dim
4.3 The Absorption Characteristics of Photopigments of the Eye Determines Its Wavelength Sensitivity
Figure 4.7 Diagram based on 1976 Commission Internationale de l’Eclairage chromaticity space showing the different colors perceived by the human eye based on a combination of three primary colors, red, green, and blue. Adapted by L. Griffing from Photo Research image in Inoue, S. 1986. Video Microscopy. Plenum Press, New York, NY. p. 89.
Figure 4.8 Wavelength absorbance of the rods (scotopic) and cones (photopic) in the eye. The rods have less color sensitivity, being able to sense blue well, but sensing yellow, orange, and red more poorly than the cones. Hence, much of our color vision comes from the cones. Adapted by L. Griffing from Neitz, J. and Neitz, M. 2011. The genetics of normal and defective color vision. Vision Research 51:633 and Dartnall, H., Bowmaker, J., and Mollon, J. 1983. Human visual pigments: microspectophotometric results from the eyes of seven persons. Proceedings of the Royal Society. B Series, Biological Sciences 220: 115.
65
66
4 Image Capture by Eye
reflects our ability to detect it. Because we sense light in red and blue less than in yellow and green, intense red and blue colors, when translated to grayscale, are often darker than intense green or yellow. This quality of color brightness is its luminance. Because different colors have difference luminance values, indexing color to luminance can make a fair approximation of color in 8-bit indexed color images. Color sensing starts with absorption of light by the pigment, retinal, covalently linked to the protein, opsin, to produce the cone pigments, iodopsins, and the rod pigment, rhodopsin. Together the iodopsins have a light absorption spectrum that closely matches the wavelength sensitivity of the eye (see Figure 4.8). The long wavelength, L, or red, iodopsins differ in gene sequence from the medium wavelength, M, or green, iodopsins, giving rise to their different absorption spectra. The differing gene sequences of fluorescence reporter proteins mediate similar changes in absorption spectra (see Section 17.2). Color blindness is a sex-linked genetic disease affecting one in every 10 males. Protanopic color blindness occurs because the L-cone pigments are absent or only partially functional. The remaining cones produce a yellow-blue spectrum (Figure 4.9). Deuteranopic color blindness occurs when M-cone pigments are absent or non-functional. In tritanopia, the S-cones are completely missing, and only the L and M ones exist. Hence, before publishing figures with color gradients or color contrast, authors of scientific publications should check the contrast of the figure using color-blind settings or plugins now available (see Figure 4.9). The CIE graph see (see Figure 4.7) shows the full range, or gamut, of wavelengths of light detected by the eye. However, most color displays and printing technologies are incapable of showing all of these colors. Therefore, the gamut of colors in a display only makes up a section of the CIE graph (Figure 4.10). The standard red, green, blue (sRGB) color gamut is the 1996 standard developed by Microsoft and Hewlett-Packard based on television displays (HDTV) and is the standard for displays on the internet. For printing, the Adobe RGB (aRGB) standard extends the sRGB in the green. The color gamut of the final output device (web, printing, e-book) produces other standards that are only a subset of the CIE standards.
Figure 4.9 Diagram in Figure 4.7 reproduced using the protanopic setting in Photoshop (View > Proof setup > Colorblindness – Protanopic type). Plugins for Photoshop and ImageJ are also available for free download at www.vischeck.com.
4.4 Refraction and Reflection Determine the Optical Properties of Materials
Figure 4.10 Color gamut of commonly used computer display color standards, standard red, green, blue (sRGB) and AdobeRGB (aRGB). The main difference is the extended green primary of aRGB. Replot of color gamuts on top of Adoniscik / Wikipedia / Public Domain.
4.4 Refraction and Reflection Determine the Optical Properties of Materials Light goes through several changes in angle and speed as it traverses the eye. It does so because the eye is made of many different transparent materials (see Figure 4.1B). The speed of light in a vacuum is a common physical constant, 299,792 km/s (metric), or 186,282 miles/s (English). How much different materials slow the light, compared with its speed in a vacuum, is the refractive index of the material. Table 4.1 shows the refractive indices of a variety of common media using in biological imaging. Frequency, ν, is the number of wave cycles per second or speed/wavelength. Frequency, unlike the speed and wavelength of light, does not change in different media and is therefore probably a better way to describe a certain color of light than the more common wavelength number. Given that frequency stays constant, it is easy to see why slowing light decreases its wavelength (Figure 4.11). When light travels through a medium of higher refractive index, the optical path length of the ray of light increases because there is an increased number of wavelengths through the more refractive medium (see Figure 4.11). As light emerges from a material, its phase, or position of the wave in space, may be different. Lasers produce beams of in-phase, or coherent, light. For example, in Figure 4.11, traversing glass produces a half wavelength of optical path length difference (OPD, in red) with the light emerging from the oil. If these waves interact through interference (Figure 4.12) and the crest of one wave matches the trough of another, it reduces the resulting amplitude of the light, a process called destructive interference. If, on the other hand, the light emerging from oil combined with the light just passing through water, these waves produce
67
68
4 Image Capture by Eye
Table 4.1 The Refractive Index Properties of Several Materials. Medium
Refractive Index, na
Vacuum
1.0
Air
1.000277
Water
1.333
Water + 1% protein
1.3348
Water + 2% protein
1.3362
Glycerin
1.4730
Oil
1.515
Optical glass
1.45–1.95
Diamond
2.42
refractive index, n medium = speed of light vacuum/speed of light medium.
Figure 4.11 The slowing of light when it enters a medium of greater refractive index produces an increase in the optical path of the light, indicated in red as the optical path difference (OPD). In oil, a complete wave cycle is added to the optical path length (red) compared with the path in water along. In glass, the light has added a half wavelength to the oil-traversing beam and 1 1/2 wavelengths compared with the water-traversing beam. Consequently, when it emerges, it is out of phase with these beams and would destructively interfere (Figure 4.12) with them. Diagram by L. Griffing.
Figure 4.12 Constructive versus destructive interference. Interfering wave trains of light that differ by a whole integer multiple of the wavelength show maximal constructive interference. Wave trains that differ by a multiple of 1/2 destructively interfere. Diagram by L. Griffing.
4.6 Neural Feedback in the Brain Dictates Temporal Resolution of the Eye
Figure 4.13 (A) The bending of light, or refraction, that occurs when light moves from a medium with refractive index n1 into a medium of refractive index n2. The angle of incidence = i. The angle of refraction = r. Snell’s law states that the sine of i over the sine of r is equal to the ratio of n2 to n1. (B) If a beam of light shines along the surface of the interface between two different refractive indices (white arrow, angle of incidence = i, or 90 degrees), then it is bent by a critical angle, ac. If light comes in from the other side at the critical angle, it propagates along the interface (grey arrow). If light comes in from the other side at a more oblique angle than the critical angle (black arrow), there is nowhere for it to go in the second medium, and it reflects back by total internal reflection. Diagram by L. Griffing.
constructive interference and become brighter. This is the basis for contrast generation in several types of microscopy (see Sections 16.7–16.11), but in the eye, the beams entering the eye travel through the same media on the way to the retina, so phase differences do not come into play. What does come into play is the bending of light caused by refraction. Changing the wavelength of a ray of light bends it; stretching a part of a spring has the same effect. Consequently, as light travels through media of differing refractive indices, it gets bent. The amount of bending depends on the ratio of the refractive indices of the two media, Snell’s law (Figure 4.13A). An important consequence is that when light hits a refractive index interface at an angle that is greater than a critical angle, the interface acts like a mirror, producing total internal reflection (black arrows in Figure 4.13B). With total internal reflection, light beams can move long distances in glass fibers because they travel through the fiber at an angle whereby the edges of the fiber become mirrors.
4.5 Movement of Light Through the Eye Depends on the Refractive Index and Thickness of the Lens, the Vitreous Humor, and Other Components Returning to the path of light through the eye (see Figure 4.1B), light bends most as it enters the eye from air. Inside the eye, light bends most when entering and leaving the lens. The amount of refraction and curvature of the lens dictate the focus of the lens (see Section 8.2). The difference in the combined thickness and refractive index of the lens crystallite and the vitreous humor (see Figure 4.1) produces the visual axis of the eye. Hence, changing the thickness of the lens or the cornea changes the focus of light on the retina and fovea. Small changes in the curvature of the lens can produce large results, making minor laser surgery on the eye quite a feasible operation. Sometimes there is uneven curvature, thickness, or refractive index of the lens so that the light from one axis (e.g., horizontal) comes to focus in a different region of the eye than light from another axis (e.g., vertical). This is astigmatism, whereby the lens does not bend light equally in all axes of view.
4.6 Neural Feedback in the Brain Dictates Temporal Resolution of the Eye The brain handles visual perception of color, form, and texture differently from movement. The apparent movement of one or two objects, displaced and sequentially shown to the viewer at different rates, fall into two categories, long-range movements and short-range movements. Magno cells detect short-range (near-contiguous) movements. Magno cells reside in the lateral geniculate nucleus (see Figure 4.1A). Parvo cells detect color, form, and texture.
69
70
4 Image Capture by Eye
It is unclear whether a series of short-range movements differs neurophysiologically from continuous movement. Hence, as far as the brain is concerned, for video and movies, a sequence of images that looks continuous is continuous. A more precise biological restatement of “the hand is quicker than the eye” is “the hand is quicker than the lateral geniculate nucleus.” To appear in motion, objects showing small displacements in sequential stills require a playback frequency of 20–30 frames per second (see Section 12.1). Most commercial movies play at 24 frames per second. Television broadcasts occur at 25 frames per second in Europe and 30 frames per second in the United States (see Sections 5.7–5.9). For larger-range motion, the lateral geniculate nucleus is also important, but it is unclear specifically where the signal is processed. If there are bright light flashes in a dark room, the action begins to look jerky. This is the special effect achieved in certain party or dance lighting. However, as the frequency of the light flashes increases, then the motion looks continuous. The jerkiness is most severe around 10 flashes per second, a flash rate of 10 Hz (Hertz, or cycles per second). This can produce epileptic fits in some people. The annoying sensation of fluttering light, or flicker, can occur up into the tens of Hertz, particularly with bright light. The frequency at which things begin to look continuous is the critical flicker frequency. Knowing the critical flicker frequency for display devices is important because the sensation of flicker is very annoying. As light becomes brighter, the critical flicker frequency increases. Hence, the minimum frequency for most desktop monitors is around 60 Hz.
4.7 We Sense Size and Distribution in Large Spaces Using the Rules of Perspective Perspective is a sense of distance achieved when viewing a scene. It is different from the sense of distance achieved by the parallax of binocular vision, described in the next section. Hence, perspective uses monocular depth cues. Light coming from the scene converges at the viewpoint, the position of the eye or the camera. In mathematical terms, an image with perspective is a perspective projection. It maps the image to a plane, and all the rays of converge to the viewpoint. As the angle of view changes, perspective projection changes. This is different from other kinds of projections, such as the orthographic projection, in which the rays of light do not converge when mapped to the projection and are parallel to each other (Figure 4.14). As the angle of view changes, the orthographic projection does not change. Because the rays of light converge in perspective, there is a vanishing point, where point objects vanish (i.e., become too small to see). Because light travels in straight lines, all straight lines remain straight (or as points when viewed end-on), and the lines in a scene with perspective remain either parallel or intersect at the vanishing point. The contextual size of objects gives the impression of distance. If objects uniformly increase in size, they appear closer. The angle of view of the scene determines the horizon on the image. Low horizons give high angles of view, and high horizons give low angles of view. Planar figures foreshorten in the direction of their angle from the observer. Circular objects appear as ellipses, depending on their angle from the observer. In both perspective and orthographic projections, closer objects occlude farther objects. In addition, the portion of the object closest to a light source is brighter. In views of three-dimensional (3D) worlds using computer graphics (see Section 13.1),
Figure 4.14 The difference between the convergence of the projection rays in perspective versus orthographic projection. In perspective projection, all the rays converge to a single viewpoint, whereas with orthographic projection, the rays do not intersect each other. The front and back clipping planes set the limits of view for three-dimensional visualization programs. Diagram by L. Griffing.
4.8 Three-Dimensional Representation Depends on Eye Focus from Different Angles
there are clipping planes (see Figure 4.14). The rear clipping plane represents the limit of view behind the object of interest, while the front clipping plane marks the boundary of view in front of the object.
4.8 Three-Dimensional Representation Depends on Eye Focus from Different Angles The resting point of accommodation is the distance at which the eye focuses when there is nothing to look at, about 32 inches (81 cm). Viewing computer monitors or other devices at distances closer than this can produce eyestrain. As people age, this strain gets worse. The near point of accommodation, or the closest distance at which an object can be focused, changes from about 3 inches (8 cm) in 15-year-olds to 10 inches (25 cm) for 40-year-olds to 40 inches (1 m) for 60-year-olds. The resting point of vergence is the distance at which a completely relaxed eye can bring an image to focus on the same point in the retinas of both eyes. This is about 40 inches (1 m). Vergence actually has a greater influence on eyestrain than accommodation because even if people have an accommodation near point of 20 inches (50 cm), they still have less eyestrain when viewing a monitor at 40 inches (1 m) than at 20 inches (50 cm). The relative height of a monitor also makes a difference. The resting point of vergence for 30-degree upward tilt of the head is about 55 inches (140 cm). For a 30-degree downward tilt, it is about 35 inches (89 cm). People tend to prefer to look down on their computers! The point of vergence presents a problem when designing 3D displays. 3D displays can project the slightly differing views of an object from each eye, or the parallax generated from the eye angle (Figure 4.15), which is the angular difference between the eyes. Some 3D cameras have two lens systems and camera chips that act essentially like the two eyes, separated by a short distance and with a slightly different angle of view. These cameras are not common in consumer electronics. Although they do exist, many 3D consumer cameras just use one imaging lens and chip, and the camera moves, producing 3D structure from motion (see Section 13.3). More professional-grade 3D cameras capture scenes primarily for entertainment, virtual tours, and urban design. These can incorporate photogrammetric tools and infrared range finders that provide mm resolution. Photogrammetry is the use of image capture, often in 3D, to measure objects. Two cameras separated by the eye angle produce images that, when slightly offset from each other, generate a 3D image if the right eye sees primarily right image, and the left eye sees the left image. These are anaglyphs. They achieve this with a red/green or red/blue tint (Figure 4.16), and the observer uses red/green or red/blue tinted glasses. Anaglyphs also use differentially polarized light, with the observer wearing polarized light glasses, or lenticular displays that project the
Figure 4.15 Three-dimensional (3D) objects are focused through accommodation at the point of vergence, which determines the eye angle. 3D projection screens close to the eye can result in a shortened focal distance that causes the 3D object to become blurred. Separated 3D projection screens provide separate images to each eye and are usually attached to headgear. Figure by L. Griffing.
71
72
4 Image Capture by Eye
two images at slightly different angles and do not require glasses. Polarized light anaglyphs work quite successfully for 3D movies in theaters in which the image is far away. However, when eyes focus on near display devices, like computer screens, the objects blur because the eye focuses on the screen and not the virtual image (see Figure 4.15). Another more hardware-intensive method is to use a split-screen display, one for each eye (see Figure 4.15). Although this was popular way to view 3D stills in the early 20th century with devices called stereopticons, 21st century 3D headgear provides a more complete 3D experience, virtual reality (VR), with split-screen technology from stand-alone computers such as cell phones. However, VR is memory and computationally intensive, often requiring linkage to another computer for more complex image rendering. Parallax also generates our stereoscopic acuity, the Figure 4.16 Red/blue anaglyph of a pot of the model plant, ability to resolve small distances along the line of sight. Arabidopsis, constructed from two photographs taken with a single camera at slightly different angles and tilt. Photos by L. Griffing. As described earlier, the lateral resolution of the eye is about 150 μm or 1 minute of an arc. Stereoscopic acuity is quite a bit better than that: about 10 seconds of an arc under optimal illumination. Hence, our axial resolution (resolution along the z-axis) is better than our lateral resolution (resolution along the x- and y-axes)! A final, more sophisticated 3D display uses holography, which records an image of an object with coherent, in phase, beams of light usually from a laser. The recording can be a photosensitive transparency (film negative) or a spatial light modulator (SLM). For electronic and digital devices, it is an SLM. An electronically addressable SLM works by changing the phase of reflected light when on and off (Figure 4.17) by reorienting the optic axis of liquid crystals in the SLM. In an optically addressable SLM, light itself can reorient the liquid crystal. This way, an SLM can act as a light recorder (Figure 4.18A) as well as an electronically-addressable display component (Figures 4.18D and E). In holography, the SLM works in a manner similar to the transparency in an overhead projector, whereby a focusing lens generates an image of the transparency on a projection screen (Figure 4.18B). Only in this case, an optically addressed
Figure 4.17 An electronically addressable spatial light modulator In the off state, the input and reflected output beams are out of phase. In the on state, the input and reflected light are in phase. This takes place because there is a change in the refractive index of the liquid crystal when a charge is applied, and the optic axes of the molecules re-orient. Adapted by L. Griffing from J. Bertolotti,https:// commons.wikimedia.org/wiki/File:Liquid_Crystal_based_Spatial_Light_Modulator.gif.
4.8 Three-Dimensional Representation Depends on Eye Focus from Different Angles
Figure 4.18 Holography. (A) A recording device such as a film negative or an optically addressable spatial light modulator (SLM) records the image of the object using a coherent light source without a lens. (B) By illuminating the SLM with the same coherent light source, a three-dimensional (3D) image, or hologram, comes to focus with a lens. (C) Lensless holography starts with an image formed by the interference of a reference beam and the reflected image of the object being recorded by an SLM. (D) Viewing through the SLM (if the rear mirror is partially transparent) illuminated with the same coherent light source produces a virtual 3D image of the object. (E) Computer-generated holography also uses a coherent light source to illuminate an SLM that can then produce an image of the object with diffracted wavefront optics. Diagram by L. Griffing.
(light sensitive) SLM records the phase change relations of the otherwise coherent (in phase) illuminating beam. Illuminated with the same coherent light, the recorded phase changes of the SLM interfere when focused in holographic 3D space to produce a 3D image, or hologram. For a lensless hologram, a beam splitter (a partially silvered mirror) splits the coherent beam into an object beam and a reference beam. The reflected object and reference beams interfere on the recording SLM (Figure 4.18C). In Figure 4.18D, the same coherent beam illuminates the semi-transparent SLM that captured the interference. When viewed from the opposite side by the observer, a virtual hologram appears. Really quite magical. For computer-generated holography (see Figure 4.18E), a coherent (laser) wave front illuminates the SLM, and wave-front optics project the hologram. These devices exist and may come to the consumer electronics market, depending on the demand. SLM optics are important components in deep-penetration two-photon or multiphoton fluorescence microscopies (see Section 17.12).
73
74
4 Image Capture by Eye
4.9 Binocular Vision Relaxes the Eye and Provides a Three-Dimensional View in Stereomicroscopes Many imaging systems set up for visual observation are binocular, allowing the use of both eyes. However, many students, when viewing through binocular microscopes, only use one eye. This is usually the result of prior experience, in which trying to look through two eyepieces, or oculars, has been difficult because (1) the interpupillary distance of the two oculars has been maladjusted and (2) the distance of the eye from the oculars is not correct. Often this is the result of having to look through poorly adjusted or cheap binoculars or microscopes. A quick and easy way to overcome these problems (for both cheap and expensive binocular systems) is to use a tissue or white sheet of paper to see where the light of the ocular focuses to its smallest disk (the Ramsden disk). The distance behind the ocular, where this disk of light appears, is where your eyes (and pupils) should go (see Section 9.9). Using both eyes helps relax the eyes because having one eye shut increases the chance of tension headaches and decreases the ability to see through the open eye, which then is in a squint. Using both eyes is absolutely required to take advantage of a stereomicroscope. Stereomicroscopes are set up so that one eye views the object from a slightly different angle of the other, producing the parallax for un-aided 3D vision. Stereomicroscopes are not simply low-magnification (2–250×) systems. A good macro lens on a camera can do as much. They provide the depth perception required for manipulating or dissecting objects in 3D space. Hence, they are sometimes called dissecting microscopes. Stereomicroscopes are essentially two compound microscopes, one for each eye. There are two configurations for this lens combination: the Greenough design and the Common Main Objective (CMO) lens system (Figure 4.19). The Greenough stereomicroscope design has two advantages: the small lenses in the nosepiece necessitated by the microscope design are cheap to make, and the compact nosepiece can get into confined spaces. The main disadvantage is that focus is not sharp in the peripheral regions of the specimen. It is only sharp parallel to the axis of tilt for each objective and that axis differs for the two eyes.
Figure 4.19 Designs of the stereomicroscope. Left is the Greenough design, and right is the Common Main Objective (CMO) design. Coma is comatic aberration (see Section 9.11. Diagram by L. Griffing adapted from Nothnagel, P. et al. https://www.microscopyu.com/ techniques/stereomicroscopy/introduction-to-stereomicroscopy.
Annotated Images, Video, Web Sites, and References
Figure 4.20 Stereomicroscopy images of the preserved embryos of Molossus rufus, the black mastiff bat. These images formed part of an embryonic staging system for this species. Dr. Dorit Hockman entered the images into the 2012 Nikon Small World Competition. According to Dr. Hockman, “The limbs start off positioned alongside the sides of the embryo, and then eventually get tucked in under the chin as they grow, which is how they are seen in the first image. As the limbs get bigger they move upwards covering the embryo’s face in the ‘peek-a-boo’ type pose seen in the second and third images.” They are in “hear no evil, speak no evil, and see no evil” postures. Pretty cute. Used with permission. Compare with Figure 14.1,ultrasonography of bat fetuses.
The main advantage of the CMO stereomicroscope is that the image is in focus over the entire field of view. It also can be infinity corrected. Infinity correction requires the incorporation of a tube lens in the optical path of the objective to generate an image. The objective alone will not form an image. The advantage of infinity correction is that the magnification does not change even when varying the distance between the camera and the objective (see Section 8.5). This makes it possible to put in other optics (e.g., fluorescence optics) without changing magnification. The CMO stereomicroscope is the most popular stereomicroscope design, producing different ranges of magnification with a simple sliding diopter adjustment on the objective. A design disadvantage is that light coming from the object travels through the edge of the objective, requiring special aberration corrections in the lens. Stereomicroscopes require good illumination. The devices for epi-illumination, or illumination from above the sample, range from focused fiber optics to light sources mounted on booms for use during surgery. Epi-illumination, when provided from the side, provides high surface and texture contrast (see Section 9.10). Bright illumination from the side, when photographed against a black matte-finish plate on a stand, produces a darkfield image (Figure 4.20). Ring illuminators mount in a ring around the objective, providing bright uniform illumination against a white polished plate. Trans-illumination, or lighting from below the sample, provides spot lighting or diffuse lighting for the sample mounted on a glass plate. Using stereomicroscopes for photography does not generally produce a 3D image. The camera usually receives light from just one pathway to one of the eyes. Nonetheless, it is an excellent tool for intermediate magnifications (see Figure 4.20).
Annotated Images, Video, Web Sites, and References 4.1 The Anatomy of the Eye Limits Its Spatial Resolution Figure 4.1B. The gradient of rods and cones comes from Osterberg, G. 1935. Topography of the layer of Rods and Cones in the Human Retina (dissertation), Levin and Munksgaard, Copenhagen. The anatomy of the eye is from Polyak, S. 1957. The Vertebrate Visual System. University of Chicago Press, Chicago, IL. p. 208. Figure 4.2. The organization of the cell types is from Polyak, S. 1957. The Vertebrate Visual System. University of Chicago Press, Chicago, IL. p. 254. More information on contrast transfer functions of the eye is in Campbell, F.W. and Gubitsch, R. W. 1966. Optical quality of the human eye. Journal of Physiology 186: 558–578.
75
76
4 Image Capture by Eye
4.2 The Dynamic Range of the Eye Exceeds 1 Orders of Magnitude of Light Intensity, and Intrascene Dynamic Range Is About 3 Orders A more complete discussion of adaptation is in Clayton, R. 1971. Light and Living Matter: A Guide to the Study of Photobiology. Volume 2: The Biological Part. McGraw-Hill, New York, NY. p. 105.
4.3 The Eye Determines Its Wavelength Sensitivity Helmholtz, H. 1866. Handbuch der Physiologischen Optik. Voss, Leipzig, Germany. Young, T. 1802. On the theory of light and colours. Philosophical Transactions 92: 12–48. The genetics of iodopsin gene sequence difference to make different cone photopigments and those genetic differences that give rise to color blindness are discussed at length in Neitz, J and Neitz, M. 2011. The genetics of normal and defective color vision. Vision Research 51: 633–651. Figure 4.8 uses some of the information in Figure 3 in that article. It is also adapted from Dartnall, H., Bowmaker, J., and Mollon, J. 1983. Human visual pigments: microspectrophotometric results from the eyes of seven persons. Proceedings of the Royal Society. B Series, Biological Sciences 220: 115–130. Figure 4.10 is a replot of color gamuts using the 1976 CIE plot by Adoniscik from https://en.wikipedia.org/wiki/ Chromaticity#/media/File:CIE_1976_UCS.png.
4.4 Refraction and Reflection Determine the Optical Properties of Materials A basic coverage of physics of light as waves and optical path differences generated by different refractive indices is in Slayter, E. M. 1970. Optical Methods in Biology. Robert E. Krieger Publishing Company, Huntington, NY. A tutorial on wavelength and color is here: Parry-Hill, M., Sutter, R.T., and Davidson, M.W. https://www.olympuslifescience.com/en/microscope-resource/primer/java/wavebasics. The text by Born, M. and Wolf, E., 1999. Principles of Optics. Seventh Edition. Cambridge University Press, New York, NY, covers reflection and refraction on pp. 38–39 and discusses the laws of refraction and reflection on pp. 132–141.
4.5 Movement of Light Through the Eye Depends on the Refractive Index and Thickness of the Lens, the Vitreous Humor, and Other Components See Campbell, F.W. and Gubisch, R.W. 1966. The optical quality of the human eye. Journal of Physiology 186: 558–578.
4.6 Neural Feedback in the Brain Dictates Temporal Resolution of the Eye The designation of the roles of parvo and mango cells are in Livingstone, M. and Hubel, D. 1988. Segregation of form, color, movement, and depth: anatomy, physiology, and perception. Science 240: 740–749. Further description of light intensities as they relate to critical flicker frequency are in Inoue, S. 1986. Video Microscopy. Plenum Press, New York, NY. pp. 71–92. A treatment debunking the older concepts of persistence of vision and phi effect, which many still use to describe the response of the eye and brain to motion, can be found in Anderson, J. and Anderson, B. 1993. The myth of persistence of vision revisited. Journal of Film and Video 45: 3–12.
4.7 We Sense Size and Distribution in Large Spaces Using the Rules of Perspective A treatment of computer graphics for 3D is in Schroder, W., Martin, K., and Lorensen, B. 2002. The Visualization Toolkit. Third Edition. Kitware Inc. pp. 44 and 234–237.
4.8 Three-Dimensional Representation Depends on Eye Focus from Different Angles More information on resting point of vergence and accommodation is in Anshel, J. 2005. Visual Ergonomics Handbook. CRC Press. Taylor and Francis Group, Boca Raton, FL, and Stroebel, L., and Zakia, R.D. 1993. The Focal Encyclopedia of Photography. Third Edition. Butterworth-Heinemann, Woburn, MA.
Annotated Images, Video, Web Sites, and References
A comparison of the different forms of computer graphic displays, including holography is in Reichelt, S., Häussler, R., Fütterer, G., and Leister, N. 2010. Depth cues in human visual perception and their realization in 3D displays. Proceedings of SPIE, Three-Dimensional Imaging, Visualization, and Display, 7690, 76900B. doi:10.1117/12.850094 The computer hardware required for holography are in Wang, Y., Dong, D., Christopher, P.J., et al. 2020. Hardware implementations of computer-generated holography: a review. Optical Engineering 59: 10241. doi:10.1117/1.OE.59.10.102413 and Slinger, C., Cameron, C., and Stanley, M. 2005. Computer-generated holography as a generic display technology. Computer 38: 46–53. doi:10.1109/MC.2004.260
4.9 Binocular Vision Relaxes the Eye and Provides a Three-Dimensional View in Stereoscopes A comparison of prices from 2005 and the advantage of infinity corrected optics on CMO stereomicroscopes is in Blackman, S. 2005. Stereo microscopes: still changing after all these years. The Scientist. https://www.the-scientist.com/technology/ stereo-microscopes-still-changing-after-all-these-years-48496. Nothnagel, P. Chambers, W., and Davidson, M.W. Introduction to stereomicroscopy. https://www.microscopyu.com/ techniques/stereomicroscopy/introduction-to-stereomicroscopy.
77
78
5 Image Capture with Digital Cameras 5.1 Digital Cameras are Everywhere The digital camera is an everyday accessory; one or more are typically on our smartphones; they are on our cars, in our buildings and streets; and they look down from the skies. They all turn photons into electrons. As they get smaller, we invent new technologies to improve their ability to detect and store electrons (e.g., backside illumination and improved quantum efficiencies). With miniaturization, the lenses and electrical read-out of cameras, which affect their light sensitivity, focus, and resolution, must change as well. This chapter provides an in-depth analysis of key components of the digital camera, focusing on those that most affect the images. Chapter 9 treats how to control tone and contrast with the lighting and lens settings.
5.2 Light Interacts with Silicon Chips to Produce Electrons Silicon can convert light to electrons. This has many applications. Solar panels are silicon sheets or thin layers of silicon optimized to produce electrons in response to light. The efficiency of light conversion to electrons, the quantum efficiency (Panel 5.1), is high in solar panels. They can produce electrons from low- (10 picowatts, 10–11 W) to high- (1 milliwatt, 10–3 W) intensity light sources. The electrons leave the solar cells, producing an electrical current that can charge batteries or power devices. How does silicon produce the electrical current? Silicon is a light-sensitive semiconductor. In the absence of light, it is an insulator, while in the presence of light, it is a conductor. However, the answer involves more than just silicon. Different atoms introduced into the silicon lattice change the electrical conductance of silicon. Figure 5.1 shows a layered silicon microchip. Silicon has four electrons in its outer shell (+4). The silicon in the bottom layer bonds with atoms having only three electrons in their outer shells (e.g., boron or gallium). The bond produces electron “holes.” It is the p-type semiconductor because it has a positive charge created by the electron holes. In the top layer silicon bonds with atoms having five electrons in their outer shells (e.g., phosphorous or arsenic). It is the n-type semiconductor because the extra electrons produce a negative charge. Adding these extra elements to the silicon chip is doping the silicon. The p-n junction semiconductor is a diode because current only flows in one direction. When the negative pole of a battery in the circuit connects to the n-layer and its positive pole to the p-layer, the forward bias arrangement, current flows through the p-n junction as the negative pole of the battery repels electrons in the n-layer, which move to the p-layer and to the positive pole of the battery that attracts them. However, reversing the polarity of the battery, making the reverse bias arrangement, its positive pole connects to the n-layer and its negative pole to p-layer. No current flows across the p-n junction because all the n-layer electrons go to the nearby positive pole and the holes go to the negative pole. However, shining a light on the p-n junction produces a current across it (see Figure 5.1). Light breaks silicon bonds in the p-layer, thereby creating electron holes and producing electrons, photoelectrons, which move through the n-layer to the positive pole of the battery. Electrons move in one direction when illuminated by light; hence, it is a photodiode. Photodiodes are useful electronic exposure meters or light meters for cameras. Most cameras have built-in exposure meters that measure light reflected from a surface. These are reflectance meters. A very precise form of reflectance Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
5.2 Light Interacts with Silicon Chips to Produce Electrons
Panel 5.1 The relationship between quantum efficiency, wavelength, and signal strength. Quantum efficiency = 100 * (e− out / photonsin )
(5.1)
in which e− out
= electrons out = amperes = coulombs / second = 6.2 × 1018 electrons / second
photonsin = Watts = Joules / second.
(5.2)
E = hc/λ * Avogadro's number
(5.3)
Wavelength, photons, and energy: The number of photons/Joule varies with wavelength.
At shorter wavelengths, there are more kilojoules (kJ) per mole of photons (380 nm; 315 kJ/mol photons; Figure 4.6) than at longer wavelengths (680 nm; 176 kJ/mol photons; Figure 4.6). Signal strength: S = I * QE * T
(5.4) The total signal, S, in electrons generated in a pixel is the product of intensity, I (in photons/sec), quantum efficiency, QE, 100*(electrons out/photons in), and integration time, T (sec).
Figure 5.1 Circuit dynamics in silicon p-n semiconductors. The tetravalent silicon has trivalent elements in the p layer (positive) and pentavalent elements in the n layer (negative). Adding extra elements is “doping” the silicon. In a circuit with a forward bias battery, the “extra” electron in the n layer goes to the p layer and then to the + pole of the battery. Current flows in one direction. It acts as a diode. In the reversebias battery configuration, no current flows from the p- to the n-layer unless the p-layer absorbs light, producing a photoelectron. Current flows from the p-type semiconductor to the n-type semiconductor as long as the light is on. In this configuration, it is a photodiode. Diagram by L. Griffing.
79
80
5 Image Capture with Digital Cameras
metering is spot metering, which monitors reflectance from small regions of the scene. This type of metering is important when taking pictures of backlit dark objects surrounded by light (e.g., a person standing against a sunrise) or bright objects surrounded by darkness (e.g., fluorescent cells). To check how much light illuminates a studio scene, a meter inside the scene monitors the amount of light. These incident light meters monitor side lighting by lamps or flash from the camera to help determine the exposure time. Even with a light meter, bracketing the exposure time by taking longer and shorter exposures on either side of the light meter reading often achieves the best contrast and tone control (see Section 9.5).
5.3 The Anatomy of the Camera Chip Limits Its Spatial Resolution The design of a camera chip resembles the distribution of square pixels on a digital image. Each of the imaging elements is a pixel. The array of pixels stores the photoelectrons that accumulate during the exposure time to light. The read-out of the electrons occurs in an order that accurately reconstructs the final image, with different read-out for different kinds of chips (Figure 5.2). In cameras with charge-coupled device (CCD) chips, each pixel’s electrons has to transfer to an adjacent pixel during the read-out. Other cameras use complementary metal oxide semiconductor (CMOS) chips, in which the read-out is independent for each pixel; in other words, the pixels are independently addressable by their attached electronics. Table 5.1 compares the imaging chips used in consumer-grade cameras (light blue rows) and scientific-grade cameras (dark blue, red, and green rows). All of the consumer-grade cameras use CMOS chips. CMOS chips are faster, acquiring images at video rates (20–30 frames per second; see Section 12.1) and beyond (>100 frames per second). The historical disadvantage of CMOS chips was that the sampling and switching necessary to read each pixel independently produced fixed-pattern noise. New chips overcome this problem, and CMOS chips are becoming more common in scientific
Figure 5.2 Charge-coupled device (CCD) versus complementary metal oxide semiconductor (CMOS) design of camera chips. The arrows represent the read-out sequence, with read-out by the CCD through charge coupling of the individual pixels and read-out of the CMOS by independently addressable (active pixel) separate electronics. One way these chips can detect color is to use a mosaic color filter with a Bayer pattern. Diagram by L. Griffing, adapted from T. Ang. 2002. Handbook of Digital Photography. Dorling-Kindersley, London. p. 19.
5.3 The Anatomy of the Camera Chip Limits Its Spatial Resolution
Table 5.1 Comparison of Consumer-Grade (Light Blue) and Scientific-Grade Complementary Metal Oxide Semiconductor Cameras (Dark Blue), Including Backside-Illuminated Smartphone Sensors, Scientific-Grade Interline Charge-Coupled Device Cameras (Red), and Scientific-Grade Electron Multiplication Charge-Coupled Devices (EMCCDs) (Green).a
Camera
Pixel Size (μm)
Pixel # Dimensions (Width x Height)
Full Well Capacity at Chip Dimensions Saturation (Electrons) (mm) Bit Depth
DR Color and Intensity: RN or TN
Nikon Z7 II
4.35
8288 × 5520
35.9 × 23.9
57,640 14 bit
19,875 26-bit RGB 14.7 EVs 2.9e- TN
Canon EOS R5
4.39
8192 × 5464
36 × 24
43,768 14 bit
13,677 25.3 bit RGB 14.6 EVs 3.2e- TN
Samsung Galaxy S22 Ultra (ISOCELL line)
0.8
12,000 × 9,000
12.03 × 12.03
6,000, 12,000 (binned nonacell) 10 bit
Variable with auto ISO 2.7 e- RN
Pixel 6 (Samsung ISOCELL)
1.2
8669 × 5768
12.21 ×12.21
~8,000? 10 bit
Variable with auto ISO 2e-? RN
ORCA- Lightning
5.5
4608 × 2592
25.34 × 14.26
38,000 (binned) or 1000 17,000 (high) or 650 (stan) (standard) 2.0–2.7 e-
1200 ×1200 (size varies)
11 × 11
80,000
61,500 1.6 e- RN (41fps)
Prime 95B
11
Zyla 5.5
6.5
2560 × 2160
16.6 × 14
30,000
33,000 0.9–2.4e-
HiCAM Fluo
6.6
1280 ×1024
12.78 ×12.68
N/A
53.7 dB (DR)
Retiga-E7
4.54
3200 × 2200
14.4 ×9.9
19000
2000 2.2e- RN
C11090-22B
13
1024 × 1024
13.3 × 13.3
80,000
13,333 6e-
ImagEM X2 – 1K
13
1024 × 1024
13.3 × 13.3
50,000 (normal) 400,000 EMCCD – 8× amp.
Varies with read-out speed 10e- (normal)
a Full well capacity is at base ISO. ?, estimate, specs unavailable; bit depth, bit depth per channel; dB, 20 log(SAT/Ncamera); DR, dynamic range at base ISO; EV, exposure value (defined in Section 9.5); H, height; N/A, not available; RGB, red, green, blue; RN, read noise; TN, total noise; W, width.
cameras. One advantage of CCD chips is they are easier and cheaper to make. A CCD read-out has less sampling noise and requires less processing. However, the frame rate of most high-sensitivity scientific-grade CCDs is slow (5–10 frames per second). Both types of chips use metal oxide semiconductors (MOSs) that produce photoelectrons (Figure 5.3). Unlike a photodiode, the MOS pixel stores the photoelectrons until it reads out. It has a capacitance to store electrons as electron–hole pairs in a potential well. In the most typical color chip, each pixel contains three potential wells, one for each primary color, red, green, and blue. These three-phase chips (Figure 5.4) insulate the electrons stored within each pixel from those in the surrounding pixels. The barriers that insulate the pixels are channel stops, regions of boron-infused silicon separating the pixels (see Figure 5.4A). Selectively opening the potential wells, or gating, moves electrons from one side of the pixel to the other. Applying a voltage to the polysilicon gate achieves gating. The physical size of the chip, and its potential wells, determines the maximum number of stored electrons, the full well capacity (FWC) or saturation (SAT) level of the pixel.
81
82
5 Image Capture with Digital Cameras
Figure 5.3 Diagram of a pixel in a charge-coupled device. Either electrons or electron holes can be stored in the potential well depending on the design. Diagram by L. Griffing.
A rule of thumb is that the limitation on SAT is about 1000 times the pixel area in square µm, so for a 20- × 20-μm pixel, the 400-μm area has a saturation level (SAT) of about 400,000 photoelectrons. Larger pixels with higher FWC have the capacity to represent more gray levels accurately (i.e., they can have greater pixel depth). The resolving power of a camera is half of its pixel spatial frequency (see Section 1.5). It is pixel size that matters, not the number of megapixels on the camera chip. As with the eye (see Section 4.1), cameras can theoretically resolve two points of light detected by two illuminated pixels separated by an unilluminated pixel. Using a typical 6-µm pixel from Table 5.1, the center-to-center spacing of two illuminated pixels separated by an unilluminated one is 12 µm. Likewise, the Nyquist criterion imposes the condition that the recording media should have twice the spatial frequency of pixels of the image, making 12 µm the theoretical limit of resolution of that camera chip (see Table 5.7). Consequently, there is a trade-off between pixel size and pixel depth. Smaller pixels produce higher resolving power, while collecting fewer electrons before reaching their SAT: better resolution, lower pixel depth. Pixel size and the size of the imaging chip determine the number of megapixels in a camera chip. In the examples in Table 5.1, all of the consumer-grade cameras have higher megapixel values (multiply the pixel horizontal number by the pixel vertical number) than all of the scientific-grade cameras. However, the pixel size in the scientific-grade cameras is about the same as consumer-grade digital single-lens reflex cameras (DSLRs) (the first two blue rows), except for electron multiplication CCDs (electron multiplication CCD [EMCCD], green row) that have large and very deep pixels with high FWC. Smartphones and point-and-shoot consumer-grade cameras have small chips and small, shallow pixels with low FWC.
5.4 Camera Chips Convert Spatial Frequencies to Temporal Frequencies with a Series of Horizontal and Vertical Clocks Just as applying a potential to the polysilicon gate in a three-phase pixel moves the electrons from one potential well to the next within a pixel, applying a voltage to the gate on an adjacent pixel phase moves electrons from one pixel to the next. With gating, the potential well in the adjacent pixel collects electronic charge from the first pixel, thereby transferring the charge. This is similar to a bucket brigade, in which a line of people transfers buckets (potential wells in pixels) with water (electrons) from one brigade member to the next. To see how these transfers are coordinated, consider the full-frame architecture CCD (Figure 5.5). This chip has a large two-dimensional (2D) imaging area, the parallel register, and a single one-dimensional row, the serial register (see Figure 5.5). For the example in Figure 5.6, the parallel register is a small 5- × 5-pixel array, but pixel arrays in reality can be a thousand-fold larger in each dimension (see Table 5.1; up to 8K standard, see Table 2.2). During exposure to light, a “latent” electronic image forms through the accumulation of charge in each potential well in the parallel register (see Figure 5.5). The length of time taken to expose the chip is the exposure time. During the exposure, charge accumulation is charge integration. The voltage read-out of the CCD is a set of timed voltage changes whereby the integrated charge of each pixel produces a proportional voltage change during a controlled time interval. The whole idea is to generate a timed frequency of voltage pulses that represents the spatial frequency of integrated charge in the pixels of the chip (see Figure 5.6A). A display device (Figures 5.6F and 5.7) interprets this timed signal and displays large voltage changes as bright pixels and small voltage changes as dim pixels. The beginning of the CCD read-out is a signal for the display device that a new frame is starting, a vertical V-blanking pulse, which occurs during the vertical retrace for broadcast video (see Figure 5.7). Blanking
5.4 Camera Chips Convert Spatial Frequencies to Temporal Frequencies with a Series of Horizontal and Vertical Clocks
Figure 5.4 Movement of charge through a three-phase charge-coupled device chip using gating. (A) A gate potential at phase 1 gathers the photoelectrons to that well. (B) Both phase 2 and phase 1 gates have an intermediate potential, and electrons move from one well to the next. (C) The potential in phase 1 sets back to zero, and the photoelectrons move to the next well. This gating can read out the color information from three phase chips. Diagram by L. Griffing.
83
84
5 Image Capture with Digital Cameras
Figure 5.5 Full-frame architecture charge-coupled device. The unmasked parallel register provides the two-dimensional image area. This area can record very long exposures and is useful for very faint signals for on-chip integration. The masked serial array sequentially reads out the rows following exposure of the chip. Diagram by L. Griffing.
Figure 5.6 Charge-coupled device read-out from full-frame architecture camera. (A) Collection of light in parallel register. Each dot represents a photoelectron that resulted from exposure of the 5- × 5-pixel array to light. (B) Read-out sequence for the parallel chip starts with a blanking pulse to register the beginning of a new frame (first blanking pulse in E and F). A vertical shift timed by the parallel clock shifts the top row of the parallel register into the serial register. (C) The serial register makes a horizontal shift using the serial clock. The output amplifier amplifies the electrons are stored in the first pixel, producing an analog signal. In this case, there are no photoelectrons. Read-out “a” produces no voltage change. (D) The horizontal shift repeats at an interval determined by the serial clock. Read-out “b” produces a voltage change following amplification. (E) The electronic timed signal resulting from the read-out of all five rows of the chip. The spatial frequencies of the chip become a temporal frequency in the electronic signal. Note the read-out “a” and “b” values in the first signal after the vertical blanking (V-blanking) pulse. (F) The final image on a device displays the electronic signal in (E), converting it back to a spatial frequency and showing a gray level intensity proportional to the voltage in the signal. Diagram by L. Griffing.
5.5 Different Charge-Coupled Device Architectures Have Different Read-out Mechanisms
Figure 5.7 Raster scan convention for analog video. Solid red horizontal raster scan lines make up the image. Dotted lines show the retrace pathways. Diagram by L. Griffing.
pulses are voltage dips that do not show up on the display. V-blanking pulses occur between frames, while horizontal H-blanking pulses occur between the raster lines that form the image (see Figure 5.7). After the V-blanking pulse, the parallel clocks associated with the parallel register open the polysilicon gates on the “top” of each pixel, producing a vertical shift in charge, filling the top serial register with the charge from the top row of the parallel register, and emptying the bottom row of the parallel register. Then the horizontal row is read-out at precise intervals dictated by the serial clock in the serial register, shifting each serial column horizontally (see Figure 5.6C and D), producing read-outs a and b, respectively (see Figure 5.6E). In this 5- × 5-pixel array, the serial register completely reads out with five horizontal shifts followed by an H-blanking pulse. Then the parallel clock shifts the rows vertically again, filling the serial register, which then reads out by another five horizontal shifts. This repeats until the serial register reads out the last row of the parallel register followed by a V-blanking pulse, which starts the new frame. Most electronic imaging devices convert spatial frequencies to time frequencies in a similar way. The image display device, here for simplicity made up of 5 × 5 pixels, reads these electronic pulses and reassembles them into an image using the blanking pulses and short synchronization, sync, pulses, embedded within the blanking pulses. The intensity of each pixel in the display is proportional to the voltage in the signal. The clocking speed of the parallel and serial clocks sets the limit on how fast a chip reads out. However, because all of the pixels in the chip are involved in the read-out from a full-frame architecture CCD, a shutter covers them during this process. If the shutter remains open, incoming light interferes with the read-out. The time for read-out of each horizontal raster line is slower than the standard video read-out time of a raster line (63.5 microseconds in Japan and the United States; see Section 5.7) in slow-scan CCDs. Slow-scan CCDs find use in low-light, high-resolution imaging. A shutter is also important for taking quick exposures of bright scenes. However, read-out time extends the time between short exposures, making high-speed imaging impossible. The lag of the camera is the time required for a camera to respond to changes in light intensity. The limiting factor is camera read-out speed. Lag is an important consideration when examining quickly moving objects. CCDs can capture a single instance of the moving object, but recording the movement continuously requires more frames per second or higher frame rate. Independently addressable pixels in CMOS chips do not use the other pixels for read-out, so they are capable of high-speed imaging using a rolling shutter. However, there are CCD architectures that are capable of faster read-out.
5.5 Different Charge-Coupled Device Architectures Have Different Read-out Mechanisms An alternative slow-scan architecture, the frame transfer architecture (Figure 5.8), somewhat overcomes this time limitation by providing a separate, masked parallel register to which an original exposure transfers quickly followed by a new exposure in the unmasked parallel register. The shift to the masked parallel array typically takes about 1 ms. A shutter does not cover the unmasked array during transfer, so light falling on the unmasked array during this interval reduces image quality. The most common CCD design uses interline transfer architecture (Figure 5.9) in which each horizontal row has alternating masked and unmasked pixels. The charge accumulated in the unmasked pixel rapidly transfers to the adjacent
85
86
5 Image Capture with Digital Cameras
Figure 5.8 Frame-transfer architecture charge-coupled device. This arrangement is good for two very quickly acquired images. The shift to the storage array can occur very quickly, but then the read-out occurs in the same way as in full-frame architecture. Diagram by L. Griffing.
Figure 5.9 Interline-transfer architecture charge-coupled device. This arrangement reads out the parallel array to a masked parallel array that then clocks out to the serial array quickly. These cameras provide better real-time imaging than either the full-frame or the frame-transfer architectures but have lower quantum efficiency. Diagram by L. Griffing.
5.6 The Digital Camera Image Starts Out as an Analog Signal that Becomes Digital
Figure 5.10 A microlens captures the light in an interlinetransfer charge-coupled device that would otherwise fall on the interline transfer mask. The lens focuses the light onto the region of the pixel called the photosite. An exposure gate transfers the photoelectrons to the interline transfer potential well that then reads out into the serial array. Diagram by L. Griffing.
masked pixel. The masked pixels transfer their charge to the serial register through vertical shifts that can occur while the unmasked pixel is exposed. The downside of this architecture is that very little (about 23%) of the pixel is available to act as a photosite for imaging. However, a slightly different diode architecture covered with a microlens (Figure 5.10) can focus 70%–80% of the light of the pixel onto the photosite region. The high degree of curvature of this lens optimizes object-side aperture (see Section 8.1). The CMOS chip has the fastest and most flexible form of read-out. The pixels are independently addressable, making the read-out of any one pixel independent of the others. Instead of reading out the pixels in a lock-step, the same ordered output of the pixels occurs much more quickly, producing frame rates of greater than 100 frames per second.
5.6 The Digital Camera Image Starts Out as an Analog Signal that Becomes Digital The signal coming from an electronic photodetector, such as a CCD, is analog and must be made digital (Figure 5.11). Digitization makes a continuous function discrete (i.e., a set of individual, quantized values). The mathematics of discrete functions is the theoretical framework for digital processing. From an intuitive point of view, if there is a continuously changing event, many small discrete values represent it more accurately than a few large discrete values. With poor digitization, the signal, although continuously varying with large peaks and troughs, becomes degraded if the discrete values that represent it do not show the peaks and troughs. This happens if the time of sampling is too infrequent. A timed sampling frequency that is too slow produces aliasing just as too low of a spatial sampling frequency produces aliasing (see Section 1.5). The ideal square waves of voltage change in Figures 5.6 and 5.11 are in reality curves (Figure 5.12). Increasing the temporal frequency of digitization makes the curves “more square” by enabling faster rise times for the curves. However, if the voltage change is too fast, it overshoots, as in Figure 5.12, and ringing results, producing contoured edges (see Section 3.7) in the image. Analog-to-digital converter (ADC) chips on the same printed circuit board as the imaging chip digitize the signal and provide direct read-out of digital frame memory (frame buffer) (see Figure 5.11). To generate the digital frame, the ADC quantizes the analog signal, that is, it converts continuous voltage changes to discrete values expressed in digital bits. The ADCs supply digital video frames directly using high-speed connections called buses. A 400- to 600-MHz bus provides a high-resolution output signal, which plays at video rates from an incoming camera signal. High-speed connections include the universal serial bus (USB). For display on standard video displays that use analog signals, another chip, the digital-toanalog chip, converts the signal to back to an analog signal (see Figure 5.11).
87
88
5 Image Capture with Digital Cameras
Figure 5.11 There are frequently two digital converters in a modern camera. One analog-to-digital converter provides the ability to store the image in frame memory. The digital-to-analog converter reconverts the digital image to a broadcast electronic signal. Computer monitors can read the digital frame information if it is saved as a digital file. Video monitors display the broadcast analogue signal, converting the image back to a spatial frequency. Most video display cards and monitors have the electronics for display of both digital and analog signals. CCD, charge-coupled device. Diagram by L. Griffing. Figure 5.12 The frequency of transmission or digitization determines the rise time. The ideal is a square wave (black). Long rise times (green) produce blurry edges. Short rise times (blue) produce sharper edges. Very short rise times (red) can generate ringing, where bright and dark bands outline the edges of the objects. Diagram by L. Griffing.
It is important to include transmission of color in addition to just gray scale intensity. If the signal is transmitting color, it can do so using several different color standards. Component video transmits the red, green, blue, and sync signals separately. Composite video transmits the color within a single signal stream. S-video transmits two components, a luminance (Y) and chrominance (C) signal. Most digital video signals transmit composite, progressive (non-interlaced) video signals where all of raster lines of the video signal are in one field.
5.7 Video Broadcast Uses Legacy Frequency Standards For American and Japanese (NTSC) video, the latent image on the video camera sensor is read out every 1/60 of a second, or every 15.7 ms based on the 60-Hz AC cycles in the power lines in these countries. Other standards use 50-Hz AC cycles (Table 5.2). For the NTSC standard, the limiting resolution for video systems is how many cycles of spatial frequency can
5.8 Codecs Code and Decode Digital Video
Table 5.2 Comparison of World Video Standards. NTSC
PAL
SECAM
HDTV
Field rate (Hz)
60
50
50
—
Lines/frame
525
625
625
720, 1080
Lines/second
15,750
15,625
15,625
—
Aspect ratio
4:3
4:3
4:3
16:9
NTSC, America and Japan; PAL, Europe and South America; SECAM, Russia and France.
Panel 5.2 The bandwidth of NTSC systems. The NTSC standard is interlaced; in other words, there are two fields of 262.5 adjacent horizontal scans at 60 Hz, or 60 times per second. The horizontal (H) scan rate of 262.5 × 60 = 15,750 Hz. The H scan interval is 1/15,750 seconds or 63.5 ms. Subtracting the H-blanking interval,11.4 ms, the total time to get the information from a single raster scan is reduced to 52.1 ms. The aspect ratio of the active video frame is 4:3 by convention, so the total time for equal information in the x and y dimensions is 3/4 of 52 ms, or 39.4 ms. The horizontal bandwidth for NTSC is 262.5 cycles (sampling using the Nyquist criterion) every 40 ms or 6.6 MHz. The vertical bandwidth is limited by the discrete nature of the interlaced scan lines, doubling the time for the vertical scan (two sets of 40 microsecond lines) and the time it takes for the vertical retrace. It is less than half the bandwidth of the horizontal scan.
be put into a horizontal raster lasting about 40 ms (Panel 5.2). This is the horizontal bandwidth of the system. For NTSC systems, it is about 5.6 MHz (see Panel 5.2). The bandwidth determines the resolving power of the system. The higher the temporal frequency of acquisition and transmission, the higher the spatial frequency of objects faithfully represented in the image.
5.8 Codecs Code and Decode Digital Video There are many advantages of digital video over analog video. Digital video copies have complete fidelity. They are more resistant to signal degradation by attenuation, distortion, and noise. Signal delay and special effects are easier. The delivery bitstream can be scalable, providing support for multiple spatial and time resolutions. The multiple formats are fairly easily interconvertible. To convert analog video to digital video in the absence of an ADC chip, converters first eliminate high-frequency components that are too high to be digitized and cause aliasing; then the converters sample the signal at a rate that exceeds the Nyquist limit. For a 525-line NTSC signal, it is 13.5–14.2 MHz (exceeds 2 × 5.6 MHz), which translates to 162 Mbps (megabits per second) for high-definition digital television (Rec.601). Note that in digital video, bandwidth is in Mbps instead of MHz (Table 5.3). With digitizing the analog signal, the format quantizes it, making, for example, 8-bit signals with 256 levels. Finally, codecs, short for coder and decoder, encode the video. Codecs are software that implement standards issued by international organizations (Table 5.4). They not only transmit video, but they also do it efficiently with compression based on interframe, or temporal, redundancy. In other words, if pixels with the same address have the same values on sequential frames, a code describes that redundancy rather than the pixels themselves. Predictive coding techniques predict whether the next frame is going to be identical to the current frame. If it is not, the difference is simply determined by subtracting the two. Predictive coding uses techniques in motion estimation (see Section 12.4). Codecs also compress using spatial transforms, such as the discrete cosine transform used for JPEG compression (see Section 2.14) or with advanced region- or object-based coding techniques.
89
90
5 Image Capture with Digital Cameras
Table 5.3 Example Digital Video Playback Formats. Format
1
Application
Size (Width × Height
Frame Rate
Raw Data (Mbps)
QCIF: quarter common intermediate format Video telephone
176 × 144
30 fps, P
9.1
CIF: common intermediate format
Videoconference
352 × 288
30 fps, P
37
SIF: source intermediate format
VCD, MPEG-1
352 × 240
30fps or 25fps, P
30
Rec. 601
Video production
720 × 480
60fps or 50 fps, I
162
SMPTE 296M
HDTV distribution
1280 × 720
24, 30, or 60 fps P
265 or 332 or 664
SMPTE 274M
HDTV distribution
1920 × 1080
24P, 30P, or 60 I fps
597 or 746 or 746
fps, frames per second; I, interlaced scan; P, progressive scan.
Table 5.4 International Standards and Example Formats.a Compression Standard
Compression Formats
MPEG
MPEG-1 (VCD videos)
MPEG-2 (digital broadcast TV, DVDs)
MPEG-4 (range of applications)
ITU-T (International Telecommunications Union)
H.261 (video telephony)
H.263 (videoconferencing)
H.264 (all video – with MPEG 4 AVC – advanced video coding)
VVC (versatile video coding) MPEG I-3, H.266 – compression recommended by the JVET, Joint Video Experts Team – working group of both ITU-T and MPEG
Containers (and Their Suffixes)
Open source
FFmpeg framework
MXF (.mxf)DCP
Matroska (.mkv)
WebM (.webm)
Ogg (.ogg, .ogv)
Proprietary
Microsoft (.avi, .asf)
Apple QuickTime (.mov)
Adobe Flash (.fla, .fv4 – legacyb)
Hardware: DV (dv) 3GP (.3gp, .3gpp)
DivX (.avi, .divx) Legacyb Used by PS3, TV, VOD, and Blu-ray devices
Open source
x264 (H.264/ MPEG-4AVC) lossy, but high quality
Theora, VP9, lossy, used with Ogg and WebM respectively
FFmpeg #1. lossless RGBAc
SVT-AV1 is the Intel Open-source streaming codec used by Netflix
Fraunhofer HHI released an open-source VVC (H.266) encoder VVenC and an open-source decoder VVdeC
Proprietary
QuickTime, .qt, RGBAc
DivX (MPEG-4)
DV (.dv) lossy
Real Networks’ Real Video
Sorenson (used by QuickTime and Flash)
Others
Camtasia, for instruction
PNG – frame by frame lossless, RGBAc
HUFF-YUV, lossless
JPG – frame by frame – lossy
DNxHD –either lossless or lossy; used for editing and presentation; proprietary by Avid but usable at 10 bit in the FFmpeg utility
Codecs
Example Formats
a
Not comprehensive. Legacy; no longer supported by companies making them. c Red, green, blue, and alpha; supports the alpha channel in red, green, blue (RGB) space. b
A big confusion in digital video arises from saving video with a certain file extension such as .avi or .mov without specifying or knowing the codec. These file formats are not codecs but containers that can map to different subsets of codecs (e.g., .avi can use either png or jpg compression in ImageJ). Containers act as wrappers around encoded audiovisual data and their associated metadata (see Table 5.4). Some more recent containers such as Ogg and Matroska can hold an unlimited number of video, audio, picture or title tracks in one file. Only a few contain codecs that are lossless and are RGBA (red, green, blue, and alpha), supporting the alpha channel in the red, green, blue (RGB) color space. Alpha channels contain information such as titles or image modification commands that are separate from the color channels (see Table 2.3 for different image file formats that include alpha channels). Some older containers, such as QuickTime, can hold multiple tracks but can only operate within an Apple iOS. The DNxHD codec offers either lossless or lossy compression and
5.10 The Light Absorption Characteristics of the Metal Oxide Semiconductor, Its Filters, and Its Coatings
provides an intermediate for either editing or presentation. If there is an option within a video editing program, probably the most universal single codec to assign for a high-quality, small file size presentation recognized by multiple devices is one using the H.264 standard (see Table 5.2). A more recent implementation is the VCC (H.266) compression standard recommended by a joint video group from both the MPEG and ITU organizations. However, note that these are lossy codecs, the jpeg analogs of video. For editing and compositing research video, a non-lossy format such as PNG is important to use. Some recent codecs for video streaming are VP9, which is the only file type contained in the WebM container, and SVT-AV1, used by most professional streaming services, such as Netflix. Professional digital cinema uses the MXF (material exchange format) container, which can use the advanced authoring format (AAF) for enabling workflows between nonlinear editing systems, cameras, and other hardware. When rendering video from a video sequence editor like Blender, Adobe Premier, Adobe After-Effects, or Apple’s Final Cut Pro, rendering the final production using two or three different codecs takes relatively little time and provides playback on many devices. To run an unknown container or codec, use the FFmpeg utility. While not a container or codec itself, it provides support for multiple, recent containers and codecs and is an open-source program.
5.9 Digital Video Playback Formats Vary Widely, Reflecting Different Means of Transmission and Display Another complication arising from different display devices and modes of transmission is that small formats over limited bandwidth are used for cell phones, while larger formats with large bandwidths are used for HDTVs (see Table 5.3). For most scientific publication, high-quality video production should use the Rec. 601 format (the ITU-R, International Telecommunication Union – Radiocommunication Sector, formerly the CCIR, Recommendation BT.601–5 4:2:2). This video compresses from 162 Mbs to 4–8 Mbps using MPEG-2 (used for most DVD compression). It provides either PAL or NTSC formats for broadcast. Conversion between NTSC and PAL formats is relatively lengthy (deinterlacing, line rate conversion, frame rate conversion, and re-interlacing), so if the intended audience needs both formats, the video should be “saved as” both PAL and NTSC versions with the video sequence editor. This format provides high-quality, lower resolution, intermediate formats because they are lower multiples of the frame size (720 × 480) (see Table 5.3). Of course, using smaller frame sizes than those provided by the camera (many cameras now use HDTV recording) decreases the resolving power of the final production. Use care to avoid distortion whenever incorporating a different format image into a video sequence from, for example, a still camera. Older video formats use an aspect ratio of 3:2 rather than 4:3, with frames typically 640 pixels × 480 pixels. If using older video at that aspect ratio, do not stretch the frame from 640 to 720. Instead, use a blank 80-pixel edge buffer. Likewise, use a blank pixel buffer in the horizontal dimension following linear interpolation (1.5 or 2.25×) to convert the 720 × 480 Rec. 601 video standard to the 16:9 aspect ratio 1280 × 720 (720p) or 1920 × 1080 (1080i) HDTV formats. As SMPTE standards supporting HDTV become more common in cameras and capture devices, they will replace the Rec. 601 format.
5.10 The Light Absorption Characteristics of the Metal Oxide Semiconductor, Its Filters, and Its Coatings Determine the Wavelength Sensitivity of the Camera Chip The quantum efficiency of CCDs is much higher than that of the human eye (Figure 5.13). Full-frame architecture CCDs and the newer high-sensitivity, back-illuminated, back-thinned chips have peak quantum efficiencies in the far-red region of the spectrum, around 700 nm. Interline transfer architectures have a blue-shifted peak quantum efficiency. Because silicon has relatively high quantum efficiencies in the infrared, they often have an infrared blocking filter if they are for visible light imaging. CCDs detect color with a series of on-chip or off-chip filters. High-resolution photography takes three images with a small pixel size, single well sensor using motorized red, green, and blue filters. These separate images combine as channels to give an RGB color image (see Section 2.13). Fluorescence microscopy using probes that emit at different wavelengths employs filter sets that optimize the signal from the probes by matching the filter transmission to the emission wavelengths of the probes. Using multiplexing, separate pseudocolored channels represent each probe in the final image (see Sections 3.3 and 17.5). A problem that arises from this approach is the misregistration of the separate channels, particularly if the sample is alive and moving, but even on fixed and unmoving samples. Consequently, multiplexing includes a
91
92
5 Image Capture with Digital Cameras
Figure 5.13 The spectral-dependence of quantum efficiency of a front illuminated charge-coupled device (CCD) (black line) and the same chip, back thinned and back illuminated (red line, see Figure 16.27). Adapted from Scientific Technologies Inc. 1995. Lit. No. ST-002A. Compare with the scotopic (green line) and photopic (orange line) human vison from Pelli, D.G. 1990. Vision: Coding and Efficiency. Cambridge University Press, Cambridge, UK.
software-based registration step for fixed tissue (see Figure 3.18, Section 3.7 and Figure 17.8, Section 17.3). Alternatively, different pixels on the camera can have filters permanently mounted on them. In this case, different pixels detect and read out red, green, and blue. This is a mosaic filter. The pattern most frequently used in mosaic filters is the Bayer pattern (see Figure 5.2), in which every other pixel is green, and every fourth pixel is red or blue. This, of course, diminishes resolution because the “superpixel” that senses all three colors is larger than the individual pixels for each color, like the color produced by a variety of dots in Seurat’s paintings (see Section 1.1). Another approach is to have triple well sensors in three phase chips, in which the wavelength of the light determines its depth of penetration, blue wavelengths penetrating least and red wavelengths penetrating most. Each pixel thereby has the capability of reading red, green, and blue. Like the single well sensor, these have the advantage of using all of the pixels for all of the colors, thereby maintaining high resolution. A final, less common architecture is to have a beam splitter that splits the incoming beam three ways, one going to red-, one to green-, and one to blue-filtered CCDs. This configuration requires excellent design so that the pixel registration is precise on all three chips. Testing the truthfulness of color capture employs a pattern of color standards. Figure 5.14 shows the analysis of how faithful color capture is on a Canon EOS 10D. The test
Figure 5.14 Calibrating color. (A) A standard color card used to color check the camera under the existing lighting and then set the white balance with color and gray patches as described in Chapter 10. © Greta Macbeth. Used with permission. A table of standard red, green, blue (RGB) or cyan, magenta, yellow, and black (CMYK) values for each color patch is available. This particular color card is part of the X-Rite color checking system. (B) The Imatest error output from Imatest ColorChecker using an image from a Canon EOS-10D at ISO 400 and RAW converted with Canon software using automatic color balance. Actual color, squares, color for camera, circles. The color space is CIELab or L*a*b*, in which L* is lightness, a* is the red–green axis, and b* is blue–yellow axis. See the CIE chart in Figure 4.7. From https://www.imatest.com/docs/colorcheck.
5.11 Camera Noise and Potential Well Size Determine the Sensitivity of the Camera to Detectable Light
compares the image of a standard color card (see Figure 5.14A) taken under standard lighting conditions by the camera (see Figure 5.14B, circles) with that of the actual color (see Figure 5.14B, ideal, squares). Color-based image analysis requires calibration. Section 7.7 discusses separating and measuring objects based on their color.
5.11 Camera Noise and Potential Well Size Determine the Sensitivity of the Camera to Detectable Light The sensitivity of the camera chip is very important in biological imaging. There are many reasons to use chips with sensitivity to low light. Decreasing the amount of stain or dye minimizes side effects. Intense illuminating light can change the behavior of the sample, requiring low light. The sample may be moving or changing rapidly, requiring very short exposure times. There are many naturally occurring phototoxic events (e.g., generation of reactive oxygen species), making illumination with low photon doses desirable. The light sensitivity of a digital camera is the number of electrons produced by the light-illuminated chip compared with the number of electrons produced in the dark. The production of electrons by each pixel in the camera chip increases linearly with increasing numbers of photons (Figure 5.15). When the pixel can hold no more electrons, it reaches its FWC, also known as the SAT (Panel 5.3). As in the intensity transfer graph (see Section 2.9, Figure 2.13), the relationship between photon “in” versus electrons “out” is the gamma. CCD and CMOS chips have linear gamma up to SAT. Older tube-based video cameras, film cameras, and some high-sensitivity photo-amplifying cameras, such as the EBCCD (Section 5.12), do not have a linear gamma, making measurements of image intensities difficult. This makes CCD and CMOS chips excellent for measuring light or photometry. However, at very low light intensities, noise interferes with the measurement. The number of “dark” electrons in each potential well that occur during an equivalent exposure time without light is the dark current, Dcurrent (see Panel 5.3). This dark noise, Ndark, in silicon semiconductors depends primarily on the temperature of the chip (see Figure 5.15). Thermionic noise, the production of electrons by heat, is the main contributor to the dark noise in most chips. Other chips, particularly the EMCCD (see later) have a large read-out noise, Nreadout, generated by chip-associated amplifiers during conversion of the photoelectrons to a voltage signal. As described in Panel 5.3 and shown in Figure 5.16, the total camera noise, Ncamera, is the square root of the sum of the squares of Nreadout and Ndark. The amount of signal produced by each pixel is the product of light intensity, quantum efficiency of the chip, and time of exposure (see Panel 5.1). With the signal comes shot noise, Nshot, resulting from the statistical distribution of the incoming photons (see Panel 5.3). It is the square root of the signal. The shot noise and the camera noise combine to make the total noise (see Panel 5.3). At high light intensities, shot noise predominates and the signal-to-noise ratio (SNR) becomes the ratio between the signal and the shot noise (see Figure 5.16). The more electrons that can accumulate per well prior to SAT, the higher the light intensities that can be recorded by each pixel, giving the chip a higher dynamic range (DR) (see Panel 5.3). Ncamera limits the dynamic range at low light intensities. At high light intensities, the ability of the camera to detect changes is limited primarily by the shot noise that occurs at FWC, the square root of SAT (see Panel 5.3). Hence, cameras with low levels of low SAT or FWC have a smaller minimum detectable signal change (DS) (see Panel 5.3). For example, a camera with a SAT of 100,000 e-/pixel has a 316:1 minimum DS, or the ability to sense a 0.03% change in intensity with 95% certainty. On the other hand, a camera with a SAT of 20,000 e-/ pixel has a 141:1 minimum DS, or the ability to sense a 0.7% change in intensity with 95% certainty. Hence, cameras with higher SAT can detect a smaller percent change in the signal at 95% confidence. At higher levels of intensity, SAT determines the accuracy of signal detection. Cameras with high SAT also have more accuracy at lower levels of illumination. Higher SAT or FWC means that not only is there a higher dynamic range of the camera but also Figure 5.15 Linear gamma of the electrons produced by a more electrons can accumulate per gray level in cameras that 40% quantum efficiency charge-coupled device at one have equal pixel bit depth, as determined by the chip digitizer wavelength. Calculated value diagram by L. Griffing.
93
94
5 Image Capture with Digital Cameras
Panel 5.3 Relationships between camera noise, signal, and dynamic range. Ndark = Dcurrent × Time
(5.5)
Besides dark noise, there is read-out noise, the electrons generated as the signal in the detector is converted into a voltage signal. Together they make the camera noise, a constant value, or noise floor, that is independent of the level of illumination of the chip, Figure 5.16. Ncamera = N2dark + N2readout
(5.6)
There is also shot noise, Nshot, which is the statistical noise of an average signal generated by a Poisson distribution of incoming photons that produce electrons, Figure 5.16 and as follows:
Nshot = Signal
(5.7)
As the level of illumination increases, the shot noise increases as its square root. Hence, the signal-to-noise ratio (S/N or SNR) is a square root function, as graphed in Figure 5.16. Ntotal = N2shot + N2camera
(5.8)
The number electrons held by the pixel at saturation (SAT) is also called the full well capacity (FWC). The ratio of this value and the camera noise is the dynamic range (DR) of digital cameras. DR = SAT / N camera
(5.9)
Dynamic range in decibels (dB) = 20 log (SAT / N camera ) The minimum detectable signal change, DS, is defined as the square root of SAT, the saturation level of each pixel: DS = SAT
(5.10)
(Table 5.5). Because more electrons can accumulate, there is more signal relative to noise, and each gray level is more accurate. By increasing the bit depth but not SAT, the number of electrons per gray level diminishes (see Table 5.5), making each gray level less accurate. For low-light applications, use a camera with low camera noise level. For example, two cameras with equal dynamic range (1000:1) may have different camera noise levels (e.g., a camera with 100000 e- saturated well capacity with a camera noise of 100 e-, and a camera with 20000 e- saturated well capacity with a camera noise of 20 e-). Even though the latter has the same dynamic range, its low camera noise makes detecting very low light possible. The scientific CMOS cameras often have lower read-out noise than consumer-grade DSLRs (see Table 5.1). However, some consumer-grade CMOS cameras have low enough noise for low-light applications, while costing about $500–$2000 compared with about $5000– $20,000 for a scientific-grade CMOS camera. ISO is the acronym for the International Standards Organization, a body that determined the value of film speed for film cameras (also done by the American Standards Association [ASA], so they are also ASA numbers). High-speed film, with
5.11 Camera Noise and Potential Well Size Determine the Sensitivity of the Camera to Detectable Light
Figure 5.16 Total noise (dark green line) in a charge-coupled device is a combination of thermal dark current (orange line), camera read-out noise, together making the camera noise floor (pale green line) and shot noise (dashed green line). Shot noise becomes limiting at levels of illumination above the camera noise level. Adapted from Figures 7–20 and 7–48 in Inoue, S. and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY.
Table 5.5 Effect of Digitizer and Full Well Capacity and ISO Value on Gray Level Value from an Equivalent Exposure.
Full Well Capacity (FWC or SAT) e-
e- Per Gray Level
e- Collected in 100-ms Exposure
Gray Level Value
256 (8 bit)
215
15,000
69
200
256 (8 bit)
107
15,000
140
100
4096 (12 bit)
13.4
15,000
1119
200
4096 (12 bit)
6.7
15,000
2238
ISO Rating
Total Gray Levels – Pixel Depth
55,000
100
27,500 55,000 27,500
high ISO values (800, 1600, and 3200), was more sensitive to light than low-speed film with low ISO values (50, 100, 200, and 400). In consumer digital cameras, low light is more visible to our eyes at high ISO settings, but instead of actually making the chips more sensitive to light at high ISO settings, higher ISO settings simply reduce the SAT or FWC. For example, if an unaltered SAT of around 55,000 has an ISO rating of 100, then a halved SAT has a rating or 200, and so on, doubling the ISO rating and halving the SAT with each step. Because higher ISOs have fewer electrons per gray level, they generate a higher gray level per exposure than that achieved by an equal exposure with a higher SAT (see Table 5.5). Higher bit depth cameras can handle this drop in pixel depth better than lower bit depth cameras. In this way, the same exposure looks brighter. Back in the days of film, to get a low-light image, you needed to change to a low-light, high-ISO film that was more sensitive to light. These days, you cannot change the chip to a more sensitive one, but you can make the light that the chip does detect more visible digitally. This comes at a cost. As the ISO increases, the SNR decreases, that is, the images become noisier. The camera noise becomes a relatively larger part of the total noise, and total noise occupies a relatively larger proportion of the pixel depth. At low-ISO settings, the camera produces higher SNRs for a given amount of light (Figure 5.17). Furthermore, the drop in SAT that comes with increased ISO reduces the dynamic range of the camera. This is the reason why the dynamic range of the many cell phone cameras varies (see Table 5.1); auto ISO settings (ISOCELL) increase the ISO automatically in low light. However, for scientific-grade cameras, even though the recorded light may not be visible at low ISO settings, measurement of the signal is better at low ISO settings, which provide a larger dynamic range.
95
96
5 Image Capture with Digital Cameras
Figure 5.17 Signal-to-noise ratio (SNR) at increasing light intensities for a complementary metal oxide semiconductor chip with a 52,000 e- SAT at different ISOs compared with the SNR of film. These are the calculated values assuming an ideal square root function of the linear intensity for the chip and on measurements for Fujichrome ISO 50 film. Characteristics of the Fujichrome film SNR are at https://clarkvision.com/articles/ digital.signal.to.noise/index.html. Diagram by L. Griffing.
Scientific-grade cameras detect small changes in intensity in a similar way to changing SAT and ISO in consumer-grade cameras. They change the “zero level” of intensity or offset to some relatively high value and increase the gain (along with read-out noise), to amplify the signal and fill the pixel depth. This technique improves the resolution of the microscope from that achieved by eye, the Rayleigh criterion for resolution, to that achieved by electronics, the Sparrow criterion for resolution (see Section 8.4). Modern digital cameras, consumer- and scientific-grade ones, compare favorably with older film cameras. The CCD has a higher SNR at most ISOs compared with the SNR of low-ISO film, (see Figure 5.17). Furthermore, each roll of film had only one ISO (with the exception of push film, which varied the time of development for different speeds), whereas a range of ISOs is available with digital cameras. Some consumer-grade CMOS cameras and most scientific CCDs and CMOS cameras can further increase their sensitivity, but they do so at a cost in resolution. They do this by combining pixels, pixel binning, making a larger “superpixel” for light collection. For consumer-grade cameras, sometimes pixel binning provides ISO ratings of 3200 and above. Bin size is the number of pixels wide and high that constitute the superpixel bin. A bin size of one is a single pixel. A bin size of two is a 2 × 2 pixel. The size of the superpixel increases with the square of the bin size. Changing the read-out of a chip produces binning, as shown in Figure 5.18 for a CCD.
Figure 5.18 The read-out of binned pixels in a charge-coupled device (CCD). (A) CCD without binning. During read-out, the parallel array shifts up 1 pixel, and then the serial array shifts 1 pixel. (B) CCD with 2 × 2 binning. Larger white line divisions show new, larger pixels. Each pixel is now a 2- × 2-pixel superpixel. To read it out, the parallel array shifts up twice to get the entire new superpixel, and the serial array shifts twice for each superpixel. Diagram by L. Griffing.
5.12 Scientific Camera Chips Increase Light Sensitivity and Amplify the Signal
5.12 Scientific Camera Chips Increase Light Sensitivity and Amplify the Signal Most digital cameras for the consumer electronics market have a higher camera noise floor than scientific-grade cameras because they operate at ambient temperature. Scientific cameras usually operate at lower temperature and hence less noise (see Figure 5.16). For cooling, they often use a Peltier chip, heat transfer chips that cool on one side and heat on the other (Figure 5.19). A heat sink is present on the hot side of the chip to remove heat build-up. Peltier chips also cool computer CPUs. As in most computer CPUs, the heat sinks are metal fins cooled by a fan, achieving 0°C to –20°C for some air-cooled Peltier chips. Alternatively, room temperature water or ethylene glycol can flow across the hot side of the Peltier chip, providing a better heat sink and cooling it to –40°C (see Figure 5.19). Some super-cooled chips use circulating liquid nitrogen instead of Peltier chips. Back thinning also improves the camera SNR (Figure 5.20). Back thinning and backside illumination of CMOS chips improve the sensitivity of cameras to low light. These chips are dropping in price as they become common in smartphones. In backside illumination, light shines directly on the p-type silicon semiconductor. Polysilicon gates and other electronics do not absorb the scarce photons. This increases the quantum efficiency of the chip (see Figure 5.13). Signal is a direct function of quantum efficiency, and this type of camera produces more signal. Many cell phone camera chips are back illuminated. Another approach is to increase the amplification of the signal. The EMCCDs (Figure 5.21) are a modification of the frame transfer CCD (see Figure 5.5). They also use an unmasked parallel array, but instead of shifting to another masked parallel array, they shift to a masked electron multiplication serial register, whereby every time the electrons move from
Figure 5.19 Construction of a cooled charge-coupled device (CCD) that uses electronic cooling with a Peltier chip that cools on the camera chip side and warms on the side of the heat sink. The efficiency of the heat sink dictates the efficiency of the cooling, so besides air-cooling with fans, some use liquid pumped across the cooling fins. Diagram by L. Griffing.
Figure 5.20 (A) The difference between a back-thinned charge-coupled device (CCD) and a regular CCD. In a back-thinned CCD, the light comes directly into the p-type silicon semiconductor without traversing the gates. Diagram by L. Griffing.
97
98
5 Image Capture with Digital Cameras
Figure 5.21 The design of an electron multiplication charge-coupled device (CCD) or electron multiplication CCD. This is a form of frame-transfer CCD. The data in the imaging array transfer to a masked storage array, either a standard serial read-out register or a read-out register that amplifies the signal with electron multiplication. Diagram by L. Griffing.
one potential well to the next, the number of electrons increases. This greatly amplifies the signal on chip but also potentially produces more noise from the process of electron multiplication. Consequently, the camera noise in EMCCDs is greater than in other cooled CCDs. As in the frame transfer design, the information gates rapidly into the storage serial array, quickly making another exposure on imaging array possible. Consequently, EMCCDs have faster frame rates than full-frame CCDs. EMCCDs have larger pixel sizes. The larger pixel size also increases the sensitivity by providing a larger area for light collection, giving a larger SAT. However, as discussed earlier, larger pixel sizes also mean reduced resolving power. Another development that enhances the signal, but only used in scientific CCDs and CMOS chips, is electron bombardment CCDs (EBCCDs) (Figure 5.22), or electron bombardment CMOS (EBCMOS). In EBCCDs, back-thinned direct electron-detecting CCD receives photoelectrons produced by a photocathode (also in photomultiplier tubes; see Figure 6.12, Section 6.4). The kilovolt voltage difference between the photocathode and the CCD accelerates the photoelectrons and amplifies the current produced by light. This makes it possible to detect single photons. However, CCDs have a relatively short lifetime when exposed to direct illumination with electrons. EBCCDs have a finite lifetime of about 1012 counts/mm2, making it important to shield the camera from intense light when powered. The high-voltage electrons do less damage to EBCMOS chips. Modified CMOS chips are a foundational technology for high-resolution cryo-electron microscopy (see Section 5.13). Another layer of image intensification couples high-speed CMOS cameras to image intensifiers. Commercial scientificgrade cameras that use a sealed photocathode, like the EBCCD, combined with a microchannel plate, provide frame
5.13 Cameras for Electron Microscopy With Regular Imaging Chips or With Modified CMOS
Figure 5.22 An electron-bombarded charge-coupled device (CCD) or EBCCD. The photocathode makes photoelectrons, which then are transferred to the p-type silicon substrate of a back-thinned CCD. The number of electrons produced by the photocathode depends on the voltage. Higher voltages amplify the number of electrons. Diagram by L. Griffing.
rates of 1–5 MHz. A microchannel plate is a series of microtubes about 6 µm in diameter with an internal coating that can amplify the electrons, firing them into a phosphor screen under high voltage that then emits light captured by the CMOS camera via fiber-optic coupling. These are very sensitive, very fast cameras (HiCam Fluo; see Table 5.1).
5.13 Cameras for Electron Microscopy Use Regular Imaging Chips after Converting Electrons to Photons or Detect the Electron Signal Directly with Modified CMOS Probably the last imaging technology to abandon film and use silicon chips was transmission electron microscopy because electrons damage CCDs. A solution to this problem is to have electrons produce light in a scintillator, collect the photons produced by the electrons, and project them onto the CCD or CMOS chip (Figure 5.23). There are two different commercially available designs for collecting light from the scintillator, lens collection, or fiber-optic collection. Fiber-optic collection provides higher resolution (Figure 5.24) but is slightly more expensive. The detective quantum efficiency (DQE), the ratio of the squares of the SNRout and SNRin, is a measure of how much these intermediate optics degrade the signal. The DQE changes with spatial frequency, and as a convenience, expressing it at half the Nyquist frequency (see Section 1.5) reports it at the frequency of the image projected onto pixels of the chip that adequately samples the image (i.e., at the Nyquist frequency). When compared with film, the DQE of scintillator or fiber-optic cameras is only slightly better than film at typical transmission electron microscope acceleration voltages for plastic sections (80–100 keV). However, when using higher voltages (e.g. 300 keV), a common value for single particle cryoEM (see Section 19.14), in which signal is intrinsically low from the low-contrast specimens, the DQE of scintillator or fiber-optic coupled cameras diminishes from 35% to 7%–10% compared with film, with a DQE of 30%–35%. Consequently, single-particle analysis projects used film prior to 2012.
99
100
5 Image Capture with Digital Cameras
Figure 5.23 Transmission electron microscope digital camera configurations. (A) Phosphor/scintillator-coupled charge-coupled device. The electron beam excites the scintillator and produces light and either use lens coupling or fiber-optic coupling collect the light. Diagram by L. Griffing. (B) Direct electron detector with a modified complementary metal oxide semiconductor chip. Adapted from McMullan, G. et al. 2016. Direct electron detectors. Methods in Enzymology 579: 1–17. DOI/10.1016/bs.mie.2016.05.056.
Figure 5.24 Comparison of images of a renal sample taken with two different types of lens coupling with a scintillator. (A) Fiberoptic coupling. (B) Lens coupling. From Gatan Inc. https://www.gatan.com/techniques/imaging.
Modified CMOS chips are direct electron detectors, the cameras of choice in high-resolution transmission electron microscopy of organic materials. They have DQEs up to double that of film. They typically detect electrons in a modified epilayer of the silicon p-layer residing over the thicker p-type silicon found in a typical MOS chip (see Figure 5.23B). The epilayer has less of the boron or gallium trivalent doping, so there is a higher relative density of tetravalent silicon atoms. When the electron from the electron beam enters, it generates a new electron–hole pair by causing ejection of an electron from silicon, with several such interactions causing electrons to accumulate in the potential well of the p-layers. The potential on the polysilicon gate attracts the new electron to the n-layer when the chip is read out. These chips, because they are CMOS and independently addressable, read out very fast. This is useful because during longer exposures, the frozen hydrated samples move under the electron beam. These cameras in non-counting mode capture 17–32 frames per second. The final image is the result of aligning and summing the images taken during exposure period (Figure 5.25).
5.14 Camera Lenses Place Additional Constraints on Spatial Resolution
Figure 5.25 (A) Average of frames of rotavirus particles without alignment of images following motion of sample induced by the electron beam. (B) Average of frames with alignment. From Cartoni, M. and Saibil, H.R. 2015. Cryo electron microscopy to determine the structure of macromolecular complexes. Methods 95: 78–85. doi: 10.1016/j.ymeth.2015.11.023. CC BY 4.0.
5.14 Camera Lenses Place Additional Constraints on Spatial Resolution We now move away from just the camera chip and consider other elements in the camera. The spatial frequency of the pixels in a chip sets a limit on the resolving power of the system. All of the elements in the light path, including the imaging chip and lenses, can limit the resolving power. After resolving power is lost, it is not recoverable, making an imaging system only as good as the element in it with the least resolving power. Figure 5.26 shows the lens system and internal optics of a typical consumer-grade DSLR. The way the lens, viewfinder optics, and camera chip are designed provides the photographer with a view through the imaging lens, whether it is a
Figure 5.26 Digital single-lens reflex camera with a zoom lens. Different color ray lines through the lens and body of this camera reveal the optical paths to the eyepiece or viewfinder (yellow), to the autofocus sensor (red), and to the exposure meter (green). When a picture is taken, mirrors 1 and 2 lift out of the path, the shutter opens, and the image chip is exposed with light. The Canon cameras have a signal processing board that includes a DIGIC (digital imaging core) chip that can convert analog to digital signals, compress images, stretch the image histogram, process video, and apply noise reduction. There are separate processing boards or chips for autofocus and aperture adjustment in the lens. Modification of images © Canon Inc.
101
102
5 Image Capture with Digital Cameras
zoom, variable focal length, lens or a prime, fixed focal length, lens. There are separate imaging pathways to an autofocus sensor (red line to AF in Figure 5.26) and exposure sensor (green line to meter in Figure 5.26). A set of mirrors and prisms focuses the incoming image through the eyepiece (yellow line in Figure 5.26) or to a camera chip, which previews the scene through the lens on a back-mounted backlit liquid crystal display (LCD). LCDs can preview exposure while also providing a greater flexibility of use; the camera does not have to be right in front of the photographer’s face. Cell phone cameras use a backlit LCD screen instead of a viewfinder. In some cameras, the preview mode (sometimes to a remote computer rather than the back LCD) provides a high-resolution image produced by the image chip after the shutter opens and mirrors 1 and 2 (see Figure 5.26) flip out of the way. Diffraction, the scattering of light by an edge, sets an upper limit on the resolving power of imaging lenses (but see superresolution imaging in Chapter 18), whether they are telephoto or wide-field lenses on a DSLR or the objective and condenser lenses on a research microscope. In addition, particularly for photographic lenses, aberrations, or lens errors, set an upper limit on resolving power. Normally, light travels in straight lines. However, when a photon hits an edge, it scatters, or diffracts, through wide angles. As shown in Figure 5.27, when light hits the edge of a razor blade, it produces fringes of bright and dark lines
Figure 5.27 Diffraction of light detected in the shadow of a razor blade, producing Fresnel fringes. Light hitting the front of the razor reflects from the razor or the razor absorbs it, creating a shadow in the shape of the blade. If light wave 1 interacts with the edge of the blade and is scattered by diffraction, it produces diffracted waves 1ʹ and 1ʺ. If the diffracted wave (light wave 1ʹ) has an optical path difference from the undiffracted wave (light wave 2) of whole integer multiple of the wavelength plus 1/2 wavelength, then destructive interference takes place, creating a dark band. If the diffracted wave (light wave 1ʺ) has an optical path difference of a whole integer multiple of the wavelength of the undiffracted wave (light wave 3), then constructive interference takes place, creating a bright band. Diagram by L. Griffing.
5.14 Camera Lenses Place Additional Constraints on Spatial Resolution
surrounding the shadow of the blade. These are Fresnel fringes. They occur when the scattered light either constructively or destructively interferes with the illuminating light. Constructive interference occurs when the scattered light wave peak (wave 1ʺ, Figure 5.27) matches the peak of the illuminating light (wave 3, Figure 5.27), producing a bright band a certain distance from the razor’s edge. Destructive interference occurs when the diffracted light trough (wave 1ʹ, Figure 5.27) matches the illuminating light peak (wave 2, Figure 5.27), producing a dark band. The bands farther away from the blade are higher orders of diffraction, while those close to the blade are lower orders of diffraction. Sharper blades with finer edges produce higher orders of diffraction. Light coming from a very small hole produces a central Airy disk with an Airy pattern of ring-shaped bands of diffraction surrounding it. There is more on this in Section 8.4, but for now, it is important to understand that smaller spots and finer edges produce higher orders of diffraction and scatter light more. If a small object produces a lot of light scatter at high angles, then to capture it, a lens requires a wide aperture, the lens opening, expressed as the angle of light acceptance by the lens. Hence, the aperture of the lens system determines resolution. The f-number of a DSLR lens is a function of its aperture. The f-number is the ratio of the focal length (see Section 8.1) of the lens to its aperture. There is an inverse relationship between the f-number and aperture diameter (Table 5.6). High
Table 5.6 Relationship Between Aperture Opening, f-Number, the Amount of Light Passing Through the Opening, and the Diffraction Limitations and Aberration Limitations on Resolution and Resolving Power with Green Light (530 nm).a
Aperture Opening
a
f Number
Light Passed
DiffractionLimited
AberrationLimited
Resolution Resolving Power
Resolution Limiting Power
x1
Limiting 28.5 µm 35.2 lp/mm
Not limiting
f/22
x2
Limiting 20.7 µm 48.3 lp/mm
Not limiting
f/16
f/11
x4
Not limiting 14.2 µm 70.4 lp/mm
Limiting 18 µm 54 Ip/mm
f/8
x8
Not limiting 10.3 µm 97.1 lp/mm
Limiting 21 µm 48 Ip/mm
f/5.6
x16
Not limiting 7.2 µm 138.9 lp/mm
Limiting 22 µm 45 Ip/mm
f/4
x32
Not limiting 5.2 µm 192.3 lp/mm
Limiting 25 µm 40 Ip/mm
f/2.8
x64
Not limiting 3.6 µm 277.8 lp/mm
Limiting 33 µm 30 Ip/mm
f/2
x128
Not limiting 2.6 µm 384.6 lp/mm
Limiting 40 µm 25 Ip/mm
Diffraction-limited resolution is calculated as the diameter of the first-order Airy disk 2.44 *f number *530 nm (see Chapter 8). Aberration-limited resolution is from an average example zoom lens for a digital single-lens reflex camera. The sweet spot of the aperture for this lens is f11, where the resolving power is highest (light blue numbers).
103
104
5 Image Capture with Digital Cameras
f-numbers mean small apertures. Because small things produce high orders of diffraction, they can be resolved better at low f-numbers (see Table 5.6). Each ascending f-number setting, or f-stop, decreases the light by half. However, just as described for resolution of the eye (see Section 4.1), working against aperture is aberration. At these small f-numbers and wide apertures, lens aberrations put an upper limit on resolution because the image occupies more of the lens, with all of the lens imperfections. Figure 5.28 plots the relationship of lens aperture to resolving power of two typical camera lenses; neither magnifies (telephoto lens) nor reduces (macro lens) the image. The point of highest resolution of each of the lenses is where diffraction stops limiting and aberration starts limiting resolution. This is the sweet spot of the lens. The better lens has wider aperture at its sweet spot, f/8, because it has better correction for aberrations (higher dashed line). At the sweet spot, the better lens has a resolving power of about 60 line pairs per millimeter, or 16-µm resolution, while the good lens has a resolving power of about 50 lines per millimeter, or 20-µm resolution. Most smartphones have very large apertures (f/1.8), and their resolution is generally aberration limited. For the Nyquist criterion (see Section 1.6) to be satisfied while recording the projected image by the camera chip, the pixels on the chip have twice the spatial frequency of the image produced by the lens (i.e., they have pixels half the size of the smallest things resolved with the lenses). All of the consumer-grade cameras in Table 5.1 have pixels less than 8 µm. This pixel size would not limit the resolution provided by the 16-µm resolution lens in Figure 5.28. Two of the scientificgrade cameras in Table 5.1, the Prime 95B and the ImagEM X2, do not have small enough pixels. However, magnification, which reduces the spatial frequency of the image, overcomes the limitation set by these pixel sizes. Adding an extension ring (see Section 8.5) to the lens magnifies its projected image. Magnification of the smallest 16-µm features to 22 µm would make them large enough for capture by the 11-µm pixel of the Prime 95B. This demonstrates that it is the combined resolution of the lenses and the imaging chip that dictate the theoretical resolution of the imaging system. It also demonstrates an important principle of high-resolution photography: use magnification to match the size of the diffraction-limited spot generated by the imaging system to double (Nyquist criterion!) the size of the pixel in the imaging chip. This is particularly true in microscopy. Diffraction also limits the resolution of light microscopy. In microscopes, lens aperture is the numerical aperture (NA), the half-angle of light acceptance by the lens. Lens design and its immersion medium influence its NA (see Section
Figure 5.28 Comparison of the relationship between photographic lens aperture and resolving power in two typical non-zoom digital single-lens reflex camera lenses, one good and one better. At the higher f-numbers (small apertures), diffraction limits the resolving power. At the lower f-numbers, aberrations limit the resolving power of the lens. Adapted from Williams, J.B. (1990) Image Clarity: High Resolution Photography. Focal Press, Boston, MA. pp. 126, 171.
5.14 Camera Lenses Place Additional Constraints on Spatial Resolution
8.3). High-NA lenses collect higher orders of diffraction and resolve smaller things (see Table 5.7). However, the resolution produced by low-magnification lenses (40× and 10×) cannot be adequately captured by a camera chip with, for example, a 6-µm pixel (see Table 5.7). Higher magnification lenses can usually resolve smaller things because they project a larger diffraction-limited spot on the CCD, thereby matching the size of the smallest resolved point to double the size of the pixel on the imaging chip. Magnification lowers the spatial frequency of the projected image so that the camera can capture it. Any higher magnification would be empty magnification. Nevertheless, the aperture, not the magnification, dictates the resolving power of microscope lenses, as shown by graphing the resolving power of the lens against its ability to hold contrast, its relative modulation (Figure 5.29). The graph of modulation transfer functions (MTFs) shows the tradeoff between contrast and resolving power, as does Figure 4.4 (see Section 4.1) for the eye. However, Figure 5.29 plots the relationship on a linear, rather than log, scale and at frequencies a thousand-fold higher, showing line pairs or cycles per micrometer rather than per millimeter. For comparison purposes, Table 5.7 Relationship Between Numerical Aperture, Resolution, and Pixel Size Required by the Recording Imaging Chip in Microscopy.a
Objective
Resolution (µm)
Size of Smallest Resolved Point on CCD (µm)
Required Camera Pixel Size Satisfying the Nyquist Criterion (µm)
10.5
100×, 1.3 NA
0.21
21
60×, 1.3 NA
0.21
13.2
40×, 1.3 NA
0.21
40×, 1.3 NA, with 2.5× relay optic
0.21
8.4
6.6 4.2
21
10.5 10.9
32×, 0.4 NA
0.68
21.8
10×, 0.25 NA
1.1
11
5.5
a
In compound microscopes, the relay optic referred to in row 4 is the projection lens. NA, numerical aperture.
Figure 5.29 A comparison of the resolving power and contrast of microscope lenses at various apertures and a photographic lens at its f/8 sweet spot. This curve is a modulation transfer function. It is the ability to hold contrast: Modulation = (Imax – Imin)/(Imax + Imin), in which Imax is maximum intensity in the image and Imin is the minimum intensity in the image. Adapted from Inoue, S. 1986. Video Microscopy. Plenum Press. New York, NY. p. 124, and from Rosenhauer, K. and Rosenbruch, K.J. 1960. Die optischen Bildfehler und die Ubertragungsfunktion. Optik 17: 249–277.
105
106
5 Image Capture with Digital Cameras
an MTF curve of a prime photographic lens at its f/8 sweet spot is included, a lens with about a fivefold higher resolving power than the human eye. Section 8.9 has a more complete discussion of the MTF.
5.15 Lens Aperture Controls Resolution, the Amount of Light, the Contrast, and the Depth of Field in a Digital Camera Reducing the lens aperture on a camera reduces the amount of light reaching the camera chip. Increasing the exposure time or increasing the ISO compensates for aperture reduction. In digital cameras, higher ISOs produce intrinsically more contrast because with decreased pixel depth, fewer gray values represent an image. Likewise, reducing the lens aperture increases image contrast. This trade-off between contrast and resolution is a hallmark of imaging systems. For example, the MTF for a brightfield microscope changes by closing down the condenser aperture, making the cut-off resolving power lower and the contrast at low resolving power higher (Figure 5.30). See Section 9.9, Figure 9.23, for the parts of upright and inverted microscopes and for a more detailed discussion of contrast and tone in brightfield microscopes. As the aperture increases, things in the distance become blurry. This range of distance over which objects remain in focus is the depth of field. As shown in Figure 5.31, narrow apertures produce very large depths of field. Even though the example in Figure 5.29 is for a photographic lens, microscope lenses show the same property. Consequently, low-NA lenses have a much larger depth of field than large-NA lenses, and depth of field increases when closing down the condenser aperture. Consult Chapter 9 for the optical principles behind this. The light field camera, the Lytro, allows focusing after taking an image! It does this by breaking down the original image into thousands of images by a sheet of microlenses between the camera lens and the camera chip. The camera chip records each of these smaller images. Software then combines the images from only the center of lens – effectively low aperture – to generate an image with large depth of field or from the entire opening of the lens to create an image with a very shallow depth of field.
Figure 5.30 The effect of shutting down the condenser on increasing contrast and decreasing resolving power as shown with a modulation transfer function plot. Adapted from Inoue, S. 1986. Video Microscopy. 1986. Plenum Press, New York, NY. p. 124.
5.16 Relative Magnification with a Photographic Lens Depends on Chip Size and Lens Focal Length
Figure 5.31 A comparison of the depth-of-field produced by different aperture settings on a camera lens. Whereas wide apertures produce shallow depths of field, very narrow apertures produce deep focus to near infinity. Adapted from Davis, P. (1976) Photography. Wm. C. Brown Company Publishers, Dubuque, IA.
5.16 Relative Magnification with a Photographic Lens Depends on Chip Size and Lens Focal Length As people change digital cameras, they are often surprised that same lens gives different magnifications with different cameras. The reason for this is that the area of the digital camera chip changes (see Table 5.1). As shown in Figure 5.32, smaller chip sizes have smaller fields of view. A print of the image from a smaller chip has more magnification than a similar size print from a larger chip. The increase in magnification is the ratio of the diagonal of the larger chip to that of the smaller chip. This increase in magnification by the same lens with a different size chip is the crop factor because the small chip crops the field of view or the scale factor from the ratio of diagonals. It is also a focal length multiplier, as can now be described. DSLR lenses with focal lengths longer than 50 mm, the long or telephoto lenses, magnify the object, while focal lengths shorter than 50 mm, the macro or wide-angle lenses, reduce the object. The focal length multiplier is the factor by which the effective focal length of a lens changes when switching from a camera with a large chip to a camera with a smaller chip. It is the same number as the ratio of diagonals. For example, if the ratio of diagonals is 1.6, then a 50-mm lens mounted on a camera with the reference chip will effectively be an 80-mm lens when mounted on the camera with a smaller chip. Section 8.5 covers the relationship of focal length and magnification in more detail. Because longer lenses project a smaller part of the field onto the camera chip, they gather less light and have smaller apertures. Consequently, telephoto lenses have a smaller maximum aperture or Figure 5.32 Image projected by the same lens on base aperture, f/5.6 or above, than do macro lenses, f/2 to f/2.8. The different chip sizes. The blue box is a 36- × 24-mm wider the base aperture (lower f-number), the more corrected the chip (Nikon D3) or film. The light orange box is a lens has to be to avoid aberrations and hence the more expensive it 23- × 1- mm chip (Canon 350D). Photo of a sandhill becomes. crane by L. Griffing.
107
108
5 Image Capture with Digital Cameras
Annotated Images, Video, Web Sites, and References 5.1 Digital Cameras Are Everywhere A film from 2006, A Scanner Darkly, based on a Philip K. Dick dystopian novel from 1977, is about intrusive high-tech video surveillance during a drug addiction epidemic. Prescient. The director, Richard Linklater, uses rotoscoping to animate the film, removing it one step from the reality of the digital image.
5.2 Light Interacts with Silicon Chips to Produce Electrons For an explanation of the silicon photodiode and its discovery, see Shockley, W. 1964. Transistor Technology Evokes New Physics. Nobel Lectures, Physics 1942–1962. Elsevier, Amsterdam. Another treatment of silicon p-n junction transistors and CCDs aimed at the audience of microscopists is in Wayne, R. 2009. Light and Video Microscopy. First Edition. Elsevier, Amsterdam, Netherlands. pp. 208–211. Table 5.1 values for some consumer cameras from http://www.clarkvision.com/articles/digital.signal.to.noise. The Samsung ISOCELL information is from the ISOCELL Wikipedia site: https://en.wikipedia.org/wiki/ISOCELL. Hamamatsu camera specifications at https://camera.hamamatsu.com/jp/en/product/search/C15440-20UP/index.html, https://www.hamamatsu.com/resources/pdf/sys/SCAS0092E_IMAGEMX2s.pdf, and https://www.hamamatsu.com/resources/ pdf/sys/SCAS0134E_C13440-20CU_tec.pdf. Oxford Instruments Andor camera specifications at https://andor.oxinst.com/products/fast-and-sensitive-scmos-cameras. Teledyne Photometrics camera specifications at https://www.photometrics.com/products/prime-family/prime95b#p7AP4c1_3 and https://www.photometrics.com/products/retiga-cmos-family/retiga-e7-cmos. There is more on color CMOS sensors at https://micro.magnet.fsu.edu/primer/digitalimaging/cmosimagesensors.html. https://www.microscopyu.com/digital-imaging/introduction-to-charge-coupled-devices-ccds is an excellent resource for information on CCDs. Another site for learning about CCDs is https://hamamatsu.magnet.fsu.edu/articles/microscopyimaging.html. See the construction of a CCD at https://hamamatsu.magnet.fsu.edu/tutorials/flash/digitalimagingtools/buildingccd/ index.html.
5.3 The Anatomy of the Camera Chip Limits Its Spatial Resolution For another explanation of three-phase clocking, with links to four-phase and two-phase clocking, see https://hamamatsu. magnet.fsu.edu/articles/threephase.html. There is an applet animation at https://hamamatsu.magnet.fsu.edu/tutorials/java/threephase.
5.4 Camera Chips Convert Spatial Frequencies to Temporal Frequencies with a Series of Horizontal and Vertical Clocks For a simple animation of the read-out of CCDs, see https://hamamatsu.magnet.fsu.edu/tutorials/flash/digitalimagingtools/ ccdclocking/index.html. For a description of blanking pulses using analog video signal, see https://micro.magnet.fsu.edu/primer/digitalimaging/ videobasics.html. Details of the timed read-out that determine frame rates are in Fellers, R. and Davidson, M. Digital camera readout and frame rates. https://micro.magnet.fsu.edu/primer/digitalimaging/concepts/readoutandframerates.html.
5.5 Different Charge-Coupled Device Architectures Have Different Read-out Mechanisms For more discussion of different architectures, see Inoue, S., and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY, and Spring, K. and Davidson, M. Electronic imaging detectors. https://micro.magnet.fsu.edu/primer/ digitalimaging/digitalimagingdetectors.html.
Annotated Images, Video, Web Sites, and References
5.6 The Digital Camera Image Starts Out as an Analog Signal that Becomes Digital Details of analog-to-digital conversion are in Spring, K. and Davidson, M. Digital image acquisition. https://micro.magnet. fsu.edu/primer/digitalimaging/acquisition.html.
5.7 Video Broadcast Uses Legacy Frequency Standards Different legacy frequency standards are in Inoue, S. 1985.Video Microscopy. Plenum Press, New York, NY. pp. 164–165.
5.8 Codecs Code and Decode Digital Video A selection of video containers and codecs is at https://docs.blender.org/manual/en/2.79/data_system/files/media/video_ formats.html.
5.9 Digital Video Playback Formats Vary Widely, Reflecting Different Means of Transmission and Display A discussion of digital video playback formats and standards is in Marques, O. 2011. Practical Image and Video Processing using MATLAB. Wiley, Hoboken, NJ. pp. 521–525.
5.10 The Light Absorption Characteristics of Silicon, Its Filters and Coatings, and Camera Design Determine the Wavelength Sensitivity of the Camera Chip Figure 5.13. Quantum efficiencies for CCDs are in Inoue, S., and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY. pp. 310–312. Adapted from Figure 7, SITe 2048 × 4096 Scientific-Grade CCD. 1995. Scientific Technologies Inc., Beaverton, OR. This is at http://instrumentation.obs.carnegiescience.edu/ccd/parts/ST-002a.pdf. Quantum efficiency of vision is in Pelli, D.G. 1990. The quantum efficiency of vision. In Vision: Coding and Efficiency. Ed. by C. Blakemoor. Cambridge University Press, Cambridge, UK. Figure 5.14A is © Greta Macbeth. Used with permission. Instructions for basic use of this card are in Chapter 10. Figure 5.14B is at https://www.imatest.com/docs/colorcheck. This web site also gives instructions for using their software to check color accuracy. Another color checking system is at https://www.xrite.com/categories/calibration-profiling/ colorchecker-targets.
5.11 Camera Noise and Potential Well Size Determine the Sensitivity of the Camera to Detectable Light Panel 5.3 and Table 5.5 contain information from Moomaw, B. 2007. Camera technologies for low light imaging: overview and relative advantages. Methods in Cell Biology 81: 251–283. Figure 5.16 combines Figures 7–20 and 7–48 in Inoue, S., and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY. Figure 5.17 uses the ideal square root relationship between signal and noise. Characteristics of the Fujichrome film SNR are at https://clarkvision.com/articles/digital.signal.to.noise/index.html.
5.12 Scientific Camera Chips Both Increase Sensitivity to Low Light and Amplify the Existing Signal Further information about EMCCDs and EBCCDs are in Moomaw, B. 2007. Camera technologies for low light imaging: overview and relative advantages. Methods in Cell Biology 81: 251–283. The MTF of EBCCDs is at https://hamamatsu.magnet.fsu.edu/articles/ebccd.html. On-chip electron multiplication and noise in EMCCDs is at https://hamamatsu.magnet.fsu.edu/articles/emccds.html.
5.13 Cameras for Electron Microscopy Use Regular Imaging Chips after Converting Electrons to Photons or Detect the Electron Signal Directly with Modified CMOS Gatan Inc. https://www.gatan.com/techniques/imaging has more information on the scintillator and fiber-optic camera chip configurations and resolutions.
109
110
5 Image Capture with Digital Cameras
For the importance of direct electron detectors for single particle analysis, see Cheng, Y. 2015. Single-particle cryo-EM at crystallographic resolution. Cell 161: 450–457.
5.14 Camera Lenses Place Additional Constraints on Spatial Resolution Figure 5.26 is from the cutaway image of the Canon Rebel xs by Canon Inc. A very cool animation of assembling the Canon 10D is at https://www.youtube.com/watch?v=6-HiBDLVzYw. For more about diffraction, see Chapter 9 and https://micro.magnet.fsu.edu/primer/lightandcolor/diffractionintro.html. An early article on the Fresnel fringes produced by razors is Brush, C.F 1913. Some diffraction phenomena: superimposed fringes. Proceedings of the American Philosophical Society 52: 1–9. Table 5.2 from basic principles and the MTF curves for typical lenses. A more detailed discussion is in Osuna, R. and Garcia, E. Do sensors outresolve lenses? https://luminous-landscape.com/do-sensors-out-resolve-lenses. Table 5.4 contains information from Inoue, S., and Spring, K. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY. p. 314. Figure 5.29 is partly from Inoue, S. 1985. Video Microscopy. Plenum Press, New York, NY. p. 124, and the MTF curve for at f/8 is from Rosenhauer, K. and Rosenbruch, K.J. 1960. Die optischen Bildfehler und die Ubertragungsfunktion. Optik 17: 249–277.
5.15 Lens Aperture Also Controls the Amount of Light, the Contrast, and the Depth of Field in a Digital Camera Figure 5.30 is adapted from Inoue, S. 1985. Video Microscopy. Plenum Press, New York, NY. p. 124. Figure 5.31 is adapted from Davis, P. 1975. Photography. Wm. C. Brown Company Publishers, Dubuque, IA.
5.16 Relative Magnification with a Photographic Lens Depends on Chip Size and Lens Focal Length For a comprehensive list of existing Canon field-of-view crop factors, see https://www.the-digital-picture.com/CanonLenses/Field-of-View-Crop-Factor.aspx.
111
6 Image Capture by Scanning Systems 6.1 Scanners Build Images Point by Point, Line by Line, and Slice by Slice Scanning differs from standard photography. Most photographic cameras are widefield. They record the entire image directly on a recording chip. Scanners acquire the final image piece by piece. Scanners use image-ordered acquisition. They acquire the elements of the image in a specific order. Laser-scanning confocal microscopy, scanning electron microscopy, and atomic force microscopy acquire images point by point along a raster line. Flatbed scanners acquire the images raster line by raster line. Computed tomography (CT; also called computer-assisted tomography [CAT]) scanners acquire images from multiple angles and use tomography to make two-dimensional (2D) slices and assemble the slices in three dimensions (3D). This chapter deals with all of these scanners, starting with the scanner that many use on a daily basis, the flatbed scanner (Figure 6.1). A flatbed scanner has many of the acquisition characteristics of all scanners and serves to illustrate important principles. It has advantages for archiving historical biological imaging. Furthermore, flatbed scanners have many applications in biological and biomedical imaging.
6.2 Consumer-Grade Flatbed Scanners Provide Calibrated Color and Relatively High Resolution Over a Wide Field of View The most common office flatbed scanners are reflective scanners, using reflected light from the same side as the detector, or epi-illumination (see Figure 6.1A). They use a narrow strip charge-coupled device (CCD) or complementary metal oxide semiconductor (CMOS) to acquire each line. A light source beneath the glass plate or platen of the flatbed scanner illuminates each line and the reflected light reflects off mirrors to a lens system that reduces the image of the raster line and focuses it onto a CCD (see Figure 6.1A). The size of the sensor pixels (Table 6.1) and the reducing power of the lens determine the resolution of this type of flatbed scanner. The pixels in consumer-grade scanners are not usually square. The pixel dimension of some scanners is, for example, 2.7 × 5 μm, width to height. Because they are almost double the size in height, there is always lower resolving power in the vertical dimension. The size of the imaging CCD chip is about an inch, 26.5 mm, wide by 6 rows, or 30 μm high. The six rows are to capture color, with red, green, and blue filters over each pair of rows. The reduction from an 8.5-inch-wide (216-mm-wide) platen to a 1.1-inch (26.5-mm) chip is 6.8-fold. Hence, pixel size at the platen is (6.8 × 2.7) or 21 μm. Consulting Table 6.1, this gives the 1200-ppi resolution value used to market many scanners. However, to sample a photograph or object on the platen using the Nyquist criterion, the smallest object is 42 μm, and the scanner can resolve 600 ppi or 23.6 pixels per mm in the horizontal direction. This is usually quite adequate for scanning photos because as described in Section 1.3, Table 1.4, color prints at 600 ppi have excellent legibility. To increase the effective resolution of the scanners, manufacturers offset the horizontal rows by half a pixel (or in some cases, with more horizontal rows, a quarter of a pixel), producing a higher spatial frequency if each real pixel contributes to the surrounding four pixels in the read-out. Using this subpixel read-out doubles the horizontal resolving power according to industry standards to 2400 ppi and increases the vertical resolving power. In practice, it decreases some aliasing in scans of 600-ppi photos, but in scans of 1200-ppi photos, the 40- to 50-µm objects blur. Newer effective resolutions to Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
112
6 Image Capture by Scanning Systems
Figure 6.1 Internal view from below of a reflective flatbed scanner (A), a contact flatbed scanner (B), and a transparency or transmissive flatbed scanner (C). (A) Design of the color reflective charge-coupled device (CCD) flatbed scanner. A light source illuminates the line to be scanned, and the reflected light is angled by mirrors to a lens system that focuses the image on a linear CCD array and breaks down the light into red, green, and blue (RGB) components. That array is read out prior to the light source moving to the next line for read-out. (B) Design of a contact image sensor scanner. (C) The transparent media adapter for scanning film or other transparencies is basically a scanning light in the lid of the scanner. Scanners used for 35-mm film have a smaller format but also shine light through the film to scan it. The FARE filter is used by Canon to take an infrared image of the transparency, which records the dust and not the transparency image. The image is then digitally subtracted from the transparency image. CMOS, complementary metal oxide semiconductor; LED, light-emitting diode. Diagram by L. Griffing.
Table 6.1 Resolving Power of Scanners in Pixels per Inch (ppi) and the Size of the Pixels in Micrometers. Resolving Power (ppi)
9600
Pixel Size (µm)
2.6
4800
5.3
2400
10.6
1200
21.2
600
42.4
300
84.8
200
127
150
167.6
100
254
72
335.2
6.2 Consumer-Grade Flatbed Scanners Provide Calibrated Color and Relatively High Resolution Over a Wide Field of View
9600 ppi are available, theoretically making it possible to resolve 5-µm objects. Pixel interpolation programs that come with these scanners offer 19,200 ppi, but pixel interpolation does not improve resolution. The application of consumer-grade (or office) reflective scanners in biological and biomedical research includes shape and color or grayscale analysis of content requiring relatively high resolving power over a large field of view. In molecular biology, the use of gold nanoparticle–labeled antibodies or ligands bound to nitrocellulose dot blots of cells or molecules is possible after silver enhancement (Figure 6.2). Likewise, the calibration and representation of color are part of the software associated with the more expensive consumer-grade reflective flatbed scanners. These have a large enough depth of field to focus on objects several millimeters above the platen and can image samples in inverted petri dishes or upright multi-well plates. An example is the colorimetric analyses of signature volatile gasses emitted by bacteria in inverted petri dishes (Figure 6.3). Reflective scanners, however, do not work like a spectrophotometer because they measure reflection, not absorbance, by the sample over a given path length, as required for Beer’s law calculation of concentration (concentration C equals absorbance, A, divided by the pathlength l times the molar extinction coefficient ε, see section 9.8, Panel 9.2, Eqn 9.1). For this, light shining through the sample is necessary. Furthermore, consumer-grade scanners may save images in just the lossy jpeg format (see Section 2.14). This is not a problem for recording the position of a protein band on a gel or a spot on a chromatogram, but it is a problem for quantitative densitometry, measuring the intensity of the stain or relative optical densities across the gel (see Section 7.9). In these cases, when densitometry is desired, the scanner needs to use transmitted light and save the image in RAW, TIFF, or some other lossless format. Some consumer-grade transmitted light scanners have a modified lid for scanning film and other transparencies using transmitted light. The scanning light is in the lid, so the CCD records the light transmitted through the media (see Figure 6.1B). This adapter is necessary for scanning gels, X-ray film, and other objects for densitometry. However, in consumer-grade scanners, the software behind the operation of the scanner may alter the gamma linearity of the CCD and change the read-out between independent scans. Consistency between independent scans is lacking because they often have an automatic gain control that the user cannot adjust. They do not have a digital negative, RAW, read-out, as do most consumer-grade digital single-lens reflex cameras. To access the RAW values reported by the CCD, vendor-independent control of the machine is necessary and is Figure 6.2 Silver-stained dot blots of gold nanoparticle labels. (A) Dot spots of antigen available for some scanners through the open-source Linux package adsorb to the nitrocellulose paper. (B) After SANE (Scanner Access Now Easy). blocking with non-specific protein, incubation A common use for transmitted light scanners is X-ray film densiwith a primary antibody proceeds. (C) After tometry or the dosimetry of film sensitive to ionizing radiation rinsing off the unbound primary antibody, a secondary antibody conjugated to colloidal gold (e.g., Gafchromic EBT film). Ionizing radiation dosimetry is critical recognizes the tail of the primary antibody. (D) Silver to evaluation of medical procedures such as intensity-modulated enhancement of the colloidal gold makes the spot radiation therapy using high-energy photon or proton beams to treat visible. (E) A scan of the array of dots with different patients with cancer tumors. Consequently, evaluations of high-end concentrations of antigen shows different gray levels according the original concentration of the consumer-grade flatbed scanners (recommended for dosimetry by antigen. A–D by L. Griffing. E is adapted from Yeh, the film manufacturer) point out that there are problems with the C.-H., Hung, C.Y., Chang, T.C., et al. 2008. An fidelity of the scan near the edges of each raster scan, the lateral immunoassay using antibody-gold nanoparticle response dependence. In addition, polarizing filters in the scanner conjugate, silver enhancement and flatbed scanner. Microfluids and Nanofluids 6: 85–91. influence the accurate digitization in each color channel if the film
113
114
6 Image Capture by Scanning Systems
Figure 6.3 Inverting a petri dish (A) with a lawn of bacteria on media over a colorimetric sensor produces changes in the colors of the sensor array over time when scanned with color-calibrated scanner (B). A by L. Griffing. B from Carey, J. R. 2011 / American Chemical Society.
is birefringent (see Sections 9.8 and 16.7). Besides scanning in the center of a platen, using transmitted light consumergrade scanners for densitometry requires internal standards or calibration with known concentrations of a standard. Open-source processing software, such as ImageJ, has several plugins for gel densitometry and documentation on measurement (see Section 7.9). An alternative design for reflective scanning uses an LED (light-emitting diode) array directly linked to a CMOS chip via a self-focusing (Selfoc), gradient index (GRIN) lens (see Figure 6.1C). Endoscope microscopy also uses these lenses. This configuration is a contact image sensor (CIS) scanner. The reflected image projects directly onto the CMOS chip. If the object is not in direct contact with the platen, it is out of focus. Hence, this technology has very limited depth of field, and the object needs to be very flat and unwrinkled. However, with no intervening optics or mirrors, it can collect light very efficiently from the sample. CIS scanners weigh less, are thinner, and use less energy (i.e., can be powered by a USB port for mobile flatbed imaging applications). They can have both uniform lighting and high resolution. Although many have a pixel size of 42 μm, or 600 ppi at the platen, CMOS pixel sizes in the 5- to 6-µm range are available, making sampling of 10-μm, 2400-ppi images possible using the Nyquist criterion. These consumer-grade scanners are excellent for recording the position and area of proteins or nucleic acids in gels or stained molecules in thin-layer or paper chromatograms. However, their design does not accommodate wet objects easily, and extra water with bubbles or dirt will compromise the scan and potentially damage the scanner. Be aware of this caveat when adapting all consumer-grade scanners to objects other than dry papers. During line scanning, both consumer- and scientific-grade flatbed scanners illuminate an object for a pre-determined time, the dwell time, the exposure time for the linear array of sensors. Most consumer-grade flatbed line scanners have a fixed, invariable dwell time. Scientific-grade scanners have variable rates of scanning, which change the dwell time. For samples with low intrinsic contrast, slow scans with long dwell times give a better signal-to-noise ratio (SNR) (see Section 5.11), as do longer exposures on cameras. Fast scans produce noisier images. Adjusting the number of passes that the scanner uses per scan can also change the amount of radiation used for imaging, with some high-resolution scans requiring more passes.
6.3 Scientific-Grade Flatbed Scanners Can Detect Chemiluminescence, Fluorescence, and Phosphorescence In consumer-grade flatbed scanners, the color and intensity of illumination are usually not adjustable. However, scientificgrade flatbed scanners use many forms and wavelengths of radiation for illumination and can read out signals “in the dark” using only the light produced by the sample, or luminescence. For example, chemiluminescence produces light from a chemical reaction in the sample. An enhanced chemiluminescence (ECL) reporter (Figure 6.4) is a peroxidase-conjugated secondary antibody that recognizes a primary antibody specific for a protein blotted onto nitrocellulose paper in Western
6.3 Scientific-Grade Flatbed Scanners Can Detect Chemiluminescence, Fluorescence, and Phosphorescence
Figure 6.4 Peroxidase-conjugated secondary antibodies produce enhanced chemiluminescence (ECL) when incubated in luminol and hydrogen peroxide. As in Section 6.1, Figure 6.2, the first step of a Western blot is binding the antigens in a lysate to nitrocellulose paper. A primary antibody raised against the antigen recognizes the antigen after blocking non-specific labeling with a protein solution. A secondary antibody conjugated to horse radish peroxidase (HRP) generates a luminescent signal when incubated in the ECL reagent. Figure adapted from GE Healthcare. 2018. Western Blotting: Principles and Methods. GE Healthcare. GE Healthcare Bio-Sciences AB, Uppsala, Sweden.
blotting. The peroxidase oxidizes luminol that then gives off light. One way to record ECL is with X-ray film, then scanning the developed film for densitometry. Alternatively, chemiluminescence scanners directly detect the light by reflective or contact scanning the luminescent sample with a CCD strip. The dwell time in this case is the travel time of the CCD strip. Using multiple CCD strips decreases the total time to get an exposure. There are several advantages of recording on a CCD or CMOS strip over X-ray film. They include a much larger dynamic range (up to 3–4 orders of magnitude larger), linearity over the dynamic range up to saturation, a digital read-out that does not rely on secondary densitometry, and elimination of trialand-error exposure times on expensive and perishable film. Besides chemiluminescence, two other important forms of luminescence used in scientific-grade scanners are fluorescence and phosphorescence. As shown by the energy diagram in Figure 6.5, the absorption of light or interaction with a charged particle (a femtosecond process) can excite electrons to a higher energy state (see Section 17.2, Figure 17.3, for a more expanded version of the Jablonski energy diagram). As the electrons return to their lower energy level, during de-excitation, they lose some of their energy through vibration and thermal emission but lose most of their energy through fluorescence emission of a longer wavelength photon of lower energy. This process takes nanoseconds. If the electron enters an excited triplet state, de-excitation produces phosphorescent emission of even longer wavelength. Phosphorescence takes milliseconds to minutes, much longer than fluorescence. Fluorescence scanners detect the fluorescence from objects scanned with lasers or LEDs of different wavelengths, which excite fluorescent dyes that label proteins in the object or gel. A filter specific for the emission wavelength of each dye covers the light sensor. An example of the use of this technology is to fluorescently label proteins extracted from tissues at different times with different fluorescent dyes. Mixing them together and separating them on a single gel using 2D gel Figure 6.5 Jablonski energy diagram showing how electrophoresis will show which proteins have changed during absorption of light (green) by a molecule excites electrons the time interval (Figure 6.6). In combination with subcellular to different energy levels. They then de-excite (red) through fractionation, these scanners can identify the organelles that thermal, fluorescence, and phosphorescence events. accumulate or modify fluorescent tracers or fluorescent Diagram by L. Griffing.
115
116
6 Image Capture by Scanning Systems
Figure 6.6 Difference gel electrophoresis fluorescence analysis of proteins from tissues expressing different proteins at different times or under different conditions. From Becket P 2012 / Springer Nature.
reporter proteins, such as green fluorescent protein (see Section 17.2). Some of the scanners use a confocal light sensor, one that only accepts light from one focal plane in the sample (see Section 6.9). This limits the noise from the surrounding field but also produces a very shallow depth of field. Phosphor-imaging scanners use special phosphorescent imaging plates that are sensitive to high-energy photons (gamma, γ, radiation), alpha (α) particles and beta (β) particles. Alpha particles are 4He nuclei (two protons and two neutrons). Beta particles are electrons (negatively charged) or positrons (positively charged electrons). They are the products of radioactive decay of radioisotopes (Panel 6.1). X-ray film was the media of choice in the past for detecting radioactive isotopes in gels and fixed sections of animal or plant material, a process called autoradiography. The advantage of using phosphor-imaging plates over X-ray film–based autoradiography are similar to the advantages of CCD and CMOS scanners for chemiluminescence. The phosphor-imaging plates are reusable, produce a digital read-out, require shorter exposure time than film, and have a larger linear dynamic range. Panel 6.1 Radioactive decay When there is an imbalance between the attractive nuclear forces and repelling electromagnetic forces in an atom, there is a fixed probability that the nucleus will lose energy by emitting high-energy radiant particles. These particles produced by such radioactive decay are photons (gamma, γ, radiation), alpha (α) particles, and beta (β) particles. Alpha particles are 4He nuclei (two protons and two neutrons). Beta particles are like electrons but can have either a positive or negative charge. Radioactive atoms such as uranium and thorium that have high atomic numbers, and large nuclei emit alpha particles. Common radioisotopes used as tracers for molecular biology emit beta particles. Each of these radioisotopes has a particular half-life (fixed probability of exponential decay) and emission at particular energies. Isotope
Particle
Half-Life
3
β
12.3 years
14
H (tritium)
β
5700 years
32
β
14.3 days
35
β
87.37 days
45
β
162.6 days
C P S Ca
59
Fe
131
I
β, γ
45 days
β, γ
8 days
6.3 Scientific-Grade Flatbed Scanners Can Detect Chemiluminescence, Fluorescence, and Phosphorescence
Figure 6.7 X-radiography or autoradiography with photostimulable phosphor (PSP) plates. After acquiring the latent image on a PSP plate (step 1), a scanning system reads out the photostimulable luminescence and records the luminesced light (step 2A or 2B). The data are recorded, and the PSP plate can acquire another exposure after flashing it with a bright light (step 3). LED, light-emitting diode. Diagram by L. Griffing.
Phosphor-imaging plates can record both autoradiographs and standard planar X-radiographs (Figure 6.7). The phosphor-imaging process is the same for both. Both use phosphor imaging screens, or photostimulable phosphor (PSP) plates, composed of PSP crystals of the europium-activated barium halide, BaFBr:Eu+2 (barium fluoride bromide: europium). Exposing the screen to ionizing radiation, such as α, β, γ or X-radiation, or wavelengths of light shorter than 380 nm, excites the electrons from Eu+2, trapping them in an “F-center” of the BaFBr − complex; this results in the oxidation of Eu+2 to Eu+3, which forms the latent image on the screen. The latent image is the pattern of stored electrons in the storage phosphor in the phosphor-imaging plate. It can last for several hours until read-out by a laser scanner. Reading out this trapped energy in the storage phosphor by the phosphor-imaging scanner (Figure 6.7) involves stimulating the phosphors with additional light energy from a red-light laser or LED to produce the far blue light of
117
118
6 Image Capture by Scanning Systems
photostimulated luminescence. During luminescence, Eu+3 reverts back to Eu+2, releasing a photon at 390 nm. A silicon chip or a photomultiplier tube collects the light (Figure 6.7, steps 2A and 2B) from each serial line and constructs a final digital image. Flashes of intense light erase the PSP screen and recycle it for use (Figure 6.7, step 3).
6.4 Scientific-Grade Scanning Systems Often Use Photomultiplier Tubes and Avalanche Photodiodes as the Camera During point scanning, when a single spot or point of light or electrons scans across an object, a photomultiplier tube (PMT) or an avalanche photodiode detects the reflected, fluoresced or luminesced light during the dwell time at each point. PMTs and avalanche photodiodes produce an amplified number of electrons when photons strike them. A voltage signal from the amplified electrical current lights up a monitor screen at a point on the monitor that is positionally identical to the point on the sample. As in digital cameras (see Section 5.4, Figure 5.6), electronic imaging converts a spatial light signal into a timed electric signal that then forms an image as light on a display or stored in digital format. Generation of a timed electrical signal is the function of PMTs and avalanche photodiodes in phosphor imaging (see Figure 6.7, step 2B) in fluorescence scanners and in scanning microscopes. They sample light very quickly, with potential clocking rates that exceed the dwell time of light on each point (an instance of temporal Nyquist sampling; for an instance of spatial Nyquist sampling, see Section 1.6). PMTs gather light through a side window on a vertical vacuum tube or an end-on window in a horizontal vacuum tube (Figure 6.8). The transmitted photons hit a photocathode coated with metals that produce electrons in response to light. This photoelectric effect is the same effect that produces electrons in response to light in silicon (see Section 5.1), but photocathode chemistry has a shorter lifetime and requires storage in the dark. The electrons produced, photoelectrons, travel down a vacuum tube (see Figure 6.8) accelerated by the voltage difference between dynodes. There are several kinds of coatings for photocathodes (Figure 6.9). Popular coatings such as cesium-activated gallium arsenide [GaAs(Cs)] or phosphorus-activated gallium arsenide [GaAs(P)] have different quantum efficiencies (see Section 5.2, Panel 5.1, for quantum efficiency calculation) at different wavelengths of light. Their quantum efficiencies are generally lower than those of CCDs (see Figure 6.9). Secondary electron emission at the dynodes produces the amplification of the initial photoelectric signal in the PMT. A primary electron beam hits the dynode metal, emitting more secondary electrons than in the original beam. Increasing the voltage difference between the dynodes exponentially increases the number of secondary electrons given off per electron hit. The gain is the ratio of the current at the output anode to the current produced by photons at the photocathode. More dynodes at a higher voltage difference produce more amplification and more gain (proportional to VαN, in which V is the voltage difference between the dynodes, N is the number of dynodes, and α is a coefficient determined by the geometry and material of the dynodes). For example, a 10-dynode PMT can have a gain of 1000 at 1000 volts but 1.1 million at 2000 volts, but a 12-dynode PMT can have a gain of several thousand at 1000 volts and more than 10 million at 2000 volts. With more gain, there is a higher noise floor. Consequently, the PMT has a voltage offset to raise the read-out above the noise floor, as do cameras.
Figure 6.8 Standard end-on or box photomultiplier tube used in imaging technologies such as confocal laser scanning microscopy. It is a vacuum tube (10–4 Pascals). A photon goes through the transparent faceplate and hits a photocathode, which in turn produces an electron, a photoelectron. The photoelectron hits the first in a series of dynodes set with a high voltage difference, driving the electrons from one dynode to the next, making new secondary electrons at each dynode. The number of dynodes and the voltage difference determines the degree of amplification. The last dynode in the chain is the anode. Diagram by L. Griffing.
6.5 X-ray Planar Radiography Uses Both Scanning and Camera Technologies
Figure 6.9 Quantum efficiencies of the standard coatings of the photocathodes for photomultiplier tubes (PMTs). Note that the quantum efficiency rarely gets into the 30% range for these coatings. These are lower quantum efficiencies than charge-coupled devices, with 80% for silicon photodiodes in blue light. Adapted from Inoue, S. and Spring, K. 1995. Video Microscopy. Second Edition. Plenum Press, New York, NY.
Figure 6.10 Avalanche photodiode. Under reverse bias, the neutral depletion region absorbs light and produces photoelectrons. The photoelectrons initially flow to the p-doped silicon and then to an n + -doped silicon, amplifying with each step. Electron holes move through the depletion layer to the p + -type silicon. Diagram by L. Griffing.
Avalanche photodiodes (Figure 6.10) are p-n semiconductor silicon chips that contain a neutral silicon component that produces more electrons with increasing reverse bias voltage (see Section 5.2, Figure 5.1). Again, the gain is the ratio of the anode current to the cathode current. Compared with PMTs, their lifetimes are longer; they have more (thermal) noise at low signals; and the amount of potential amplification is lower, only about a thousandfold. Both PMTs and avalanche photodiodes have good gamma linearity.
6.5 X-ray Planar Radiography Uses Both Scanning and Camera Technologies Standard planar X-radiography starts with the production of beams of X-rays coming from X-ray tubes (Figure 6.11). The tubes produce divergent rays of X-rays emitted from a spinning disk of tungsten, which is an anode target under bombardment from electrons from one or two cathodes. Digital mammography requires a more focused beam, so the anode target is molybdenum rather than tungsten, and the cathode is flat rather than helical. A set of linear lead strips, a collimator, creates parallel (collimated) X-ray beams coming from the X-ray tube. An antiscatter device behind the patient or tissue is another set of lead strips that reduce the scattered X-rays that would produce noise on the imaging device. Medical (Figure 6.12) and veterinary X-radiography use either phosphor imaging or imaging with a silicon-based camera. Computed radiography collects the X-ray with a PSP plate. The term computed radiography comes from the several computational steps performed between the collection of the image and its final presentation (see Figure 6.12, step 3). Direct radiography is similar to line-scan computed radiography but does not use PSP plates. Hence, it does not use the scanning system for read-out employed by phosphor imaging computed radiography. Instead, it uses a CCD; a CMOS chip;
119
120
6 Image Capture by Scanning Systems
Figure 6.11 A standard X-ray tube. Electrons stream from one or two cathodes toward the rotating tungsten target. When the electrons hit the target, they produce bremsstrahlung, or deceleration, radiation, with a broad spectrum of energies corresponding to the varying amounts of energy lost as the electrons interact with the tungsten atoms. When the electron beam ejects electrons from the tungsten atom, the outer orbital electrons move to lower orbitals, producing X-rays of very narrow energy bands, characteristic radiation. Diagram by L. Griffing.
Figure 6.12 Planar X-radiography with photostimulable phosphor (PSP) plates. After acquiring the latent image on a PSP plate (step 1), a scanning system reads it out using photostimulable luminescence and records the luminesced light (step 2). A computed radiograph results from scaling and enhancing the digital data (step 3). Diagram by L. Griffing. The machine is adapted from Wikimedia Commons: US Navy 030623-N-1126D-002 Hospital Corpsman 3rd Class Tyrone Mones from Chula Vista, Calif. performs a C-Spine table top X-RAY on a patient at medical aboard Naval Air Station, Jacksonville, Fla.jpg. X-ray of human lung is from Lung X-Ray. Sudraben. 2018. Creative Commons Attribution. Share-Alike 4.0.
or for high-resolution mammography, a parallel imaging array of amorphous silicon or selenium thin film transistors (a-Si TFT or a-Se TFT). Some use a TFT array underneath a crystalline cesium iodide-tellurium (CsI:Tl) scintillator layer. In the TFT and CMOS arrays, each pixel, called a detector element or del by radiologists, is directly addressable. In CCD systems, read-out occurs with charge transfer through the chip (see Section 5.4). Rapiscan systems at airports use direct radiography to detect backscattered X-rays. Backscattered X-rays come back from the tissue in the direction of the original X-ray source. The X-ray source and the direct radiography sensor reside side by side and get a full-body scan by rotating around the body. The U.S. Transportation Safety Administration states that the
6.6 Medical Computed Tomography Scans Rotate the X-ray Source and Sensor in a Helical Fashion Around the Body
X-rays used in the backscatter machines use soft X-rays, defined as radiation between 0.12 and 1 keV (kilo electron volts). Hard X-rays are higher energy (e.g., > 1 keV). Soft tissue or low-density matter absorbs soft X-rays. The softer the X-ray, the more it is absorbed by soft tissues like skin and the higher the biologically relevant dose. Hence, soft X-ray radiation can be more damaging to soft tissues than a higher energy medical chest X-ray examination.
6.6 Medical Computed Tomography Scans Rotate the X-ray Source and Sensor in a Helical Fashion Around the Body Tomography is imaging using slices or sections. CT scanning machines acquire X-ray slices. CT scanning is a popular diagnostic tool. The United States currently performs more than 60 million human CT scans per year. Hounsfield and McCormack invented the technique in 1972, and these inventors received the Nobel Prize in Medicine in 1979. The current medical CT scanners are third- and fourth-generation scanners. A patient or object resides inside a ring containing the X-ray tube, with an opposing array of rows of 512–768 sensors made of a gadolinium scintillator, which converts the X-ray into lower energy light, layered over a silicon photodiode. Each scintillator-photodiode sensor is a pixel. The sensor array varies from about 64 rows wide to modern multi-detector CT scanners with 256–320 rows. One revolution of the ring containing the X-ray tube and sensor array produces multiple one-dimensional (1D) scans from each row in the sensor array. Filtered back-projection converts these 1D scans into a 2D image slice. To understand back-projection, consider a 2D test object, or phantom. CT scanning routinely uses 3D phantoms to calibrate and analyze instrument performance. This example generates a 2D image from a series of 1D scans of a 2D phantom. X-irradiating the phantom from one angle produces a 1D line of the transmitted X-ray intensity (Figure 6.13A). Scanning from multiple angles results in multiple 1D intensity lines. Plotting the intensity lines along an axis of the angle of acquisition
Figure 6.13 Image acquisition and reconstruction using backprojection in computed tomography scanning. (A) One-dimensional (1D) intensity read-out from 20 and 90 degrees. The green plot is a 20-degree intensity plot versus distance. The orange plot is a 90-degree intensity plot versus distance. (B) Sinogram showing positions of stacked intensity plots from each angle of acquisition. (C) Backprojection of 1D plots into two-dimensional space. (D) Summing the plots of all of the angles as in C, an approximation of the object forms. Diagram by L. Griffing.
121
122
6 Image Capture by Scanning Systems
results in a sinogram (Figures 6.13B and 6.14). The mathematical treatment to produce a sinogram is a Radon transform. Sinograms are useful in radiology; they show discontinuities or breaks when the patient moves. This example shows 1D recording of the 2D phantom at 20 degrees (green rays) and at 90 degrees (red rays) (Figure 6.13A). To construct an image from back-projection, the intensity plot from one angle fills the back-projected 2D image plane (Figure 6.13C), and filling the plane with the intensity profile from each angle adds more back-projections until an image of the phantom emerges (Figure 6.13D). However, blurring (low spatial frequency) (see Figure 6.14) and streaking (high spatial frequency) occur. Back-projection software usually operates in the Fourier (frequency, k space) domain and minimizes the blur and streaking using a variety of Fourier filters, the most common of which is the ramp filter, which sharpens the image, minimizing the blur (see Figure 6.14). Section 10.7 describes the tomographic operations in the Fourier domain. In a medical CT scanner, the patient is on a table moving relative to the rotating ring of the scanner. The X-ray tube/detector ring rotates about once per second. The ratio of how far the table moves during a single rotation of the scanning ring to the collimated slice thickness is the pitch of the helical scan. The pitch varies between 1 and 2 in clinical CT scanners. In a spiral scanner, an interpolation technique between the back and front part of the spiral produces the image of a single slice. They have a spatial resolution of about 0.35 mm (~35 times the size of a typical cell). Medical CT scanning is a form of hard X-ray (~70 keV). The signal comes Figure 6.14 Construction of a sinogram over from the attenuation of the X-ray by the tissue, bone being quite absorptive 180-degree rotation of X-ray acquisition. A and fat much less absorptive. Comparing the attenuation of the X-rays by ramp filter in frequency space produces the filtered back-projection from the original various tissues with that of water produces the CT number expressed in sinogram and improves the contrast of objects Hounsfield units (HU). Different tissues have different HU values within the tissue. Images by Adam Kessner. (Table 6.2). Internal structures such as the gastrointestinal tract have Available at: https://sites.google.com/a/ intrinsically low contrast. Ingested barium sulfate increases the contrast of fulbrightmail.org/kesnersmedicalphysics/ home/Education/3d-image-reconstructionthese tissues. Relatively inert iodinated contrast agents, such as iodixanol, explained-with-animated-gifs. Adapted by injected into the bloodstream provide contrast of the circulatory system L. Griffing. with X-rays. For non-medical applications, other staining agents for ex vivo (non-living, fixed) tissue are available (see Section 6.7). A relatively recent form of X-ray scanning is X-ray tomosynthesis. This is a hybrid technology between planar radiography and CT scanning, with limited X-ray scanning and no movement of the imaging camera. The X-ray source scans a region of the patient while a direct radiography camera takes pictures, providing views from several angles. Diagnosis of small pulmonary or breast lesions uses chest tomosynthesis. Oral tomosynthesis provides a partial 3D volume of the mouth to fit dental implants. Table 6.2 Different Tissue Computed Tomography Numbers or Hounsfield Units at 70 keV. Tissue
Computed Tomography Numbers (HU)
Air
−1000
Lipid or fat
−50 to −100
Water
0
Muscle
10–40
Brain (white matter)
20–30
Brain (gray matter)
35–45
Bone
1000–3000
6.7 Micro-CT and Nano-CT Scanners Use Both Hard and Soft X-Rays and Can Resolve Cellular Features
6.7 Micro-CT and Nano-CT Scanners Use Both Hard and Soft X-Rays and Can Resolve Cellular Features Micro-CT and nano-CT scanners, sometimes known as industrial CT scanners to distinguish them from medical CT scanners, have resolutions in the micrometer and nanometer range, respectively. Using greater than 1–5 keV X-rays, they can non-destructively image inert, non-living materials, such as mineral composites, making them useful for geological studies. Using softer, lower energy ( Fit Spline. The editable spline curve has In ImageJ, an ROI manager saves path selections (Figure multiple vertices or anchor points, shown as movable white 7.7A). The ROI manager assigns a number to the editable squares. Hovering over a selection vertex produces an rounded rectangle selection in Figure 7.7B, which the user can adjustment finger for adjusting the vertex position (except on replace with another number or name, such as “band 1.” Saving multiple selections made by holding down the shift key while making the selections). Image by L. Griffing. the areas of selections directly creates masks. Masks put into a separate image channel (alpha channel) are overlays (Figure 7.7C), whereas those put into separate images (Figure 7.7D) save as binary images. Binary image masks show the selection as the highest value (255) of an 8-bit image and the unselected region as the lowest value (0). While creating masks in ImageJ, the look-up table (see Section 2.7) sometimes inverts so that the selections appear black on a white background. The binary masks are editable. Morphological image operations described below employ such binary images or masks. The ROI manager can reconvert masks and overlays to selections. In Photoshop, the pen tool makes vector-graphic editable selections (see Figure 7.5). Adobe Illustrator, a vector graphics illustration application, has identical tools. Points, lines, and shapes drawn by the pen tool (see Table 7.2) create savable paths. These paths are combinations of Bézier curves. They are vector graphic paths that are interconvertible with raster graphic selections (Referring to the upper a lower parts of this table makes no sense now that it has been reformated. Table 7.2; Figure 7.7 and Figure 7.8). The direct selection tool (white arrow) adjusts the vertex or anchor points in the Bézier curves made by the pen. The direct selection tool also adjusts the Bézier handles that are tangent to the curve, changing the curve of the path on either side of the anchor point (Figure 7.8B). Photoshop has pixel masks (Figure 7.9) generated from the raster-based selection tools and vector masks from vectorbased path or pen tools. Vector masks have the advantage that they are resolution independent. A quick and easy introduction to the use of editable masks is the Quick Mask option in Photoshop, in which paint and erase tools (Figure 7.10) edit a temporary pixel mask created as a semi-transparent ruby overlay. On exiting the Quick Mask, the region(s) without the
Figure 7.7 In ImageJ, the region of interest (ROI) manager saves paths and image files save masks. (A) The ROI manager appears after selecting with the command Edit > Selection > Add to Manager. (B) The rounded rectangle selection (red) is editable; note the handles. (C) Produce overlay masks in ImageJ by using the command Image > Overlay > Add Selection. The overlay can be hidden or shown and saves when saving the image as a TIFF file. Creating a selection from an overlay uses the command Image > Overlay > To ROI Manager. (D) Create binary file masks as separate image file in ImageJ using the command Edit > Selection > Create Mask. Create a selection from a mask using the command Edit > Selection > Create Selection. Image J also has an analyze gel capability that automates some of this with the Analyze > Gel commands. Adapted from Goubet, F et al. 2005.
7.3 Recorded Paths, Regions of Interest, or Masks Save Selections for Measurement in Separate Images, Channels, and Overlays
Figure 7.8 Conversion of selections to paths and paths to selections in Photoshop. (A) Right clicking on a lasso-tool selection provides an option to make a work path. (B) The path is editable with the direct selection tool. The Bézier handles (Ctrl-Alt-Left button click) adjust the curves around each vertex, or anchor point. (C) Reconvert the path to a selection either by right-clicking on the path as in A or by the “load path as selection” option at the bottom of the path window (shown).
Figure 7.9 In Photoshop, a pixel mask comes from a lariat tool selection using the mask window and clicking on the pixel mask option. If using a path for the selection, use the vector mask. This converts the rest of the image outside the mask to a ruby overlay. When opening the layers window, both the background image and the mask lock together in one layer. In the channels window, an extra channel appears for the saved selection (Save > Selection > New Channel). Saving selections in a separate document uses the command Select > Save Selection > Document. Image by L. Griffing. Figure 7.10 In Photoshop, a semi-transparent (medium opacity) ruby overlay indicates the unselected regions of the image when using (highlighting) the Quick Mask option at the bottom of the toolbar. Add more selections by erasing the ruby overlay with the eraser tool. Deselect areas by filling in the ruby overlay with the paintbrush tool. Image by L. Griffing.
ruby overlay in Quick Mask mode become raster-based selections. Even though Photoshop has limited measurement capability, ImageJ routines can measure these selections by saving or copying them to a new image file. An important step in archiving scientific images is to save analyzed or processed selections. Photoshop automatically saves paths as part of the image information in certain file formats, just as it saves layers. Avoid image formats for measurement that do not preserve layers and paths (e.g., JPEG). ImageJ does not recognize paths and layers saved in TIFF formats in Photoshop, so saving them as separate images is the option when switching from some processing programs to some measurement programs. ImageJ saves masks as separate images, and archiving them as a form of metadata is good practice. Once saved, there are many options for measuring ROI or mask selections. Contrast enhancement on parts of images for final presentation in scientific journals constitutes scientific fraud (see Section 3.9). However, non-scientific photographers use masks in Photoshop to adjust contrast within selected regions using the image adjustments menu.
143
144
7 Measuring Selected Image Features
7.4 Stereology and Photoquadrat Sampling Measure Unsegmented Images The time required to accurately identify and outline each object for direct digital measurement may necessitate less time-intensive approximation techniques, such as ecological photoquadrats and stereology grids. These stereological or quadrat methods are done on whole images rather than on segmented parts of images and are therefore referred to as global image measurements. Ecological photoquadrats are sampling strategies used to measure the abundance and surface cover of different species. Stereology estimates Figure 7.11 For counting photoquadrats in Photoshop, place a the geometrical properties of an object from a set of subgrid as a separate layer over a photographic field; in this case, a samples, usually a series of physical sections through the 10-cm grid layers overlays a photo of benthic organisms. object. Both are point-counting sampling strategies that use Counting the number of vertices of each square grid that the organism occupies estimates the percent area of coverage of uniform random systematic samples generated by raneach organism. Image by S. Bernhardt. Used with permission. domly selecting a starting point for counting. Counting continues through the sample at regular (uniform, systematic) intervals by means of a grid placed at random over the image (Figure 7.11). Point-counting approximations use the vertices of the grid as points for counting. The percent coverage in photoquadrats, or the percentage of the total area of interest covered by an object, is the area fraction, AA. A direct measure of the percent coverage selects the objects of interest by segmentation and measures their combined area, then divides by the total area of the ROI. However, approximating this value is possible by counting the number of points that fall on the objects of interest as a percent of the number of points that fall on the ROI. This is the point fraction, PP. In studies comparing manual segmentation with point counts for percent coverage in which there is a single layer of organisms on a substrate, point counts frequently overestimate the percent coverage (Figure 7.12). When there are several layers of organisms, as in terrestrial assemblages, point counts underestimate the coverage because features higher above ground obscure the lower features. Hence, although PP = AA is a frequent statement in this literature, this relationship is an approximation. The approximation
Figure 7.12 Comparison of point counting and manual selection or segmentation to analyze percent cover, or area fraction, of three different intertidal species (represented by different symbols). (A) Segmentation versus 25-point estimate of quadrats. (B) Segmentation versus 100-point estimate of quadrats. The line shows the expected relationship if the techniques are equivalent. The point-count estimates are higher than the value from manual segmentation. From Whorff, J. S et al / With permission of Elsevier.
7.4 Stereology and Photoquadrat Sampling Measure Unsegmented Images
becomes statistically meaningful when achieving the calculated minimal sample size. Equation 7.1 calculates the sample size required to get a meaningful approximation with point counting objects using uniform, random, systematic samples. n = (z2 * p * q ) + ME2 / ( ME2 )
(7.1)
For a confidence level of 95% (α = 0.05), n is the number of samples needed. z is a critical standard score (1.96 for this confidence level), p is the proportion of total counts on object of interest, q is the proportion of total counts off object of interest, and ME is the margin of error (acceptable deviation from the mean; 0.05 would be ± 5%). This sample size is the total number of point counts; to get the total number of images needed, divide point counts by the average number of counts per image. Older stereological methods assume regular shapes or distributions in the volume, giving rise to large errors with complex biological samples. They are assumption-based methods. The stereological methods described here are designbased methods to distinguish them from the older, mostly out-of-use, methodologies. Just as PP, can approximate AA, AA can approximate the volume fraction, VV. Hence, in stereology, point counts also estimate volumes. A way to calculate volume from point measurements is the Cavalieri estimator, which works by superimposing a spatially calibrated grid on each section of known thickness of an entire volume (Figure 7.13). Counting the number of vertices on the grid containing the volume of interest estimates the area of the feature in each slice. Estimate the volume by multiplying the total number of points counted in all the slices by the volume of the slice (area of the grid square × the thickness of the slice). FIJI’s volume calculator plugin was originally an implementation of the Cavalieri estimator. The disector (the name comes from counting two adjacent sectors, not to be confused with the word dissector) is a technique in stereology that estimates the number of objects in a volume. It starts by assigning a single point to each object of interest in the volume. This point comes from sectioning through the volume and determining in which section the object becomes visible for the first time. It counts the object if it is in one section but not in the adjacent section. Such a physical sectioning method counts an object only once in the Z dimension. Less obviously, it can count objects in the X and Y dimensions as well. Counting an object in the X dimension records its first occurrence to the right of the border of a grid square, as shown in Figure 7.14. Counting in the Y dimension records its first occurrence above the bottom of the square. The bottom and right-side boundaries of the box are “forbidden lines” that exclude counting any object touching them. Counting includes everything contained in the square and touching the other two lines. However, there is a problem because two diagonal grid squares could count the same object – one in the X dimension and one in the Y
Figure 7.13 The Cavalieri method of volume calculation. Using the distance between each section h (e.g., 3 µm),and the spacing on the grid, d (e.g., 1 µm), each voxel has a volume of 3 × 1 × 1 or 3 cubic µm. The total number of inter sections or hits made by the section on the grid is 30 + 18 + 17 + 24 + 26 = 115. The total volume is then 115 × 3, or 345 cubic µm. Adapted from Russ J. C. 2007.
145
146
7 Measuring Selected Image Features
dimension. Extending the forbidden lines in the X dimension as shown (see Figure 7.14) overcomes this problem. Hence, counts of objects in a volume proceed by selecting the correct sub-region of each section to count and applying Equation 7.2.
– + + – – Figure 7.14 The disector grid. The brown lines are the forbidden lines of the grid, which excludes from counting any object touching the line. Count the objects intersecting the green lines. This produces the same result for the X and Y directions as first appearance does in the Z direction. Adapted from Russ J. C. 2007.
Nobj = ∑ Q / ∑ Vdis × Vref
(7.2)
Nobj is an estimate of the total number of objects, ∑ Q is the sum of all objects counted using the disector probes, ∑ Vdis is the sum of the volumes of all the disector probes, and Vref is the volume of the structure estimated with the Cavalieri estimator. Note that ∑ Q/∑ Vdis gives the number per volume or Nv, so it is sometimes called a Nv × Vref method. Alternatively, the fractionator estimates the total number of objects in a volume by multiplying the counts in each dimension by the inverse of the sampling fraction (how much of the total population was sampled) (Equation 7.3). Nobj = 1 / ssf × 1 / asf × 1 / tsf × ∑ Q
(7.3)
Nobj is an estimate of the total number of objects in the structure, ssf is the section sampling fraction, asf is the area sampling fraction, tsf is the thickness sampling fraction, and ∑Q is the sum of all objects counted using the disector probes.
7.5 Automatic Segmentation of Images Selects Image Features for Measurement Based on Common Feature Properties Besides global image measurements, another way to overcome the burden and uncertainty of manual selection is to have the computer identify features with pixels having similar properties. Different people manually segment images differently. This is a problem for medical and industrial professions. If an algorithmic approach to segmentation can be found, even if it might not be as good as segmentation produced by some people, its consistency makes it highly desirable. Once identified, machine learning algorithms or artificial intelligence can optimize automated segmentation while maintaining reproducibility. Some of the shared properties of segmented regions are gray level, color value, and texture. In fact, any property that can describe or classify different clusters of pixels can segment an image. Automatic segmentation operations can identify spatial patterns of pixels within an image, such as textures and boundaries (e.g., for Watershed segmentation, see Section 7.7; for convolution processing, see Sections 10.4 and 10.8 and see Section 11.7 for how convolutional neural networks can identify objects). Iterative segmentation using multiple feature properties can further parse the image. Gray-level and color value segmentation produces a list of identified objects, and sorting the list based on computationally derived measurements, such as size and shape, can provide more properties of those objects. Hence, measurement itself can be a way of segmenting the image.
7.6 Segmenting by Pixel Intensity Is Thresholding A convenient way to segment an image based on gray level is to turn all pixels “on” when above a certain threshold gray value and “off” when below. This thresholding operation creates the selection. The thresholding job can either be manual or by algorithms that calculate features of the histogram. In Figure 7.15, manual segmentation thresholds the gray scale at the first peak of the histogram (Figure 7.15B). Segmentation with a variety of different automatic procedures choose either the second or third peak as their threshold limit (see Figure 7.14C–P). In this figure, an inverted look-up table shows the selected “on” pixels as black on a white, unselected, “off” pixel background. A black bar in the inset underlies the selected portion of the histogram. These thresholding programs are included in the Auto Threshold plugin in FIJI. Thresholding can also select a range of gray values starting higher than 0. Figure 7.16A is one panel from the sequence in Figure 7.1. Both the Golgi and endoplasmic reticulum (ER) are red, selected because both lie in the more intense regions
7.7 Color Segmentation Looks for Similarities in a Three-Dimensional Color Space
Figure 7.15 Manual and automated thresholding of a picture of mating ladybugs. (A) The original grayscale image and its histogram, with numbered peaks. (B) Manual segmentation, making everything in the peak 1 of the histogram black. This does a fairly good job of isolating the spots of the ladybugs from the rest of the picture. (C–P) Procedures that automatically segment the image. Note that Isodata (D) and Otsu (K) are fairly similar and separate the thoracic spots from the rest of the picture, making a threshold at the beginning of peak 3. The remaining procedures (except I) threshold the image at peak 2 and thereby separate most of the ladybug from the leaf. Image by L. Griffing.
of the histogram. However, the Golgi are much brighter than the ER, so by lowering the upper threshold value (Figure 7.16B), they can be segmented away from the ER, and only the red ER remains. Selecting a band of the grayscale histogram for segmentation is density slicing.
7.7 Color Segmentation Looks for Similarities in a Three-Dimensional Color Space Color segmentation is more complicated than just intensity thresholding shown in Figures 7.15 and 7.16. There are many ways to represent color space. A version of the CIE chromaticity diagram (CIELAB) represents the gamut of color seen by humans (see Section 4.3) by defining color as a function of luminance L, a yellow-blue color axis a, and a red-green color
147
148
7 Measuring Selected Image Features
Figure 7.16 Density slicing. (A) Manual thresholding of bright regions of the histogram. This picture shows both Golgi and endoplasmic reticulum (ER) in the network as selections and depicts them in red. (B) By lowering the upper threshold in the histogram to exclude the Golgi, only the ER is a selection and shown as red, with the Golgi shown as white spots. Golgi and ER image in (A) is from Chris Hawes. Used with permission. Image by L. Griffing.
axis b, producing the Lab color space. Other color spaces, the RGB (red, green, and blue) and HSV (hue, saturation, and value) color spaces plot color as a function of other axes (Figure 7.17). There are many ways to use these color spaces. Choosing a region of color in an RGB color space can be as simple as thresholding in the RGB channels, producing a “volume” of related color values (see Figure 7.17).
Figure 7.17 Color spaces. (A) The HSV color space shows hue, saturation, and value of color. It is a cylinder with hue varying around its circumference, saturation varying along its radius, and value (or intensity) varying with the height of the cylinder. Adapted from Wikimedia. (B) The RGB color space shows red, green, and blue values of color. It is a cube with the height, width, and depth each being a separate primary color and the combination of the highest intensities (upper front of cube) producing white. Thresholding with upper and lower values in RGB generates a selection of color range within a three-dimensional space of an RGB cube. Adapted from Wikimedia.
7.8 Morphological Image Processing Separates or Connects Features
Figure 7.18 Change in the abundance of fire coral, Millepora alcicornis, in photoquadrat 113 (inset) from the Stetson Marine Sanctuary. Point count (solid circles), manual segmentation with outlining (open triangles), automatic color (solid diamonds), and interactive color segmentation (open squares) show similar percent coverage with time of the yellowcolored coral. Image from Bernhardt, S.P., and Griffing, L.R. 2001. An evaluation of image analysis at benthic sites based on color segmentation. Bulletin of Marine Science 69: 639–653. Used with permission.
Alternatively, selecting a color volume can start by choosing a color, which would be a 3D point in the RGB color space, and then expanding the selection to include similar colors by increasing its radius outward. In this case, the color volume would be a sphere. The magic wand color selection tool in Photoshop probably works along similar lines, with the radius of the sphere being the tolerance level set for that tool. The magic wand only selects spatially connected colors; other similar colors in the image can be selected with the Select > Similar command in Photoshop. A scientific application of this approach identifies the different organisms on the seafloor of the Gulf of Mexico. In that study, an analysis of such color segmentation compares favorably with point counting and manual segmentation methods, with similar results for each (Figure 7.18). ImageJ uses the HSV color space (for HSV, also called the HSL color space for hue, saturation, and luminance) for color segmentation. In Figure 7.19A, tan Arabidopsis seeds are on a piece of white paper with a purple ruler. Ranges within the color space isolate the seeds from the surround, with the selection segmentation shown in red in the image (Figure 7.19B). The value, or brightness, slider selects the darker regions of the image. The saturation slider removes the shaded regions caused by the rough paper. The hue slider selects the tan seeds away from the purple ruler. All of these thresholds provide an excellent color segmentation of the final image. Other common forms of segmentation also require some form of user interaction. A simple interactive object extraction (SIOX) approach (Figure 7.20A and B) segments color objects away from a background of another color. In this case, it adequately segments the colored wing coverings of the ladybugs. Some selection tools interactively “grow” or dilate a selected region, such as the Select > Grow command in Photoshop and balloon segmentation in FIJI (Figure 7.20D). Finally, a sophisticated machine learning segmentation routine uses supplied regions that the program then matches, the WEKA (Waikato Environment for Knowledge Analysis) trainable segmentation tools. This approach is probably the most flexible of the automated segmentation routines. It uses a variety of digital filters such as the Hessian matrix (see Section 7.7) on the image and then matches not only color but also patterns using image or ROI correlation (see Section 11.13). The Hessian matrix provides a form of scale-invariant segmentation, meaning that it segments small and large objects of the same shape (see Section 11.4). Based on user-supplied image regions, these programs use machine learning (see Section 11.7) to calculate the probability of region similarity, provide the probability map (Figure 7.20E), and make a binary image mask of those with the highest probability (Figure 7.20F). However, as can be seen in the image of seeds in Figure 7.19, after color segmentation, counting the objects would lead to erroneous results if some of the objects touch each other. Morphological image processing of binary images can separate the objects and produce accurate measurements of the separated objects.
7.8 Morphological Image Processing Separates or Connects Features Following selection of an object based on gray level or color, there are frequently problems with the selected features that make further computer-based analysis difficult. The features may be touching slightly, which leads to miscounting, or may contain an unselected feature inside them, which leads to inaccurate area measurement. Morphological image processing overcomes some of these problems. It starts out as way to process a binary mask of the image so that slightly touching features disconnect without changing their area. It can also fill in incomplete features inaccurately captured,
149
150
7 Measuring Selected Image Features
Figure 7.19 Color segmentation of seeds using the hue, saturation, value (HSV) color space in ImageJ. (A) Original image of the seeds. (B) Image showing selected seeds in red. (C) Operation in hue, saturation, and intensity that provide segmentation of the seed image. Image by L. Griffing.
Figure 7.20 User-interactive selection methods. (A) The color image of the ladybugs with their histogram. (B) The selection resulting from a simple interactive object extraction (SIOX), choosing the wing colors as the foreground and the leaf and ladybug spots as the background. (C) Grayscale ladybugs. (D) Balloon segmentation of some of the ladybug spots in C. (E) Probability map from a trainable segmentation program (WEKA [Waikato Environment for Knowledge Analysis] segmentation, FIJI). (F) Binary mask of high probability regions. Image by L. Griffing.
7.8 Morphological Image Processing Separates or Connects Features
providing continuity of thinly branched features. It develops into more complicated processes that can classify objects based on size, shape, and position in the image. Morphological processing operates by adding or removing pixels to segmented features in a way dictated by a structuring element of a certain size and shape, typically a 3 × 3 kernel, or array, of pixels. The basic operations are opening, closing, erosion, and dilation. Dilation adds pixels to features, filling in holes, and erosion subtracts pixels from objects, disconnecting them. Applying the center pixel of a structuring element to each pixel in the selection, each temporarily a target pixel, and adding or removing pixels from the selection is based on whether the structuring element falls over selected pixels in the neighborhood of the target pixel. The pixels that correspond to the shape of the structuring element add to the selection during dilation. During erosion, the process can subtract the target pixel if there are non-feature pixels detected in the vicinity of the target pixel (Figure 7.21).
STRUCTURING ELEMENT
DILATION
BASIC SHAPE EROSION
EROSION
DILATION
“OPENING” “CLOSING” SKELETONIZATION
Figure 7.21 Basic morphological image processing operations use a structuring element composed of 9 pixels in a 3 × 3 matrix. Dilation places the center pixel of the structuring element over a “target” pixel in the “on” portion of the binary image and adds pixels to the binary image (turns them “on”) if any of the pixels under the structuring element are “off.” The pattern of pixel addition is the same as the pattern of the structuring element. The operation only works on the “on” pixels. Erosion also places the center pixel of the structuring element over the “target” pixel in the “on” binary image but removes the target pixel if any of the pixels under the structuring element are “off.” If erosion follows dilation, the features are “closed” (i.e., holes tend to fill and separated objects connect, as shown in the figure). If dilation follows erosion, the features are “opened” (i.e., objects become unconnected with the removal of small features while maintaining the area of the larger features). Repeated erosion of the closed structure results in a branched skeleton, a process called skeletonization. Diagram by L. Griffing.
151
152
7 Measuring Selected Image Features
Figure 7.22 Network analysis using morphological image processing: differences between triple branch points, or nodes. Pixel a has three neighbors that are part of the white skeleton and has five neighbors (shown in gray) that are not, so it is a triple branch node. Pixel b has four neighbors that are part of the white skeleton and four neighbors that are not, so it is a quadruple branch node. The branch above pixel b has an endpoint. Image by L. Griffing.
The operations of opening and closing are combinations of erosion and dilation operations. Opening is erosion followed by dilation. Closing is dilation followed by erosion. The opening operation (see Figure 7.21) removes small single-pixel strings that connect larger particles during the erosion operation. However, during the dilation operation, it adds back the removed pixels from the edge of the larger particles, compensating for the area lost by erosion. Closing connects separated objects by dilation and then removes pixels added to the larger objects with erosion, again restoring them to nearly the size they were at the beginning of the closing operation. Another basic technique in morphological image processing is skeletonization (see Figure 7.21). This is a repeated symmetrical erosion of the object to the point where it is only a connected path of single pixels. Skeletonization illustrates the usefulness of the operating on a selection reiteratively. The morphological image operations can be set to operate a certain number of times. Skeletonization produces branched networks. Evaluation of these networks starts by determining the number of neighboring pixels adjacent to each of the pixels in the skeleton (Figure 7.22; see also Figure 7.1) and by using Euler’s equation, thereby classifying branching structure topologies (Equation 7.4).
No. of loops = No. of branches − No. of ends − No. of nodes +1
(7.4)
Branches come together or stop at nodes, the ends of branches, the bends of branches, or triple or quadruple branch junctions, identified by the number neighboring pixels that form part of the skeleton. The neighborhood is the group of eight pixels surrounding the target pixel in the 3 × 3 kernel. With three neighbors, there is a three-way junction or triple branch node (as long as it is not adjacent to a four-way junction), and with four neighbors, there is a four-way junction, or quadruple branch node. If there is only one other neighbor in the eight-neighbor vicinity, then the pixel is an endpoint on the branch. Dividing the straight-line distance between any two endpoints (the Euclidean distance; see later) by the length of each path gives the path tortuosity. Eroding, or pruning, all of the endpoints back to their nearest junction identifies residual connected networks. Pruning is particularly useful when skeletonizing the background pixels to separate regions containing the objects of interest by making a single pixel boundary, or skiz, between them. Connected particles, as in Figure 7.23, may not disconnect with reiterative opening operations (Figure 7.23B). Problems with asymmetrical illumination can produce a shadow that interferes with the original segmentation (see Figure 7.20). This illustrates the point that the illumination (in this case, oblique and collimated) should be optimized (changed to axial and diffuse) based on planned subsequent analysis (see Section 9.10). Even though the seeds are connected because the shadow is selected or the seeds are touching, there are binary image operations that can identify separate seeds for counting, some maintaining the boundaries between them, providing approximate areas and shapes. Another method for counting separated particles is to map the remaining, ultimate point, also called the ultimate eroded point, after reiteratively eroding the edge (Figure 7.23D). Erosion should be uniform in every direction, or isotropic. However, it often is not because the structuring element itself is a particular shape that is not isotropic. Calculating the point in the object that is the farthest away, in a straight line, from the background also produces the ultimate point. The point with the maximum straight-line distance from the edge, or the Euclidean distance, is the ultimate point. Assigning each pixel a gray value proportional to its distance from the edge produces the Euclidean distance map or just distance map (Figure 7.23E). Distance maps often look like terrain, with the ultimate point resembling the peak in the terrain and the regions in the center of the objects appearing like ridges. Ridges can generate skeletons by a process called the medial-axis transform. The medial-axis transform produces a skeleton that is more isotropic because there is no structuring kernel in this technique. Watershed segmentation, a name also taken from this terrain analogy, also separates particles. It generates a border along a local minimum in the distance map (Figure 7.24A). An opening operation on
7.9 Measures of Pixel Intensity Quantify Light Absorption by and Emission from the Sample
the object (or dilation of the background) increases the separation along the edges. After a watershed segmentation, achieving background partitioning with a set of boundaries using the Voronoi diagram produces a set of equidistant boundary points between the segmented objects (Figure 7.24B). The Hough transform identifies mathematically defined objects in a field and therefore can identify circular objects (as well as other shapes), as shown in Figure 7.25. It is a great example of image segmentation based on shape. It doesn’t detect all the circles in Figure 7.25. It detects most of the offspring Volvox colonies but not the outer boundary of some of the parental colonies. In cases in which there is considerable background noise, the Hough transform identifies noisy regions as containing many circles. However, high-throughput cell analysis uses the Hough transform because as cells in culture enter mitosis, they “round up,” becoming nearly circular in 2D. Morphological image operations performed on grayscale images use a structuring element that gives all of the pixels under it just one of their grayscale values (rank filter; see Section 10.3). These gray morphology operations produce larger dark domains upon erosion (Figure 7.26A), larger white domains upon dilation (Figure 7.26B), emphasize dark features with opening (Figure 7.26C), and emphasize light features with closing (Figure 7.26D). Gray morphology can have several applications, such as thickening the borders between objects drawn with a Voronoi diagram or increasing the size of ultimate eroded points for visualization.
7.9 Measures of Pixel Intensity Quantify Light Absorption by and Emission from the Sample After making selections, the measurement of pixel intensities in the selections is very straight-forward. The average intensity, or gray, value within a selection is the mean gray value. This is the quantity used for optical densitometric evaluation of gels, 96-well plates, X-ray film, and other images in which quantities occupy a selected area. For quantitative evaluation, the background intensity subtraction is necessary. In gels, subtracting a lane without sample from lanes with sample provides background subtraction, assuming uniform lanes and absence of staining artifacts. Alternatively, if there is some background staining of the gel that varies from a high value on one side to a lower value on the other, a “rolling ball” algorithm for background subtraction works. Calibrating optical density measurements requires either a step tablet of known gray levels or a preparation with a standard curve of known amounts. Other common intensity measures within selections are gray value standard deviation, the maximum and minimum gray values, and the modal gray value, the value of the highest peak in the histogram. Some applications, such as dot blot or
Figure 7.23 Separating touching objects using morphological image processing techniques. (A) Touching and non-touching seeds. (B) Overlay of binary image of color-thresholded seeds, as done in Figure 8.19 but opened iteratively five times. (ImageJ commands: Process > Binary > Options > 5 iterations, Open). (C) Overlay of skeletonization of binary seed image using a square structuring element for erosion. (ImageJ command Process > Binary > Skeletonize). (D) Overlay of ultimate eroded points of the seeds (dilated to be visible). (ImageJ commands: Process > Binary > Ultimate Points) (E) Overlay in cyan of the Euclidean distance map of the binary seed image. (ImageJ commands: Process > Binary > Distance Map). (F) Overlay of watershed segmentation of binary seed image. Note that this process draws a fine line between the connected seeds, disconnecting them. (ImageJ commands: Process > Binary > Watershed). Image by L. Griffing.
Figure 7.24 Voronoi segmentation of the background. (A) Watershed segmentation of touching seeds, as in 8.23F. (B) Voronai diagram show as a red line overlay on the separated seeds in A (ImageJ commands: Process > Binary > Voronoi). Image by L. Griffing.
153
154
7 Measuring Selected Image Features
Figure 7.25 Daughter colony identification in volvox using the Hough transform. (A) Red, green, blue (RGB) image of volvox. (B) Grayscale image of volvox after Process > Find Edges command and histogram stretching. (C) Analysis of binary image of B using the Hough transform. All of the circles identified are in white. Red arrow: region of noise showing “false positives.” Green arrow: periphery of parental colony showing “false negative.” A is by Michael Clayton, https://search.library.wisc.edu/digital/AP2H3MR3KTAVCY9B used with permission. B and C by L. Griffing.
Figure 7.26 Gray morphology operations with a circular, six-pixel structuring element. (A) Original grayscale image. (B) Original image eroded. (C) Original image dilated. (D) Original image opened.( E) Original image closed. FIJI commands: Process > Morphology > Gray Morphology > 6 pixel > Circle > Erode/Dilate/Open/Close). Image by L. Griffing.
microarray analysis, require the sum of pixel values in the selection, or raw integrated density. Integrated density is the mean gray value times the area of the selection (equivalent to the sum of the pixel values in an area times the area of one pixel). Appropriate background subtraction and standard curve calibration are necessary before integrated density analysis of dot blots. Densitometric applications require transparency scanners (see Section 6.2, Figure 6.1) when scanning and scanners or cameras that can capture in non-lossy formats. To quantitatively capture the transmittance of light through a microscope or other lens system with a digital camera, it is important to set the camera to a mode where there is no background processing by the camera, such as amplification (no auto-gain). Consumer-grade digital single-lens reflex cameras (see Section 5.3, Table 5.1 and Section 5.14) can alter pixel values using a picture style setting. Make sure to use “faithful” or “neutral” settings that do not adjust color and intensity (see Section 10.1, Figure 10.1). Use a non-lossy format (TIFF or RAW) for capture. It is important to set the intensity, color temperature (voltage on tungsten bulbs), and white balance correctly (see Sections 9.3–9.6). When comparing specimen intensities quantitatively over time, the camera settings should remain constant. In addition to the image of the specimen, it is important to (1) record the dark image with the light off to show any “hot pixels” or fixed pattern noise (these are camera chip flaws) and (2) take an image of the incident light with the specimen removed from the field. Panel 7.1 shows the image calculation for 8-bit transmittance (Equation 7.5).
Transmittance image =
255 * (Specimen image − Dark image)
(Incident light image − Dark image)
.
(7.5)
7.10 Morphometric Measurements Quantify the Geometric Properties of Selections
Panel 7.1 Image ratio operations for quantitative calculation of specimen transmittance and fluorescence ratio imaging. A) Transmittance
1) Set manual camera, color balanced light source. 2) Acquire dark image (no light on). 3) Acquire incident light image (no specimen in field). 4) Acquire specimen image. Transmittance image =
255 * (Specimen image − Dark image)
(Incident light image − Dark image)
.
(7.5)
This gives an 8-bit gray scale image. Multiplying by 65,535, instead of 255, gives a 16-bit gray scale image. B) Ratio imaging
1) Acquire background fluorescence image in absence of fluorescent dye at environment sensitive wavelength, B(λs), and environment insensitive wavelength, B(λi). 2) Acquire dye-only image under known environment for shade correction at both wavelengths S(λs) and S(λi). This can be saved as a standard reference for future work. 3) Acquire images of fluorescent dye in cells at both sensitive, F(λs) and insensitive F(λi) wavelengths. Equation 7.6: Ratio image =
(F (λs) − B(λs))(Sλi) / ((Fλi) − B(λi))(Sλs).
(7.6)
This ratio image will be floating point ratios usually less than 10. Consequently, to visualize the ratio using a 256 gray-level image, the ratios usually have to be multiplied by 25 or more. Calibration and calculation of calcium are in Section 17.4, Figure 17.10, equation 17.2.
Note that these operations take place on entire images, and image segmentation takes place afterward. Measuring fluoresced light is important for colocalization of fluorescent molecules in cells (see Section 11.10) and for analysis of their environment, such as pH, by ratio imaging. Ratio imaging divides the image intensity values taken at an environment-sensitive wavelength by the image intensity values taken at an environment-insensitive wavelength Panel 7.1 and Equation 7.6. Ratio image = ((F ( λs) − B( λs))(Sλi ) / ((Fλi ) − B( λi ))(Sλs).
(7.6)
in which F(λs) and F(λi) are images intensity values at sensitive and insensitive wavelengths, respectively; B(λs) and B(λi) are the intensity values of background images at the environment-sensitive and environment insensitive wavelengths, respectively; and (Sλi) and (Sλs) are shade-correction images taken of a uniform field of dye under a coverslip. The calculation of free ion concentration involves knowing the dissociation constant of the dye with the ion and the intensity values of images at saturation with the ion (see Section 17.4, Figure 17.10, Equation 17.2). The use of CCDs for image capture provides a quantitative measure of the light emitted from the sample because their electronic output varies linearly with photon input, above a certain noise floor (see Section 5.11, Figure 5.15.). Quantitation of emitted light in scanning systems, such as confocal microscopes, uses photomultiplier tubes in photon counting mode (a mode with high linearity between photon input and electron output) and standards (e.g., fluorescent beads) to adjust settings. In all of these approaches, it is necessary to subtract the background intensity prior to quantitation, particularly when there is a relatively low signal-to-noise ratio in any of the images.
7.10 Morphometric Measurements Quantify the Geometric Properties of Selections Measuring the geometric properties of a selection is one of the most fundamental features of any image analysis. The common ones such as length, width, and area do not require definition. Shapes that describe objects using the dimensions of rectangles or ellipses are also common, the shape descriptors (Figure 7.27). The common shape descriptors are aspect ratio (Width/Length of a rectangle; see aspect ratio of video, Table 1.2, Section 1.2), major axis/minor axis of an ellipse,
155
156
7 Measuring Selected Image Features
Figure 7.27 Common measures and shape descriptors of selected objects. Feret’s diameter: The longest distance between any two points along the selection boundary, also known as maximum caliper. Bounding rectangle: The smallest rectangle enclosing the selection. For ImageJ, the results headings are BX, BY, Width, and Height, in which BX and BY are the coordinates of the upper left corner of the rectangle. Fit Ellipse fits an ellipse to the selection. In ImageJ, the results headings are Major, Minor, and Angle. Major and Minor are the primary and secondary axis of the best-fitting ellipse. Angle is the angle between the primary axis and a line parallel to the x-axis of the image. The coordinates of the center of the ellipse are displayed as X and Y if Centroid is checked. Perimeter: the length of the outside boundary of the selection. Image by L. Griffing.
circularity (4πArea/Perimeter2), roundness (4Area/πMajor axis2), and solidity (Area/Convex Area). Convex area is the area under a convex hull, or convex perimeter, which is the perimeter without any inward dimples. A Feret’s diameter is the length of an object in a specific direction. Other shape descriptors are curl (Length/Fiber length – compare with tortuosity), convexity (Convex perimeter/Perimeter), extent (Area/Area of bounding rectangle), and elongation (Fiber length/Fiber width). Using some of these measurements as filters produces subclasses of selected objects for analysis. For example, automatic selection techniques may exclude small objects from the analysis by simply telling the program not to analyze anything below a certain area. ImageJ uses circularity as one of the default classifiers or filters for selections. So, when measuring objects in ImageJ, it is useful to get approximate sizes and circularities of objects of interest and use them to filter the final results. Spatial moments offer very powerful ways of describing the spatial distribution of values (available in the set measurements dialog box in ImageJ). Center of mass, the first spatial moment, is the average value of the X and Y extent of the object. But note that spatial moment defined here is brightness, or intensity, related. It is often of interest to describe the way that intensities vary within an object using the spatial moment measures, skewness and kurtosis. Skewness describes whether the intensity distribution is symmetrical in the object or skews to the left or right. Kurtosis reports whether the intensity distribution is Gaussian or normal, if it is flatter than normal, and if it is bimodal or multimodal. The eye is only sensitive to about 60 levels of gray (see Sections 2.1, 2.2, and 4.2). When present as a gradient, it is often difficult to see small changes in the distributions of intensity. Skewness and kurtosis provide a measurement.
7.11 Multi-dimensional Measurements Require Specific Filters Measurements of objects in images is not just 2D. This chapter focuses on 2D measurements primarily using ImageJ and FIJI (see the end of the chapter for ways to cite this software in publications). However, the third spatial dimension and the time dimension are very important. The reality of image acquisition is that distortion or aberration often occurs differently
Annotated Images, Video, Web Sites, and References
in different dimensions. Images suffering from these problems use specific filters based on our knowledge of the imaging systems. The next several chapters present the theory behind imaging systems and how it can inform pre- and post-processing of images. There are many geometrical, temporal, and Fourier domain operations, or filters, for multi-dimensional images after acquisition.
Annotated Images, Video, Web Sites, and References 7.1 Digital Image Measurements Are Part of the Image Metadata Figure 7.3. Interface from Cell Profiler: https://cellprofiler.org. This program includes several advanced algorithms (colocalization, for example) as well as interesting educational tutorials in high-throughput screening. Example: Tian, R., Gachechiladze, M.A., Ludwig, C.H., et. al. 2019. CRISPR interference-based platform for multimodal genetic screens in human iPSC-derived neurons. Neuron 104(2): 239–255.e12. doi. PMID: 31422865 PMCID: PMC6813890. IPython: https://ipython.org. A python scripting interface that serves as a python kernel for Jupyter notebooks. Jupyter: https://jupyter.org. Open-source scientific, calculating, and imaging notebook. Includes workflows for machine learning. Julia: https://julialang.org. Open-source programing language designed for high-performance computing. Several opensource image processing programs are in Julia. Omero: https://www.openmicroscopy.org/omero. Secure central repository for image data that provides internet access for processing, analyzing, and annotating images. VisTrails: https://www.vistrails.org/index.php/Main_Page. An open-source scientific workflow and provenance management system that supports data exploration and visualization. The latest version of this is from 2016. Paraview: https://www.kitware.com. Open-source data analysis and visualization toolkit. KNIME: https://www.knime.com. Open-source (enterprise versions are proprietary) automated workflow, testing, and validation for scientific data processing and analytics. Provides extensions for image processing and can access many different kinds of databases.
7.2 The Subject Matter Determines the Choice of Image Analysis and Measurement Software Table 7.1: ArcGIS: https://www.arcgis.com/index.html GRASS: https://grass.osgeo.org Photomodeler: https://www.photomodeler.com DLTdv: http://biomech.web.unc.edu/dltdv. Tyson Hedrick’s page. Used for analysis of animal flight. MorphoJ https://morphometrics.uk/MorphoJ_page.html TpsDig and several other statistical analysis software packages by Rohlf are at http://sbmorphometrics.org. Click on Software and then Data Acquisition: These have been around for many years, with updating for new operating systems. The site provides links to other morphometric software. Brainvoyager: https://www.brainvoyager.com. This includes functional and structural MRI analyses. Connectome Workbench: https://www.humanconnectome.org/software/connectome-workbench This software uses Human Connectome Project (HCP) magnetic resonance imaging files. Imaris: https://imaris.oxinst.com. This is the Andor’s image processing and analysis software. They have other image acquisition software for their cameras. ITK: https://itk.org. ITK is an open-source segmentation and registration toolbox. VTK: https://vtk.org. This is a 3D “Visualization Toolkit” that is open source available from the open-source software company Kitware. 3D slicer, Paraview, Tomviz, Resonant, CMB, and Kwiver are also available from Kitware: https:// kitware.com. Kwiver analyzes video (marine, aerial, and so on). Resonant is image data management software. Tomviz is tomographic reconstruction and analysis software. IMOD: https://bio3d.colorado.edu/imod. IMOD is an open-source package for 3D analysis of electron micrographs and electron tomography.
157
158
7 Measuring Selected Image Features
EMAN2: https://blake.bcm.edu/emanwiki/EMAN2. EMAN2 is an image analysis package primarily for single particle cryo-EM reconstruction. vLUME: https://github.com/lumevr/vLume/releases. VR viewing of superresolution data sets. Stereoinvestigator: http://www.mbfbioscience.com/stereo-investigator. Stereological analysis of cell volumes. CellInsight High Content Screening: https://www.thermofisher.com/us/en/home/life-science/cell-analysis/cellularimaging/high-content-screening/high-content-screening-instruments/hcs-studio-2.html Operetta HCS from Perkin Elmer: https://www.perkinelmer.com/category/operetta-cls-high-content-analysis-system. This is hardware with a software package called Harmony. MetaXpress: https://www.moleculardevices.com/products/cellular-imaging-systems/acquisition-and-analysissoftware/metaxpress. High-content imaging (time and z-dimension stack capable) from molecular devices. iGen: https://www.thorlabs.com/newgrouppage9.cfm?objectgroup_id=7476. A software package to analyze cytometric readings from ThorLabs iCyte machine. Metamorph: https://www.moleculardevices.com/products/cellular-imaging-systems/acquisition-and-analysissoftware/metamorph-microscopy. Image acquisition, processing, and analysis for several different makes of microscopes from Molecular Devices. ImageJ: https://imagej.nih.gov/ij. The open-source image analysis package used for examples in this book. Several of the plugins to ImageJ are in a package called FIJI: https://fiji.sc. FIJI stands for FIJI Is Just ImageJ. ImageJ is generally cited using the following reference: Schneider, C.A., Rasband, W.S., and Eliceiri, K.W. 2012. NIH image to ImageJ: 25 years of image analysis. Nature Methods 9(7): 671–675, PMID 22930834. FIJI is cited using the following reference: Schindelin, J., Arganda-Carreras, I., Frise, E., et al. 2012. Fiji: an open-source platform for biological-image analysis. Nature Methods 9(7): 676–682. PMID 22743772. doi:10.1038/nmeth.2019. Icy: http://icy.bioimageanalysis.org. Another open-source image processing and analysis package that offers a variety of tools, including a protocols tool for data management and pipeline construction. Science GL: http://www.sciencegl.com. 3D visualization for the range of scales from atomic force microscopy to LiDAR (light detection and ranging). Image SXM: https://www.liverpool.ac.uk/~sdb/ImageSXM. Image processing for many forms of scanning microscopy (atomic force, laser scanning, scanning near-field optical microscopy, scanning electron micrograph) using the Apple operating systems and based on the original NIH
7.3 Recorded paths, ROIs, or masks save selections for measurement in separate images, channels, and overlays ImageJ User Guide introduces all of the basic functionality of ImageJ. It is at https://imagej.nih.gov/ij/docs/guide/userguide.pdf. Photoshop Essentials, a third-party web tutorial provider, has tutorials on making selections in Photoshop: https://www. photoshopessentials.com/basics/make-selections-photoshop.
7.4 Recorded Paths, Regions of Interest, or Masks Save Selections for Measurement in Separate Images, Channels, and Overlays For a discussion of manual versus automated segmentation in medical image analysis, see Suetens, P. 2009. Fundamentals of Medical Imaging. Cambridge University Press, Cambridge, UK. pp. 159–189. This chapter on medical image analysis also has many medical examples of matching of segmented shapes and analysis of image deformation, along with the math.
7.5 Stereology and Photoquadrat Sampling Measure Unsegmented Images For a good discussion of assumption versus design-based methods and the limitations on sample size and disector analysis, see West, M.J. 1999. Stereological methods for estimating the total number of neurons and synapses: issues of precision and bias. Trends in Neuroscience 22: 51–61.
Annotated Images, Video, Web Sites, and References
7.6 Automatic Segmentation of Images Selects Image Features for Measurement Based on Common Feature Properties For some automated segmentation approaches see Prodanov, D. and Verstreken, K. 2012. Automated segmentation and morphometry of cell and tissue structures. Selected algorithms in ImageJ. In Molecular Imaging. Ed. by B. Schaller. IntechOpen. Open Access Publisher, London. pp. 183–207. For a discussion of the problems with manual approaches to segmentation in medicine, see Suetens, P. 2009. Medical image analysis. In Fundamentals of Medical Imaging. Cambridge University Press, Cambridge UK. pp. 160–183.
7.7 Segmenting by Pixel Intensity Is Thresholding The ImageJ wiki has a nice description of all of the automatic segmentation protocols and their references. It is at https:// imagej.net/Auto_Threshold. A mathematical discussion of thresholding, automatic gray-level segmentation, and region growing is in Gonzalez, R.C. and Woods, R.E. 2008. Digital Image Processing. Third Edition. Pearson Prentice Hall, Upper Saddle River, NJ. pp. 760–790.
7.8 Color Segmentation Looks for Similarities in a Three-Dimensional Color Space The ImageJ Wiki at https://imagej.net/SIOX:_Simple_Interactive_Object_Extraction describes the Simple Interactive Object Extraction (SIOX) color segmentation plugin for ImageJ and FIJI. A more complete description of the WEKA trainable segmentation is in the ImageJ Wiki is at https://imagej.net/ Trainable_Weka_Segmentation. It does more than just segment based on color and is at the interface of machine learning and image processing. The publication describing it is Arganda-Carreras, I., Kaynig, V., and Rueden, C., et al. 2017. Trainable WEKA segmentation: a machine learning tool for microscopy pixel classification. Bioinformatics 33(15): 2424– 2426. PMID 28369169. doi:10.1093/bioinformatics/btx180. A feature of the trainable WEKA segmentation is the use of the use of the Hessian matrix. This is an integral of the derivative values in the user-defined vicinity of the target pixel. It multiplies the target pixel’s neighbor values by a value that shrinks (following a Gaussian function) as the distance increases from the target pixel. This not only produces corner points (as does the Harris matrix) but also stable points that do not vary with affine transformations (scale, rotation, translation, skewing). Scale-invariant transforms are an important part of SIFT (scale invariant feature transform; see Chapter 12) and structure-from-motion algorithms (see Chapter 14).
7.9 Morphological Image Processing Separates or Connects Features A discussion of opening, closing, skeletonization, Euclidean distance mapping, and watershed operations as well as other binary image processing is in Russ, J.C. 2007. Processing digital images. In The Image Processing Handbook. Fifth Edition. CRC Press, Boca Raton, FL. pp 443–510. See also Bradbury, S. 1989. Micrometry and image analysis. In Light Microscopy in Biology: A Practical Approach. First Edition. Ed. by A.J. Lacey. IRL Press, Oxford, UK. pp. 187–220. A mathematical discussion of morphological image processing is in Gonzalez, R.C. and Woods, R.E. 2008. Digital Image Processing. Third Edition. Pearson Prentice Hall, Upper Saddle River, NJ. pp. 649–702. A description of the implementation of the Hough transform is in the ImageJ Wiki: https://imagej.net/ Hough_Circle_Transform.
7.10 Measures of Pixel Intensity Quantify Light Absorption by and Emission from the Sample A description how to use ImageJ and FIJI to analyze gels in the Analyze>Gels submenu is at https://imagej.nih.gov/ij/ docs/menus/analyze.html. This also has references (links) to other tutorials there for different kinds of blots, such as dot blots and Western blots.
159
160
7 Measuring Selected Image Features
7.11 Morphometric Measurements Quantify the Geometric Properties of Selections Features, such as shape and pixel intensity, are in ImageJ and FIJI in the Analyze>Set Measurements command, and its documentation is at https://imagej.nih.gov/ij/docs/menus/analyze.html. A mathematical discussion of boundary values and shape measures is in Gonzalez, R.C. and Woods, R.E. 2008. Digital Image Processing. Third Edition. Pearson Prentice Hall, Upper Saddle River, NJ. pp. 817–879.
7.12 Multi-dimensional Measurements Require Specific Filters See Chapters 10–13 for multi-dimensional filters.
161
8 Optics and Image Formation 8.1 Optical Mechanics Can Be Well Described Mathematically There are two types of mathematics to describe lens systems considered here: geometry and Fourier analysis. Geometric optics (right panel in Figure 8.1) describes lens systems using idealized rays of light refracting as they enter and exit curved objects (lenses) of different refractive indices. Ray diagrams show how the rays travel, representing the bending of the wave front as angles in the straight lines of light (Figure 8.2). Geometry also describes where important parts of the imaging process reside physically, such as the importance of the diffraction pattern in the rear focal plane of the lens and the importance of lens apertures for depth of field and focus and lens aberrations. Fourier analysis of optics or Fourier optics (left panel in Figure 8.1) is able to parse the object into a group of spatial frequencies, or sizes. Fourier transforms into frequency space and inverse transforms out of frequency space mathematically model the separate stages of the imaging process (see Figure 8.1). Fourier analysis describes the point spread function (PSF) in a way that will make consideration of superresolution, that is, not diffraction-limited, imaging understandable in Chapter 18. Fourier analysis will revisit the modulation transfer function (MTF), which is the Fourier transform of the PSF, for determining the quality of an imaging system. Geometric optics and Fourier optics illuminate final image formation, whereby the final image of the object is the object convolved by the PSF (see Figure 8.1).
8.2 A Lens Divides Space Into Image and Object Spaces A converging lens has two focal points, which are the meeting points of parallel rays striking the opposite side of the lens. Parallel rays of light from one side converge at the focal point on the other. Parallel rays of light are collimated rays of light. Collimated rays focus on the focal point of a converging lens in the focal plane (see Figure 8.2). The distance between the optical “center” of the lens (the lens itself if it is very thin) and the focal point is the focal length of the lens. The lens divides space into object space and image space. Object space contains the object examined, and image space contains the image generated by the optics (Figure 8.3). Image space also contains the rear focal plane of the lens. Although this may seem trivial, it is important to name these spaces; otherwise, concepts such as depth of focus and depth of field are easily confused. In addition, because optics are reversible (e.g., there are two symmetrically positioned focal points and focal planes of the thin lens in Figure 8.3), designating these spaces provides names for the different focal points and planes; the one in image space is the rear focal plane, and the one in object space is the front focal plane. Good focus produces a good image of the object. During focusing, light from a three-dimensional (3D) object is projected onto a two-dimensional plane (e.g., chip) or onto a curved surface (e.g., retina). This is the primary image plane (see Figure 8.3). A projectable image is a real image, produced by converging rays of light. A non-projectable image is a virtual image, produced by diverging rays of light, such as the image seen in a mirror.
Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
162
8 Optics and Image Formation
Figure 8.1 A comparison of mathematical image formation from a point source and geometrical image formation of an object from lenses. In the geometrical pathway, the lens system collects the light from the object. Diffraction of the spot produces a threedimensional diffraction pattern in the rear focal plane. The ability of the lens to transmit information about the object depends on the number of orders of diffraction the lens collects. Lenses with large apertures can collect many orders and, when the rays of light interfere with each other at the primary image plane, form an image of the object. In the mathematical pathway, the point source is an impulse function of intensity I(x,y,z), a spot of infinitely high intensity with an integrated area of unity. A Fourier transform of the point, F(I(x,y,z)), produces the mathematical correlate of the diffraction pattern and generates a continuous transfer function. Circular apertures limit the number of orders in frequency space. An inverse Fourier transform (F(F(I(x,y,z))) generates a point spread function (PSF). Convolution operations describe the limitations of lens apertures and aberrations. The object convolved by the PSF is equivalent to the image in the primary image plane. Diagram by Holly C. Gibbs and adapted by L. Griffing. Used with permission.
Figure 8.2 Ray diagram of a thin lens that focuses collimated (parallel beams) light onto a focal point, F. There is an equidistant point on the opposite side of the lens that is also a focal point. A distance called the focal length of the lens separates them from the optical “center” of the lens. The focal plane of the lens includes all of the points in the plane parallel to the lens, including the focal point. There are a front focal plane and a rear focal plane for every converging lens. Diagram by L. Griffing.
8.3 The Lens Aperture Determines How Well the Lens Collects Radiation
Figure 8.3 The object, in this case a grizzly bear, is in object space. The image that comes to focus in the primary image plane resides in image space. Diagram by L. Griffing.
8.3 The Lens Aperture Determines How Well the Lens Collects Radiation The light-collecting capacity of the lens, or its aperture, determines the resolving power of a lens, with narrow apertures giving less resolution (see Section 5.14). Fish-eye or hemispherical lenses collect light from the extreme periphery of the lens and therefore have wide object-side apertures (Figure 8.4). The aperture of a lens can be restricted by placing an opaque material with a hole of variable width in object or image space. The metal apertures used in a transmission electron microscope (TEM) are very important because they limit the resolution of that form of microscopy (see later discussion of spherical aberration). In photography and light microscopy, variable apertures on a lens system reduce glare (collection of ambient light produced outside the scene of interest) and accommodate the lighting of a scene to the sensitivity of the imaging device (eye, film, video camera). These adjustable apertures are irises or iris diaphragms. (They are often mistakenly called diaphragms, which just means a thin sheet of material but does not indicate that there is a hole in it.) In photography, the apertures are “stops” because they stop light from getting to the lens or imaging chip. Numerical aperture (NA) is the product of the half-angle of light acceptance by the lens and the refractive index of the medium in object space, n sinα (see Section 5.14). Using immersion media, such as water or oil, produces larger NAs (Figure 8.5). The NAs of the objective and condenser are very important for light microscopy because they set the diffraction limit of resolution, as originally formulated by Ernst Abbe. Most objectives and condensers usually have their NAs inscribed on them (see Figure 8.29).
Figure 8.4 (A) A thin convex lens does not collect light efficiently from the peripheral field, giving it a low object-side aperture but a less distorted image. (B) A spherical lens has a high object-side aperture. It collects light from the peripheral field, producing not only wide object-side aperture but also severe spherical aberration. Diagram by L. Griffing.
163
164
8 Optics and Image Formation
Figure 8.5 (A) A non-immersion medium produces a narrow object-space aperture. (B) An immersion medium of high refractive index increases the effective aperture of the lens. Diagram by L. Griffing.
8.4 The Diffraction Limit and the Contrast between Two Closely Spaced Self-Luminous Spots Give Rise to the Limits of Resolution Diffraction is the scattering of light by an edge (see Section 5.14). When imaging a small hole with light shining through it, the light diffracts into concentric rings of alternating light and dark intensities (Figure 8.6) caused by alternating bands of destructively interfering and constructively interfering beams of light (see Figure 5.27). The central disk of light is the Airy disk, and the disk plus the diffraction rings is the Airy pattern. The minimum separation between the two slits that gives rise to the first order of constructive interference occurs when the distance, BC, shown in Figure 8.7A, also equivalent to dmin sinα by similar triangles, is one wavelength λ of light in the medium of refractive index, n. As shown in the diagram, this relationship produces the equation, dmin = λ / nsinα or λ / NA.
(8.1)
Extending this treatment to a single lens using collimated (parallel beams) of light produces its limit of resolution, λ/NA. The orders of diffraction collected by a lens are in its rear focal plane (Figure 8.7B). Only large-NA lenses can collect the highly scattered light, the higher orders of diffraction, making them capable of smaller dmin and higher resolution. Likewise, using shorter wavelengths of light produces higher resolution. A small aperture placed in the same plane as the diffraction pattern limits the projection of the higher orders of diffraction, making objects responsible for these higher orders unresolvable in the image plane. A sub-resolution self-luminous spot or point (either a fluorescent bead for fluorescence microscopy or sub-resolution hole in an opaque film for trans-illuminating microscopy) imaged with a microscope objective with smaller apertures (Figure 8.8) produces more blur or more “spread” to the point and generates larger Airy disks with fewer outer diffraction rings. This image of a point object is the PSF. An intensity plot across the Airy pattern of a sub-resolution point shows that the orders of diffraction decrease in maximum intensity for the zero-order disk, the first-order ring, the second-order ring, and so on (see Figure 8.8). The zero-order maximum has a slope that defines the spread of the point and in two dimensions defines the Airy disk. Larger points have zeroorder maxima with more gradual slopes, larger diameter Airy disks, and lower orders of diffraction. Hence, the diffraction pattern is a visual Figure 8.6 An Airy disk pattern of diffraction map of the sizes of objects in the sample, or their reciprocal value, the from a self-luminous point. The central disk is the spatial frequencies of the sample. Things with low spatial frequencies Airy disk, which together with the outer rings make an Airy pattern. Diagram by L. Griffing. (larger objects) have large PSFs and low orders of diffraction, while
8.4 The Diffraction Limit and the Contrast between Two Closely Spaced Self-Luminous Spots Give Rise to the Limits of Resolution
Figure 8.7 (A) If two slits are separated by a value, dmin, then the light passing through them will diffract and constructively interfere at Z (traveling through refractive index, n) if the optical path difference between them (nBC) is an integer multiple of the wavelength (mλ). Because nBC = dminsinα, this gives the relationship Equation 8.1: dmin = m λ/nsinα or λ/NA in which m = 1. (B) Diffracted light from an object illuminated with coherent, collimated light forms the first-order maximum ring on the Airy disk in the rear focal plane of the lens. Diagrams by L. Griffing.
Figure 8.8 The point spread function (PSF) is the image of a point object. Examples of the point spread function in two dimensions (2D) include the image and in one dimension (1D) include the intensity plot of the image across the center of the image (blue line in top figure). Infinity-corrected objectives require a tube lens as shown to form the image. Adapted from Kurt Thorn. The Regents of the University of California. Used with permission.
165
166
8 Optics and Image Formation
those with high spatial frequencies (small objects) have small PSFs and higher order rings of diffraction. Reducing apertures limits the number diffraction rings and increases the diameter of the Airy disk. In a chain of optical elements, such as condenser lenses, objective lenses, projection lenses, and cameras, the element with the most spread limits the spread of the entire chain; it is only as strong as its weakest link. Consequently, the apertures of both the condenser and the objective limit the effective aperture of compound microscopes, as described in Section 5.15, Figure 5.30. The number of orders of diffraction collected with an objective increases by the addition of a condenser (Figure 8.9) because the angle of light acceptance is doubled to 2sinα when the condenser has the same NA as the objective, as originally described by Ernst Abbe. If the condenser has a smaller aperture than the objective, then dmin = λ / (NAcondenser + NAobjective).
(8.2)
For microscope with objectives and condensers of equal aperture, the Abbe diffraction limit of resolution or the Abbe criterion for resolution is dmin = λ / 2nsinα or 0.5 λ / NA.
(8.3)
Resolving objects requires collection of their orders of diffraction. If the NA is 1, only objects with a size half the wavelength of the illuminating radiation (e.g., 200-nm objects illuminated by 400-nm blue light), scatter, or diffract, enough of the light to form these higher orders of diffraction. However, depending on the manner of detection of this signal, there are different criteria for resolution. For the eye, we use the Rayleigh criterion, and for electronic devices, we use the Sparrow criterion as described later. Superresolution methods, however, can resolve objects that are smaller than the limit imposed by diffraction (see Section 18.3, Table 18.1). If there are two self-luminous points, their minimum distance of separation, dmin, in the x-y plane is the lateral resolution of the system. They can only be resolved if there is sufficient contrast between them; in other words, there has to be enough of a dip in intensity between them so that they appear as two separate objects and not just one elongated object. Because eyes and electronic sensors have differing abilities to detect the contrast of this dip, there are different visual and electronic limits of resolution (Figure 8.10). Eyes can detect a 19% intensity drop between the two intensity peaks in the spread function. For this to occur, the zero-order maximum of one point can get no closer than the first-order minimum of the other. The radius of the first dark ring, the first-order minimum, is 0.61 λ/NA. Hence, to produce the Rayleigh criterion for visual resolution, multiply the Abbe criterion by 1.22: dmin = 1.22 λ / 2nsinα = 0.61 λ / NA.
(8.4)
Figure 8.9 Adding a condenser produces convergent, coherent light that doubles the angle of light acceptance α and thereby changes the resolvable dmin to Equation 8.3: dmin = λ / 2 nsinα or 0.5λ / NA. the Abbe criterion of resolution. Diagram by L. Griffing.
8.5 The Depth of the Three-Dimensional Slice of Object Space Remaining in Focus Is the Depth of Field
Figure 8.10 (A) Airy patterns of two points that are easily resolvable because their zero-order maxima peaks (spreads) do not overlap. (B) The Rayleigh criterion for the limit of resolution: The zero-order maxima (Airy discs) overlap, creating a 19% intensity drop between them needed sensing by the eye. (C) The Sparrow criterion for resolution. The zero-order intensity drop can be much smaller for electronic sensors than for the eye. Diagram by L. Griffing.
Electronic sensors can detect intensity drops that approach 0. To achieve the Sparrow criterion for electronic resolution, multiply the Abbe criterion by 0.95: dmin = 0.95 λ / 2nsinα = 0.47 λ / NA.
(8.5)
These are criteria for lateral resolution in the x-y plane. We now consider focus and resolution in the third, z or axial, dimension.
8.5 The Depth of the Three-Dimensional Slice of Object Space Remaining in Focus Is the Depth of Field Clear focus of a 3D slice of object space can occur over a range of distances in object space that depends on the aperture, or NA, of the lens. The depth of the 3D slice of object space that remains in focus in image space is the depth of field, or in microscopy, the optical section thickness. As described in Section 5.15, Figure 5.31 and as shown with a lens diagram in Figure 8.11, as apertures decrease, the depth of field increases. The diameter of the circle (the so-called circle of confusion) produced by the crossed beams (cone) of light at the planes in object space where the object is in acceptable focus is the same in the high- and low-aperture cases in Figure 8.11, but just the distance between the planes, the depth of field, is different. In lenses in which aberrations limit the usable NA, such as those used in photography (see Table 5.6 and Figure 5.28) and electron microscopy, the optical geometry, which includes magnification, determines the axial resolution. It is based on the geometric properties of lenses and the Gaussian lens formula, 1 / f = 1 / u + 1 / v ,
(8.6)
in which f is focal length, u is distance to object, and v is distance to image (Figure 8.12). Magnification is the ratio of image size to object size. By similar triangles, it is also the ratio of the image distance to the object distance: M = v / u.
(8.7)
167
168
8 Optics and Image Formation
Figure 8.11 (A) With a fully open aperture, a lens has a very shallow depth of field. (B) Restricting the aperture with a “stop” or iris diaphragm in front increases the depth of field. Diagram by L. Griffing.
Figure 8.12 The lens equation, Equation 8.6: 1 / f = 1 / u + 1 / v, relates the focal length f, the distance between the focal point F, and the lens to the acceptable distances of the object and an image from the lens u and v. Diagram by L. Griffing.
8.5 The Depth of the Three-Dimensional Slice of Object Space Remaining in Focus Is the Depth of Field
Figure 8.13 Relationship of lens distance to image plane and object and magnification. Image by L. Griffing.
In other words, when using a magnifying glass, if you put the glass closer to the object, the object appears bigger, but you have to look at it from farther away. The image gets bigger, and the image plane recedes; this is the concept behind a bellows close-up camera (Figure 8.13) and extension rings for lenses. Most photographic cameras do not have bellows or movable image planes – the distance between the lens and imaging chip remains approximately the same, whether using a large or small focal length camera lens. So, the important relationship is not how far the image plane is from the lens (this is relatively constant) but how close the object is to the lens relative to the focal length. As the focal length increases, the object can be farther away and appear the same size. In other words, long lenses (long focal length lenses or telephoto lenses) magnify. As image modifiers are added to generate contrast in a light microscope in the rear focal plane of the objective (see Sections 16.7–16.11), they often increase the distance to the primary image plane, thereby increasing the magnification in older objective lenses that have a fixed distance to the primary image plane. This distance is the optical tube length. The optical tube length was conventionally 160 mm. To overcome this magnification change and other problems, newer microscope objectives use infinity corrected optics, meaning they do not make a primary image without a tube lens. The tube lens can compensate for the addition of other optical elements, keeping the magnification constant. The optical tube length (either infinity or 160 mm) appears on the objective mount (see Section 8.11, Figure 8.29). Magnification also changes the relationship between depth of field and depth of focus. As shown in Figure 8.14, with closer objects (and higher magnification), the depth of field diminishes, while the depth of focus increases. This relationship is d ' = d× M2 ,
(8.8)
in which the depth of focus, d’, of a lens is related to its depth of field, d, by the square of the magnification. Hence, at a constant depth of focus, a 20×, f/11 telescope lens has a shallower depth of field than a 5×, f/11 telephoto lens (see Section 9.4, Figure 9.9). A 100× microscope objective with 1.3 NA has a much shallower depth of field than a 0.16 NA, 4× objective, but it also has a much deeper depth of focus. Likewise, electron microscopes with very high magnifications have very deep depths of focus. The camera could be meters below the floor, and the object will still be in focus!
169
170
8 Optics and Image Formation
Figure 8.14 Changing the magnification can increase depth of focus. u is the distance to the object, and v is the distance to the primary image plane. Diagram by L. Griffing.
8.6 In Electromagnetic Lenses, Focal Length Produces Focus and Magnification Magnetic fields can focus electrons. Magnetic fields push (never pull) electrons. Because the electrons can be pushed, they can be focused (i.e., made to converge on a point). Unlike light lenses, which can both focus light with convergent, convex lenses and spread light with divergent, concave lenses, electron lenses only focus electrons by making them converge to a point. An electromagnet, a coil of wire, or solenoid, through which current flows, sets up the magnetic field. The path of the electron in a magnetic field is a spiral. Therefore, to focus an electron, just increase the magnetic field to make the electron move in tighter and tighter spirals. Both the design of the electromagnetic lens and the current through the solenoid can change the magnetic field strength. Soft iron enshrouds electron lenses. The inner rings of the iron shroud, the pole pieces, generate a small, but intense magnetic field that focuses the electrons (Equation 8.9). f = K ( V / i2 ),
(8.9)
in which f is the focal length, K is a constant based on number of turns in lens coil wire and geometry of lens, V is accelerating voltage, and i is milliamps of current passed through the coils. By changing the current flowing through the solenoid, the focal length of the lens changes (Figure 8.15). Changing the focal length changes focus, so unlike light microscopes, electron microscopes do not move the object relative to the lens but rather change the focal length of the lens. As with light microscopes, changing the focal length also changes magnification.
Figure 8.15 The construction of an electromagnetic lens. (A) An iron-enshrouded solenoid of copper wire turnings. (B) The pole pieces produce a more intense electromagnetic field to help focus the electrons. Lenses change focal length with increasing current: Equation 8.9: f = K (V/i2) in which K is a constant based on number of turns in lens coil wire and geometry of lens, V is the accelerating voltage, and i is milliamps of current passed through the coils. Diagram by L. Griffing.
8.7 The Axial, Z-Dimensional, Point Spread Function Is a Measure of the Axial Resolution of High Numerical Aperture Lenses
8.7 The Axial, Z-Dimensional, Point Spread Function Is a Measure of the Axial Resolution of High Numerical Aperture Lenses In higher NA lenses, the axial (z-dimension) PSF determines axial resolution (and depth of field), just like the lateral (x-y dimension) PSF does for lateral resolution. Through-focus imaging of the sub-diffraction point produces the axial PSF. To record the axial and lateral PSF, take images of a sub-diffraction (e.g., 150 nm) fluorescent bead focusing up and down at distances away from the position where you get optimal lateral focus (Figure 8.16A). Figure 8.16B and C show the axial view of the PSF. Axial resolution, or dmin (z) is a quarter of the distance between the two first-order minima, or half the radius to the first-order minimum (Figure 8.16D). This distance exceeds that of the lateral resolution and depends on wavelength, refractive index, and NA: dmin (axial) = 2 λ n / (NA )2 ,
(8.10)
in which λ is the wavelength of light, n is the refractive index, and NA is numerical aperture. Notice that the NA of the lens has more of an effect on axial than on lateral resolution because axial resolution depends on the square of the NA, while lateral resolution depends on the first power of the NA. In addition, higher values of refractive index increase axial dmin while decreasing lateral dmin.
Figure 8.16 The z-dimensional point spread function (PSF) determines the axial resolution. (A) A 28-frame through-focus, optical section series of a point. (B) The 28 frames are stacked and viewed on their sides. (C) An enlargement of B shows the intensity of the axial, z dimensional, projection of the PSF. The intensity plot in D is along the center cyan line. (D) The intensity plot of C shows the axial resolution, dmin, as half the radius of the intensity plot to the first-order minimum. Image (A) is from Adapted from Kurt Thorn. The Regents of the University of California. Used with permission.
171
172
8 Optics and Image Formation
Table 8.1 Resolution and Light-Gathering Power of Microscope Objectives with High Numerical Apertures.a. Mag
NA (imm)
dmin x-y (nm)
dmin z (nm)
LGP
10
0.3
860
12,222
9.0
20
0.75
344
1.955
14.06
40
0.95
272
1218
5.64
40
1.3 (oil)
198
986
10.05
60
1.2 (H2O)
215
1018
4.09
60
1.4 (oil)
184
850
5.4
100
1.4 (oil)
184
850
1.96
a
Using 550-nm light, calculated according to the Sparrow criterion for dmin x-y, the formula for axial resolution for dmin z, and the formula for light-gathering power for transmitted light (LGP) x 10–4. imm, immersion medium; NA, numerical aperture.
8.8 Numerical Aperture and Magnification Determine the Light-Gathering Properties of the Microscope Objective Just as the amount of light doubles with each f-stop opening of the photographic lens (see Section 5.14, Table 5.6), increasing the NA of a microscope objective lens increases the amount of light the lens can collect. For trans-illumination, the lightgathering power of the objective is (NA / M)2 .
(8.11)
The inverse relation to the magnification results from the magnified image covering a larger area, so there is lower intensity per unit magnified area. In epi-illumination microscopies (see Section 17.5, Figure 17.13), the objective lens also acts as the condenser, so its light gathering power is NA 4 / M2 .
(8.12)
Table 8.1 shows the NA, lateral and axial resolution, light-gathering power, and magnification of some common microscope objectives. The light-gathering power of low-magnification, high-NA lenses makes them useful for low-light transmission microscopy (e.g., polarized light and differential interference contrast; see Sections 16.8–16.10), in which the increased light signal can generate good contrast against a dark background. They generate even higher contrast for epifluorescence microscopy because the light-gathering power increases to the fourth power of the NA instead of to the second power of the NA for transmitted illumination. Hence, when choosing an objective or any component of an imaging system, there is a trade-off between the contrast and the resolution. The MTF – the relationship between the spatial frequency of the object being imaged (the reciprocal of its size) and its ability to hold contrast in the image – shows the trade-off (see Sections 5.14 and 5.15, Figures 5.29 and 5.30).
8.9 The Modulation (Contrast) Transfer Function Relates the Relative Contrast to Resolving Power in Fourier, or Frequency, Space There is a mathematical way to make a spatial frequency map like that of the diffraction pattern of the object and visualize it, the Fourier transform. The Fourier transform is a general mathematical tool used to describe waveforms based on the number of frequencies that compose them. As shown in Figure 8.17, simple sine waves are one waveform, while square waves are the sum of many high- and low-frequency waves. We commonly see this in music displays: Fourier transforms decompose music into frequencies to visualize the treble and bass in a piece.
8.9 The Modulation (Contrast) Transfer Function Relates the Relative Contrast to Resolving Power in Fourier, or Frequency, Space
Figure 8.17 Frequencies and their sum approximate a square wave. A Fourier transform of the square wave would give the number of frequencies that make it up. Diagram by L. Griffing.
The Fourier transform is a read-out of the object in terms of spatial waves or cycles of different frequencies. As we know from Section 4.3, light waves have amplitude, direction, and phase. The Fourier domain can also describe these, but we shall graphically leave out phase because it is a complex number (Figure 8.18). To do this, we map these characteristics into Fourier space or frequency space. It is also reciprocal space because it is the reciprocal of the size of image elements. Finally, it is also k-space. Yes, the Fourier domain, frequency space, reciprocal space, and k-space are all synonyms. K-space describes the Fourier domain in all advanced imaging modalities, from magnetic resonance imaging to superresolution microscopies. A Fourier transform decomposes an image into high and low spatial frequencies. The sum of these frequencies generates an image (Figure 8.19), just like the sum of waves required to make a square wave in Figure 8.17. A Fourier transform of a self-luminous point can also give the spread of the point and the spatial frequencies needed to describe the point, just like a diffraction pattern. In other words, a Fourier transform of the image of a self-luminous point also produces a PSF. Figure 8.1 shows how the PSF and the diffraction image relate. Unlike the Airy pattern, however, the PSF is 3D, giving information not only about the x-y plane but also about the z-dimension (see Figure 8.16). The amazing thing about optics is that with further projection, the light from the diffraction pattern interferes and comes into focus in the image plane as an image. In the mathematical world of the Fourier transform or fast Fourier transform (FFT), when computed, getting an image back from Fourier transform involves an inverse Fourier transform or inverse fast Fourier transform (IFFT) when computed (Figure 8.20). Note that the diffraction pattern or Fourier transform has the coordinates of k-space (see Figures 8.18 and 8.20). It has the “0” or zero-order value mapped in the center, and values to either side are given positive values to the right of 0 on kx (up on ky) and negative values to the left of 0 on kx (down on ky). The Fourier transform of the PSF generates a new function, called the optical transfer function (OTF) (Figure 8.21). The OTF without the complex algebra phase information is the now-familiar MTF (also called the contrast transfer function). The MTF represents how much contrast there is with increasing resolving power of optical systems. The plot of
173
174
8 Optics and Image Formation
Figure 8.18 Representation of a single spatial frequency in k-space. The two-dimensional (2D) spatial frequency maps directly onto a 2D k-space coordinate system, giving it a unique value. Adapted from Kurt Thorn. The Regents of the University of California. Used with permission.
Figure 8.19 Multiple spatial frequencies sum to generate an image, in this case of Albert Einstein. Library of Congress / Wikimedia commons / Public domain.
Figure 8.20 The Fourier transform (fast Fourier transform [FFT]) of the image of Einstein produces a diffraction pattern. It is in the Fourier domain, or k-space, with the origin, 0, in the center. An inverse Fourier transform (inverse fast Fourier transform [IFFT]) reproduces the image. Library of Congress / Wikimedia commons / Public domain.
8.9 The Modulation (Contrast) Transfer Function Relates the Relative Contrast to Resolving Power in Fourier, or Frequency, Space
the MTF in Figures 4.4, 5.29, and 5.30 shows how contrast varies with spatial frequency (absolute value), while Figure 8.21 shows how MTF varies in k-space. Figure 5.29 only shows the MTF for trans-illumination microscopy without contrast modifiers. The MTFs of phase contrast microscopy or differential interference microscopy are quite different (see Sections 16.6 and 16.10, Figures 16.18 and 16.38, respectively). In imaging systems, only certain frequencies are observable (i.e., have any contrast and are non-zero on the lateral MTF; see Section 5.15, Figures 5.29 and 5.30). This generates a k-space map of an observable set of spatial frequencies, the observable region in Figure 8.21, that is circular, with equal radii kr in the ky and kx directions. In Figure 8.22, object frequencies 1 and 2 have observable contrast in kx and ky. Object frequency 3 has no contrast in kx, so it cannot be seen. Object frequency 4 has no contrast in either kx or ky, so it cannot be seen. The MTF of the values along kx map to the MTF graph in
Figure 8.21 The modulation transfer function (MTF; actually the optical transfer function when including phase information) is the Fourier transform of the point spread function (PSF). A two-dimensional (2D) or three-dimensional intensity plot versus kx and ky represents the MTF. Diagram by L. Griffing.
Figure 8.22 The modulation transfer function (MTF) along the kx axis. The frequencies with observable contrast (non-zero in the MTF) are points 1 and 2. Points 3 and 4 have no observed contrast and are invisible. Adapted from Kurt Thorn. The Regents of the University of California. Used with permission.
175
176
8 Optics and Image Formation
Figure 8.23 (A) The modulation transfer function (MTF) in the z-dimension is the Fourier transform of the z-dimension point spread function (PSF). (B) The observable areas in the lateral and axial dimensions limit each other so that the three-dimensional MTF becomes essentially the product of the two. FFT, fast Fourier transform; IFFT, inverse fast Fourier transform. Image by L. Griffing.
Figure 8.21. To map it throughout the circle of the observable region, it would need to be plotted to the 3D representation of the two-dimensional MTF, as shown in Figure 8.21. Just as the Fourier transform of the lateral PSF (see Figure 8.21) generates the lateral MTF, so the Fourier transform of the axial z-dimensional PSF generates the axial MTF (Figure 8.23). The axial z-dimensional MTF has a zero value (lowest contrast) at the zero order, unlike the lateral MTF, in which the value is 1 (highest contrast) at the zero order. This occurs because the largest objects do not have high contrast in the depth dimension. If, for example, you are looking through a microscope, you can’t distinguish large objects lined up in front of each other. When the edges (higher spatial frequency) of the largest objects are invisible, the objects themselves have no contrast. Thus, axial contrast is high for intermediatesized objects (decreasing up to the diffraction limit) and low for very large structures. The cross-sectional axial MTF gives the z-dimensional theoretical cut-off frequency, which defines the diffraction limit of resolution. As you can see (see Figure 8.23), the equation for axial resolution used in practice (see earlier) is somewhat different from that used in theory (the reciprocal of the cut-off frequency in Figure 8.23). Combining the observable region from the lateral MTF with the observable region from the axial MTF generates a 3D MTF (see Figure 8.23B). The 3D MTF is important because it provides the observable contrast in three dimensions. When we discuss structured illumination microscopies and light sheet fluorescence microscopy (see Sections 18.4–18.7), it becomes important to include the phase information with the 3D MTF, and we talk about OTFs instead of MTFs.
8.10 The Point Spread Function Convolves the Object to Generate the Image The 3D PSF is the blur introduced by the lens system. This mathematical form of the blur function operates on an object to cause the blur introduced by a lens system (Figure 8.24). This operation is convolution (see Section 10.8). The PSF convolves the object to form the image. Because we can achieve this mathematically, we can also remove the blur introduced to the lens system in the deconvolution operation. Deconvolution achieves a potentially more faithful, de-blurred representation of the object.
8.11 Problems with the Focus of the Lens Arise from Lens Aberrations
Figure 8.24 Object convolution in real space and in k-space during image formation. Note that the convolution operation in k-space is simply multiplication Library of Congress / Wikimedia commons / Public domain.
Computers can do deconvolution in either real space or Fourier space. If done in real space, the PSF is used. If done in Fourier space, the Fourier transform of the PSF, the MTF, is used. The Fourier transform of the image is the product of the MTF and the Fourier transform of the object (see Figure 8.24). Deconvolution algorithms, in their most basic form, simply divide the Fourier transform of the image by the MTF to generate the Fourier transform of the object. An inverse Fourier transform reproduces the object (see Figure 8.20), now without as much blur. Some of the computer programs used in image processing include convolution (near-neighbor) and deconvolution operations (see Sections 10.4, 10.5, and 10.8). Lens aberration limits the resolution of high-aperture photographic lenses (see Section 5.14). Likewise, aberrations also limit high-NA microscopy lenses. The PSF of a lens reveals some of these aberrations. We now turn to the optical strategies used to correct these aberrations in both glass lenses and electromagnetic lenses.
8.11 Problems with the Focus of the Lens Arise from Lens Aberrations Spherical aberration occurs when rays from the outer regions of a converging lens have a shorter focal point than rays coming from the center of the lens (Figure 8.25). Sometimes studio photographers want the “blurry” effect generated by spherical aberration provided by “soft-focus” lenses with spherical aberration. As seen in the spot diagram, when a diagram of the focal plane of a lens is illuminated with collimated light, spherical aberration produces a bright central spot surrounded by a less intense halo. While the halo diffuses the image, the central spot still gives adequate resolution for portraiture. Light microscopy needs correction of spherical aberration. Before correction for spherical aberration, there was a theory called “globulism” that organs were made of uniform 3-µm globules. These globules were actually the product of spherical aberration in the microscope objectives. In 1830, Joseph Jackson Lister (father of the physician Joseph Lister, known for his work on antisepsis) debunked globulism by producing a doublet objective, which corrected spherical aberration. He combined the more convex, converging lens in an objective separated by a specific distance from a more concave, diverging lens, which made the outer rays diverge, thereby bringing all the rays to a single focal point (Figure 8.26). Another way to reduce spherical aberration is to simply eliminate the outer rays with an aperture. This only collects the central, nonaberrated rays. However, as described earlier, shutting down the aperture reduces the resolving power of the lens. Today, all mid- to high-power modern light microscope objectives have some correction for spherical aberration. Doublet lenses and aspheric lenses, whereby the lens is more concave near the edge and more convex in the middle, provide the correction. Computer-aided design and fabrication make these lenses commercially feasible. However, a notorious failure of this technology appeared during initial fabrication of the Hubble telescope mirror using computer-aided design, which produced a mirror with severe spherical aberration. A compensating mirror installed by space shuttle astronauts corrected the aberration. Electromagnetic lenses used in electron microscopes have severe spherical aberration. As mentioned earlier, spherical aberration of electromagnetic lenses is a major limitation on the resolution of the electron microscope
177
178
8 Optics and Image Formation
Figure 8.25 Ray diagram of spherical aberration. The inner rays focus long, and the outer rays focus short. Inset: lateral point spread function (PSF) at focus (A), lateral PSF at focus with spherical aberration (B), axial PSF at focus (C), and axial PSF with spherical aberration (D). PSF insets from Kurt Thorn. Used with permission.
Figure 8.26 Correcting for spherical aberration involves combining a more convex lens with a more concave lens. Diagram by L. Griffing.
(Panel 8.1). Because electromagnetic lenses can only be converging, not diverging, lenses (see Section 8.6), compensating the aberration with another lens is impossible. So instead, most electromagnetic lenses correct for spherical aberration by only using electrons from the center of the lens, not the edges, to produce an image. Hence, in routine electron microscopy but not in high resolution TEM (see Section 19.8), very small objective apertures are used, thereby decreasing the resolving power of the microscope. Just like in light microscopy, there is a trade-off between contrast and resolution. These narrow apertures in electromagnetic lenses also produce higher contrast because they absorb the object’s scattered electrons, and these regions become darker in the electron micrograph. In chromatic aberration, lenses act like prisms. Shorter wavelengths have shorter focal lengths. Blue wavelengths “focus short.” Hence, a spot diagram of chromatic aberration at the focal point of red light produces a target with outward rainbow rings of decreasing wavelengths (Figure 8.27). As with spherical aberration, a compensating lens corrects for chromatic aberration. Convex lenses focus blue short, whereas concave lenses focus blue long. When combined, equal but opposite chromatic aberrations cancel out each other, making an achromatic lens or achromat. In addition to modified lens geometry, a corrected doublet has a net converging action because the glass in each of the elements has different indices of refraction (Figure 8.28). The flint glass concave lens (n = 1.94) glues to a crown glass convex lens (n = 1.54). Because the concave (diverging) lens element refracts the light less than the convex (converging) lens element, the lens is, overall, a converging lens. In 1733, Chester More Hall
8.11 Problems with the Focus of the Lens Arise from Lens Aberrations
Panel 8.1 Electron microscope resolution and wavelength Equation 8.13: λ = h / mv in which λ is the de Broglie wavelength of the electron, h is Planck’s constant, and m and v are the mass and velocity of the electron. Equation 8.14: 1 / 2mv 2 = Ve in which V is the accelerating voltage of the electron and e is the charge or, doing the algebra and substituting in approximations of Planck’s constant and charge, Equation 8.15: λ = 1.23 / ( V 1 / 2 ) nm Wavelengths of EM at different voltages: Voltage (kV)
Wavelength (nm)
60
0.005
100
0.0037
1000
0.00087
Using the Abbe criterion for resolution, we should get 1 million times better resolution with transmission electron microscopy than with light microscopy (LM), but the apertures are so limited – spherical aberration – that we only get about 1000–10,000 times better resolution. Instead of a 90-degree aperture with an LM lens, an electron microscopy lens may have less than 1-degree aperture.
Figure 8.27 Ray diagram of chromatic aberration. In the red focal plane, a spot diagram shows how different rays intersect the image plane. Diagram by L. Griffing.
discovered how to correct chromatic aberration using a combination of lenses made of different glasses. In 1791, Francois Beeldsnyder first used achromats in microscopy. The lenses Lister used in the 1830s were achromats. Achromat microscope objectives and condensers have an A inscribed on them (Figure 8.29). Apochromats correct for three colors, not just the two of achromats. Apochromatic lenses have an Apo inscription. Superchromats correct for more than three colors. Lenses that are made from mirrors, catoptric lenses, have no chromatic aberration (see Section 16.4) because the light does not travel through media of differing refractive indices. Not all lenses on the modern research microscope are achromats. Abbe condensers (simple high-NA condensers with a hemispherical top lens) usually have chromatic aberration. Chromatic aberration is visible when focusing Abbe condensers during Köhler illumination (see Section 9.9, Figure 9.16). The field iris has an orange-red fringe when the condenser is farther from the slide and a blue fringe when closer to the slide.
179
180
8 Optics and Image Formation
Figure 8.28 An achromat lens is made by cementing together two differently ground lenses (one concave and one convex) of made of different glasses, flint and crown glass. Diagram by L. Griffing.
Figure 8.29 Typical inscriptions on the objective mount include aberration correction, magnification, numerical aperture, and at the bottom, the distance to the primary image plane (optical tube length) and the coverslip thickness. Most research microscopes have infinity-corrected optics, designated by the infinity symbol for the optical tube length. Infinity-corrected lenses do not produce an image unless a tube lens is present. Diagram by L. Griffing.
In electron microscopy, electrons change their velocity during focusing or as they interact with a specimen. Different electron velocities produce different wavelengths and a blurry image because high-velocity, short-wavelength electrons come into focus at a different point than low-velocity, long-wavelength electrons (Figure 8.30), creating chromatic aberration. Decreasing section thickness and minimizing voltage fluctuations in electromagnetic lenses minimize chromatic aberration in TEM. In photography, apochromats were rare until the advent of computer-aided design of lenses. Now apochromats and superchromats are commercially available for single-lens reflex cameras. However, even with that correction, transverse chromatic aberration, or lateral color, occurs with magnification and is most severe in long focal length lenses. The projection of the red image will be larger than the blue. Both images will be in focus but will be different sizes, giving the effect of a rainbow fringe. Even lenses corrected for spherical aberration do not focus the projected image in a flat plane but in a slightly curved plane. This makes an image with a focused periphery hard to capture on a flat video chip. When the center is in focus, the edges are not and vice versa. Constructing the lens of multiple elements with specially formulated optical glass and aspheric surfaces corrects for flatness of field. These are Plan or Plano lenses, and flat-field microscope lenses have this inscription (see
8.11 Problems with the Focus of the Lens Arise from Lens Aberrations
Figure 8.30 Chromatic aberration in electron microscopy. Electrons of different velocities (wavelengths) focus in different planes. Diagram by L. Griffing.
Figure 8.29). Those corrected for both chromatic aberration and flatness of field have a Planapo inscription. In photography, Plan lenses are required for flat copy work. Research microscopes use plan lenses, while teaching microscopes often do not. Anyone who has used a cheap magnifying glass in the sun has seen coma. Coma is short for comatic aberration. It is when off-axis rays come to focus on the surface of a cone instead of in a plane, as shown by the spot diagram in Figure 8.31. They spread out and defocus in this asymmetrical way, giving rise to a comet-shaped spot diagram. Correcting for coma uses only the center region of the lens or requires careful design of the radii of curvature of the lens. Wide-angle photographic lenses with a large field of view are particularly prone to coma. Astigmatism is when a lens has greater strength in one axis than in the other. Different axes of the lens have different focal points, producing a final image of circular point object that is an ellipse. Not only is there asymmetry of the PSFs with astigmatism, but also the Fourier transform of the image itself is asymmetrical (Figure 8.32). Astigmatism is generally not a problem in light microscopy but is a common problem in the eye and in electromagnetic lenses used for electron microscopy. A lens of compensating astigmatism corrects for it in the eye. Likewise, in the electron microscope, there is a set of coils, the stigmators, that sit below the lens and make a compensating asymmetric electric field to compensate for the astigmatic lens (see Section 19.8).
Figure 8.31 Comatic aberration results from off-axis illumination. Correction requires equal magnification from the different zones of off-axis points and when meeting Abbe’s sine condition: Equation 8.16: sin α/sin α’ = s’/s in which sin α is the half-angle of the aperture in object space, sin α’ is the half angle of the aperture in image space, s’ is the distance to the image, and s is the distance to the object. Inset: point spread function (PSF) at focus (A) and PSF at focus with comatic aberration (B). Diagram by L. Griffing. PSF insets from Kurt Thorn. The Regents of the University of California. Used with permission.
181
182
8 Optics and Image Formation
Figure 8.32 (A) An image of cellular tissue not showing astigmatism. The fast Fourier transform (FFT) of the image is symmetrical along both the x’ and y’ axes. (B) An image of the same tissue showing astigmatism. The FFT has a narrower profile in the x’ direction than in the y’ direction. Adapted from Russ J. C. 2007.
Distortion occurs when the magnification is greater in one region of a lens than in another. The higher magnification occurring at the periphery of the lens produces barrel distortion. Lower magnification at the periphery produces pincushion distortion. As with astigmatism, compensating for distortion uses another lens with counter-distortion. Or, as discussed later, a deformable mirror corrects for distortion in adaptive light optics. Distortion could be a problem in electron optics, but the addition of compensating lens elements eliminates it. There are also digital approaches to correct for distortion (see Section 11.2, Figure 11.1).
8.12 Refractive Index Mismatch in the Sample Produces Spherical Aberration It is important to realize that all of the elements of the optical train contribute to image formation, including the sample itself. For example, in light microscopy, objective lenses have corrections for the thickness and the composition of coverslips, which lie between the objective and the sample. Coverslips are generally made of glass, but suppliers have a large number of different thicknesses. For most microscopy, a coverslip thickness of 0.175 mm is optimal and sold as #1.5 by suppliers. The number (usually 0.17) on the objective (see Figure 8.29) is the thickness of the coverslip for which the objective is corrected. Objectives without coverslip correction have a hyphen (-) on the objective. Below the coverslip is the sample, which also has its own optical properties. Matching the refractive indices of the sample, coverslip, and immersion media for the objective is ideal and minimizes spherical aberration (Figure 8.33). Even in that ideal situation, however, the sample usually contains many different refractive indices. Air bubbles produce severe spherical aberration. For living samples, infusing the sample with an aqueous medium and using a water-immersion objective matches the refractive index of the sample with the immersion medium and minimizes refractive differences produced by air–water interfaces in the sample. Likewise, when viewing living samples with oil-immersion objectives, perfusion with oxygenated perfluorocarbons (e.g., perfluorodecalin n = 1.3) is desirable. (Note, however, that perfluorinated compounds are on the Stockholm list of persistent organic pollutants.) Achieving deeper epi-illumination penetration of the light into a sample is also possible with silicone-immersion objectives because it is nearer to the refractive index of glass than oil. With a mismatched refractive index between the sample and the other optical components, the actual z dimensional location of the object is the product of the z step size times the ratio of the sample refractive index and the objective immersion medium refractive index.
8.13 Adaptive Optics Compensate for Refractive Index Changes and Aberration Introduced by Thick Samples
Figure 8.33 Spherical aberration results when the specimen refractive index does not match the refractive index of the immersion medium. It gets worse when the light comes from deeper in the sample. Point spread function (PSF) insets from Kurt Thorn. Used with permission.
8.13 Adaptive Optics Compensate for Refractive Index Changes and Aberration Introduced by Thick Samples To correct for a variety of thicknesses of glass between the sample and the objective, objectives have adjustable correction collars (Figure 8.34A). Similar correction collars compensate for sample refractive mismatch with depth. Several microscope companies now offer motorized correction collars that compensate for spherical aberration with depth (Figure 8.34B). These kinds of adaptive optics (AO) operate by feedback strategies similar to autofocus by maximizing sharpness (pixel standard deviation in the histogram; see Section 3.8, Figure 3.16), brightness, or contrast in the image through focus adjustment. With AO, automatic collar adjustment maximizes image sharpness, brightness, or contrast. Correction collars compensate with an average correction for aberration across the plane of focus. Aberrations, however, occur across the entire field of view, distorting the wave front of light as it penetrates the sample, producing aberrated focal points within the focal plane (Figure 8.35A). Cellular features such as nuclei have much higher refractive indices than their surrounding cytosol. To compensate for the aberrations introduced by examining thick, heterogeneous samples, a deformable mirror can distort the incoming light (Figure 8.35B). Deformable mirrors have reflective membrane surfaces controlled by actuators beneath the mirror membrane. For microscopy AO, there are usually fewer than 100 actuators per mirror. The mirror can distort the light and correct the focus of the light in the sample. Image feature feedback controls the amount of distortion introduced by the deformable mirror. This is a form of indirect wave front sensing. Aberrations produce an altered wave front in the rear exit pupil of the objective (Figure 8.36A). Internal standards, or guide stars (named after the stars used as standards in astronomy), of known size and shape have characteristic wave fronts (Figure 8.36B), with different, known aberrations. Guide stars in light microscopy can be injected fluorescent beads, fluorescent organelles of known size and shape (e.g., peroxisomes), or in multi-photon microscopy (see Section 17.12), the region of excitation. Using guide stars in fluorescence microscopy, direct wave front sensing analyzes the wave front of light emitted from a fluorescent sample. The Shack–Hartmann (SH) sensor has an array of micro lenses that focus the light rays from different regions of the wave front onto a camera. This assumes that there are sufficient “ballistic” photons from the sample that
183
184
8 Optics and Image Formation
Figure 8.34 (A) Olympus 40× objective with high transmission in the ultraviolet (UV) light corrected for flat-field (Plan), chromatic aberration (Apo), with a numerical aperture of 0.85, an optical tube length of 160 mm, and a correction collar for between 0.11- and 0.23-mm thick coverslips. (B) A cut-away of a Leica objective with a motorized correction collar. Example of aberrated (C) and aberration-correction (D) images achieved with different settings of the correction collar. Scale bar = 5 µm. A by L. Griffing. B Leica Microsystems CMS GmbH.
Figure 8.35 (A) Comparison of light passing through water and focusing on a focal plane with and without the presence of a sample that produces aberration. There is no coverslip. This is a water-immersion objective. (B) Indirect wave front sensing and correction of the aberration across the focal plane with the feedback into the deformable mirror from the image metrics of brightness, sharpness, or contrast. Diagram by L. Griffing.
8.13 Adaptive Optics Compensate for Refractive Index Changes and Aberration Introduced by Thick Samples
are not deviated. Comparison of the displacement of the micro lens focal points from a guide star in the aberrated sample with that of an aberration free guide star detects the change in the wave front (Figure 8.37). A deformable mirror can compensate for the aberrated wave front detected by the SH wave front sensor (Figure 8.38). In this case, the deformable mirror is initially not deformed, and the aberrated signal is sent to the SH wave front sensor. The wave front sensor then changes the deformable mirror to compensate for the aberration and the beam splitter passes the corrected wave front to the camera. Figure 8.39 shows an example of AO applied to a cell within a zebrafish brain. Other examples of AO applied to superresolution images are in Section 18.5, Figure 18.8.
Figure 8.36 (A) Diagrammatic comparison of an object or a guide star producing a wave front with and without aberration. (B) Wave-front representations of different aberrations. Wave-front diagrams by Kurt Thorn. Used with permission.
Figure 8.37 Sensing the wave front by a Shack-Hartmann sensor. Relatively non-aberrated wave fronts from near the surface of the specimen produce a discrete series of focal points, whereas deviated light from deeper in the sample, with more aberrations, produces a blur. The deformable mirror (DM) compensates using the change in the separation of the focal points. Adapted from Ji, N. 2017. Adaptive optical fluorescence microscopy. Nature Methods 14: 374–380. doi:10.1038/nmeth.4218. Used with permission.
185
186
8 Optics and Image Formation
Figure 8.38 Using a Shack-Hartmann (SH) sensor with a deformable mirror. The SH sensor detects the aberrated light coming from a guide star in the specimen. The feedback from the comparison between the aberrated and un-aberrated guide star adjusts the deformable mirror to produce a corrected wave front. Diagram by L. Griffing.
Figure 8.39 A cell from about 150 µm deep in the zebrafish hindbrain. (A) Without adaptive optics (AO). (B) With AO and deconvolution. Mitochondria are magenta, and the plasma membrane is green. Scale bar = 5 µm. From Marx, V. 2017. Microscopy: hello, adaptive optics. Nature Methods 14: 1133–1136. Used with permission.
Annotated Images, Video, Web Sites, and References 8.1 Optical Mechanics Can Be Well Described Mathematically For a textbook on Fourier optics, see Goodman, J.W. 2005. Introduction to Fourier Optics. Third Edition. Roberts and Company, Englewood, CO. For a textbook on optics, see Born, M. and Wolf, E. 1980. Principles of Optics. Sixth Edition (corrected). Pergamon Press, Oxford, UK.
8.2 A Lens Divides Space Into Image and Object Spaces Lensless holography, which uses spatial light modulators or photographic plates to form real or virtual images (see Section 4.8), can also produce image space.
Annotated Images, Video, Web Sites, and References
8.3 The Lens Aperture Determines How Well the Lens Collects Radiation In microscopy, the object-side aperture is the NA.
8.4 The Diffraction Limit and the Contrast Between Two Closely Spaced Self-luminous Spots Give Rise to the Limits of Resolution Figure 8.8 is an adaptation of several slides by Kurt Thorn: http://nic.ucsf.edu/dokuwiki/lib/exe/fetch.php?media=lecture _2_-_kthorn.pptx. Another treatment of the diffraction limit of optical microscopy is found in Inoue, S. and Spring, K.R. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY. pp. 26–32. Another more geometric discussion is in Slayter, E.M. 1976. Optical Methods in Biology. Robert E. Kreiger Publishing Company, Huntington, NY. pp. 237–248. An extensive discussion of the two slit experiment is in Sanderson, J. 2018. Understanding Light Microscopy. John Wiley & Sons, Hoboken, NJ. pp. 216–230. There is an iBiology video by Lichtman, J. (2012) that uses Huygens wavelets to describe diffraction from two slits at https://www.ibiology.org/talks/diffraction. Lichtman continues by using wavelets to construct the point spread function at https://www.ibiology.org/talks/point-spread-function.
8.5 The Depth of the Three-Dimensional Slice of Object Space Remaining in Focus Is the Depth of Field This applet shows how the Gaussian lens formula works: http://graphics.stanford.edu/courses/cs178-11/applets/gaussian. html. This applet shows how it relates to depth of field: http://graphics.stanford.edu/courses/cs178-11/applets/dof.html.
8.6 In Electromagnetic Lenses, Focal Length Produces Focus and Magnification A good description of electromagnetic lenses and their similarities and difference to light lenses is in Bozzola, J.J. and Russell, L.D. 1999. Electron Microscopy: Principles and Techniques for Biologists. Jones and Bartlett Publishers, Boston, MA. pp. 149–162.
8.7 The Axial, Z-Dimensional, Point Spread Function Is a Measure of the Axial Resolution of High Numerical Aperture Lenses Figure 8.16 uses a through-focus series of the PSF from Kurt Thorn: http://nic.ucsf.edu/dokuwiki/lib/exe/fetch. php?media=lecture_2_-_kthorn.pptx. In Figure 8.16, the axial resolution limit shown is a quarter of the distance between the first two minima. Keller H.A. 1995. Objective lenses for confocal microscopy. In Handbook of Biological Confocal Microscopy. Second Edition. Ed. by J.B. Pawley. pp. 111–126.
8.8 Numerical Aperture and Magnification Determine the Light-Gathering Properties of the Microscope Objective Table 8.1 covers calculation of light-gathering power of objectives for representative lenses. For a more complete listing of actual lenses, see Table 3.3. in Inoue, S. and Spring, K.R. 1997. Video Microscopy. Second Edition. Plenum Press, New York, NY. p. 138. Discussion of the light-gathering properties of microscope objectives is often in chapters on fluorescence microscopy, where light is limiting. More information on specialized objectives for fluorescence is in Section 18.9.
8.9 The Modulation (Contrast) Transfer Function Relates the Relative Contrast to Resolving Power in Fourier, or Frequency, Space Figures 8.18–8.23 are adapted from slides by Kurt Thorn in https://wiki.library.ucsf.edu/download/attachments/517198168/ Tues-Kurt-microscopy_ii.pptx.
187
188
8 Optics and Image Formation
There is a recommended video introduction to the Fourier transform by Huang, B. 2012. https://www.ibiology.org/talks/ fourier-transform.
8.10 The Point Spread Function Convolves the Object to Generate the Image Convolution is a very important concept. A more complete treatment is in Chapter 10. Non-mathematical descriptions of this are rare, but here is one. Imagine a slide projector with perfect optics projecting a high-resolution input image. If we de-focus the projector optics, then every pixel (point) on the slide becomes a small light disk on the screen. This is the so-called point response (point spread function) of the projector. By producing this disk from a point, the defocused optics of the projector convolve the input image into the projected output image. When the point response is applied to (i.e., convolves) each pixel in the input image, it produces the defocused output image.
8.11 Problems with the Focus of the Lens Arise from Lens Aberrations A treatment of aberrations in the light microscope is in Chapter 6 of Sanderson, J. 2018. Understanding Light Microscopy. John Wiley & Sons, Hoboken, NJ. pp. 101–125. A good introduction to the design of lenses to compensate for aberration is in A Gentle Guide to Optical Design by Bruce Irving at https://www.synopsys.com/optical-solutions/learn/gentle-intro-to-optical-design.html. The globulist controversy is interesting in the history of science, see Schickore, J. 2009. Error in historiographical challenge: the infamous globule hypothesis. In Going Amiss in Experimental Research. Ed. by G. Hon, J. Schickore, and F. Steinle. Philosophy of Science 267: 27–44. doi: 10.1007/978–1-4020–8893-3_3. See Joseph Jackson Lister’s microscope at http://collection.sciencemuseum.org.uk/objects/co440642/joseph-jacksonlisters-microscope-london-england-1826-compound-microscope. Figure 8.32 is adapted from Figures 6.17 and 6.18 in Russ, J. 2007. The Image Processing Handbook. Fifth Edition. CRC Press, Boca Raton, FL. p. 352. For more on objective corrections, see https://www.olympus-lifescience.com/en/microscope-resource/primer/anatomy/ objectives.
8.12 Refractive Index Mismatch in the Sample Produces Spherical Aberration Figure 8.34 C and D are from Kurt Thorn’s presentation: http://nic.ucsf.edu/dokuwiki/lib/exe/fetch.php?media=lecture_ 2_-_kthorn.pptx.
8.13 Adaptive Optics Compensate for Refractive Index Changes and Aberration Introduced by Thick Samples See Ji, N. 2017. Adaptive optical fluorescence microscopy. Nature Methods 14: 374–380. doi:10.1038/nmeth.4218. Figure 8.39 is from the Betzig lab and published in Marx, V. 2017. Microscopy: hello, adaptive optics. Nature Methods 14: 1133–1136. An amusing quote is “Imaging tissues with many nuclei is like imaging through a bag of marbles.”
189
9 Contrast and Tone Control 9.1 The Subject Determines the Lighting An outdoor biologist may be recording new macro-level environmental data where artificial lights are not an option. There, diffuser screens, filters, and carefully selected apertures and exposure times modulate the nature of the light. Most indoor biologists use artificial lighting. The influence of light on the subject determines the type and use of lighting. As seen in Figure 9.1, some creatures are phototaxic, either moving toward the light or away from it. Knowing how light affects the subject is imperative. Likewise, knowing what feature in the scene is the object of interest can determine the lighting. Lighting would be very different for photographing the rearing head of a grizzly, its snapping jaws, a single tooth, or a microbe in its gaping yaw. Timing is an important key to lighting. Natural lighting varies as clouds cross the sky or as the sun or moon change position. To photograph moving objects without blur, the intensity of illumination is high, and the exposure is brief. Optimizing lighting means providing enough light to capture the feature within the desired exposure time, with good tonal range, while providing excellent contrast of the object against its background. Light quality refers to how well the light generates contrast of the object of interest. To control the light quality, there are several options, such as changing the intensity of the light, its degree of collimation, or its color. All of these change when changing the reflectance from the object, its angle of illumination, and its background lighting. There are many options for achieving proper light quality through different adjustments of the optics, light, filters, and camera settings for
Figure 9.1 Phototropism of the sporangiophore (fruiting body) of the zygomycete Phycomyces blakesleeanus. This shows the curvature of the Phycomyces toward a light source on the left. There is a 20-minute interval between each exposure. The stalk is about 150 µm across. The stalk moves in a helical fashion, as seen by the movement of the fiber attached to the tip. From Ortega, J.K.E., Mohan, R.P., Munoz, C.M., et al. 2021. Helical growth during the phototropic response, avoidance response, and in stiff mutants of Phycomyces blakesleeanus. Sci Rep 11 3653. Used with permission. Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
190
9 Contrast and Tone Control
common macro- and microphotography. To maximize the amount of information in a scene, it is important to illuminate it so that the camera can adequately capture the information and to set the camera to maximize the tonal range within the image without over- or underexposure of the region of interest.
9.2 Light Measurements Use Two Different Standards: Photometric and Radiometric Units When buying a light bulb, the value used to judge its brightness is the amount of power it consumes in watts: a 100-watt incandescent, tungsten light, or a 25-watt compact fluorescent light (CFL) or a 10-watt light-emitting diode (LED). Watt is a radiometric unit. However, the intensity of light seen by the eye is given in lux (or foot candles; see Section 4.1, Figure 4.3). How do these different units relate to the intensity of the light sources? Consider first the radiometric units. The wattage on a commercial light bulb is not a measure of the light intensity it produces but of its operating power. In general, increasing the operating power of a light increases its intensity – a 25-watt CFL is brighter than a 10-watt CFL. However, a 10-watt LED can be as bright as a 25-watt CFL or a 100-watt incandescent light. Confusingly, the watt (W) the International System of Units (SI) unit of power (Table 9.1) is the measure of the amount of energy (joules) per unit time (seconds) of the light produced, its radiant power or radiant flux, while also being the amount of energy per unit time required for its operation. The light produced may have different wavelengths. Different wavelengths have different energies (see Section 4.3, Figure 4.6) so the power of a particular light source depends on the spectral range of wavelengths of light it emits (Figure 9.2). This is not hard to calculate for monochromatic sources, such as lasers, but to calculate the power of polychromatic light sources requires a knowledge of the spectrum of light emitted. Furthermore, the radiant intensity of light has to take into account the light power emanating from a source into three-dimensional (3D) space, the watts/steradian, in which a steradian is the solid angle that subtends an area that is the square of the radius, r, of the sphere surrounding the source (Figure 9.3). The amount of power per unit area is the irradiance of light. The irradiance of a fairly puny laser can be more than that of the sun. The solar irradiance reaching the outer atmosphere is 1400 W/m2. The 0.8-mW HeNe laser in a supermarket bar-code reader has a beam diameter of 0.8 mm and therefore a beam area of 7 × 10–7m2, thus producing an irradiance of 1600 W/m2, 200 W/m2 greater than the solar irradiance. Notice that all of these radiometric units have radiant in their names.
Table 9.1 Units of Radiometric and Photometric Measurement. Radiometric SI Units
Quantity
SI Unit
Notes
Radiant energy
joule (J)
Energy
Radiant flux
watt (W) (=J/sec)
Also called radiant power
Radiant intensity
watt/steradian (W/sr)
Watt per unit solid angle
Spectral radiance
watt/square meter/nm
Intensity as a function of wavelength (see Figure 10.3)
Irradiance
watt/square meter
Power incident on a surface, radiant flux density
Luminous energy
lumen * second
Unit also called a talbot
Luminous flux
lumen (=cd*sr)
Also called luminous power
Luminous intensity
candela (=lm/sr)
An SI base unita
Photometric SI Units
2
Luminance
cd/m
Mean luminous density is cd/mm2; also called a nit
Illuminance
lux = lumens/m2
Light incident on a surface; lamberts = lumens/cm2
Luminous emittance
lux
Light emitted from a surface
a The candela is the luminous intensity, in a given direction, of a source that emits monochromatic radiation of frequency 540 × 1012 hertz and that has a radiant intensity in that direction of 1⁄683 watt per steradian. The luminous intensity in 1/60 cm2 projected area of an ideal blackbody radiator, in any direction, at a temperature of 2045 degrees K is also one candela.SI, International System.
9.2 Light Measurements Use Two Different Standards: Photometric and Radiometric Units
Figure 9.2 Wavelength dependence of intensities for common microscopy light sources. The metal halide has a similar spectrum to the mercury arc, except it has a higher intensity in the 450- to 520-nm region. The tungsten–halogen intensity is 10 times the actual value and is for a filament operated at 3200K. The xenon arc source has the most constant spectral characteristics in the visible range. Adaptation of Figure 5 in https:// zeiss-campus.magnet.fsu.edu/articles/ lightsources/lightsourcefundamentals.html.
The basis for photometric units of light is the ability of people to detect a certain light intensity. As described in Section 4.3, we can only detect light in the visible spectrum; the relative visibility of light peaks at 555 nm for photopic vision and falls off at lower and higher wavelengths, producing a curve called the luminous efficacy (Figure 9.4). Consequently, the infrared (IR) or ultraviolet (UV) light glowing from a lamp is invisible to our eyes and would have zero luminous efficacy and zero value using photometric units of measure. In photometric units, the luminous intensity of light in the area, r2, of the sphere’s surface is one candela, cd (candle in Latin), and the SI unit that replaced the term candle. If the sphere surrounding the light source has a radius of 1 m, then the intensity through the square meter of surface is 1 lux; likewise, a sphere of 1 foot surrounding the source produces 1-ft candle through the square foot surface area (see Figure 9.3), but the foot candle is not a standard international unit. A candela, originally standardized according to a specific whale-oil candle recipe (but see Table 9.1 for the current definition), gives off a luminous power (or flux) of 4π lumens, the surface area of the sphere, 4πr2, divided by r2. In other words, a lumen is the photometric power of light in the unit solid angle, the steradian (see Figure 9.3). Because power is energy per unit time, lumens have a “per second” reference value, as do watts. Luminous energy (lumen × seconds) is the talbot, the photometric analogue of the joule. Dividing the candelas, or lumens/steradian, by the spectral luminous efficacy at that wavelength gives the conversion to watts/ steradian (see Figure 9.4). The moon on a clear night is about 250 cd/ft2, which is 0.0026 cd/ mm2. A tungsten–halogen incandescent light used for microscopy is about 20,000 times brighter, at 45 cd/mm2. Photographing the two would require quite different exposure times, ISO settings (see Section 5.11), and apertures (f-stop setting; see Section 5.14)! The challenge with chip-based cameras in low light is to overcome the noise of the chip (see Section 5.11, Panel 5.3) and collect enough light in the low-light regions without reaching saturation level in Figure 9.3 Production of one candela of luminous the high-light regions. It is particularly a challenge for cell phones intensity by one lumen of light occurs through a solid that have small pixel sizes. angle, the steradian, defined as the area subtended by There is an outdoor photography rule of thumb for setting the square of the radius of sphere centered on the exposure times and apertures without an exposure meter, based source. Diagram by L. Griffing.
191
192
9 Contrast and Tone Control
Figure 9.4 The spectral luminous efficacy for detecting light with our eyes. The peak is at 555 nm, the same peak as our photopic vision. An example using it would be to calculate the radiant intensity in mW/sr of a red light-emitting diode (LED) that has a luminous intensity of 5000 mcd (or 5000 mlm/sr) and a peak emission wavelength of 625 nm. Using the graph, the spectral luminous intensity at 625 is 200 mlm/mW, so the radiant intensity is 5000 mlm/sr/200 mlm/mW or 25 mW/sr. Diagram by L. Griffing.
on the illuminance of the landscape, the sunny 16 rule. It states that “On a sunny day, set the aperture to f/16 and shutter speed to the reciprocal of the ISO setting for a subject in direct sunlight.” So, if it were a sunny day and the ISO was 100, then the camera aperture should be f/16, and the exposure should be 1/100 of a second. Variations in this rule would be based on the fact that opening the aperture by one stop (e.g., to f/11) doubles the amount light collected by the camera (see Table 5.6, Section 5.14), so the exposure should be 1/200 of a second. Table 9.2 supplies aperture settings based on the illuminance of the landscape in lux. These are aperture settings like those given by incident light meters, which detect light shining on the subject. If one were to continue the table into the landscape illuminated with moonlight, 0.1 lux, it would require 18 f-numbers below f/16, clearly an impossibility. So, in low-light scenes, such as the full moon, or in low-light (fluorescent, darkfield, polarized light) microscopy (see Chapters 16 and 17), choosing exposure and choosing ISO settings are the only alternatives. These are best set with exposure meters that are usually part of digital single-lens reflex (DSLR) cameras (see Section 5.14, Figure 5.26), which are reflective light meters, detecting the average light coming from the field of view. Dark objects on a bright background (i.e., backlighting) or bright objects on a dark background, encountered in darkfield and fluorescence microscopy, require metering at specific regions in the field of view (Figure 9.5). Measurement of light relies on having a calibrated system response, which relates output to input across the spectral range of interest. The spectral luminous efficacy curve approximates the system response of the human eye. Photometrically, this is the illuminance (lux, lumens/m2) per nm, and radiometrically, it is irradiance (watts/m2) per nm or spectral radiance (see Table 9.1). Light meters that life scientists use to measure the radiometric value of incident light often express
Table 9.2 Aperture Settings Based on the Sunny 16 Rule. Condition
Lux
Aperture
Shadows
Sand or snow
200,000
f/22
Dark and distinct
Sunny
100,000
f/16
Distinct
50,000
f/11
Soft edges
20,000
f/8
Barely visible
f/5.6
No shadows
f/4
No shadows
f/2.8 or add flash
No shadows without flash
f/2 or add flash
No shadows without flash
Slight overcast Overcast Shade Heavy overcast Sunset or sunrise Office space
5–10,000 2–4000 500–1000 300–500
9.2 Light Measurements Use Two Different Standards: Photometric and Radiometric Units
Figure 9.5 Example metering modes for a digital single-lens reflex (DSLR) camera (Canon 500D). The gray area is the metered region of the image. (A) Evaluative metering. The camera sets the exposure based on light detected across the field. (B) Partial metering. Light in a smaller region sets the exposure. (C) Spot metering. Light in a central spot sets the exposure. (D) Center-weighted average metering. Light averaged over the scene and weighted at the center sets the exposure. From Canon Inc.
Figure 9.6 Light intensity imposes limits on detection and resolution. If it is too dark to detect, then, of course, it is invisible! Adapted from Weiss, D.G., Maile, W., and Wick, R.A. 1989. Video microscopy. In: Light Microscopy in Biology: A Practical Approach. Ed. by A.J. Lacey. IRL Press at Oxford University Press. Oxford, New York, NY. p. 222.
their values in einsteins per centimeters squared per second or moles of photons per area per second. For example, this is the common unit of expression for determining the amount of photosynthetically active radiation supplied to a plant. Using them as incident light meters for photography works (Panel 9.1), with knowledge of the spectrum of light detected by the meter. Conversely, photographic meters reporting photometric lux units give values, upon conversion, in einsteins per centimeters squared per second but only for the visible portion of the spectrum. Photometric light meters are only sensitive in the visible range. So, beware: if the light is in the UV or IR range, these meters may not report light that could be harmful to the subject or change its behavior. Using complementary metal oxide semiconductor (CMOS) camera chips for light meters (e.g., the chips in cell phone cameras) is great for visible light because they have a linear gamma (see Section 5.11, Figure 5.15). However, they are much less sensitive to UV and more sensitive to IR than to visible light (see Section 5.10, Figure 5.13). Consequently, they generally have IR-blocking filters.
193
194
9 Contrast and Tone Control
Panel 9.1 Converting watts to einsteins and lux to gray level
1) Watts to einsteins. The power of one photon of light is very small. Because these values are so small, it is customary to talk about the power in 1 mole of photons, also called 1 einstein of quanta. 1 mol/sec, einstein/sec of blue light at 480 nm light has 2.5 × 105 W (J/sec). or 1 W = 4 × 10–6 einsteins/sec. 2) Lux to gray level. Although the following is an approximation, it shows what is necessary to make this calculation. a) The spectral luminous efficacy: On a sunny day, there is 100,000 lux or 100,000 lumens/m2. Assuming that 1% of the light is 480-nm wavelength of blue or about 1,000 lux, then with an average spectral luminous efficacy of 100 lumens/watt, this converts to 10 W/m2, or 1 × 10–3 W/cm2. b) Watts to einsteins to photons: There are 4 × 10–6 Einsteins/cm2/sec in 1 W/cm2, so there are 4 × 10–9 einsteins/cm2/sec in 1 mW/ cm2. There are 6 × 1023 photons/einstein, so there are about 24 × 1014 photons/cm2/ sec or 24 × 106 photons/µm2/sec. c) Pixel size and quantum efficiency: If the pixel size 2 µm or 4 µm2, then there are about 100 × 106 photons/ pixel-sec. At a 40% quantum efficiency, there would be 40 × 106 electrons/pixel-second. d) Exposure and aperture: At an exposure of 1/100 of a second, then there would be about 4 × 105 electrons/ pixel. With a reduction of 64-fold of the light with an f-16 lens, there would be 6248 electrons/pixel. e) Pixel bit depth and full well capacity: In an 8-bit luminosity spectrum from a 55,000-electron full-well-capacity pixel, it would be a gray level (at 215 electrons per gray level) of about 30 for that wavelength of light.
9.3 The Light Emission and Contrast of Small Objects Limits Their Visibility The nature and intensity of the illumination and the ability of a microscope camera to detect the light coming from the sample places limits on image formation. When comparing the limits of resolution and the limits of illumination (Figure 9.6), conventional brightfield microscopy is limited to the realm of moonlight (reflected light). Conventional darkfield and fluorescence microscopy moves into the realm of starlight (self-luminous objects). Electronically amplified microscopy can go much lower. The high contrast of very small, unresolvable, self-luminous objects identifies them but will not visually separate them when they are closer than the limit of resolution. Superresolution microscopies, such as PALM (photoactivated localization microscopy; see Section 18.9), use the spread of fluoresced light by individual proteins to identify their locations. They mathematically eliminate that spread by assuming that the particle is at the center of the blur and thereby resolve individual proteins.
9.4 Use the Image Histogram to Adjust the Trade-off Between Depth of Field and Motion Blur The best procedure for optimal exposure is to adjust the aperture f-number of the lens and the shutter speed to within the limits needed to maintain acceptable motion blur, depth of field, and resolution. The reflective light meters in digital cameras most typically use the entire frame for exposure evaluation, called evaluative metering (see Figure 9.5A). To meter the range of light intensities within a scene, there are focus points within the viewfinder. The exposure within a field can use center-weighted average metering, so the area within the center of the field has more influence over the exposure
9.4 Use the Image Histogram to Adjust the Trade-off Between Depth of Field and Motion Blur
(see Figure 9.5A) in scenes with medium intrascene dynamic range. Partial metering monitors the field inside a wide-diameter center disk (see Figure 9.5C). Spot metering monitors a smaller-diameter center ring (see Figure 9.5D). Spot metering and partial metering are of use on high-intrascene dynamic range, when bright objects such as the moon are on a dark background, or for bright fluorescent or darkfield objects in a microscope. For digital cameras, shutter speeds and aperture work in opposition to each other. The program, or P, mode (Figure 9.7) on most digital cameras provides just that; it chooses a reasonable shutter speed and aperture for good exposure, and then manual adjustment of a single dial changes the aperture and shutter speed in opposition to each other. However, there is more to exposure setting than not over- or underexposing regions of the image. The image histogram is often available as part of the live view in both consumer-grade DSLRs and in scientific-grade cooled cameras. When available, Figure 9.7 The mode dial on a digital single lens reflex use the histogram for monitoring not only over- and underexpocamera (DSLR; Canon 500D). Most of the imaging for the sure but also the spread of the tonal range values within the professional biologist on consumer grade DSLRs cameras image. Examining and adjusting image tonality with the image is in the creative zone or in the close-up mode of the histogram, either during or after image acquisition, achieves basic zone. From Canon Inc. optimal tonality using the entire pixel depth to capture gray levels or the levels of color in each color channel. Then, while examining the histogram of the image (in live-view mode on the back liquid crystal display (LCD) of a DSLR, if available, or on the remote histogram display on a computer for microscope cameras), change the ISO until the image is bright enough to fill the histogram. How fast the object is moving also determines the exposure. If the shutter is too slow, the objects will have motion blur. A quickly moving subject requires short exposures to diminish motion blur. To create desirable motion blur, as when showing water flow in a mountain waterfall (Figure 9.8), decrease shutter speed to less than 1/15 sec. This adjustment is done with the shutter priority mode, Tv, T, or S on the mode dial (see Figure 9.7). The fastest shutter speeds available on consumer-grade cameras is about 1/4000 of a second. It takes longer to read out the frame from the CMOS chip, so multiple sequential exposures at this speed are not possible. To overcome this, use a strobe flash to get multiple exposures within a single frame. There are also specially designed scientific-grade high-speed cameras for examining things such as the waveform of a flagella or the wingbeat of a bat. Most scientific-grade CCDs have slow frame rates (8–31 frames/sec), while scientific-grade CMOS cameras have much
Figure 9.8 (A) Waterfall taken at 1/320-second exposure. (B) Waterfall taken at 0.3-second exposure. Photo by Peter West Carey. Used with permission.
195
196
9 Contrast and Tone Control
Figure 9.9 (A) A flower cluster of Shepard’s purse taken with aperture priority at the widest aperture. Photo by L. Griffing. (B) Photo of a puffin taken with a telephoto lens and shallow depth of field. Note that the lichens and algae on the rock are out of focus, while the water droplets on the puffin’s back are in focus. Also, the background water is completely out of focus. Photo by Charles J. Sharp, https://en.wikipedia.org/ wiki/Atlantic_puffin#/media/File:Puffin_(Fratercula_arctica). Jpg Creative Commons 4.0.
higher rates (100 frames/sec or more; see Section 12.1). When using a microscope as a lens system, apertures are usually set for optimal resolution and contrast (see Sections 5.15, 8.3, 8.4, and 9.10 on Köhler illumination), so use shutter priority and ISO setting for exposure adjustments. If the signal is low, increasing the exposure time increases the signal-to-noise ratio. If the signal is hard to see, changing the ISO increases the visibility of low light images by decreasing the saturation level of each pixel (see Section 5.11, Table 5.5). The best approach is to maximize exposure time to within the tolerable limits of depth of field and motion blur and use the histogram to make sure there is adequate tonal range. Lens systems with smaller apertures give greater depth of field (see Sections 5.15 and 8.5). On many cameras, there is an automatic depth-of-field mode (A-DEP) that uses the focus points in the viewfinder to get as many features in the field of view in focus, thereby optimizing depth of field (see Figure 9.7). To manually adjust depth of field, use the aperture priority mode (Av or A) (see Figure 9.7). For shallow depth of field in bright images, such as a close-up of a flower, use a wide aperture and short exposure (Figure 9.9A). Likewise, for low-light, deep depth-of-field shots, use a narrow aperture and long exposure. Images of close objects have shallow depths of field, and those of farther objects have deeper depths of field. Magnification alters depth of field and focus (see Section 8.5), with telephoto lenses having shallower depth of field than lower magnification lenses at the same aperture (Figure 9.9B). Conveniently, to assess the depth of field in the live view, some cameras have a button on the side of the lens mount, the depth-of-field preview button, which gives a view of the field with the apertures in place.
9.5 Use the Camera’s Light Meter to Detect Intrascene Dynamic Range and Set Exposure Compensation In most DSLR cameras, set the light level with aperture priority, Av, or with shutter speed priority, Tv. By reading out the change in apertures in shutter priority mode and using partial, or spot-metering, one can know the range of light intensities in the scene because one change in aperture setting is half (decreased aperture, higher f-number) or double (increased aperture, lower f-number) the light (see Table 9.2). If there is a variation in light intensity across the field, or intrascene dynamic range, of 3 f-stops, then there is about an 8-fold light intensity change. Even though digital cameras have a dynamic range of up to 10,000:1, this does not translate to an ability to detect a 10,000-fold lighting change across a scene. This 8-fold change in intensity is about the limit for cameras recording the 8 bits of gray level intensity or luminosity in a 24-bit RGB color image. As shown in Figure 9.10, with an 8-bit intensity scale of 256 gray levels, halving the intensity three times (3 f-stops) leads to a total change of only 32 gray levels, and 32 gray levels do not look continuous to our eyes, which detect 60 shades of gray (see Section 2.1). For more advanced cameras recording 12 bits/channel in 36-bit red, green, blue (RGB) images or 12 bits of gray-level intensity, 3 f-stops produces a 512-gray-level change, and the fourth f-stop produces a 256-gray-level change. Although this gray level range of 4 f-stops is small, the gray-level contrast within it still looks continuous to our eyes. Consequently, larger bit-depth cameras can capture a wider range of illuminance in a scene. The caveat is that increasing the bit depth without increasing the saturation, or full well capacity, of the pixels leads to fewer electrons per gray level, and hence, noisier individual gray levels (see Table 5.5, Section 5.11). In low-key images, such as the fluorescence micrograph in Figure 2.8, using the spot meter on the brightest features that have detail produces the intensity as a mid-tone gray, and everything else is darker. It is essentially like putting an additional
9.6 Light Sources Produce a Variety of Colors and Intensities That Determine the Quality of the Illumination
Figure 9.10 Effective dynamic range of camera chips with 8- and 12-bit pixel depth. Diagram by L. Griffing.
f-stop (smaller aperture) on the camera. Exposure compensation (Figure 9.11) compensates for this and increases the gray-level values of the darker objects. Exposure compensation adjustments are in EV (exposure value) units (see Figure 9.11): +1 EV unit doubles the light, and –1 EV unit halves the light. Note that consumer-grade cameras usually express their dynamic range in EV units (Table 5.1). They are equivalent to f-stops; however, be aware that exposure compensation of +1 lowers the f-stop by 1, doubling the light. Fractional values can also be set. In the case of a well-metered low-key image, an exposure compensation of +1 will bring out the gray levels of darker objects. Likewise, if the object of interest is dark on a light background in a high-key image and the metered region is larger than the object, an exposure compensation of +1 or +1.5 might be necessary to bring out the detail in the dark region. A note of warning: when using exposure compensation, reset it manually to 0 between shots, or all future exposures will have it. The simplest way to get good exposures is to bracket exposures, taking images at +0.5 or +1.0 and –0.5 or –1.0 exposure compensation, as well as the original metered image. Taking the same shot at bracketed shutter speeds or bracketed exposure compensation achieves manual bracketing. There is also an auto-bracketing setting on most digital cameras. In auto-bracketing, the camera returns to the setting with no exposure compensation. If the intrascene dynamic range exceeds the ability of your camera (e.g., 4 f-stops at shutter priority for a 12-bit camera), there are many ways to adjust it. Fill lights or flashes can brighten dark areas, while filters can make light regions become darker. The following three sections describe these light sources and filters.
Figure 9.11 Exposure compensation setting in the Canon 500D. As shown above the exposure compensation dial, the f-stop changes from F8 to F5.6 with a +1 EV setting and from F8 to F11 with a –1 EV setting. From Canon Inc.
9.6 Light Sources Produce a Variety of Colors and Intensities That Determine the Quality of the Illumination As shown in Figure 9.2, the spectral irradiance of arc sources and incandescent lights are quite different. These light sources have been the workhorses for lab and microscopy illumination for the past century. However, newer LED and laser technologies are replacing them. These newer technologies are safer, more energy efficient, longer-lived, and less expensive. Incandescent lights have a tungsten filament that glows when heated by an electric current running through it. The filament is in an air-tight glass envelope or bulb containing a non-reactive gas mixture (e.g., 93% argon, 7% nitrogen).
197
198
9 Contrast and Tone Control
Tungsten–halogen or halogen bulbs used in microscopy and fiber-optic illumination contain, instead, a halogen gas mixture that permits heating of the tungsten to a higher temperature while producing a tungsten–halogen cycle that re-deposits volatilized tungsten back on the filament, thereby increasing its life. As the filament heats up, it gives off light of different wavelengths depending on its temperature, or color temperature. As the voltage of a typical tungsten–halogen bulb increases, its color temperature increases (Figure 9.12). At lower voltages, the color temperature of tungsten is very red, with very little blue intensity. The Commission Internationale de l’Eclairage (CIE) chromaticity diagram for the different color temperatures shows that a white light only appears when the color temperature exceeds 3000 K (see Section 4.3, the region of black body temps in Figure 4.7). Because tungsten melts at 3695 K, tungsten lights usually operate at 3200 K. At this temperature, tungsten–halogen bulbs give off a great deal of IR light (see Figure 9.12), so for most applications, the optical system contains an IR heat filter to protect the sample and other components. Figure 9.12 Different color temperatures from a White balance means that whites in the scene look white in the tungsten–halogen light result from increasing the voltage image. Equalizing the intensities of red, green, and blue in the RGB across the filament. Each step in color temperature is a color space produces gray at low intensities and white at higher “one click” of increase in voltage of a 2- to 6-volt light. UV, intensities (see Section 7.7, Figure 7.17). For example, at 3200 K, the ultraviolet. Diagram by L. Griffing. color of a tungsten–halogen light requires a white balance setting in cameras that makes the chips less sensitive to the predominant red colors in the light source and more sensitive to the blue colors (see Figure 9.12). A far-red filter on the camera chip blocks the far-red light, where the quantum efficiency of silicon chips is also highest (see Section 5.13, Figure 5.10). Setting the color sensitivity of the sensor complementary to that of the light source achieves white balance. There is often a standard white balance adjustment setting for different light sources. White balance settings on most cameras have settings for different light sources besides tungsten, including fluorescent lighting, outdoor lighting, and cloudy days, all with different color temperatures (Table 9.3). Instead of using color temperatures, however, the white balance adjustment .
Table 9.3 Color Temperatures for Different Light Sources. Color Temperature (K)
Source
1000
Candles, some flashlights
2000
Early dawn
2500
Old tungsten bulb (used)
3200
New tungsten–halogen bulb
3000–4000
Sunrise or sunset
4000–5000
Fluorescent cool white, daylight, bulbs
5200
Direct sun
5000–5500
Camera electronic flash
5500–6000
Studio electronic flash (new bulb)
6000–7000
Sunny or partially cloudy day
7000–8000
Shade
8000
Most white or blue-white LEDs
8000–9000
Heavy overcast or slight shade
9000–11,000
Rain at lower elevations or clear day at higher elevations (above 8000 feet)
11,000–18,000
Overcast to snowy days at higher elevations (above 8000 feet)
LED, light-emitting diode.
9.7 Lasers and LEDs Provide Lighting with Specific Color and High Intensity
just uses the names of different standard sources. Note that the fluorescent white balance setting is not for fluorescent samples but for fluorescent lamps. When using camera white balance adjustments for fluorescence microscopy, be aware that the tungsten setting will diminish the red relative to the blue fluorescence, and the fluorescent or daylight setting will diminish the blue relative to the red. Electronic flash units are xenon arc units with a color temperature of about 5500 K, close to daylight color temperature. The spectrum of an electronic flash comes from the xenon gas in the tube through which a high-voltage electric discharge passes. The flash occurs during very brief intervals (e.g., 1/10,000 of a second), so while the shutter is open, there are actually two exposures made: a longer ambient light exposure and a shorter flash exposure. There are two color temperatures to deal with here, the ambient light color and the flash color. One of the problems with shooting dark, near-distance scenes with a flash is that the flash illuminates near objects well, but there is a large light fall-off into the darkness. Having longer shutter speeds with a flash can help bring the background light up. Likewise, if the flash unit is detachable, bouncing the flash off a reflector fabric more uniformly illuminates the scene. Finally, to get standard colors, use a standard 18% gray card or a color card either in the field of view or in a calibration image taken with identical lighting. DSLRs include a “custom white balance” for setting these standards. Use any illumination while taking a photo of the 18% gray card, spot metering on the card. Select the “custom white balance” setting, open your picture, and import the white balance information from the picture. This sets the white balance for that illumination. A variety of software can check the RGB values (e.g., Photoshop, ImageJ, or specialty software such as X-Rite or Imatest; see Section 5.10, Figure 5.14). Software can also adjust color balance after acquiring the image (see Section 2.11, Figures 2.16 and 2.18). However, for reproducible pictures that don’t need processing, particularly for images in which colors need to be measured or used for image segmentation, accurate camera white balance is important.
9.7 Lasers and LEDs Provide Lighting with Specific Color and High Intensity LED and laser light sources emit specific wavelengths (Figure 9.13). The LEDs emit a band of wavelengths, whereas lasers have multiple laser lines. Filters restrict which laser line illuminates the sample. LEDs and laser diodes (solid-state lasers) emit light when current flows through the diode in the forward-bias state (see Section 5.2, Figure 5.1). The color of light emitted depends on the composition of the p (electron hole accumulating) layer and the n (electron accumulating) (see Figure 9.13). Upon recombination of the electrons and holes, the electron falls to a lower energy level and emits a characteristic wavelength based on that energy drop, the band gap energy. White light LEDs emit a much wider spectrum of light by producing a light from a white light–emitting phosphor that absorbs (usually blue) light from the LED (see Table 9.3 and Section 17.9, Figure 17.21). Lasers require that most of the electrons in the lasing medium be at an excited energy state, as opposed to the ground state; this is a population inversion. Upon achieving a population inversion, when a photon interacts with a high-energy electron, the photon doesn’t lose energy but continues on, and in the process, the electron de-excites to a lower energy, emitting a photon of the same phase (coherence) and wavelength of the exciting photon. This “two-forone” photon production results in amplification. The photons bounce back and forth through the lasing medium in a resonant cavity with a highly reflective (100% reflection) mirror on one side and a less reflective (98% reflection)
Figure 9.13 Some examples of specific colors emitted by some light-emitting diodes (LEDs), laser diodes, and gas lasers. The LEDs have specific trivalent and pentavalent element combinations that create a fairly narrow band of light as follows: aluminum gallium arsenide, red; aluminum arsenide phosphide, orange; aluminum gallium phosphide, green; and indium gallium nitride, blue. Diagram by L. Griffing.
199
200
9 Contrast and Tone Control
Figure 9.14 Vertical cavity laser diode. Unlike horizontal laser diodes that produce an elliptical laser from the side, a vertical cavity laser diode emits a circular laser beam from the top. Note the dimensions. It is flatter than it appears in the diagram, being about 80 µm wide and only 8.2 µm tall. Diagram by L. Griffing.
mirror on the other. Upon production of enough light, some escapes the resonant cavity through the 98% reflective mirror. In widely used gas lasers, argon, helium– neon, or krypton fills the resonant cavity. In solid-state lasers, solid-state semiconductor material fills the resonant cavity, or multiquantum well (Figure 9.14). Lasers have many desirable features for imaging. They have good focus ability and low divergence. Lasers, therefore, penetrate into tissues better than conventional sources. They deliver a precise amount of energy to a small area, decreasing the effect on the surrounding tissues. Lasers are also an excellent polarized light source and have found use in polarization-based systems, such as differential interference contrast (DIC) and epi-polarization confocal microscopy.
9.8 Change Light Values with Absorption, Reflectance, Interference, and Polarizing Filters Absorbance filters are typically tinted glasses or plastics. Absorption is the reduction of the amplitude of light by an object. Recording the image of absorbance at a certain wavelength provides the means to calculate the concentration of colored molecules in the sample (Panel 9.2). Image-based spectrophotometry often employs absorbance filters with a standard, published transmission spectrum of fairly broad bandwidth. The measure of the bandwidth of transmission is full width at half maximal (FWHM) transmission, the full width of the transmitted spectrum between the points at half of the maximum in the amplitude (intensity) curve. Absorbance filters have many uses. They generate grayscale contrast in pigmented or stained samples if their color is complementary to the object of interest (see Section 2.1, Figure 2.4). Colored filters can correct white balance (e.g., a blue filter on a tungsten bulb for a camera set to daylight mode, or a red filter in daylight for a camera set to tungsten
Panel 9.2 Absorbance spectrophotometry with digital cameras. This approach requires the thickness (z, or axial dimension) be known and that distribution of the molecule is uniform within that thickness. For this, use a chamber with a known thickness or measure the thickness of the specimen with optical sectioning. Light source. The light source can be a line from a laser (appropriately spread), specific emission from a light-emitting diode (LED) or white light with an absorbance or reflectance filter with known transmissivity. Optical density (OD) or Absorbance (A). The measurement of optical density or absorbance is based on Beer’s law: OD = log (I0/Ix) = εCx in which I0 = intensity at position 0 Ix = intensity at position x ε = molar extinction coefficient C = concentration of molecule x = distance across sample
(9.1)
Imaging Use the approach in Panel 7.1 (see Section 7.9) for acquisition. OD image = 255 × log ((Specimen image − Dark image)/(Incident light − Dark image)) (9.2) Concentration image = OD image / εz, in which z is the axial thickness of the sample. Note that there is an additional terms to account for light scatter and diffuse reflectance. See reference at the end of the chapter. (9.3)
c09.indd 200
06-03-2023 21:33:13
9.8 Change Light Values with Absorption, Reflectance, Interference, and Polarizing Filters
illumination). They can be the heat filters that absorb IR light from incandescent or arc sources. Absorption neutral density filters cut the intensity of all wavelengths of light by a controlled amount, providing 10%, 25%, 50%, or 90% transmission. A better neutral density filter is a reflectance filter, a partially silvered mirror that reflects all wavelengths of light uniformly. They transmit a known percentage of the incident light and reflect the rest. By this means, partially silvered mirrors can channel light to multiple optical components and act as beam splitters. Interference filters take advantage of the phase, ϕ, of light. Wave trains are dynamic. The position of waves of light relative to time and space is their phase. Measurement of phase is as an angle, with 0 degrees at the start of a wave and 360 degrees being one wavelength. The phase of light is not visible until two waves interfere. Phase retardation is the slowing down of light by the sample so that it emerges with a different phase than it had when it entered. After interference, light of a certain wavelength may lose its amplitude, decrease its intensity, and be eliminated from the spectrum of light passed by the sample. Such phase retardation is responsible for the interference colors of oil slicks, lenses and filter coatings, and certain thin crystals in polarized light. Interference colors produced by reflection can come either from constructive interference amplifying light of a certain wavelength or by destructive interference, eliminating light of a certain wavelength (Figure 9.15; see Section 16.9, Figure 16.29 for an interference color spectrum). Figure 9.15 Constructive and destructive reflecting Blue butterfly and bird wings produce their color through interference filters. A thin film of refractive index n2 and a reflection interference, in this case the constructive interthickness that introduces a full wavelength, 2 × 1/2 wavelength (constructive interference), or a half wavelength, 2 ference of blue light. The topography of a feather or butterfly × 1/4 wavelength (destructive interference) total optical path wing has a periodic nano-construction. Light reflected off the difference upon reflection. Diagram by L. Griffing. scale component in the wing or feather a half a wavelength of blue light away (~200 nm) from another constructively interferes to make it bright blue. The coatings on binoculars and other optical systems used for light detection by eye are set up to interfere with reflection of green light, thereby allowing higher transmission of green light, which is the wavelength best detected by eye (see Figure 9.4; see also Section 4.3, Figure 4.8). The lens surface reflects the wavelengths it does not transmit. In the case of green light transmission, the surface reflects a purple (red/blue) color. Thin film reflectance interference filters transmit light above or below a certain wavelength. If they transmit longer wavelengths of light, they are long-pass filters, while if they transmit shorter wavelengths, they are short-pass filters (Figure 9.16). Because the cut-off is not abrupt, the cutoff value that gives the filter its name (e.g., a 500-nm long-pass filter) is the value at half-maximum transmittance. Dichroic beam splitters, commonly used in epifluorescence microscopy, take advantage of both the reflective Figure 9.16 Transmission of long-pass, short-pass, and and transmittance properties of these filters, reflecting light bandpass filters A dichroic beam splitter reflects the shorter below the cut-off wavelength and transmitting light above wavelengths while transmitting the longer ones, a special case of a long pass filter. Diagram by L. Griffing. the cutoff wavelength. Combining a short-pass filter with
201
202
9 Contrast and Tone Control
Figure 9.17 Production of plane-polarized light with a polarizing filter and the ability of a rotated polarizer to block that light. The unpolarized light is absorbed by the first Polaroid in all planes that vibrate in a direction that has a component parallel to the aligned stretched polymers (horizontal lines in first polaroid), transmitting only light that is at an azimuth at right angles to the stretched polymers. A second Polaroid oriented at right angles to the first absorbs that light. Its stretched polymers (vertical lines) absorb that light, and the crossed Polaroids transmit no light. Diagram by L. Griffing.
a long-pass filter creates a bandpass filter, which allows a certain bandwidth of light with a specified FWHM (see Figure 9.16). Filters with a large (>20 nm) FWHM are wide-pass or wide bandpass filters, while those (Filters>Unsharp Mask, blur = 3 radius, weight = 0.6). (C) Sharpening using a 10-pixel radius high-pass filter (Photoshop command: Filter>Other>High Pass radius = 10) followed by RGB (red, green, blue) histogram stretching. Figure by L. Griffing. Table 10.1 Speed and Programming Efficiency of Widely Implemented Open-Source Image Processing Libraries. C/C++
Julia
Java
Python
CPU speed
Fastest
Slightly slower than C
Up to two times slower than C
Slowest
Memory needs
Low
Higher than C – unclear
Up to two times higher than C
Highest
Programmer efficiency
Medium compiled
Highest: runtime compiled, “just in time”
Higher than C, interpreted, compiled
Highest – interpreted
Image processing library
Open-source OpenCV
Open source, Julia Images
Open source, Java 2D API
Open source, Python Imaging Library
API, Application Programming Interface; CPU, central processing unit.
compiling step to run), thereby providing program output in “real” time. However, this increase in programming efficiency comes at the cost of higher memory usage and slower processing times. Julia is a compiled language, but it is compiled “just in time” so that output is also quick, speeding efficiency. To get the best of both worlds, hybrid programs, such as Java and Visualization Toolkit, use interpreted language “wrappers” that execute programs “on the fly” from non-interpreted, compiled languages. Python has a wrapper for OpenCV, opencv-python, and uses numpy (written in C) extensively in the Python Imaging Library. The niche of Julia is to attempt to avoid needing these kinds of hybrid programs. Table 2.1 (processing) and Table 7.1 (processing and analysis) list the common programs used for different tasks. Most of the filters described here use the open-source packages ImageJ (FIJI) and Icy (Icy also contains embedded ImageJ).
213
214
10 Processing with Digital Filters
10.2 Near-Neighbor Operations Modify the Value of a Target Pixel Similar to the structuring element for morphological image processing (see Section 7.8, Figure 7.21), digital filter operations employ a neighboring range, or radius, of pixels centering on a target pixel. As an example in the following discussion, one of the pixels in a grayscale image of ladybugs has been turned white (given a value of 255) to show it as a target pixel (Figure 10.3). The eight pixels surrounding the white target pixel have different gray values (Figure 10.4A). A set of values in a matrix of pixels surrounding and including a center pixel placed over the target pixel is a nearneighborhood mask. A 3 × 3 near-neighborhood mask (Figure 10.4B) includes the first row of pixels surrounding the target pixel in all directions. The radius of the mask is the distance in pixels from the target pixel to the edge of the mask. A 3 × 3 mask has a radius of 1 pixel. The conventions for pixel radius used in ImageJ (Figure 10.5) are circular masks. Figure 10.3 Example target pixel and neighborhood Circular masks generate less artifact because they are uniform. from an image of ladybugs mating. (A) Grayscale image. A morphological image processing example that generates (B) Zoomed-in image showing pixel. (C) Enlarged pixel artifact from a square mask is anisotropic skeletonization and neighborhood. Image by L. Griffing. (see Section 7.7). Each pixel in the image can serve as a target pixel, except the very edge pixels, where the neighborhood is incomplete. This is why, after most filter operations, the edge values usually remain unchanged.
Figure 10.4 (A) Near-neighbor values of the white pixel in Figure 10.3. The white center pixel has a value of 255. (B) The nearneighbor mask shows the eight pixels surrounding the pixel placed over the target pixel. (C) Centering the mask on the target pixel. (D) Values for the target pixel following rank filtering operations. Image by L. Griffing.
10.3 Rank Filters Identify Noise and Remove It from Images
Figure 10.5 Circular masks of different radii for near neighborhood operations, including rank filters and convolution operations. Image by L. Griffing.
10.3 Rank Filters Identify Noise and Remove It from Images A rank filter (or order-statistics filter) is a special near-neighbor pixel operation that replaces the value of the target pixel with one of the values of the neighboring pixels based on the statistical variation within the neighborhood. The operation starts by ranking, in ascending order, the neighboring pixels in gray level. A neighborhood of pixels surround the target pixel (see Figure 10.3) from the darkest to the lightest, 71, 77, 84, 91, 93, 94, and 115. A rank filter of 3 would give the target pixel the value of 84, the third-place rank. The target pixel in Figure 10.3 has a very high value, 255, like that produced from shot noise in video cameras (see Section 5.11, Panel 5.3). Replacing it with one of the values from the neighborhood removes the shot noise – but which of the neighborhood values to use? Typical rank filters include minimum, maximum, mean, median, and remove outliers. Figure 10.6 shows the consequences of using rank filters on the region in Figure 10.4C. Probably the best filters for removing shot noise (see Section 5.11) are the median filter, the remove outliers filter (a special case of the median filter), and the non-local means denoise filter (a plugin in ImageJ). A maximum filter enhances the shot noise, making it the size of the neighborhood mask (see Figure 10.6B). A minimum filter darkens the image (see Figure 10.6C). A mean filter reduces the intensity spike in the original at the expense of increasing the intensity locally because the “noise” is a high pixel value that contributes the neighboring pixel values (see Figure 10.6D). The median filter eliminates the spike completely without increasing the local intensities (see Figure 10.6E). The despeckle filter in ImageJ is the same as a 1-pixel-radius median filter; compare Figures 10.6E and 10.7B. However, this filter is not ideal because it changes many pixel values that are not noise in the image (Figure 10.7B). To decrease that likelihood, there is the remove outlier filter that only replaces the target pixel with the median value if the difference between it and the median value is above a certain threshold, thereby maintaining the surrounding values as original (Figure 10.7C). A more complicated noise removal rank filter is the top-hat filter, a kind of morphometric image filter (see Section 7.8). This uses two concentric near-neighbor circular masks, an inner one of smaller radius (e.g., 2 pixels) and an outer one of larger radius (e.g., 3 pixels). The inner circle of pixels has high values and is the crown of the hat, while the pixels in the concentric outer circle have low values and are the brim of the hat. The algorithm identifies the most extreme pixel values in the crown and brim. If the difference is greater than a set threshold, the median value of the brim replaces the extreme pixel values. The operation only works on objects that fit inside the crown, so it only detects certain particle sizes such as dust. It also does not operate on objects that have spacing between them that are smaller than the width of the brim. For example, the photograph of a sea crab, Platymaia spp., against a background of floating particles
Figure 10.6 Consequences of a 1-pixel-radius rank filter operations on neighborhood target intensities. (A) Zoomed-in view of region of interest from Figure 10.3. (B) Maximum rank filter. (C) Minimum rank filter. (D) Mean rank filter. (E) Median rank filter. Image by L. Griffing.
215
216
10 Processing with Digital Filters
Figure 10.7 Noise filters. (A) A single target pixel set to 255. (B) Despeckle operation on the entire image. The gray values between the dark and light bands change slightly, becoming more uniform. (C) The remove outliers filter only changes the other pixel values if they fall above a difference threshold with the outlier. In this case, a remove outliers filter set at a 163-difference threshold only changes the target pixel. Image by L. Griffing.
Figure 10.8 (A) A crab, Platymaia spp., swimming in a sea of sea snow (biological debris), which reflects light and creates noise. (B) Result of top-hat noise removal using an outer disk diameter of 30 pixels using MorphoLibJ plugin in ImageJ and converting to an indexed image pseudocolored with Cyan Hot look-up table. (C) Inverted image of A, revealing the color difference of the crab with surround. (D) Same as B but not pseudocolored, revealing the green color of the crab. (E) Result of rolling ball filter (50 pixel) used for background subtraction in ImageJ (Process>Background subtract). (F) Result of top-hat filter using a 5-pixel-diameter disk, which removes the body of crab, as well as the background snow. (A) Courtesy of M. Wicksten. Used with permission.
in the seawater or marine snow in Figure 10.8A produces a gradient of light from the top to the bottom of the image and various out-of-focus blur particles. A top-hat filter with a large structuring element (30-pixel radius) eliminates many of the blur particles and the gradient while retaining the features of the legs of Platymaia (Figures 10.8B and 10.8D). When not pseudocolored, the top-hat filter reveals the true color of the object, which is green in this lighting (see Figure 10.8D). The inverted color table in Figure 10.8C also reveals the complementary red color of the sea crab. A tophat filter with a smaller structuring element (Figure 10.8F) does not retain the body of the crab, just the legs. A similar filter to the top-hat filter is the rolling ball filter. The rolling ball filter flattens the field (i.e., adjusts a graded intensity background to a single flat intensity). This can happen during uneven illumination, uneven reflection against a background of noise (see Figure 10.8A), a gradient of illumination generated by differential interference contrast
10.4 Convolution Can Be an Arithmetic Operation with Near Neighbors
Figure 10.9 The variance filter. (A) A phase contrast image of plant cells processed after background subtraction, which removed dirt specks from the camera. (B) Variance filter of A showing bright regions that result from subtraction of dirt specks from the original image. Image by L. Griffing.
microscopy, or by gradient staining of a polyacrylamide or agarose gel from dye interaction with the gel. It is similar to subtracting a gradient intensity polynomial-fit curve from the background but considers some local variation (see Figure 10.8E). The result is better than a small-diameter structuring element top-hat filter (see Figure 10.8F) but not as good as a top-hat filter with a larger structuring element (see Figures 10.8B and 10.8D). The background subtraction routine in ImageJ uses the rolling ball filter to level backgrounds. A better way to produce a flat field is to take an image of the field (field image) with and without the sample (see Section 10.5) and then subtract the dark image from both and ratio them (see Section 7.9, Panel 7.1 and Section 9.8, Panel 9.2). Averaging multiple images with low signal also reduces noise. A running average, Kalman averaging, is available on most confocal microscopes and as a plugin in ImageJ. Running averages take the incoming image and average it with the prior average, weighting it to the most recent image. It is a form of moving average, which averages subsets of the data. The most effective average is to use the entire data set, as is done in EM sub-tomography averaging (see Section 19.14). Another complicated rank filter that uses two masks is the maximum-likelihood, or Kuwahara filter. A smaller square filter (3 × 3) samples a larger square filter (5 × 5). It calculates the variation within the 3 × 3 mask in each 3 × 3 area (there are nine). The mean value from the region with the lowest variance replaces the target pixel of the 5 × 5 mask. This filter is useful for scanned half-tone prints and other images that have periodic noise. The variance filter assesses variance within the mask and can identify regions that have noise or edges. This filter replaces the target pixel with the variance of the pixels within the near-neighborhood mask. It is an edge detector because edges show large pixel value variance at boundaries. It also identifies prior noise suppression through background subtraction (Figure 10.9).
10.4 Convolution Can Be an Arithmetic Operation with Near Neighbors The optics of the system convolve the object with a blur to generate the image (see Section 8.10). Digital filters can also convolve the image (e.g., blur or sharpen it) by operating arithmetically on the pixels underlying the near-neighbor mask. This contrasts with the rank filters, where the filter assesses pixel values underlying the mask. The masks used in convolution operations are convolution masks, or kernels. Convolution operations are linear filters, while rank operations are non-linear filters. Convolution operations rotate the kernel 180 degrees, while correlation operations (see Section 11.13) do not. The values of the kernel differ depending on the function of the filter, whether it is blurring, sharpening, or edge enhancement. The central value of the convolution mask overlays the target pixel after rotation of 180 degrees, distinguishing it from correlation (see Section 11.13), and multiplies all the values in the mask by the underlying pixel values (Figure 10.10). The operation then sums the values, substituting the final value for the target pixel. As the operation progresses across the whole image, each pixel in the image takes its turn as the target pixel. Different convolution masks have different effects. The example in Figure 10.10 is a sharpening filter. Sharpening filters, which include Laplacian filters, enhance detail, giving a higher number to the center pixel in the kernel, thereby
217
218
10 Processing with Digital Filters
Figure 10.10 Convolution operation that changes the value of a target pixel with a sharpening filter. Actual convolution rotates the kernel 180 degrees, but because most of these are symmetrical, that is not in the diagram. Image adapted from Bradbury, S. 1989. Micrometry and image analysis. In Light Microscopy in Biology: A Practical Approach. First Edition. Ed. by Lacey, A.J. IRL Press, Oxford, UK.
10.4 Convolution Can Be an Arithmetic Operation with Near Neighbors
emphasizing it (Figures 10.11B, 10.11D, 10.12B, and 10.12C). Blurring or smoothing filters, which include Gaussian filters, suppress detail by making pixels more similar to the pixels surrounding them (Figures 10.11C and 10.12D). In a box filter, the values for the convolution kernel are the same (Figures 10.11C and 10.12D). Box filters deemphasize the target pixel, smoothing the image. Filters that combine sharpening with smoothing, Laplacian of Gaussian (LoG) filter, use the blur of the Gaussian to minimize previous artifacts (e.g., jpeg quantization artifacts; see Section 2.14; Figure 10.11C) followed by sharpening to enhance the image (Figure 10.11D). The Gaussian filter or smoothing filter can be similar to the mean rank filter. In fact, the default smoothing filter in ImageJ is a filter that simply replaces the value of the target pixel with the average pixel value of the mask (Figure 10.12D). In this case, the convolution mask has values of 1/9 or, as shown, 1/9 times a mask, where all values are 1. Examining the pixels in the neighborhood of the bright target pixel in Figures 10.6 and 10.12 reveals that the mean and smoothing filters are identical in this case. Figure 10.11 Sharpening, smoothing, and Laplacian of Gaussian. The convolution kernels for the operations are in light red. (A) Original de-interlaced jpg from satellite transmission of a bear image. (B) Sharpened image using a Laplacian filter. The 8× 8-pixel jpg pixel compression becomes obvious. (C) Smoothed image using a Gaussian filter. (D) Sharpened image of C, known as the Laplacian of the Gaussian, which sharpens the image while not emphasizing the 8- × 8-pixel quadrants in B. Image by L. Griffing.
Figure 10.12 Effect of sharpening and smoothing kernels on pixel neighbors. (A) Original from Figure 10.4. (B) Laplacian sharpening using a low-value convolution filter. (C) Laplacian sharpening using a high-value convolution filter. (D) Smoothing using same averaging convolution filter as in Figure 10.11. Image by L. Griffing.
219
220
10 Processing with Digital Filters
The term Gaussian filter comes from the Gaussian function, a function representing a curve that, when its underlying area (integral) is 1, is the probability density function of a normally distributed random variable (Figure 10.13). With larger standard deviations in the distribution, the curve flattens and gets broader (compare the low-blur curve with high-blur curve in Figure 10.13B). The blur from a point source, the zero order of a point spread function (PSF), approximates as a two-dimensional (2D) Gaussian curve (see Section 18.9). The Gaussian filter adds to the blur generated by the PSF. Subtraction of Gaussian blur from an image sharpens it; this is the unsharp mask filter. It applies a Gaussian blur to an image, subtracts the blurred image from the original, and adds the result back to the original (see Figures 10.2B and 10.14B). A sharpening filter for noisy images is the difference of Gaussians (DoG) filter or the Mexican hat filter. It adds one small Gaussian to reduce noise in the original image, then subtracts the blur generated by another, larger Gaussian (Figures 10.13 and 10.14C). The Mexican hat name refers to the shape of a sombrero – a tall crown with a depressed rim that turns up at the edges, a shape that is mimicked by the shape of the filter: high numbers in the center, low numbers immediately surrounding the center region, and then slightly higher numbers again (see Figure 10.13B). The unsharp mask filter is very popular. However, over-sharpening produces a bright ring around a dark object and undesirable contouring (see Section 3.9). Knowing that there may be better ways than blurring to remove noise may limit the use of the DoG or Mexican hat filter. The Laplacian filter sharpens an image by increasing image contrast at edges (see Figures 10.11 and 10.12B and C), enhancing places where there is a large change in pixel values over short distances. It is, in fact, an approximation of the
Figure 10.13 (A) An 11 × 11 Gaussian convolution mask represented as pixel intensities. (B) High-blur (blue) and low-blur (red) filter values. Subtracting the high blur from the low blur produces a filter called the difference of Gaussians or Mexican hat filter (green). It shows a curve similar to the cross-section of a sombrero. Image by L. Griffing.
10.5 Deblurring and Background Subtraction Remove Out-of-Focus Features from Optical Sections
Figure 10.14 Sharpening, derivative, and edge filters. (A) Original image (Figure 10.4). (B) Unsharp masking using a 1-pixel-radius unsharp mask with a 0.6 threshold. (C) A Mexican hat or difference of Gaussians edge filter amplifies the edge differences. (D) Lowvalue north-west shadowing filter. © High-value north-west shadowing filter. (F) A Sobel edge detector used in the ImageJ Detect edges command. (G) A simple derivative filter from FeatureJ (FIJI plugin) flattens the background. (H) The Canny edge detector uses the Gaussian derivative filters in FeatureJ (FIJI plugin). Image by L. Griffing.
second derivative of pixel value in the x and y directions, the definition of the Laplacian operator in Cartesian coordinate systems. Hence, it falls into a more general class of filters, the derivative filters. Derivative filters find edges in a sample in different directions (Figures 10.14D, E, and G). The gradient or shadowing filter weights one corner of the kernel with higher values than the diagonal corner of the kernel. The gradient filter is a special diagonal form of the derivative filter (Figure 10.14D and E). The cross filter emphasizes two different directions in the same kernel (Figure 10.14G). The Sobel filter uses the cross filter and takes the square root of the sum of the derivatives in different directions (Figure 10.14F). The Kirsh filter takes the derivative in all eight directions while keeping the maximum value for the target pixel. The trace edges operation or Canny edge detector (Figure 10.14H) uses two Gaussian derivative filters, one in the horizontal and one in the vertical direction. The larger of the two derivatives sets the new value of the target pixel if it is above or below a threshold value.
10.5 Deblurring and Background Subtraction Remove Out-of-Focus Features from Optical Sections In the filters discussed so far, the operations are only on 2D images. However, if the sample has any thickness, out-of-focus light from the adjacent optical sections will contribute to the blur of the in-focus image. Nearest-neighbor deblurring uses the two adjacent optical sections as the primary sources of the out-of-focus light. In practice, the nearest-neighbor deblurring algorithm blurs the neighboring two optical sections with a digital blur filter, then subtracts the blurred plane from the in-focus plane. A practical version of this deblurring uses the focus of the microscope to blur the adjacent optical sections or the sample itself, then subtract the out-of-focus image from the in-focus image. It is a background subtraction approach. This approach can remove out-of-focus dirt and other optical imperfections (Figure 10.15). It can also remove dirt and imperfections by subtracting an image without the sample from the image with the sample. This requires some care removing or introducing the sample to avoid introducing other contaminants into the field of view.
221
222
10 Processing with Digital Filters
Figure 10.15 Out-of-focus background subtraction. (A) A sample of a thin section of skeletal muscle taken with differential interference contrast optics. (B) Contrast-enhanced version of A. (C) An out-offocus image taken after defocusing the microscope. (D) Out-of-focus image subtracted from the in-focus image. Image adapted from Weiss, D.G., Maile, W., and Wick, R.A. 1989. Video microscopy. In Light Microscopy in Biology: A Practical Approach. First Edition. Ed. by Lacey, A.J. IRL Press, Oxford, UK.
10.6 Convolution Operations in Frequency Space Multiply the Fourier Transform of an Image by the Fourier Transform of the Convolution Mask Each image has fine and gross features that occupy separate parts of the spatial frequency spectrum or k-space (see Sections 1.2 and 8.9). The fine features have high spatial frequencies, and the gross features have low spatial frequencies. In microscopy, the diffraction pattern of the specimen can be found in the rear focal plane of the objective lens and is the physical correlate of the Fourier transform of the object (see Section 8.1, Figure 8.1 and Section 9.9, Figure 9.19). Placing an iris there and closing it down will eliminate the higher spatial frequencies first, then lower spatial frequencies as it closes. The closed configuration is a low-pass filter because it only transmits the lower spatial frequencies. Alternatively, a center stop that blocks the low spatial frequencies while transmitting the high spatial frequencies is a high-pass filter. To achieve the same thing in frequency space, make a fast Fourier transform of the image followed by masking either the high-frequency components in the low-pass filter or the low-frequency components in the highpass filter (Figure 10.16). An inverse fast Fourier transform produces the final filtered image in real space. The highpass filter shows the edges of the sample and is an edge filter. High- and low-pass filters are equivalent to some near-neighborhood edge enhancement and smoothing filters. A Fourier filter that cuts out high and some of the low frequencies looks like a donut in Fourier space. It is a bandpass filter, showing only those intermediate frequencies in the inverse Fourier transform.
10.6 Convolution Operations in Frequency Space Multiply the Fourier Transform of an Image by the Fourier Transform of the Convolution Mask
Figure 10.16 High- and low-pass filters of a Fourier transform of a Volvox colony. IFFT, inverse fast Fourier transform. Photo by Michael Clayton, https://digital.library. wisc.edu/1711.dl/P2H3MR3KTAVCY9B used with permission. The fast Fourier transform (FFT) algorithm produces a Fourier transform of an image. If the low-frequency center region of Fourier transform is masked (black, high-pass filter), then only the high-frequency components of the image remain. Using an IFFT of the high-pass filter preserves the high-frequency edges of the Volvox colonies. On the other hand, masking the high-frequency, peripheral regions of the FFT (black, low-pass filter) produces a blurred image upon IFFT transformation.
Figure 10.17 The convolution mask is the Fourier transform of the Fourier mask. Convolution does not require Fourier processing. It can be a near-neighbor pixel operation. FFT, fast Fourier transform; IFFT, inverse fast Fourier transform; MFT, modulation transfer function; PSF, point spread function. Library of Congress / Wikimedia commons / Public domain.
Figure 10.17 (see also Figure 8.24) shows how the PSF convolves the image in real space and the modulation transfer function (MTF) convolves the image in k-space or frequency space. Comparing it with the masking operations shown in Figure 10.16 reveals that the MTF acts as a low-pass filter: multiplication by 0, in the black regions of the MTF, sets the peripheral part of the Fourier transform of the image to 0, thereby masking it. The regions inside the MTF Fourier mask in Figure 10.17 and the low-pass filter mask in Figure 10.16 are different, however. The MTF is a cone-shaped function (see Section 8.9, Figure 8.21), with high values at the center, whereas the low-pass mask applied in Figure 10.16 just keeps the values inside the mask the same. Nonetheless, both filters smooth or blur the image because the high-frequency components are lost. The Fourier transform of the real space convolution mask is the equivalent Fourier mask. Consequently, any filter protocol that uses near-neighbor operations also works in the Fourier domain. Multiplication is a simpler operation than doing the weighted average convolution operations in real space. Consequently, many digital filters use the Fourier domain for processing.
223
224
10 Processing with Digital Filters
Figure 10.18 Graphical representation of the projection slice theorem. A one-dimensional (1D) Fourier transform of a back-projection profile through a magnetic resonance imaging slice produces equivalent values to a line in a two-dimensional (2D) Fourier transform of the slice. From Taubmann, O., Berger, M., Bögel, M., Xia, Y., Balda, M., and Maier, A. (2018) Computed tomography. In Medical Imaging Systems. Lecture Notes in Computer Science. Volume 11111. Ed. by Maier A., Steidl S., Christlein V., and Hornegger J. Springer, Cham, Switzerland. https://doi. org/10.1007/978-3-319-96520-8_8. https://link.springer.com/book/10.1007%2F978-3-319-96520-8. Creative Commons 4.0.
10.7 Tomographic Operations in Frequency Space Produce Better Back-Projections Back-projection takes advantage of the projection-slice theorem (or central slice theorem or Fourier slice theorem), which states that the one-dimensional (1D) Fourier transform of the projection is equivalent to the 1D slice of a 2D Fourier transform of the image (Figure 10.18). However, there is blurring produced by convolving the object with the inverse function 1/r in r is which the radial distance in the Fourier domain. Multiplying the Fourier transform of each projection by a Fourier ramp filter or bandpass filter reduces the low-frequency blur and the high-frequency noise (see Section 6.6, Figure 6.14).
10.8 Deconvolution in Frequency Space Removes Blur Introduced by the Optical System But Has a Problem with Noise The convolution operation, or blurring function, operates on a non-blurred object to produce a blurry image: i = o ⊗ s,
(10.1)
in which o is the object, i is the image, and s is the PSF. In optics (see Section 8.10), this is a measure of how much the lens and other factors degrade the image. Convolution in Fourier terms (bold shows the matrix representation of the Fourier transform) is I = O S,
(10.2)
in which I is the Fourier transform of the image, O is the Fourier transform of the object, and S is the MTF. A very powerful approach to restoring object features to the image is to take away the blur with a Fourier filter, which is the inverse of S, or G, which is 1/S: O = I G,
(10.3)
10.8 Deconvolution in Frequency Space Removes Blur Introduced by the Optical System But Has a Problem with Noise
in which G = the inverse transform, or the inverse of the MTF. This approach is deconvolution. Deconvolution uses the inverse transform to deconvolve an original image convolved by a PSF. Example open-source deconvolution programs are plugins to ImageJ (the Deconvolution2 plugin) and Icy (EpiDEMIC Blind deconvolution). Besides the blur introduced by the optics and the out-of-focus light in both the lateral and axial dimensions, there is also image noise. Above the additive Gaussian noise floor of the imaging device, noise is shot noise or probabilistic (Poisson) noise arising from the Poisson distribution of photons on the image sensor (see Section 5.11, Panel 5.3). Noise is of particular importance in deconvolution because it occurs at high spatial frequencies (e.g., single pixels). The MTF is like a low-pass filter, so its inverse is like a high-pass filter. Because the key to deconvolution is the use of the inverse of the MTF (more properly, the optical transfer function; see Section 8.9) to deconvolve the image, it has the characteristics of a highpass filter. High-pass filters emphasize high-frequency components of the image, which include the noise. The Wiener filter implements the inverse filter while minimizing the mean square error between the unblurred (undegraded) image and the blurred (degraded) image using estimates of the noise and the amount of image degradation. This is not the best solution because the values of the noise and the undegraded image are just a guess. Other deconvolution approaches require knowledge of the mean and variance on the noise rather than the noise values themselves. These can be determined rather than guessed. The Tikhonov regularization (TR) algorithm addresses the noise problem through regularization or constrained least-squares filtering, a way to keep the noise from amplifying. It provides an approximation of the input object while reducing the amount of noise. It uses a least-squares approach to minimize a cost function (i.e., the difference between the blurred object and a blurred estimate of the object). In the example comparing the acquired image (Figure 10.19A) with the TR deconvolution (Figure 10.19B), loss of some interior components are evident, like the absence of the lower nucleus in the axial projection of the TR deconvolution. Also, some of the microtubules radiating from the peri-centriolar regions are lost. The Landweber (LW) (Figure 10.19C) and iteratively constrained Tikhonov-Miller (ICTM) (see Figure 10.21C) methods minimize the same least-squared cost function but do so iteratively and with additional constraints, such as non-negativity. In the LW deconvolution in Figure 10.19C, the boundaries between the cells in the embryo have relatively high contrast compared with the other forms of deconvolution. As shown in Table 10.2, the structural similarity index (SSIM; see Section 3.10) is lower for the LW and ICTM algorithms than for the TR algorithm, but they have the advantage of maintaining axial features (the nucleus in ZY). These algorithms assume that the noise is additive (Gaussian) and not shot noise. If just additive noise is considered, then I = OS + n, in which n is the noise, and therefore the problem is a linear one in k space. However, in low-signal, photon-limited imaging, the Poisson noise component may predominate, requiring statistical methods to treat it. These methods achieve a statistical approximation of the true specimen image using the probability distribution (or density) of photons coming from the object. The Richardson-Lucy algorithm estimates this probability based on the initial guess of the specimen image. It is constrained to non-negative initial guesses but not regularized with a cost function. Instead, it proceeds until it minimizes the mean standard error between the statistical approximation of the image and the true image. Stopping at this point keeps the algorithm from over-running, which produces noise amplification. The Richardson-Lucy deconvolution in Figure 10.19D shows a narrower z-dimensional spread than Figure 10.19C, presumably because the axial PSF is better approximated. It also produces a higher SSIM value (see Table 10.2). A regularized version of Richardson-Lucy deconvolution might be slightly better (see Table 10.2). In this case, calculation of the cost function between the original and estimated images uses the norm of the gradient of the signal. Finally, blind deconvolution, as in, for example, the “learn 2D, apply 3D [three-dimensions]” algorithm (Figure 10.20B), is an attractive approach that deals with the uncertainty of achieving an accurate PSF when the sample is in place. Thick samples produce spherical and other aberrations caused by refractive index mismatch that are hard to model without adaptive optics (see Section 8.13, Figures 8.34 and 8.39). Blind deconvolution does not depend on a supplied, previously measured PSF, and instead estimates the PSF and the 3D image of the specimen simultaneously from the acquired image. Computing both the PSF and the image takes longer than other deconvolution approaches. It is a reiterative approach in which the recorded image is the object estimate in the first iteration. A theoretical PSF calculated from the system optics convolves this estimate. Comparing the result of this convolution with the raw image results in a correction factor. The next iteration produces a new image and PSF incorporating the correction factor and so on. Each iteration can include further constraints, such as setting all values above the MTF cutoff to zero (bandwidth constraint). The blind deconvolution available through the Icy interface EpiDEMIC is a version of learn 2D, apply 3D. It involves three steps: (1) deconvolution using total variation regularization, (2) denoising of the deconvolved image using learned sparse coding, and (3) deconvolution using the denoised image as a quadratic prior. The first step uses a maximum a posteriori method. This means that prior knowledge about the true image is a probability density function. The selection of a
225
226
10 Processing with Digital Filters
Figure 10.19 Comparisons of maximum intensity projections of deconvolved Caenorhabditis elegans embryos. (A) Acquired image. (B) Tikhonov regularized deconvolution. (C) Landweber deconvolution with 200 iterations. (D) Richardson-Lucy deconvolution with 200 iterations. Blue, nuclei; green, microtubules; red, actin. Scale bar = 10 µm. The images have a gamma correction to enhance contrast. From Sage, D., Donati, L., Soulez, F., et al. 2017. DeconvolutionLab2: An open-source software for deconvolution microscopy. Methods 115: 28–41. Used with permission from D. Sage.
Table 10.2 Difference Between a Reference Synthetic Image and the Deconvolved Image Using Different Open-Source Deconvolution Algorithms as Measured by the Structural Similarity Index. Algorithm
Structural Similarity Index
Tikhonov regularization
0.0248
Landweber
0.0206
Iterative constrained Tikhonov Miller
0.0205
Richardson-Lucy
0.0330
Richardson-Lucy with total variation regularization
0.0334
Learn 2D, apply 3D
0.0673
2D, two dimensions; 3D, three dimensions.
10.8 Deconvolution in Frequency Space Removes Blur Introduced by the Optical System But Has a Problem with Noise
prior probability density function is generally difficult. Here, it minimizes a cost function that has a data fitting or fidelity term that measures how well the model represents the data and a regularization function that enforces prior conditions. The regularization used assumes mostly lowfrequency (smooth) structures. The second step uses a dictionary of patches of the 2D image to help denoise the 3D PSF using a machine learning algorithm. The third step maintains non-negative results. The user-defined values include the size of the patch used in the library, the number of patches in the library, and a “hypervariable” that is a function of patch size. The SSIM (see Section 3.10) is a measure of the fidelity of different deconvolution algorithms (see Table 10.2). Blind deconvolution – learn 2D, apply 3D – has the highest SSIM, so it is the most faithful. It compares favorably with superresolution structured illumination microscopy of the same cell (Figure 10.20C). However, it does take about four times longer (2 hours) to compute than the Richardson-Lucy deconvolution (0.5 hours) using the same computer. Commercial blind deconvolution also produces better images than other forms of commercial deconvolution (Figure 10.21). An open-source, 3D blind deconvolution approach, 3Deconv.jl, written in Julia addresses two areas of weakness in deconvolution software. They are low signal-to-noise ratio (SNR) and z-dimensional deformation that result from that dimension’s larger PSF (see Section 8.7, Figure 8.16) that is visible when reconstructing the z dimension from a hollow latex sphere (Figure 10.22). In low-SNR images (e.g., a 20-photon signal), camera read noise makes a significant contribution, so addressing both Gaussian noise and Poisson noise is desirable. In Figure 10.22, the blind version of the Microvolution software (commercial) and the depth varying version of the Huygens software (commercial) both produce similar results to 3Deconv.jl. As evident if Figure 10.22, the axial, z-dimensional reconstruction is superior using the 3Deconv.jl algorithm. However, the runtime for the program is an order of magnitude longer than the depth-invariate Huygens and non-blind Microvolution software. Because deconvolution algorithms and confocal microscopy (see Sections 6.9, 17.8, and 17.11) both result in sharper images, is there any advantage of one over the other? Each has its own advantages and disadvantages. Confocal microscopy discards out-of-focus light by blocking it with a pinhole, whereas widefield deconvolution reassigns out-of-focus light to its original plane. Because confocal microscopy blocks light and wide-field deconvolution preserves light, widefield deconvolution is more sensitive and has the potential to be more quantitative at low signal strengths than confocal microscopy. However, accurate quantitation depends on the ability of the
Figure 10.20 Drosophila S2 cell line labeled with Alexa Fluor 488 anti-tubulin. (A) Acquired image without superresolution. (B) Deconvolved image with learn 2D (two dimensions), apply 3D (three dimensions) deconvolution. The widefield data (A) deconvolved by the structured illumination microscopy data provided an estimate of the initial point spread function. (C) Structured illumination microscopy (Zeiss Elyra) of the same cell, a form of superresolution microscopy (see Section 18.7). Adapted from Soulez, F. 2014. A “Learn2D, Apply3D” method for 3D deconvolution microscopy. IEEE Symposium on Biomedical Imaging 2014. IEEE. p. 1075–1078.
227
228
10 Processing with Digital Filters
Figure 10.21 Madin-Darby canine kidney cell labeled on the plasma membrane with concanavalin A fluorescein isothiocyanate. Each row shows deconvolution of bottom, middle, top, and side of cells. The inset shows the line of the plotted intensity scan. The right-hand side shows the name of the algorithm implemented. (A) Acquired images. (B) Deconvolved images with a deconvolution algorithm by Metamorph (Universal Imaging) software. (C) Deconvolved images with iterative constrained Tikhonov-Miller (see Table 11.2 for comparisons) algorithm using Huygens (Scientific Volume Imaging) software. (D) Deconvolved images with blind deconvolution by Autoquant (Imaris) software. Adapted from Sibarita, J.-B. 2005. Deconvolution microscopy. Adv Biochem Engin/ Biotechnol 95: 201–243. Used with permission.
Figure 10.22 Deconvolution of the XY and XZ midsections of focal stacks containing a 6-μm hollow microsphere. The cross-sections are from 256 × 256 × 57 voxel focal stacks. Scale bar: 10 μm. Low-noise images have high photon counts, while high-noise images have low photon counts. Huygens, ER-Decon 2, and DeconvolutionLab2 deconvolution used a 3.6-GHz Intel Core i7-4790 central processing unit. Huygens and 3Deconv.jl also used the GPU of an NVIDIA Quadro K6000. Microvolution used a 3.4-GHz Intel Xeon E5-2687 v2 and an NVIDIA Quadro K5000. Runtimes for deconvolving an identical data set were 84.2 s for 3Deconv.jl (140 iterations), 2.4 s for depth-invariate Huygens (32 iterations), 4.7 s for non-blind Microvolution (100 iterations), 121.1 s for ER-Decon 2 (100 iterations), and 284.8 s for DeconvolutionLab2 (100 iterations). Adapted from Ikoma et al. 2018. A convex 3D deconvolution algorithm for low photon count fluorescence imaging. Scientific Reports 8, Article number 11489. Creative Commons 4.0.
Annotated Images, Video, Web Sites, and References
algorithm to maintain structures and not lose them, as happened with the z-dimensional nucleus in Figure 10.19B. Confocal microscopy, on the other hand, has the advantage of dealing with the light from a single optical section rather than an axial, z-dimensional, stack of images, often used for widefield deconvolution. The time it takes to process the images is much longer for widefield deconvolution. Perhaps the best solution is to deconvolve confocal data. If the SNR is a problem in confocal microscopy, opening the confocal pinhole and collecting more light will give a higher SNR image (see Section 17.11), which deconvolution can then make sharper.
Annotated Images, Video, Web Sites, and References 10.1 Image Processing Occurs Before, During, and After Image Acquisition Benchmarks for C, Julia, Java, and Python, as well as MATLAB, Octave, and R are at https://julialang.org/benchmarks. Figure 10.1 is from the Canon 500D user manual.
10.2 Near-Neighbor Operations Modify the Value of a Target Pixel Near-neighbor 3D operations use pixels or voxels above or below the target pixel specified (see Section 13.3). 3D operations in k-space use Fourier filters, such as some deconvolution filters.
10.3 Rank Filters Identify Noise and Remove It from Images Rank filters are accessible in Icy through the Filter toolbox and specifying the selective filter type. Order statistics filters are in Gonzalez, R.C. and Woods, R.E. 2002. Digital Image Processing. Second Edition. Prentice Hall, Upper Saddle River, NJ. pp. 119–134 and 230–243. A comparison of the median filter, the smoothing filter, and the non-local means filter is in Aaron, J. and Chew, T.L. 2021. A guide to accurate reporting in digital image processing—can anyone reproduce your quantitative analysis? Journal of Cell Science 134(6): jcs254151. doi: 10.1242/jcs.254151. PMID: 33785609. https://imagej.net/plugins/non-local-means-denoise/.
10.4 Convolution Is an Arithmetic Operation with Near Neighbors For more detailed information, see the primary references on Erik Meijering’s Feature J web page; click the links for each filter at https://imagescience.org/meijering/software/featurej. Smoothing and sharpening filters in the spatial domain are in Gonzalez, R.C. and Woods, R.E. 2002. Digital Image Processing. Second Edition. Prentice Hall, Upper Saddle River, NJ. pp. 119–134.
10.5 Deblurring and Background Subtraction Remove Out-of-Focus Features from Optical Sections Background subtraction is in Gonzalez, R.C. and Woods, R.E. 2002. Digital Image Processing. Prentice Hall, Upper Saddle River, NJ. pp. 110–112.
10.6 Convolution Operations in Frequency Space Multiply the Fourier Transform of an Image by the Fourier Transform of the Convolution Mask Low-pass, high-pass, Butterworth, and bandpass filters in the frequency domain are in Efford, N. 2000. Digital Image Processing: A Practical Introduction using Java. Addison-Wesley, Harlow, UK. pp 212–222. Low-pass, high-pass, and Butterworth filters are in Gonzalez, R.C. and Woods, R.E. 2002. Digital Image Processing. Prentice Hall, Upper Saddle River, NJ. pp. 167–185 The correspondence between filters in the spatial domain and the frequency domain is in Gonzalez, R.C. and Woods, R.E. 2002. Digital Image Processing. Prentice Hall, Upper Saddle River, NJ. pp. 161–166.
229
230
10 Processing with Digital Filters
10.7 Tomographic Operations in Frequency Space Produce Better Back-projections Tomographic operations are not just important in scanning computed tomography. Tomographic operations in cryo-electron microscopy are also very important (see Sections 19.13 and 19.14).
10.8 Deconvolution in Frequency Space Removes Blur Introduced by the Optical System But Has a Problem with Noise The review of the Deconvolution2 plugins is Sage, D., Donati, L., Soulez, F., et al. 2017. DeconvolutionLab2: An opensource software for deconvolution microscopy. Methods 115: 28–41. Figure 10.20 and the data in Table 10.2 based on their reference image are from that review and are gratefully acknowledged. The description of the learn 2D, apply 3D blind deconvolution is from Soulez, F. 2014. A “Learn2D, Apply3D” method for 3D deconvolution microscopy. IEEE Symposium on Biomedical Imaging. pp. 1075–1078. Figure 10.21 is from that review. Clear mathematical treatment of several types of deconvolution is in Merchant, F.A. 2008. Three-dimensional imaging. In Microscope Image Processing. Ed. by Q. Wu, F. Merchant, and K. Castleman. Elsevier/Academic Press, Amsterdam. pp. 329–399.
231
11 Spatial Analysis 11.1 Affine Transforms Produce Geometric Transformations The geometric transformations considered here are transformations of objects based on a coordinate system. Generally, this is a coordinate system that uses Cartesian coordinates of mutually perpendicular (orthogonal) parallel lines. Either orthographic projection of parallel lines or perspective projection of convergent lines (see Section 4.7, Figure 4.14) can represent the coordinate systems in two dimensions (2D) or three dimensions (3D) (see Section 13.1, Figure 13.3). To translate (move), scale, skew (shear), or rotate, an object with computation requires an affine transformation (Figure 11.1). Affine transformations are matrix multiplications. If the image is a 2D array of pixels represented by numbers, or a pixel matrix, then it makes sense to use matrix multiplication to change the relative alignment of the pixels. The objects subject to affine transformations vary, depending on the software. They can be image layers in, for example, Photoshop or GIMP2. They can be regions within images (see Section 7.3). The Edit>Transform command in Photoshop transforms segmented regions placed into new layers (outline a selection and choose Layer>New>Layer via Copy or Layer via Cut). In ImageJ, the Edit>Transform command produces some of the affine transforms, but others are in plugins (e.g., skew in TransformJ plugins).
11.2 Measuring Geometric Distortion Requires Grid Calibration Image capture and display may produce geometric distortion, or morphing, of an object (see also morphing to fill in time-intermediates in Section 12.2). A particularly good example of optically generated geometric distortion is that caused by the inherent spherical aberration of electromagnetic lenses in an electron microscope. Electromagnetic lenses focus the radiation differently depending on where it passes through the lens (see Section 8.11). This magnifies the edges of the image more than the center, producing pincushion distortion. Overcoming this distortion optically involves a de-magnifying projector (or intermediate) lens with compensating barrel distortion. Overcoming it digitally would involve using the barrel distortion matrix transformation (see Figure 11.1). Geometric decalibration compensates for distortion and can provide a measurement of the geometric distortion of an optical system. It compares the image of a fiducial object, a grid of dots or lines with known spacing, with the image of the sample taken with the same optical system. Plotting the identifiable points in the sample, control points or landmarks, on the grid image produces a measure of the distances between the landmarks. Matching the landmarks to an undistorted image of the fiducial object by stretching or shrinking the intervening regions generates an image of the sample with less distortion. This is rubber sheeting or unwarping, and it aligns or registers two images (Figure 11.2).
11.3 Distortion Compensation Locally Adds and Subtracts Pixels Distortion compensation adds or subtracts pixels by interpolation. The standard way to interpolate pixels is by backward mapping, or target-to-source mapping. This assigns every pixel in the target, or output image, to a location on the source image, the image resampling operation in Figure 11.1. The pixel values in the output image are often combinations of Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
232
11 Spatial Analysis
Figure 11.1 Geometric operations transform images by changing coordinate orientation through translation and rotation and coordinate shape through scaling, skewing, and curved or linear deformation. Image resampling interpolates new points between others. Resampling also occurs during the measurements of morphometrics in biological research. Statistical analysis of spatially related geometrical points can provide a measure of image similarity or correlation. From Meijering, E. and van Cappellen, G. 2006. Biological Image Analysis Primer. https://imagescience.org/meijering/publications/1009. Used with permission.
pixel values in the source image, or interpolated values. The degree of accuracy of the interpolation depends on the approach used for calculating the new pixel values (Figure 11.3). Nearest-neighbor interpolation employs two adjacent gray value numbers and is the fastest but least accurate choice for interpolation. Bilinear interpolation employs four nearneighbor pixel values and has medium accuracy. Bicubic interpolation employs eight near-neighbor pixel values and has the highest accuracy. With increasing accuracy comes increasing computation time. In 3D space, the interpolation algorithms differ and have different names reflecting their 3D nature. Trilinear interpolation is the industry standard. Tricubic interpolation takes almost an order of magnitude longer (1.1 frames/sec compared with 7.5 frames/sec on the same computer). Modified trilinear interpolation takes less time (5.5 frames/sec) but can approximate the detail preserved with tricubic interpolation.
11.4 Shape Analysis Starts with the Identification of Landmarks, Then Registration Geometric decalibration is a special case of image registration. Registering images requires identification of features common to all of the compared images, or landmarks. Shape decomposition can identify simple landmarks. Shape decomposition reduces shapes to a series of curves, such as convex curves. Imposing convex curves on complex perimeters produces convex hulls (Figure 11.4; see Section 7.10). Convex curve decomposition in 3D provides smooth surfaces for physical interaction between models. Discrete curve evolution approaches identify these convex curves and lines (skeletons), connecting them even in the presence of some concavities.
11.4 Shape Analysis Starts with the Identification of Landmarks, Then Registration
Figure 11.2 Aligning two histological sections of a mammary duct using BUnwarpJ. (A) Source image. (B) Target image. (C) Target aligned with source. (D) Source aligned with target. From Arganda-Carrerras, I., Sorzano, C.O.S., Marabini, R., et al. 2006. Consistent and elastic registration of histological sections using vector-spline regularization. Lecture Notes in Computer Science, Springer Berlin / Heidelberg, volume 4241/2006. Computer Vision Approaches to Medical Image Analysis. pp. 85–95. Figure 11.3 Increasing accuracy of interpolation with near-neighbor, bilinear, and bicubic algorithms backward mapping to the original image. Image by L. Griffing.
Landmarks include other points than just the intersections of convex hulls. For biological species representation, they are anatomical loci that all the samples recognizably share. They are often points of homology between species. However, homologous points may be sparse. Semi-landmarks, which could be vertices of complex curves or evenly spaced points along a curve between two landmarks, achieve comprehensive coverage. Identified landmarks need to be repeatable between users and approaches. Finally, landmarks need to have consistent relative positions (e.g., they can’t swap positions with other landmarks). Other ways of image and object registration extract features that have a certain amount uniqueness in the image. A powerful image correspondence routine is SIFT, for scale-invariant feature transform, which detects scale-invariant features and compares them between images. Scale invariance means that it looks the same no matter what size it is. It does
233
234
11 Spatial Analysis
Figure 11.4 Convex curve shape decomposition in two and three dimensions. (A) to (D) Shape decomposition of a fish using convex hulls. (A) Outline of piranha. (B) First set of landmarks (black) producing two convex curves. (C) Second set of landmarks (red). (D) Third set of convex curves (green). (E) to (G) Automated convex curve generation of an elephant outline (E) with a specific concavity threshold. (F) Segmented convex regions. (G) Convex curves of an elephant. From Lu et al. 2010 / IEEE. (H) to (I) Hierarchical convex shape decomposition. (H) A three-dimensional (3D) model. (I) The model decomposed into 3D convex shapes. From Khaled Mamou. Used with permission.
Figure 11.5 Registration result from elastic registration from the TrakEM2 plugin in FIJI. (A) The top is the original data set. (B) The bottom is the translated and rotated data set. The realignment of the frames uses scale-invariant feature transform (SIFT) described by Saalfield, S., Fetter, R., Cardona, A., and Tomancak, P. 2012. Elastic volume reconstruction from series of ultra-thin microscopy sections. Nature Methods doi:10. 1038/nmeth.2072. Images are from a GIF sequence in https://imagej.net/Elastic_Alignment_and_Montage. Used with permission.
the analysis in scale space, the space representing images scaled to different degrees. SIFT finds key points or landmarks that are unique image features in all scale spaces. A feature detector inspired by SIFT, speeded up robust features (SURF), is also scale invariant, but the calculations are several times faster. One of the programs that automatically registers images using SIFT is the elastic registration plugin in FIJI, TrakEM2 (Figure 11.5). It can align complicated thin sections of electron micrographs. The program rotates and translationally shifts the images in a larger sized canvas to line up correspondences. Image stabilization, digitally steadying a camera, uses similar approaches (see Section 12.2). These routines center a frame on a particular point over time.
11.5 Grid Transformations are the Basis for Morphometric Examination of Shape Change in Populations D’Arcy Thompson, in his classic book, On Growth and Form (1917), pioneered shape-based comparison of organisms. He examined different genera based on the relatedness of their adult forms. His thesis was that if an appropriate mapping function could derive a subtype from a “type” or ancestral form, then the subtype evolved from the type. He provided many
11.5 Grid Transformations are the Basis for Morphometric Examination of Shape Change in Populations
examples of these mapping functions (Figure 11.6). The genetic and molecular mechanisms of how shape changes with evolution remain interesting fields of research. Quantifying shape and shape change is the field of morphometrics. Coordinate morphometrics is an outgrowth of Thompson’s analysis and is a common way to describe variation in shape in natural populations, both between species and ontogenically within species as they grow. The definition of shape is the geometry of the object that is translationally, rotationally, and scale invariant. The scale invariance is a little problematic in analysis of natural populations because sometimes shape is a function of size, a principle called allometry. As described earlier, landmarks should be easily identifiable and repeatable. For 2D analysis, they should lie in the same plane. The “name landmarks and register” plugin in FIJI is a useful tool for this.
Figure 11.6 Three examples of morphing operations which show the form relationship between fish genera. Thompson, D / Cambridge University Press / Public Domain.
235
236
11 Spatial Analysis
Figure 11.7 (A) Piranha with landmarks (green) identified. (B) Allometric coefficients (numbers) of different morphometric measurements showing how the landmarks relate to each other during growth. Modified from Zelditch, M.L., Swiderski, D.L., and Sheets, H.D. 2012. Geometric Morphometrics for Biologists: A Primer. Second Edition. Elsevier Science and Technology, Amsterdam. p. 4. Figure 1.2. Used with permission.
Figure 11.8 Ontogenic change in landmarks of a species of piranha. (A) Numbered landmarks on a species of piranha. (B) Vectors associated with the initial landmark showing the change during development. (C) Partial warps of the biorthogonal grid using thin plate splines of the difference between the first and last measured landmark coordinates. Zelditch et al. 2012 / With permission of Elsevier.
Allometric analysis uses changes in the relative position of the landmarks with measured distances between them (Figure 11.7) as organisms grow and develop. The relationships are less arbitrary if the structure is considered using an engineering method that calculates the amount of deformation for a given load using finite element analysis, a method usually used for solving partial differential equations. Deformation analysis shows the relationship between landmarks as deformations of the triangles that connect them. Both of these approaches measure shape change, not shape itself. Another technique in coordinate morphometrics is to superimpose a biorthogonal grid on the specimen and measure the distances and angles between landmarks in different species or growth forms. The alignment of the same landmarks uses translation, scaling, and rotation, processes that are shape invariant. The process minimizes the Procrustes distance, the square root of the sum of the squared distance, between the aligned landmarks. The Procrustes distance derives its name from Procrustes, a villain in a Greek myth who would invite people to sleep in his bed but make them fit exactly by either stretching them or trimming (!) them. Although Procrustes changed the shape of his victims, the alignment that minimizes the Procrustes difference does not. With a defined set of landmarks (Figure 11.8A), the Procrustes difference in position of each given landmark provides a measure of the shape change over time during development or between species during evolution (Figure 11.8B). Vectors show the magnitude as well as the directionality of the change during development. Another way to analyze grid-based changes in object shape is to use thin plate splines. Thin plate splines map non-uniform transformations as a sum of the partial warps (also called principal warps), orthogonal displacements of the landmarks (Figure 11.8C). These warps visualize form change as deformation, similar to the differences between species described by the geometric transformations of D’Arcy Thompson.
11.6 Principal Component Analysis and Canonical Variates Analysis Use Measures of Similarity as Coordinates
11.6 Principal Component Analysis and Canonical Variates Analysis Use Measures of Similarity as Coordinates Shapes change with evolution as well as with development. The difference in shape between species could be a measure of their evolutionary divergence. Historically, one of the motivations behind morphometric analysis was to show that sizes and shapes of the human skull or brain are morphological evidence for certain behavioral traits, such as intelligence, given as a single value, the intelligence quotient (IQ). However, many of the early studies had flaws stemming from cultural bias. A very accessible discussion of the problems and the foundations of morphometrics is in Stephen Gould’s The Mismeasure of Man. Besides revealing the problems of cultural and scientific bias, Gould points out that the pioneers of morphometrics were also the pioneers of statistics. It is enlightening reading because it also explains in relatively nonmathematical terms the method they discovered to compare data sets, principal component analysis (PCA). This, along with canonical variates analysis (CVA), can correlate species shape, finding the morphometric similarity among different species. These forms of statistical analysis are ordination methods; they use a measure of similarity as coordinates. PCA is a way to express the most similar components of two data sets. Figure 11.9, a 2D representation of PCA, shows that there is some scatter outside of the line of perfect correlation of two variables. If the scatter is a shallow ellipse, there
Figure 11.9 Principal component analysis (PCA). (A) Plotting two perfectly correlated variables, X and Y, against each other produces a line. The data points (cyan) deviate from the line but have high correlation (see Figure 11.21A). (B) The spread broadens as the correlation diminishes (see Figure 11.21B). (C) There is no correlation if they no longer form an ellipse or if they fall along the axes (see Figure 11.21C). (D) PCA finds the least-squares line (magenta), one that minimizes the sum of the squared distances to it. The distances are perpendicular lines from the data points to the least squares line. (E) Another ordinate, the second principal component (PC2), perpendicular to the least squares line, the first principal component (PC1), are the major and minor axes of an ellipse the defines the spread of the values. The data points have principal component scores that are their values on PC1 and PC2. (F) The principal component lines become the axes upon which similarity is plotted. The relationship of the scores to their original values is a cosine function of the angles α1 and α2. D–F are adapted from Zelditch et al. 2012.
237
238
11 Spatial Analysis
Figure 11.10 Principal components analysis of the shape of eight different species of piranha. Some of them are highly clustered, and others are relatively widely distributed with respect to similarity to the principal warps illustrated on the axes. PC1, first principal component; PC2, second principal component. Adapted from Zelditch et al. 2012.
is high correlation. Minimizing the sum of the squared distance from the points to the line of perfect correlation produces the line along which most of the similarity falls. This line of most similarity is the first principal component (PC1). The second principal component (PC2) is at right angles to PC1 (Figure 11.9D–F). PCA removes the original axes and just plots the principal components, but recovering the relationship to the original axes is possible by knowing the angular relationship of PC1 to axes x and y (Figure 11.9F). This approach also applies to multiple dimensions, not just two, in which case a multi-dimensional matrix represents the co-variance values, and the principal component axes are the eigenvectors of the matrix, a description of which requires further mathematical discussion (see the Annotated Images, Video, Web Sites, and References). Comparing the changes in the landmark coordinates between species reveals that some species share certain mutual changes in landmarks. In other words, in these species, the landmark changes correlate. For example, PCA can compare the relationships between the forms of several different species of piranha (Figure 11.10). In this case, the principal component axes represent a certain degree of deformation of the biorthogonal grid used to plot the landmarks. The axes do not represent an actual trait. Although they are calculated entities showing maximum similarity between two data sets, they have an unknown relationship to actual physical detail in the species. They have value as measures of morphometric similarity between species or other groups. Another way to show similarity between species is canonical variates analysis (CVA). This approach uses a map that aligns the axes to the principal components of the pooled variances of different sample populations (Figure 11.11B). Rescaling the principal component axes normalizes (makes circular) the within-group variances (Figure 11.11C). The final plot maps canonical variance to axes where the group means differ most (Figure 11.11C). An implementation of this approach (Figure 11.12) shows more clustering of the different species compared with PCA (see Figure 11.10). References to the open-source software for doing morphometric analysis are in Table 7.1, Section 7.2.
11.7 Convolutional Neural Networks Can Identify Shapes and Objects Using Deep Learning A convolutional neural network has multiple layers of mathematical operators, some of which are convolution operations (see Sections 8.9, 10.4, and 10.6; Figure 11.13). The first step performs a certain number of convolution operations on each color channel in the image. If there are four different convolution operations and the
11.7 Convolutional Neural Networks Can Identify Shapes and Objects Using Deep Learning
Figure 11.11 A two-dimensional graphical explanation of canonical variates analysis (CVA). (A) Plot of the correlation between attributes X and Y in three different populations. (B) Re-mapping (A) according to the principal components (PCs) of the pooled variances of the populations. (C) Non-linear rescaling of the PC axes normalizes the within-group variances, producing circles. Unlike the situation with the PCA, the CVA axes are recoverable only with a known scaling factor for each axis. CV1 and CV2 (red) are the axes where group means differ most. Diagram by L. Griffing.
Figure 11.12 Canonical variates (CV) analysis of eight populations of piranha plotted against axes that represent a collection of partial warps of shape. Adapted from Zelditch et al. 2012.
239
240
11 Spatial Analysis
neighborhood of the convolution is a 3 × 3 matrix, it is a 3 × 3 × 4 convolution layer for each color channel. Figure 11.13 only shows the blue color channel. The spatial frequency with which the convolution mask is applied to pixels in the input image is its stride. In the cases of convolution described so far (see Sections 10.4 and 10.6), every pixel in the input image serves as a target pixel, having a stride of 1. If every other pixel in the input image serves as a target pixel, the stride would be 2. Sometimes there is padding of the original with a boundary of zeros, so that the convolution steps work on the edges of the input image. Also, there is a non-linear adjustment in the convolution output. An example of this adjustment is the replacement of any negative values with zero. Another example is a sigmoidal adjustment of the convolution values. The next step down-samples the image with a rank filter (see Section 10.3). Figure 11.13 illustrates a down-sampling to 1/9th of the size of the original. In this case, the pixel that represents the previous nine is the value of the maximum rank filter. Other rank filters, such as median or mean, are possible. Reiteration of the first two steps further down-samples the image size while increasing the number of images in the stack. These iterations can include possible variations in the number of convolutions, the non-linear adjustment, and the nature of the rank filter in these third and fourth steps. Downsizing reduces the spatial size of the input image, thereby reducing the number of parameters and computations in the network. These convolution and pooling layers act as feature extractors. The fully connected layers in the neural net act as a classifier. An example of a fully connected neural network with several hidden layers is a multilayer perceptron. A trainable neural net then processes the scale invariant stack. It is a multilayer perceptron made up of multiple layers of perceptrons, algorithms for supervised learning of binary classifiers. It is a neural network, each node of which represents an activation function, which, for example, sums the input values from the other nodes (Figure 11.14). The activation function for the output layer is a SoftMax function, which takes vectors of arbitrary values and converts them to values between 0 and 1 that sum to 1. Each output value is then its contribution to the probable outcome. A multilayer perceptron uses backward propagation of errors. In this case, the training supervisor corrects the neural net when it makes a mistake. This only works, of course, when the supervisor knows what the output should be from a given input. The weights or numerical contributions of the input images to each node start out as random values. The output nodes then come up with a probable outcome for each output, which is probably initially wrong. The total error then backward propagates through the network by calculating the gradients between the output, hidden, and input layers and using an optimization method, such as gradient descent, to adjust the initial weights. In convolutional neural nets, back-propagation of errors adjusts the values (not size) of the convolution masks as well as the node weights. The net should now work better by minimizing the error of the predicted output values. With several training sessions, the probability of correct classification improves. This process is an example of machine learning, deep learning, or artificial intelligence. Image-based phenomics, or the information processing and technology involved in visually discriminating phenotypes, may use convolutional neural networks. For example, it can distinguish between different mutant lines of Arabidopsis thaliana (Figure 11.15). Other kinds of machine learning can distinguish medical phenotypes or implicit human disease states, not only based on imaging but also on other health measurements such as those recorded on a wearable fitness monitor.
11.8 Boundary Morphometrics Analyzes and Mathematically Describes the Edge of the Object Landmark analysis often uses points of homology, or structural similarity, between species. Boundary morphometrics uses the boundary of the object as reference. The notion that edges or boundaries constitute conventional speciation landmarks is incorrect in that boundaries may not describe homologous features. By using a boundary, there are more semi-landmarks, points along a continuous line instead of discrete, separate points. Chain coding, a form of encoding for image compression, can also identify points along a thresholded edge. Open-source software that automates thresholding and chain coding of edges is available, for example, in the program SHAPE. SHAPE also provides tools for elliptical Fourier approximation of shapes.
11.8 Boundary Morphometrics Analyzes and Mathematically Describes the Edge of the Object
Figure 11.13 Convolutional neural network for deep learning object identification. Step 1 divides the color image into three channels. Each channel, in this case, has three convolution operations to generate three separate convolved images. Step 2 employs a rank filter that downsizes the image by replacing the rank mask with a single pixel that is the value of the final rank operation. The one shown is a maximum value rank. Step 3 is another set of convolution operations. Step 4 is another set of downsizing rank operations. Step 5 uses a multilayer perceptron to learn training sets. The multilayer perceptron has multiple layers for supervised learning of binary classifiers. With several rounds of training, the outcome may consistently have a high probability of being correct. Diagram by L. Griffing.
241
242
11 Spatial Analysis
Figure 11.14 A neural network example with a single hidden layer. In step 1, randomly assigned weights for each activation function within a hidden layer contribute to the output, which has a value between 0 and 1 using a SoftMax function. The calculated error in the output term back propagates, providing an error term for the hidden layers. Open-source programs such as Google TensorFlow and Julia Flux calculate the gradient and revise the initial weights in step 2. Step 3 re-runs the programs with the new weights, with an improvement in the correctness of the response. Step 4 and later steps reiterate step 3 with new training information. Diagram by L. Griffing.
11.8 Boundary Morphometrics Analyzes and Mathematically Describes the Edge of the Object
Figure 11.15 Example training (red) and testing (blue) curves for classification of five different mutant lines of Arabidopsis thaliana. The testing curves shows lower errors than the training curve at smaller batch numbers. At higher batch numbers, overfitting occurs (statistical model contains more parameters than justified by the data). Ubbens, J.R. et al. 2017 / Frontiers Media / CC BY 4.0.
Elliptical Fourier functions can approximate complicated boundaries. Fourier analysis can describe wave forms as the sum of different frequencies, with more orders of harmonics producing a more accurate representation, as in the square wave example (see Section 8.9, Figure 8.17). Describing outlines of 2D or 3D forms as ellipses in frequency space requires two transforms, one for the x-axis, one for the y-axis in 2D, and additional one for the z-axis in 3D, with more orders of harmonics generating more specific outlines. Epicycles, the movement of the center point of a higher order ellipse around the perimeter of the next lower order ellipse, generates a more accurate shape contour as the number of smaller, harmonic ellipses increases (Figure 11.16). It is like coupling smaller and smaller gears together in an elliptical Spirograph. Simple Fourier analysis requires sampling along equal intervals in the edge. Elliptical Fourier analysis does not and satisfies the Nyquist criterion when the number of harmonics is less than half of the number of points identified on the edge by chain coding. Besides SHAPE, other software packages do elliptical Fourier analysis, as described at the end of the chapter. Another example of boundary morphometrics is in automatic karyotyping, the analysis of chromosome spreads. Manual karyotyping uses the relationships between landmarks identified by the user, usually banding pattern and position of centromeres relative to the telomeres. In automated karyotyping, there is no user intervention. One implementation of automatic karyotyping (there are many) calculates a curvature function of each pixel in a contour (black outline) of the chromosome (Figure 11.17). A fast Fourier transform of the curvature function produces a group of elements that define the feature vector for each chromosome (see Figure 11.17C). Matching the patterns of the feature vectors of the chromosomes identifies similar chromosomes. Early work on identification of individual dolphins or whales also used pattern matching of algebraic strings that describe the boundary along the trailing edge of a fin or a fluke. Fins and flukes often have scars and indentations that are unique to individuals. After manually tracing the fin (Figure 11.18) or morphologically processing and segmenting the image, matching the string of numbers that describes the curves on the fin or fluke aid in individual dolphin identification. Automated feature extraction using SURF and convolutional neural networks and color analysis overcome the time bottleneck and potential bias of operator involvement and is available in open-source software.
243
11 Spatial Analysis
2 Harmonics
700
200
A
700
200 0
300
x
600
900
B
3 Harmonics
1200
0
y
200
300
x
600
900
20 Harmonics
1200
700
700
200 0
C
2 Harmonics
1200
y
y
1200
y
244
300
x
600
900
0
D
300
x
600
900
Figure 11.16 Elliptical Fourier functions of increasing harmonics (i.e., resolution) that describe the shape of a cranium. (A) Two harmonics with the center of the second harmonic moving along the line of the first harmonic showing tracing of 850 of 1000 chain-encoded points along the line. (B) Competed second harmonic tracing with shaded cranium outline superimposed. Inset shows magnification of elliptical harmonics. Phasors are dashed lines; harmonics are gray lines. (C) Three harmonics. (D) Twenty harmonics. Caple, J et al. 2017 / with permission of Springer Nature.
11.9 Measurement of Object Boundaries Can Reveal Fractal Relationships
70 60 50 40 30 20 10 0 -10 -20 -30 0
A
80
0.68
60
0.64
40 20
0.60
0 0.56
-20 -40 -60
20 40 60 80 100120140160180200
B
0.52 0
20 40 60 80 100 120 140 160 180
C
1
2
3
4
5
6
7
8
9
10
Figure 11.17 Automated karyotyping using boundary morphometrics. (A) and (B) Homologous chromosomes and their curvature functions of their edge contours that describe them. (C) A comparison of feature vectors derived from the curvature functions after transforming the curves into frequency space. This process is independent of banding pattern, which is often used for manual karyotyping. Garcia, C.U. et al. 2003 / With permission of Springer Nature.
Figure 11.18 Dolphin identification using boundaries. (A) High-contrast photograph of a dusky dolphin dorsal fin. (B) Manual tracing of a fin. A and B mark the two most significant notches. From Araabi, B.N., Kehtarnavaz, N., McKinney, T., et al. 2000. A string matching computer-assisted system for dolphin photoidentification. Annals of Biomedical Engineering 28: 1269–1279. doi: 10. 1114/1.1317532.
11.9 Measurement of Object Boundaries Can Reveal Fractal Relationships Fractal dimension is how 2D a one-dimensional (1D) object is, or in general, how n dimensional a 1D object is. In the 2D case, it is the ability of a line to fill a plane. In Figure 11.19A, Koch snowflakes are objects that have the same area but different, increasing perimeters. They appear to fill the 2D space more as their perimeter increases. Increasing fractal dimensions cross the same beginning-to-end length but have increasing actual lengths (Figure 11.19E). A natural object has a fractal boundary if the logarithm of the perimeter decreases linearly with the increasing logarithm of the measurement tool (Figure 11.19F). This is an example of something that varies linearly with scale, as opposed to scale-invariant features. It is particularly important in biological systems. Mandelbrot wrote an influential work showing that many natural objects have a uniform increase in measured perimeter with increasing measuring resolution. One algorithmic approach to approximate scaling behavior is box counting, which uses a series of grid sizes (available as a plugin in ImageJ). From a grid overlaying the image, the box-counting routine counts the number of boxes of a given size that contain any pixels. If the logarithm of the perimeter increases linearly with negative logarithm of the grid spacing, the object has a fractal dimension that is the slope of the line. Such approaches provide analysis tools, which, for example, quantify rainforest fragmentation from aerial images of rainforests.
245
246
11 Spatial Analysis
Figure 11.19 (A) to (D) Koch snowflakes (Koch curve, Koch star, or Koch island) objects that fill the plane more and converge to a finite area but have infinitely increasing perimeters. The large perimeter objects also have high lacunarity. CC BY-SA 3.0, https:// commons.wikimedia.org/w/index.php?curid=1898291. (E) Fractal dimension of different line shapes. Note that they have the same end-to-end horizontal distance but different lengths. (F) Fractal objects have a linear relationship, in this case with slope –D, between the log size of the measuring tool and the log of the measured perimeter. Diagram by L. Griffing.
Lacunarity is the size distribution of the gaps or “holes” in a fractal dimension. If a fractal has large gaps or holes, it has high lacunarity. Objects with low lacunarity look the same when rotated or displaced; they have translational and rotational invariance. An apple looks the same when rotated around the axis defined by its core, but an apple with a bite out of it (higher lacunarity) looks different.
11.10 Pixel Intensity–Based Colocalization Analysis Reports the Spatial Correlation of Overlapping Signals Macro-level spatial analysis often relates to shape. It can also be a proximity measure that analyzes the relationship between landmarks. Widely used macro analysis programs, such as those for facial recognition, use scale-independent proximity measures to discriminate individuals with 2D warping algorithms. Micro-level (1 nm–50 µm) spatial analysis relates to the structure and function of cellular systems. Fluorescent proteins or probes that label a certain cell type of subcellular compartment are fiducial markers for colocalization studies. Colocalization studies ask the question, “Does a protein or probe of unknown location colocalize with a protein or probe of known location?” Alternatively, the question could be one about the possibility of interaction of two proteins, “Where do the fluorescent proteins, or the organelles that they label, colocalize and potentially interact?” Consider three different scales over which cells or cell components interact (Figure 11.20). At the smallest scale, 1–10 nm, proteins directly interact with and influence each other. Mixtures of indirectly interacting proteins form 10- to 100nm molecular complexes with specific associations and functions. For example, clathrin-coated vesicles carrying membrane and cargo are between 50 and 100 nm in diameter. Other, larger organelles and liquid phase aggregates functionally interact over scales of 100–500 nm, with vesicles fusing to form larger membrane systems and transferring material to other regions of the cell. Organelle associations with each other and with motors and cytoskeletal elements that have extended lengths longer than 100 nm, reach across the cell, generating an organelle interactome. Approaches to examining the organelle or protein (or, more generally, macromolecule) interactome use colocalization routines that either
11.10 Pixel Intensity–Based Colocalization Analysis Reports the Spatial Correlation of Overlapping Signals
Figure 11.20 Three different domains of interactomics. At distances less than 10 nm, there is physical interaction between proteins. At distances between 10 and 100 nm, proteins form molecular complexes. At larger distances, resolvable with conventional light microscopy, organelle and cellular interactions are visible. From Lagache, T., Grassart, A., Dallongeville, S., et al. 2018 Mapping molecular assemblies with fluorescence microscopy and object-based spatial statistics. Nature Communications 9: 698 doi: 10. 1038/s41467-018-03053-x. Creative Commons Attribution 4.0 International License.
correlate the intensity of pixels in two different color channels or measure the distance between the image pixels of samples labeled with two or more different labels. This section covers intensity-based colocalization, while the next covers distance-based colocalization. Starting with correlation, if green and red represent the signals of the different proteins, then the plot of the red color channel histogram against the green color channel histogram reveals the extent of their intensity correlation. As in Figure 11.9, clustering along the line shows high correlation (Figure 11.21A). If there is less correlation, there is scatter along the line (Figure 11.21B). In the absence of correlation, the curves cluster separately along the red and green axes (Figure 11.21C). Measurements of the correlation include the Pearson correlation coefficient (PCC) and its Costes probability (CP), the Spearman’s ranked correlation coefficient (Sr), and the Manders overlap coefficients (MOC) (Figure 11.22). The MOC has two values, the proportion of red label correlated with green (M1) and the proportion of green label correlated with red (M2). All show very low values when the labels are separate (see Figure 11.22A). They also show similar values in the absence of noise (see Figure 11.22B and H) and at low intensities (see Figure 11.22C). However, the inclusion of noise (see Figure 11.22E and F), or background label (see Figure 11.22D), changes the MOCs considerably but not the PCCs. Furthermore, the PCCs are more sensitive to overall colocalization, producing lower values when there are multiple objects in one of the channels that do not colocalize (see Figure 11.22G). Noise and bleed-through (see Section 17.10) of fluorescence in one channel (e.g., green) into the another (e.g., red) occur primarily at low intensity values along one or both axes. Recognizing this, the automatic Costes segmentation isolates the region that has a PCC above a certain value. It does this by calculating the correlation using a linear least-squares fit of the pixel intensity plot (Figure 11.23D). Then using the slope and intercept of that line, it calculates the PCC for regions of the graph at decreasing values of intensity. The threshold intensity is where the PCC is above a certain value. Reconstructing the image from those pixels that occupy the thresholded region of interest (ROI) (Figure 11.23E) generates an image of the features that colocalize (Figure 11.23F). Another way to generate an image of colocalized features is to segment the merged color image (Figure 11.23C) in HSV (hue, saturation, and value) color space. Thresholding the hue to a value of strong colocalization (a range of yellow and orange when comparing red and green channels), while thresholding intensities (V values) with values above the noise level provides a measure of colocalization. Selecting V values above a certain PCC threshold makes this approach similar to Costes segmentation while providing an image of the colocalized features.
247
248
11 Spatial Analysis
Figure 11.21 Colocalization and their respective red pixel/green pixel correlation plots. (A) Golgi label in maize root cells duplicated in the red and green channels and superimposed. Insets show blowup of regions. Correlation is high. (B) HeLa cells treated with microtubule end-binding protein 1 (EB1-label green) and microtubule cytoplasmic linker protein–170 (CLIP-170) label (red). Correlation is there but scattered. (C) HeLa cells with mitochondrial label (green) and nuclear label (red). There is exclusion and no colocalization. From Bolte, S. and Cordiliéres, F.P. 2006. A guided tour into subcellular colocalization analysis in light microscopy. Journal of Microscopy 224: 213–232. Used with permission.
Figure 11.22 Simulated colocalizations with statistical correlation coefficients from Coloc 2 plugin in ImageJ/FIJI. (A) Separate, no noise. (B) Overlapping, no noise. (C) Overlapping, Low Intensity. (D) Overlapping, High, uniform green background. (E) Overlapping, Low green noise. (F) Overlapping, High green noise. (G) Overlapping, Multiple non-overlapping objects. (H) More extensive overlap. Bracket values are with the Costes threshold (see Figure 11.23). CP, Costes probability value for the Pearson Correlation Coefficient, PC, with > 0.95 being probable, lower being improbable; M1, Manders overlap coefficient channel 1 (proportion of red label coincident with green; 0 = no colocalization, 1 = perfect colocalization); M2, Manders overlap coefficient channel 2 (proportion of green label coincident with red); PC, Pearson correlation coefficient (–1 = exclusion, 0 = random, 1 = total colocalization); Sr, Spearman’s rank correlation value, in which intensities are replaced by the order in which they occur. Diagram by L. Griffing.
11.10 Pixel Intensity–Based Colocalization Analysis Reports the Spatial Correlation of Overlapping Signals
Figure 11.23 Colocalization of brain proteins using Manders overlap. Labeling of cryosections of the rat cerebellum with green fluorescent antibodies to calbindin (A) compared with labeling with red fluorescent antibodies to cystatin B (B). The merged image (C) shows colocalization as a yellow signal in the Purkinje cells (PCs) but not in the granular layer (GL) or molecular layer (ML). This is not, however, a quantitative measure of the colocalization. Scale bar = 40µm. (D) The double histogram correlation plot, with Costes thresholds (green and red lines), establishing a Costes ROI. (E) The correlation plot of A versus B with the Costes region of interest (ROI) identified. (F) Image of the objects identified in the Costes ROI. D is from Bolte, S et al. 2006 / With permission of John Wiley & Sons. The other images are from Riccio, M., Dembic, M., Cinti, C., et al. 2004. Multifluorescence labeling and colocalization analyses. In Methods in Molecular Biology. Vol. 285: Cell Cycle Control and Dysregulation Protocols. Ed. by A. Giordano and G. Romano © Humana Press Inc., Totowa, NJ. pp. 171–177. Used with permission.
249
250
11 Spatial Analysis
11.11 Distance-Based Colocalization and Cluster Analysis Analyze the Spatial Proximity of Objects Besides using the overlap of pixels with different colors and intensities, spatial colocalization also uses the distance between objects to determine their coupling. Coupling occurs if the distance between objects or features identified with various segmentation algorithms is consistently within a certain short range. This dichotomy between intensity-based correlation approaches for colocalization and feature-based coupling is similar to the dichotomy of feature-based and intensity-based motion estimation (see Section 12.3). For measurements of proximity and clustering, the open-source program Icy has some useful plugins that demonstrate these approaches. The first is statistical object distance analysis (SODA). It first uses a spot detection program that identifies objects and provides the spatial coordinates. A marked point process maps the objects (Figure 11.24A), in which the mark is an ensemble of object attributes (color, shape, and so on), and the point process is a statistical approach to the localization of objects as a subset of points randomly located within the region of interest. A Ripley’s K function for the distance parameter r is proportional to the red objects within a distance r from green objects. Incremental subtractions of the K function produce a set of rings, the boundaries of which have increasing r values. When red objects are coupled to green objects, they will be enriched in a subset of rings. In Figure 11.24B, it is the first two sets of rings around the green objects. This probability analysis identifies coupled objects and allows calculation of the mean distance between them. A colocalization negative control and positive control illustrates this approach (Figure 11.25). The negative control is the colocalization of interleukin-2 receptor (IL-2R) with clathrin-coated pits. Cy3-labeled antibodies (see Section 17.3, Figures 17.5 and 17.8) against IL-2R should not colocalize with green fluorescent protein (GFP)–labeled clathrin pits because they reside in sterol-enriched lipid rafts not internalized by coated vesicles. As a positive control, Alexa-568–labeled antibodies against transferrin should colocalize extensively with GFP–clathrin pits because transferrin is taken up by coated vesicles via receptormediated endocytosis. Comparing the coupling values with the MOC and PCC colocalization (Figure 11.25B and D), the coupling analysis does a better job on the negative control than either the MOC and PCC colocalization correlation analyses, while producing similar values for the positive control. This brings up the possibility of false positives with MOC and PCC.
Figure 11.24 Statistical object distance analysis. example. (A) Cultured mouse hippocampal neurons fixed and labeled with secondary antibodies to primary antibodies against the neuronal proteins Homer 1 (green) and post-synaptic density 95 (PDS95) (red) and observed with a confocal microscope. Marking separate sets of objects with a marked point records point features, including shape and color localized to their center of mass. The gray outline in the detection channel shows the cell boundaries, and spot detection produces maximum intensity mapping of spots marked with a +. (B) Incremental subtraction of the distance parameter r in Ripley’s K function produces a set of concentric rings. It also corrects for boundary conditions. Probability analysis identifies the coupling of spots. From Lagache, T., Grassart, A., Dallongeville, S., et al. 2018. Mapping molecular assemblies with fluorescence microscopy and object-based spatial statistics. Nature Communications 9: 698. doi: 10. 1038/s41467-018-03053-x. Creative Commons Attribution 4.0 International License.
11.11 Distance-Based Colocalization and Cluster Analysis Analyze the Spatial Proximity of Objects
Clathrin IL-2R
% coupling 50 40 30 20 10 0
A
Negative control : Clathrin (GFP) -IL-2R (Cy3)
Clathrin Transferrin
B
SODA
Manders
Pearson
Manders
Pearson
% coupling 50 40 30 20 10
C
Positive control : Clathrin (GFP) - Transferrin (Alexa 568)
0 D
SODA
Figure 11.25 Comparison of statistical object distance analysis (SODA) values with Manders overlap coefficient (MOC) and Pearson correlation coefficient (PCC) in a clathrin–green fluorescent protein (GFP) transformed Hep2beta human adenocarcinoma cell line using total internal reflection fluorescence (TIRF) microscopy. (A) Marked point detection of the negative control of coupling between interleukin-2R (IL-2R; red) and clathrin (green). In the insets, the Spot detector in Icy extracts fluorescent spots of IL-2R labeled with Cy3-labeled antibodies and clathrin–GFP. A white line highlights the cell boundaries. (B) SODA does not detect coupling between IL-2R and clathrin. Percentage = 2.41 ± 0.6% (standard error of the mean) (P = 0.085), in contrast with the overlap reported with PCC, 21.9 ± 5.97%, P with pixel scrambling = 2.8 10−6, and MOC, 12.6 ± 1.04%, P with pixel scrambling = 0.0012, coefficients. (C) Marked point detection of the positive control of coupling between transferrin (red) and clathrin (green). (D) SODA, MOC, and PCC have a statistically significant difference of coupling between transferrin and clathrin; SODA, 36.5 ± 1.49%, P = 1.54 10−16; Manders, 31.7 ± 2.38%, POptic flow). (D) Vector map outcome of integral block matching using PMCC. (FIJI Plugins>Optic flow). (E) Correlation map of displacement vector peaks. (FIJI. Analyze>Optic Flow>PIV. (F) Vector map of cross-correlation analysis (FIJI. Analyze>Optic Flow>PIV). Figure by L. Griffing.
12.5 The Kymograph Uses Time as an Axis to Make a Visual Plot of the Object Motion
Figure 12.14 Two-dimensional (2D) kymograph of moving Golgi bodies. A narrow 2D slice of a region where the Golgi bodies are moving at the first time, t1, is used as the first time point on the x-axis. The x-axis can be calibrated in seconds based on the time duration of each frame from which each slice is taken. The moving Golgi bodies can be seen forming a line with a slope that is its velocity in one direction. Steeper slopes show relatively fast movement (very fast Golgi bodies) compared with shallower slopes (fast Golgi bodies). Figure by L. Griffing.
One of the first uses of a kymograph was a strange form of imaging, taping a pen to a bean leaf and placing it next to a rotating drum. As the leaf rose and fell over the period of a day, it plotted its own circadian rhythm. An adaptation of this approach, monitoring the changes in time-lapse images, can reveal changes in plant growth and rhythm. Large-scale time-lapse phenomic analysis of plant growth can discover the genes or gene networks involved. Kymographs always use time as one of the image dimensions. To make a kymograph showing the movement of Golgi bodies within a defined rectangular region, line up the rectangles along the time, x-axis, with each rectangle representing a region of the cell imaged every 1.54 seconds (Figure 12.14). In this case, two Golgi bodies move within the frame. One shows a shallow constant slope, or velocity of movement, a fast Golgi body. The other shows a slow rate of movement for 40 seconds (flat curve), then very rapid movement, becoming a very fast Golgi body. Although 2D kymographs give the rate of motion in one direction, 3D kymographs give the rate of motion in two directions (see Figures 12.1 and 12.15). This is particularly useful for visualizing motion within a large plane rather than in a small slice of the plane. It uses standard 3D ray tracing or volume rendering (as described in the next chapter) to show the rates of motion. In this case, the axes on the volume would be x-y-t instead of x-y-z for a spatial volume.
12.6 Particle Tracking Is a Form of Feature-Based Motion Estimation
Figure 12.15 Rotation of a three-dimensional (3D) kymograph of moving Golgi bodies (mG1 and mG2) and stationary Golgi bodies (sGs) around the third dimension, time, axis. The last rotation to near 90 degrees shows movement in the x-y plane by a clear slanted line (mG2) and by separated spots (mG1). The sGs make up a straight line throughout the volume. Red is the bounding box. The time dimension is 60 1-second images. Figure made with the 3D Viewer plugin in ImageJ by L. Griffing.
12.6 Particle Tracking Is a Form of Feature-Based Motion Estimation The development of highly fluorescent markers, such as fluorescent proteins (see Section 17.2, Table 17.1) and quantum dots (see Section 17.3, Figure 17.7), provides a gateway to single particle tracking (SPT). Small vesicles, molecular assemblages, or even single molecules appear as small, high-contrast, trackable dots or punctae. Finding corresponding particles between frames is the main challenge and depends on the speed of frame acquisition, particle density, particles merging or splitting, and particles leaving the frame for a period of time with changes in focus of the camera or focal plane of the particle. Manual segmentation or particle identification based on human vision is the basis of a variety of manual tracking programs (e.g., Manual Tracking plugin in ImageJ). Automatic spot detection such as those in Icy’s spot detector and in the ImageJ plugin TrackMate use size and brightness on preprocessed images (Laplacian of Gaussian [LoG] or difference of Gaussian [DoG]; see Section 10.4) to identify punctae. Both can calculate important features of the track, such as instantaneous velocity (the speed from one frame to the next), the standard deviation of the particle velocity (high standard deviations characterize saltatory or “jumping” motion), the mean and median track velocity, and track length. Diluting the probe or improving the signal-to-noise ratio in the acquisition overcomes some of the particle correspondence problems. Other solutions include rigorous algorithms that use information about particles from kymograms or assign tracks to well-characterized particles using temporal optimization such as developing cost matrices for all the linkages of particles from frame 1 to frame 2 to frame n (upper right in Figure 12.1) and for the merging and splitting of particles (lower right in Figure 12.1). The temporal optimization approach uses a mathematical framework called the linear assignment problem (LAP) (Figure 12.16). An assignment problem assigns an agent to a task at a cost. It calculates the maximal number of tasks achieved by unique agent–task pairs at a minimal cost. If the number of agents and tasks is the same, it is a balanced assignment problem. If the total sum of the costs for each agent is the same as the total cost of the assignment, then it is a linear assignment problem. In this case, the assignment is to link particles or spots between frames. Calculation of the cost for linking particles includes the square of the distance between them and their difference in intensity and size. The cost calculation for frame-to-frame particle linking (a balanced linear assignment problem) is separate from the cost calculation for particles merging or splitting (an unbalanced linear assignment problem; see Figure 12.16). The algorithm is set up so that local assignments compete with each other in cost. Although lower local assignments tend to win this competition, other assignments can win to minimize the global cost of the assignment problem. A plugin in ImageJ, TrackMate, implements this approach. To run the program, the spatial and time characteristics of the image sequence needs careful calibration. The program provides options for the preprocessing filters, LoG or DoG filters, an intensity threshold, a size threshold for particle identification, and median filtering to reduce noise. Setting the confidence level of the spot assignment refines the selection process. Finally, a spot check based on intensity and confidence level previews the spots prior to tracking them. Manually editing the spots is possible but introduces problems with reproducibility (i.e., different users identifying different spots). TrackMate calculates tracks of identified particles using either a simple LAP algorithm (without merging or branching) or the standard LAP algorithm. In addition, it uses a Kalman filter (running average) and nearest neighbor search
269
12 Temporal Analysis
Image sequence
Step 0
Detection
Particle properties (such as positions and intensities) per frame
Step 1
Frame-to-frame particle linking
Track segments
Gap closing, merging and splitting Step 2
270
Complete tracks
Figure 12.16 The three steps used in the linear assignment problem program. The second step eliminates the computationally “greedy” step of developing a link between all possible identified spots between frames (multiple hypothesis tracking) by replacing it with a cost function for certain linkages. The track segments generated are then analyzed with another to another cost matrix to place gaps and merge or split paths. From Jaqaman et al. 2008 / With permission of Springer Nature.
algorithms. These calculations require manual entry of the maximal distance between linked particles and the maximal number of frames in which the particle disappears. After track assignment, the last calculation step determines possible merging or splitting events of particles. The calculated tracks display as an overlay or layer using either the track index (order of identification of a track in the image sequence) or other track properties (Figure 12.17). Comparing the tracks in Figure 12.17 with the 3D kymograph at near 90 degrees of rotation (see Figure 12.15) and the simple pixel-based DFD (see Figure 12.12) reveals that many of the paths identified in the tracking algorithm are those that show maximal movement in the kymograph and the DFD (green paths). Hence, most, but not all, of the movement visualized in optical flow estimation represents the movement of single particles (Golgi bodies), not movement or flow of pixels in the tubules (ER). Optical flow pixel-based DFD detects nonparticulate flow in tubules independent of the Golgi body. Using the local tracking overlay features (Figure 12.18) to view the same time sequence, as shown in Figure 12.11, reveals that single particle tracks do not occupy all of the green area in Figure 12.12. The large amount of data provided on the tracks by programs such as TrackMate provides in-depth analysis and visualization opportunities. The program saves track statistics in data tables formatted for spreadsheet programs such as Microsoft Excel. The program also directly plots variables such as track length and time of entry into the field of view or plots the relationship between variables such as track duration, track maximum speed, track median speed, or track standard deviation. These tracking and movement programs are 2D. However, all cells are 3D, and complete tracking requires a 3D option. The greatest problem with tracking in 3D is acquisition time. Getting 3D images within the time frame necessary for high temporal resolution is often problematic, particularly for conventional point-scanning confocal microscopy. Spinning disk confocal microscopes acquire images more rapidly (see Sections 6.9 and 17.8). Wide-field non-confocal fluorescence image acquisition is intrinsically more rapid than spinning disk confocal acquisition. With improvements in deconvolution, it may provide similar but more rapidly acquired, data. However, light sheet microscopy (see Section 18.4) is the most rapid, particularly when combined with high-speed cameras (see Section 12.1). Light sheet microscopy also limits the photon dose absorbed by the specimen (see Section 18.3, Table 18.1), which can be toxic to cells over the longer term (hours to days). Using light sheet microscopy of the development of embryos of model organisms, such as zebrafish, fruit flies, and mice (Figure 12.19), the movement of cells (or nuclei in the coenocytic early embryo of fruit flies) shows positional rearrangements that coordinate with cell fate. The amount of data acquired in these experiments is large (10 terabytes per experiment), so data management and image compression are important. A machine learning implementation of tracking with Gaussian mixture models (TGMM 2.0) tracks cells in a tissue using an open-source program for identification of cell volumes using “supervoxels.” Statistical vector flow improves the cell tracking in TGMM by estimating the direction and magnitude of flow in four dimensions (4D;
12.6 Particle Tracking Is a Form of Feature-Based Motion Estimation
Figure 12.17 Assignment and display of tracks by track index in TrackMate. Figure by L. Griffing.
Figure 12.18 The local track display feature in TrackMate showing only the tracks assigned to the first 10 frames. Figure by L. Griffing.
see Figure 12.19F). After combining multiple samples, the technique provides a dynamic image of cellular differentiation (see Figure 12.19H) and maps of morphogenetic flow, a Mercator projection that maps the flow to the surface of the embryo using surface u-v coordinates (instead of object x-y-z coordinates; see Section 13.6).
271
272
12 Temporal Analysis
Figure 12.19 Workflow for cell tracking during mouse embryogenesis. (A) Light sheet acquisition of data sets from developing embryos. (B) Image compression. (C) Synthesis of images from multiple views using two objectives in the light sheet microscope. (D) Use of a neural network for determination of divisions. (E) tracking with Gaussian mixture models (TGMM 2.0) implemented with machine learning to track cells as they move. (F) Statistical vector flow provides a temporal context for determining cell trajectories during tracking. (G) Combining the data sets from multiple embryos requires the TARDIS software (also open source). (H) Final fate map of cells. (I) Mercator projection of cell movements. GPU, graphics processing unit. From McDole, K. et al. (2018) / With permission of Elsevier.
SPT can track particles such as cells, organelles, or fluorescent molecules individually. In the absence of other driving forces, their thermal movement is likely to be a random walk, with their rate of movement determined by their size, their inertia, and the viscosity of the environment. The ratio of the inertial forces to viscous forces is the dimensionless Reynolds number. Particles in a high Reynolds number environment move by turbulent, random flow. Particles in a low Reynolds number environment move by laminar, directional flow. For the small particles of cell and molecular biology, much of the movement occurs in a low Reynolds number environment, where the initial conditions of the particle determine direction of flow, and particle movement becomes reversible. In other words, the probability of returning to the initial condition is high under equal but opposite forces. The turbulent, random flow of particles in high Reynolds number environments is a form of diffusion. There are other, different forms of diffusion. In a synthetic 2D lipid bilayer, molecules move randomly with thermal, or Brownian, forces, producing a random walk. In this case, the 2D distance that a particle diffuses, measured as the mean squared displacement (similar to the MSE discussed earlier), is related linearly to the diffusion coefficient (D; Figure 12.20A). Another
12.7 Fluorescence Recovery After Photobleaching Shows Compartment Connectivity and the Movement of Molecules
form of diffusion is the directional flow down a concentration gradient that is proportional to the magnitude of the concentration gradient (Fick’s law). In natural biological membranes, membrane microdomains, active transport, and interaction with cytoskeleton and other membrane or cytosolic proteins alter the movement of proteins and lipids. This generally slows the effective rate of 2D diffusion within the plane of the membrane down (Figure 12.20B), producing anomalous diffusion. When superimposed forces produce directional bulk movement, or advection, then the movement rate speeds up (Figure 12.20C) by combining advection and diffusion. Bulk flow occurs when some molecules moving by energy expenditure drive the movement of other molecules through viscous or other indirect interactions. At low Reynolds numbers, there is less spreading or diffusion, and advection controls the movement. For example, the interior particles in the vacuole of the giant algal cell move by advection. Fluorescence microscopy SPT can determine the relative rates of motion of membrane components when different fluorophores or fluorescent beads mark them. This approach can detect single molecular motors moving in opposite directions along a single cytoskeletal bundle. It can also detect when advection drives directional movement.
Figure 12.20 Diffusion, anomalous diffusion, and diffusion with advection in two dimensions. (A) Random walk as diffusion. (B) Random walk impeded by objects in anomalous diffusion. (C) Movement of particle with diffusion and advection. (D) Mathematical definition of the mean squared displacement (MSD). (E) The diffusion coefficient of the particle can be directly determined by the mean squared displacement. Knowing the diffusion coefficient provides a means to calculate the velocity of advection and the power coefficient of anomalous diffusion (a).Adapted from Moens, P.D.J (2015).
12.7 Fluorescence Recovery After Photobleaching Shows Compartment Connectivity and the Movement of Molecules A convenient and more common technique to determine the movement of groups of membrane or organelle constituents that is intensity based and not feature based is fluorescence recovery after photobleaching (FRAP). Following a highintensity photobleaching light pulse in a region of interest (ROI), adjacent fluorescent, non-bleached, molecules move into the bleached area at a certain rate. Photobleaching switches the fluorescent molecules from “on” to “off” by irreversibly changing the bonds so that they no longer fluoresce. Consequently, the re-appearance of fluorescence light is not the consequence of bleached molecules becoming fluorescent but of unbleached molecules moving into the ROI. FRAP quantifies the recovery, measuring the rate and amount of return of the fluorescence intensity into the photobleached area (Figure 12.21A). Normalizing the initial intensity in the ROI to 1, a brief photobleaching pulse across the entire ROI drops the intensity to a low fractional value. The half-time of recovery is proportional to the effective diffusion coefficient (Deff) of the fluorescent molecule. Calculation of the half-time of recovery is possible following fitting the recovery to an exponential curve (no binding within the photobleached region and no photobleaching with observation) or a modified exponential curve (some binding within the photobleached region or photobleaching with observation). Curve-fitting programs are available from various software vendors (GraphPad or MATLAB) or using a web-based open-source interface, EasyFRAP (see Annotated Images, Video, Web Sites, and References). Calculations of Deff vary and often use equations based on limited or erroneous assumptions. Calculation of a one-dimensional (1D) Deff for membrane systems over a strip of photobleached region across the width of a cell is one of the better approximations. I(t ) = Ifinal (1 − w ( w 2 + 4πDt )1/2 ),
(12.2)
in which I(t) is the intensity as a function of time (with t zero being the midpoint of the bleach), w is the strip width, t is the time of recovery, D is the Deff for 1D, and Ifinal is the fluorescence intensity corresponding to the asymptote of the mobile fraction. This equation does consider the cell as a 3D structure, but Deff is probably less than the true diffusion coefficient. The term Deff represents movement of molecules into the photobleached ROIs in living systems. Usually, it is not by just simple diffusion. Deff can incorporate anomalous diffusion, including transient binding to regions in the ROI or complexity of
273
274
12 Temporal Analysis
3D structures within the ROI; active diffusion that is faster but non-directed; and energy-expending, movement, advection, and directional, motorized, movement along tracks, filaments, or tubules or through nearly 1D pathways such as the tubules of the ER. Mathematical modeling can distinguish between these contributions to Deff. All these movements can be in 1D to 3D. If the photobleached ROI is an individual organelle, or subcellular compartment examining the rate of movement into the photobleached ROI is a measure of the connectivity between the photobleached and non-photobleached regions.
Figure 12.21 (A) Fluorescence recovery after photobleaching (FRAP). Photobleaching the region is very quick (milliseconds), and the recovery into that region is monitored over time. Fitting the recovery to a first- or second-order exponential allows calculation of the half-time of recovery, as shown in the accompanying graph. If the recovery is not complete, as shown in the graph, then an immobile fraction was photobleached that is not replaced by fluorescent molecules moving into the photobleached area. (B) Fluorescence loss in photobleaching (FLIP). Continuous photobleaching of a region outside of the region of interest (ROI; red circle) that is measured for fluorescence. Because the region in the red circle is connected to the large green shape region, it loses its fluorescence with time as shown by the graph. (C) Inverse FRAP (iFRAP) photobleaches one ROI (the green shape excluding the red circle) while measuring the fluorescence in another ROI (red circle). In the graph, the green shape FRAP and the red circle iFRAP loss of fluorescence are shown. Diagram by L. Griffing.
12.7 Fluorescence Recovery After Photobleaching Shows Compartment Connectivity and the Movement of Molecules
Another way to determine connectivity of organelles containing a fluorescent molecule is to continue to photobleach an ROI and examine the loss over time in surrounding structures, or fluorescence loss in photobleaching (Figure 12.21B). An example is the connectedness of the ER lumen in cells. Photobleaching a single small ROI of the ER over time eventually results in the loss in the fluorescence of the entire ER network. iFRAP or inverse FRAP photobleaches an ROI, then measures the intensity change in another ROI (Figure 12.21C). Differences in the fate of the same fluorescent protein coming from different subcellular compartments can be determined using iFRAP (Figure 12.22). In this example, the nuclear protein green fluorescent protein (GFP)–LC3 moves out of the nucleus and becomes associated with punctate organelles, autophagosomes. However, if the nuclear GFP–LC3 is photobleached, the cytoplasmic GFP–LC3 does not become punctate. Examination of multiple ROIs surrounding the photobleached ROI can reveal the directionality of movement (Figure 12.23). In this case, the movement of the “black,” photobleached protein manifests itself as a lag in recovery from photobleaching or a slower recovery from photobleach in the surrounding regions. The FRAP is done on a central tubule (green curve 12.23B), and
Figure 12.22 Inverse fluorescence recovery after photobleaching (iFRAP) experiments showing that a nuclear and cytoplasmic LC3 redistributes differently after iFRAP. Red outlines show the regions of photobleaching. Under starvation conditions, green fluorescent protein (GFP)–LC3 puncta (autophagosomes) can form in the photobleached cytoplasm, whereas after photobleaching the nucleus, no GFP–LC3 puncta form in the nonphotobleached cytoplasm. Scale bar = 5 µm. From Huang R., Xu, Y., Wan, W., et al. 2015. Deacetylation of nuclear LC3 drives autophagy initiation under starvation. Molecular Cell 57: 456–466. Used with permission.
Figure 12.23 Analysis of movement of photobleached green fluorescent protein (GFP) in endoplasmic reticulum (ER) tubules using fluorescence recovery after photobleaching (FRAP) and inverse FRAP (iFRAP). (A) Persistent tubule network in the cortical cytoplasm of a cell. Scale bar = 2 µm. (B) Regions used for FRAP and iFRAP. FRAP analysis on the main branch, which contained the region of photobleaching (smaller than the region of interest [ROI] shown), precedes iFRAP analysis on two upper branches, 1 and 2, and a lower branch. The background ROI normalizes the data and accounts for any overall photobleaching that results from viewing. (C) Recovery FRAP for main branch and iFRAP for peripheral branches. Note the time lag in the appearance of upper branch 1, indicating slower movement of the photobleached GFP into that tubule, compared with upper branch 2 and lower branch. The very fast recovery in the lower branch is from a very small amount of photobleached GFP entering this branch during photobleaching and quickly flowing away from this branch into the upper branches via the main branch. Data and diagram by L. Griffing.
275
276
12 Temporal Analysis
Figure 12.24 Lamin B receptor–green fluorescent protein fusion protein (LBR–GFP) exhibits differential mobility when localized in the endoplasmic reticulum (ER) versus the nuclear envelope (NE). The left panel shows photobleach recovery within ER membranes, and the right panel shows photobleach recovery within the NE. Bars = 10 µm. Adapted from Ellenberg, J. and Lippincott-Schwartz, J. 1999. Dynamics and mobility of nuclear envelope proteins in interphase and mitotic cells revealed by green fluorescent protein chimeras. Methods 19: 362–372. Used with permission.
iFRAP monitors the fluorescence in the surrounding tubules (purple, red, and blue; see Figure 12.23B). Movement into the upper branch tubule 1 is slowest because the iFRAP curve lags the others, indicating that there is directionality of flow through the network. FRAP can also quantify bound, immobile molecules in the photobleached region because they do not recover but remain dark, bound, and immobile. An excellent example of the use of FRAP to examine the mobility of the same protein in different subcellular compartments is the movement of the lamin B receptors targeted to the nuclear envelope or the ER (Figure 12.24).
12.8 Fluorescence Switching Also Shows Connectivity and Movement Instead of photobleaching a fluorophore, photoactivation by light can convert a non-fluorescent molecule into a fluorescent molecule. Photoactivation is possible for a variety of caged molecules, which are molecules with a lightreactive group that, when illuminated, photoconvert to a biologically active form or become fluorescent. For fluorescence studies, expressing the genes for fluorescent proteins, such as GFP, as fluorescence tags on proteins targeted to certain regions of the cells reports the location of the tagged proteins (see Section 17.2, Table 17.1). Covalently linking GFP to the protein of interest occurs by insertion of the gene sequence of GFP adjacent to or sometimes within the sequence of the protein of interest, thereby making a fusion protein upon translation of the mRNA. Instead of using the fluorescent GFP, however, modified versions of GFP are available that become fluorescent when photoactivated with violet light, photoactivated GFP (PAGFP; Figure 12.25). The photoactivated form behaves the same as the photobleached form (see Figures 12.22 and 12.26). Illuminating the nucleus of the cytoplasm of the cell containing the non-fluorescent PAGFP–LC3 with violet light (see Figure 12.26) produces the same result as selectively photobleaching the cytoplasm of a cell containing GFP–LC3, with nuclear PAGFP–LC3 forming clusters in the cytoplasm and cytoplasmic PAGFP–LC3 moving to the nucleus without clustering.
12.8 Fluorescence Switching Also Shows Connectivity and Movement
Figure 12.25 Fluorescent protein photoactivation, photoconversion, and photoswitching behaviors. (A) Molecular model of green fluorescent protein (GFP) showing the barrel shape and internal structure. (B) The position of the fluorophore in the barrel-shaped structure of green fluorescent protein (GFP). (C) to (E) Genetically modifying the amino acids of the fluorophore region of GFP produces photoactivatable GFP (PAGFP) (C), photoconvertible GFP, Kaede and Eos (D), and photoswitchable GFP (Dronpa) (E). (C) Photoactivatable GFP becomes fluorescent (488-nm excitation, 510-nm emission) when illuminated with violet light and is nonreversible because the violet light breaks a covalent bond. (D) Photoconvertible Eos or Kaede switch from blue excitation, green fluorescence to green excitation, red fluorescence with violet light exposure. It is non-reversible because the violet light breaks a covalent bond. (E) Reversibly photoswitchable Dronpa is green fluorescent but becomes non-fluorescent with continued exposure to blue light. The green fluorescence recovers with exposure to violet light. The isomerization of the fluorophore induced by the different light wavelengths is reversible. Diagram by L. Griffing.
277
278
12 Temporal Analysis
In the case of PAGFP, the ionically neutral conjugated ring system of the phenol in tyrosine 66 of the chromophore becomes an anionic, fluorescent phenolate with the absorption of 405-nm light (see Figure 12.28B) through decarboxylation of an adjacent glutamate. Other examples of photoactivatable proteins are in Table 12.1. All are monomeric, and some can be photoactivated to become red and orange fluorescent. Some of these, such as PA-mKate and PA-TagRFP, do not photobleach as easily; that is, they are more photostable. An interesting set is Phamret, which is a FRET-based (see Section 11.12) photoactivable protein. Like the photoswitchable proteins (Table 12.2). it changes fluorescence color upon photoactivation. However, the size of dimeric structures such as Phamret ortetrameric structures (see Table 12.2) has the potential of changing the properties of the proteins whose location they are reporting. The proteins that can change fluorescence color by photoconversion (but without FRET) are photoswitchable proteins (see Table 12.2). Many were not originally monomeric but now have engineered monomeric
Figure 12.26 Photoactivation of photoactivated green fluorescent protein (PAGFP) to show that movement of nuclear PAGFP into cytosol results in PAGFP–LC3 punctae, but cytoplasmic PAGFP, when activated does not produce PA–LC3 punctae. Scale bar = 5 µm. From Huang, R. Xu, Y., Wan, W., et al. 2015. Deacetylation of nuclear LC3 drives autophagy initiation under starvation. Molecular Cell 57: 456–466. Used with permission. Table 12.1 Photoactivatable Fluorescent Proteins.a
Protein
Excitation Maximum (nm)
Emission Maximum (nm)
Brightness (Quantum yield × Extinction coefficient)
Photostabilityb
PA–GFP (G)
504
517
13.7
+
PA–mKate (R) PA–TagRFP (R) PA–mRFP (R) PA–mCherry (R) Phamret (C) Phamret (G)
586 562 578 564 458 458
628 595 605 595 480 517
4.5 5.3 0.8 8.28 13 13.5
+++ +++ ++ ++ ++ ++
a
All are 405–nm activated and monomeric Photostability is relative. PA-GFP, photoactivatable green fluorescent protein; PA-mKate, photoactivatable modified Kate; PA-TagRFP, photoactivatable TagRFP; PA-mRFP, photoactivatable modified red fluorescent protein; PA-mCherry, photoactivatable modified mCherry; Phamret, photoactivationmediated resonance energy transfer protein. (G) = green emission, (R) = red emission, (C) = cyan emission.
b
Table 12.2 Non-reversibly Photoswitchable, Photoconvertible Fluorescent Proteins.a
a
Proteinb
Excitation Maximum (nm)
Emission Maximum (nm)
Brightnessc
Not PC PS-CFP2
400
Kaede
508
572
Photostabilityd
PC
Not PC
PC
Not PC
PC
Not PC
PC
490
468
511
8.6
10.8
++
++
518
580
86.9
19.9
++
+++ +++
wtKikGR
507
583
517
593
37.6
22.8
++
mKikGR
505
580
515
591
33.8
17.6
+
++
wtEosFP
506
571
516
581
50.4
22.5
++
+++
dEos
506
569
516
581
55.4
19.8
++
+++
tdEos
506
569
516
581
22.4
19.4
++
+++
mEos2
506
573
519
584
41.4
30.4
++
+++
Dendra2
490
553
507
573
22.5
19.3
++
+++
PS mOrange
548
636
565
662
58
+++
+++
9.2
cyan, monomeric; magenta, dimeric; red, tandem dimer; green, tetramer. All except mOrange are photoswitchable at 405 nm. Dendra 2 is also photoswitchable at 488 nm. PSmOrange is photoswitchable at 480 and 540. c Brightness = Quantum yield x Extinction coefficient (see Section 17.2). d Photostability is relative. notPC, nonphotoconverted form; PC, photoconverted form. b
12.8 Fluorescence Switching Also Shows Connectivity and Movement
forms that can more accurately report protein location without interference. In the case of photoconvertible green-to-red fluorescence, the tyrosine and a histidine form a covalent bond in the fluorophore (see Figure 12.25). Absorption of violet or UV light cleaves part of the histidine and the spectra of the fluorophore changes. In the case of Eos2, it changes from excitation at 506 nm and emission at 519 nm (green) to excitation at 573 nm and emission at 584 nm (red) (see Table 12.2). Lasers are not necessary. A standard mercury arc source can photoconvert Eos2 (Figure 12.27). The region of photoconversion can be as small as the diffraction limit using 405-nm lasers, but arc sources photoconvert larger regions limited by the field shutter or iris on the epi-illuminator (see Section 17.5, Figure 17.13). As shown in Figure 12.27, components of clathrin-coated vesicles move and intermix following green-to-red fluorescence photoconversion with an arc source. The time resolution of these studies depends on how fast an existing optical beam-splitter and fast shutter coordinate to expose the
Figure 12.27 Photoconversion of the green fluorescence of an Eos2-labeled component of clathrin coated vesicles in vivo using a standard fluorescence microscope. Panel 1, row1, A portion (circle) of the clathrin-coated vesicles in the cytoplasm of a HeLa cell were photoconverted within about 10 s. Two to five minutes following photoconversion, unconverted vesicles moved laterally (red and green) and intermixed (yellow, arrows). N, cell nucleus. From Baker, S.M. Buckheit, R.W., and Falk, M.M. 2010. Green-to-red photoconvertible fluorescent proteins: tracking cell and protein dynamics on standard widefield mercury arc-based microscopes. BMC Cell Biology 11: 15 http://www.biomedcentral.com/1471-2121/11/15. Creative commons.
279
280
12 Temporal Analysis
sample to the photoconversion beam (e.g.,100 ms) and then expose it to the new photoexcitation wavelength during image acquisition. Often, this is slower than confocal systems. Likewise, a standard mercury arc source can FRAP an ROI, but the time required to reset the optics to follow recovery is generally longer. Both photoactivation and photoconversion are one-way, irreversible operations that result from breaking covalent bonds in the fluorophore. Another important photoswitchable event, albeit somewhat rare in the world of fluorescence, is reversibly photoswitchable. These can be proteins such as Dronpa (Figure 12.28 Figure 12.28 Two time series of single-molecule and Section 18.7, Table 18.2) or organic molecules such as Cy3fluorescence images of Dronpa in a polyvinyl alcohol film. A and B are different molecules shown at about Alexa647 heterodimer (see Section 18.9, Table 18.3). The use of 0.6-s intervals. The images were obtained with 488-nm these reversibly photoswitchable probes would be a disaster in excitation, and the photoconversion was done in images FRAP, which has, as an underlying assumption, the irreversible 5, 9, 13, and 17 with a 405-nm laser. From Habuchi. S., nature of photobleaching. Ando, R., Dedecker, P., et al. 2005. Reversible singlemolecule photoswitching in the green fluorescent Superresolution microscopy uses all of these photoswitchable, phoprotein–like fluorescent protein Dronpa. Proceedings of toconvertible, and photoactivatable probes for single molecule microsthe National Academy of Sciences of the United States of copy (see Chapter 18). Briefly, the idea behind superresolution America 102: 9511–9516. Used with permission. microscopy using these fluorescent protein tags is that the individual molecules can be excited at low light levels and because only a fraction fluoresce at low light, each molecule is visible, if not resolvable. When these molecules photoswitch or photobleach, another low-light pulse produces a different set of visible molecules. The centroid of the blurred fluorescence of each molecule is approximately that molecule’s position. This localizes molecules with 10–20 nm precision, well below the diffraction limit.
12.9 Fluorescence Correlation Spectroscopy and Raster Image Correlation Spectroscopy Can Distinguish between Diffusion and Advection Fluorescence correlation spectroscopy (FCS) can determine the relative contributions of diffusion and advective flow (see Figure 12.20). Panels A and B of Figure 12.29 are time sequences of images of a green fluorescent species diffusing (A) or diffusing and flowing (advection; B) through a tubule. The light blue circle contains the time-sampled region. FCS correlates the signal in this sampled region at time 0 with the signal in the sampled region at subsequent times. G(τ ) is the time correlation, which drops as the fluorescent species moves out of the sampled region. The black line in Figure 12.29C shows the correlation decay curve for diffusion. If there is a velocity field, as in advection, then the fluorescence moves more rapidly out of the sampled region (B), having a different correlation decay (red in C) than diffusion (black in C). The rediscovery of long tubules, or stromules, coming from the surface of chloroplasts occurred with GFP-labeled stromal proteins. FCS can monitor the movement of the GFP-tagged stromal proteins. The results for movement within stromules were intermediate between the cases of pure diffusion and pure active flow (advection), producing results like those in Figure 12.29C. The movement of clusters of GFP was advective and energy dependent and had an average velocity of 0.12 ± 0.06 µm / sec, whereas the 3D diffusion of stromal GFP proteins in the tubules was approximately 1 µm2 / sec. This value for the diffusion in the stromule tubule is the same order of magnitude of the value found by FRAP for glycosylated proteins in the tubules of the ER, 1.6 µm2 / sec. However, non-glycosylated GFP in the tubules of the ER have higher values, 5 − 10 µm2 / sec and 34 µm2 / sec, as reported by different FRAP studies. Raster image correlation spectroscopy (RICS) provides a way to visually map the flows that occur using a standard confocal laser scanning microscope. Determining flow from a confocal scan is possible because there is a “hidden” time structure in the image acquisition resulting from the raster scan of the laser across the sample. Single images have a time component in them. Hence, producing a map of the flow derived from each image involves calculating the correlation of the fluorescence over time that tracks with the raster scan (Figure 12.30A). The correlation happens within a single image. This is different from optic flow methods, which use the correlation between pixel blocks in sequential images. The analysis of flow in RICS is more sensitive to the geometry of the system than FCS (Figures 12.30B–D). Unlike the correlation function for FCS, the correlation function for RICS is different in the x and y directions based on the pixel dwell
12.9 Fluorescence Correlation Spectroscopy and Raster Image Correlation Spectroscopy Can Distinguish between Diffusion and Advection
Figure 12.29 Fluorescence correlation spectroscopy (FCS). (A) The fluorescence in the region sampled at time 0 (light blue circle) is continually sampled, and the correlation between the time 0 sample and the later time sample decays with diffusion. (B) With both diffusion and advection, the correlation between the time 0 sample and the later sample decays more quickly after it starts. (C) Plot of correlation decay with time for diffusion (black) and diffusion + advection (red). Diagram by L. Griffing. A is adapted from Digman, M et al. (2013).
Figure 12.30 Raster image correlation spectroscopy (RICS). (A) The region assessed at time 0 (light blue circle) contains a distribution of fluorescent species. As the raster scans across the image (red circle), the correlation between the fluorescence sampled by the raster and the fluorescence sampled at time 0 (light blue circle at time 0) diminishes. The light blue circle at times 1–8 is only included for reference. (B) If, along with diffusion, plug-flow advection occurs in the direction of the raster, the correlation between the fluorescence sampled by the raster (red circle) and the fluorescence sampled at time 0 (light blue circle at time 0) decays more slowly. (C) If, along with diffusion, advection occurs in the opposite direction of the raster scan (dark blue circle), then the correlation decays more rapidly. (D) Graph of the correlation decay with diffusion (black), diffusion + advection in the direction of the raster (red), and diffusion + advection in the opposite direction of the raster (dark blue). Diagram by L. Griffing. A is adapted from Digman, M et al. (2013).
time and line scan time for the raster in the confocal microscope. When the raster scan is in the same direction as the flow, the correlation decay lengthens (see Figure 12.30B and D). When the raster scan is the opposite direction, the correlation decay shortens (see Figure 12.30C and D). Confirming the flow rate and direction is possible by flipping the horizontal scan direction. Measurements of movement in other directions consider the pixel dwell time and the raster scan time. In Figure 12.31, an example of RICS shows the spatial regulation of diffusion of proteins within the nucleus. The nucleoplasm has a higher rate of diffusion than the nucleolus. The correlation function in this particular procedure is a local autocorrelation function (L-RICS).
281
282
12 Temporal Analysis
Figure 12.31 Raster image correlation spectroscopy (RICS) analysis of diffusion of free green fluorescent protein (GFP) in the nucleus of HeLa cell. (A) Example of a HeLa cell expressing untagged GFP. (B) Enlarged region showing the 256 × 256 pixels area of the RICS. Portions of the nucleolus (no) and the nucleoplasm (np) are visible. (C) Example of one frame of a RICS data set of dimension 64 × 256 pixels. D) Diffusion maps obtained from (C) and computed with 100 frames and E) 400 frames, corresponding to 1.37 min and 5.46 min total acquisition time, respectively. Scale bar in A = 5 µm; other scale bars = 1 µm. From Scipioni, L., Di Bona, M., Vicidomini, G., et al. (2018) Local raster image correlation spectroscopy generates high-resolution intracellular diffusion maps. Nature Communications 1: 10. doi: 10. 1038/s42003-0170010-6. Creative commons 4.0.
12.10 Fluorescent Protein Timers Provide Tracking of Maturing Proteins as They Move through Compartments All fluorescent proteins have a maturation time in the cell. This involves folding into the barrel shape, adding the chromophore, and, if it occurs, multimerization. This occurs over 20–90 minutes depending on the protein. During maturation, they can change color and work as protein timers. Some mCherry (see Section 17.2, Table 17.1) derivatives fluoresce blue early on, then fluoresce red (Table 12.3). The timing of the blue to red transition, for the fast fluorescent timer (fast-FT; Figure 12.32A) converts completely to red from blue in about 9 hours at 37°C. Medium-FT converts completely in about 20 hours, and slow-FT converts completely in 50–60 hours. The timing of the process of interest determines the choice of timers. This assumes that the transcription and translation of the fusion protein containing this reporter starts relatively
Table 12.3 Fluorescent Protein Timers. Protein
Excitation Maximum (nm)
Emission Maximum (nm)
Brightness
Blue Form
Blue Form
Blue Form
Red Form
Red Form
Red Form
Fast-FT
403
583
466
606
14.9
6.7
Medium-FT
401
579
464
600
18.3
5.8
Slow-FT
402
583
465
604
11.7
4.2
FT, fluorescent timer.
Figure 12.32 Fluorescent protein timers. (A) Time course of maturation of fast fluorescent timer (fast-FT). (B) Time course of maturation of a tandem fusion of mCherry (black, red) and superfolder green fluorescent protein (sfGFP) (gray, green). For simplicity, a one-step process represents mCherry maturation. Fluorescence intensity curves are normalized to the brightness of sfGFP. Ratios are normalized to the maximum in each plot. From Khmelinskii, A. (2012) / With permission of Springer Nature.
12.10 Fluorescent Protein Timers Provide Tracking of Maturing Proteins as They Move through Compartments
uniformly upon transfection with the gene construct of interest (or can be “turned on” with a promoter ligand). Also, it assumes that the transcription of the transgene turns off fairly completely after a certain time with a promoter ligand, such as doxycycline in the Tet-Off construct in Figure 12.33. As such, it produces a wave of protein that moves to different compartments as the protein ages, changing color as it does so. For example, the protein lysosome-associated membrane protein type 2A (LAMP2A), starts out in the perinuclear Golgi bodies at 1–6 hours, moves to the more peripherally distributed plasma membrane and endosomes by 12–21 hours, and finally ends up in the lysosomes by 36–63 hours. This visualization confirms an indirect pathway of the protein through the cell indicated by previous biochemical studies.
Figure 12.33 Intracellular localization of the lysosome-associated membrane protein type 2A (LAMP2A)–medium fluorescent timer fusion protein in HeLa Tet-Off cells at different time points after the shutting down of transcription with doxycycline. The blue and red forms of LAMP-2A–medium-FT are shown with green and red pseudocolors, respectively. Scale bar = 10 µm. From Subach, F.V. Subach, O.M., Gundorov, I.S., et al. et al. 2009. Monomeric fluorescent timers that change color from blue to red report on cellular trafficking. Nature Chemical Biology 5: 118. doi:10. 1038/nchembio.138. Used with permission.
283
284
12 Temporal Analysis
Figure 12.34 mCherry/superfolded green fluorescent protein (sfGFP) tandem timer intensity ratios of structures marked with Rax2-mCherrysfGFP. Images of representative diploid yeast cells are ordered according to cell age and cell cycle stage. Different structures are marked with arrowheads according to origin (red, birth; orange, first cell cycle; green, second cell cycle; cyan, third cell cycle). Ratiometric mCherry/sfGFP images are color coded as indicated. Scale bar = 5 µm. From Khmelinskii A., Keller, P.J., Bartosik, A., et al. 2012. Tandem fluorescent protein timers for in vivo analysis of protein dynamics. Nature Biotechnology 30(7): 708–714. doi:10. 1038/nbt.2281. Used with permission.
Tandem fusions of fluorescent proteins mature relatively rapidly compared with the FT fluorescent timers (Figure 12.32B). Tandem fusions of superfolded GFP (sfGFP; see Section 17.2, Table 17.1) and mCherry have two states: a rapidly maturing sfGFP fluorescence, reaching a maximum after 15–20 minutes, and a more slowly maturing mCherry fluorescence, reaching a maximum after about 2 hours. These tandem probes are larger than the monomeric FT fluorescent timers. The experiment in Figure 12.34 shows the time course of maturation of Rax2-mCherry-sfGFP, a reporter of the emerging yeast buds that localizes to the bud neck, the site of cell division, where it forms rings that mark sites of previous division, the bud scars. The mCherry-to-sfGFP intensity ratio correlates positively with the age of labeled structures; it is lower in new, emerging buds or bud necks and higher at bud scars. After first appearing green at the sites of cell division, the intensity ratio increases and becomes redder during cell cycle progression through four generations.
Annotated Images, Video, Web Sites, and References 12.1 Representations of Molecular, Cellular, Tissue, and Organism Dynamics Require Video and Motion Graphics The download site for Blender is https://www.blender.org/download. The video sequence editor is an often-overlooked feature for new users because Blender does so many other things. Because it does so much, the interface can be intimidating but no more so than many other high-end video sequence editors, when within the video editing workspace. The video editing workspace for Blender is available as a menu choice on the top menu bar (see Figure 12.2), but if the screen is small, sometimes this cannot be seen. Although the figures in the section are from Blender 2.9.2, the interface looks the same for Blender 3.0.
12.2 Motion Graphics Editors Use Key Frames to Specify Motion For some consumer slow-motion cameras, see https://www.premiumbeat.com/blog/6-affordable-slow-motion-cameras. For some examples of professional slow-motion cameras, see https://www.mctcameras.com/camera-guideline. Blender has a big advantage over proprietary video sequence editors because it accepts and writes a larger variety of file formats and uses specified, often open-source, containers and codecs. There are many tutorials on YouTube for setting up pan and zoom in Blender using key frames. Unfortunately, many use out-of-date versions. The Blender user manual is current and is at https://docs.blender.org/manual/en/latest. An image stabilization plugin for ImageJ/Fiji is https://imagej.net/Image_Stabilizer. Motion tracking using Adobe After Effects shown in Figure 12.6 is from this report: Koehnsena, A., Kambach, J., and Büssea, S. 2020. Step by step and frame by frame – workflow for efficient motion tracking of high-speed movements in animals. Zoology 141: 125800.
Annotated Images, Video, Web Sites, and References
Idtracker has many versions. Here is a one that uses artificial intelligence: Romero-Ferrero, F., Bergomi, M.G., Hinz, R.C., Heras, F.J.H., and de Polavieja, G.G. 2019. idtracker.ai: tracking all individuals in small or large collectives of unmarked animals. Nature Methods 16: 179–182. PathTrackerR uses the open-source statistics program, R; see Harmer, A.M.T. and Thomas, D.B. 2019. pathtrackr: an R package for video tracking and analyzing animal movement. Methods in Ecology and Evolution 10: 1196–1202. The other references, besides the earlier idtracker citation, cited in Figure 12.7 are Tort, A.B.L., Neto, W.P., Amaral, O.B., et al. 2006. A simple webcam‐based approach for the measurement of rodent locomotion and other behavioural parameters. Journal of Neuroscience Methods 157: 91–97. https://doi.org/10.1016/j.jneum eth.2006.04.005. Branson, K., Robie A.A., Bender, J., et al. 2009. High‐throughput ethomics in large groups of Drosophila. Nature Methods 6: 451. https://doi.org/10.1038/nmeth.1328. Noldus. 2018. EthoVision XT. http://www.noldus.com/EthoVision-XT/New. Rodriguez, A., Zhang, H., Klaminder, J., et al. 2017. ToxTrac: a fast and robust software for tracking organisms. Methods in Ecology and Evolution 9: 460–464. Correll, N., Sempo,G., De Meneses, D.L. et al. 2006. SwisTrack: a tracking tool for multi‐unit robotic and biological systems. 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 2185–2191. doi: 10.1109/IROS.2006.282558.
12.3 Motion Estimation Uses Successive Video Frames to Analyze Motion An introduction to motion estimation and some illuminating routines for it in MATLAB are in Marques, O. 2011. Practical Image and Video Processing Using MATLAB. John Wiley & Sons, Hoboken, NJ.
12.4 Optic Flow Compares the Intensities of Pixels, Pixel Blocks, or Regions Between Frames A discussion of optic flow approaches and their relationship to image compression, with some examples in MATLAB, is in Marques, O. 2011. Practical Image and Video Processing Using Matlab. John Wiley & Sons, Hoboken, NJ. An example of optic flow analyzes intermediate filament movement in cells in Helmke, B.P., Thakker D.B., Goldman R.D., et al. 2001. Spatiotemporal analysis of flow-induced intermediate filament displacement in living endothelial cells. Biophysical Journal 80: 184–194. There is an example of a cross-correlation approach that looks at actin polymerization in relation to migration in PaulGilloteaux, P. Waharte, F., Singh, M.K., et al. 2018. A biologist-friendly method to analyze cross-correlation between protrusion dynamics and membrane recruitment of actin regulators. Ed. by A. Gautreau. Cell Migration: Methods and Protocols, Methods in Molecular Biology. vol. 1749. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7701-7_20.
12.5 The Kymograph Uses Time as an Axis to Make a Visual Plot of the Object Motion An example of the use of a kymograph is the demonstration of altered oscillations during zebrafish development in Soroldoni, D. Jörg, D.J., Morelli, L.G., et al. 2014. A Doppler effect in zebrafish embryonic development. Science 345: 222–225.
12.6 Particle Tracking Is a Form of Feature-Based Motion Estimation LAP strategies for single particle tracking are in Jaqaman, K., Loerke, D., Mettlen, M., et al. 2008. Robust single-particle tracking in live-cell time-lapse sequences. Nature Methods 5: 695–702 Trackmate is a FIJI plugin. Tinevez, J.Y., Perry, N., Schindelin, J., et al. 2017. TrackMate: an open and extensible platform for single-particle tracking. Methods 115: 80–90. PMID 27713081. Sophisticated analyses of cell tracking and dynamics during embryogenesis that uses the TGMM open-source software to determine how cells move in 4D. Amat, F., Lemon, W., Mossing, D., et al. 2014. Fast, accurate reconstruction of cell lineages from large-scale fluorescence microscopy data. Nature Methods 11: 951–958. https://doi.org/10.1038/nmeth.3036. An application of TGMM to zebrafish embryo morphs is Shah, G., Thierbach, K., Schmid, B., et al. 2019. Multiscale imaging and analysis identify pan-embryo cell dynamics of germlayer formation in zebrafish. Nature Communications 10: 5753. https:// www.nature.com/articles/s41467-019-13625-0. The 3D renderings of the cellular trajectories were made with Blender 2.77. An application of TGMM that includes a machine learning algorithm, TGMM 2.0, provides the approach for developing a morphogenetic map of mouse embryogenesis McDole, K., Guignard, L., Amat, F., et al. 2018. In toto imaging and reconstruction of post-implantation mouse development at the single-cell level. Cell 175: 859–876.e33.
285
286
12 Temporal Analysis
A review of different ways to analyze and quantify diffusion and advection is in Owen, D.M., Williamson, D., Rentero, C., et al. 2009. Quantitative microscopy: protein dynamics and membrane organisation. Traffic 10: 962–971. An application of the distinction between diffusion and advection is in Moens, P.D.J., Digman M.A., and Gratton, E. 2015. Modes of diffusion of cholera toxin bound to gm1 on live cell membrane by image mean square displacement analysis. Biophysical Journal 108: 1448–1458. An example of single particle tracking using superresolution microscopy is in Frost, N.A. Shroff, H., Kong, H., et al. 2010. Single-molecule discrimination of discrete perisynaptic and distributed sites of actin filament assembly within dendritic spines. Neuron 67: 86–99.
12.7 Fluorescence Recovery After Photobleaching Shows Compartment Connectivity and the Movement of Molecules Easyfrap is at https://easyfrap.vmnet.upatras.gr. The iBiology video of J. Lippincott-Schwartz describing the dynamics of the endomembrane system using FRAP and photoactivation is at https://www.ibiology.org/talks/photobleaching-and-photoactivation. This also describes the experiments shown in Figures 12.22 and 12.24. A review of FRAP for looking cell dynamics is in Lippincott-Schwartz, J. Snapp, E.L., and Phair, R.D. 2018. The development and enhancement of FRAP as a key tool for investigating protein dynamics. Biophysical Journal 115:1146–1155. The use of FRAP for estimating the effective diffusion coefficient using Equation 12.1 is in Ellenberg, J. and LippincottSchwartz, J. 1999. Dynamics and mobility of nuclear envelope proteins in interphase and mitotic cells revealed by green fluorescent protein chimeras. Methods 19: 362–372. A critical review of FRAP approaches to membrane dynamics, along with protocols for these approaches is in Constantini, L. and Snapp, E. 2013. Probing endoplasmic reticulum dynamics using fluorescence imaging and photobleaching techniques. Current Protocols in Cell Biology 21.7.1–21.7.29. doi: 10.1002/0471143030. cb2107s60.
12.8 Fluorescence Switching Also Shows Connectivity and Movement A review of fluorescent protein engineering is in Shaner, N.C., Patterson, G.H., and Davidson, M.W. 2007. Advances in fluorescent protein technology. Journal of Cell Science 120: 4247–4260. Experiments with photoconvertible EOS-labeled clathrin coat components are in Baker, S.M., Buckheit, R.W., and Falk, M.M. 2010. Green-to-red photoconvertible fluorescent proteins: tracking cell and protein dynamics on standard wide-field mercury arc-based microscopes. BMC Cell Biology 11: 15 http://www.biomedcentral.com/1471-2121/11/15. The use of Dronpa is in Habuchi. S., Ando, R., Dedecker, P., et al. 2005. Reversible single-molecule photoswitching in the GFPlike fluorescent protein Dronpa. Proceedings of the National Academy of Sciences of the United States of America 102: 9511–9516.
12.9 Fluorescence Correlation Spectroscopy and Raster Image Correlation Spectroscopy Can Distinguish between Diffusion and Advection The movement of stromal proteins in stromules analyzed by FCS is in Köhler, R.H., Schwille, P., Webb, W.W., et al. 2000. Active protein transport through plastid tubules: velocity quantified by fluorescence correlation spectroscopy. Journal of Cell Science 113: 3921–3930. A review of RICS is in Digman, M. Stakic, M., and Gratton, E. 2013. Raster image correlation spectroscopy and number and brightness analysis. Methods in Enzymology 515: 121. http://dx.doi.org/10.1016/B978-0-12-388422-0.00006-6. The diffusion of nuclear proteins using RICS is in Scipioni, L., Di Bona, M., Vicidomini, G., et al. 2018. Local raster image correlation spectroscopy generates high-resolution intracellular diffusion maps. Nature Communications 1: 10. doi: 10.1038/s42003-017-0010.
12.10 Fluorescent Protein Timers Provide Tracking of Maturing Proteins as They Move through Compartments The two main references for protein timers are Subach, F.V., Subach, O.M., Gundorov, I.S., et al. 2009. Monomeric fluorescent timers that change color from blue to red report on cellular trafficking. Nature Chemical Biology 5: 118–126 and Khmelinskii A., Keller, P.J., Bartosik, A., et al. 2012. Tandem fluorescent protein timers for in vivo analysis of protein dynamics. Nature Biotechnology 30: 708–714. doi: 10.1038/nbt.2281.
287
13 Three-Dimensional Imaging, Modeling, and Analysis 13.1 Three-Dimensional Worlds Are Scalable and Require Both Camera and Actor Views Three-dimensional (3D) imaging, modeling, and analysis occur at all scales and resolving powers. Going from large to small, several laser-scanning technologies are used for acquiring large-scale 3D images. LiDAR (light detection and ranging) can acquire terrain images (Figure 13.1). Laser range finders or structured illumination can produce 3D images from hand-held (see Section 6.8) or full-body scanners. While these acquire the surface details of macro objects, multiple modalities acquire views of internal structures, including magnetic resonance imaging (MRI) (see Section 15.1, Figure 15.1), ultrasonography (see Section 14.1, Figure 14.1), and computed tomography (CT) scanning (see Section 3.3, Figure 3.6 and Section 6.7, Figure 6.15). MRI and ultrasonography have a limiting resolving power of about 1 voxel, or volume element, per cubic millimeter. Micro- or nano-CT scanning (see Section 6.7) have a higher resolving power of 109 to 1012 voxels per cubic millimeter, or one voxel per cubic 0.1–1 µm. This is about the same resolving power achieved by light microscopy modalities such as confocal laser scanning microscopy (see Sections 6.9, 17.8, and 17.11), light sheet microscopy (see Sections 18.4–18.5), or widefield deconvolution microscopy of reflected or fluoresced light (see Sections 17.5 and 10.8). Light microscopy typically acquires the 3D volume of living tissue with optical sectioning by confocal microscopy (i.e., acquiring serial through-focus images of an object) and so is non-invasive but has poor penetration into the living tissues, being limited to 50–100 µm of depth, depending on the transparency of the object. Fixation (see Section 16.2), clearing, and adaptive optics (see Section 8.13) increase the depth of optical sectioning. Multiphoton microscopy (see Section 17.12) improves penetration by using longer wavelengths of light that have less scatter. Electron microscopy and crystallography achieve several orders higher resolving power (i.e., 0.1–10 cubic nm per voxel). In conventional scanning electron microscopy, images often look 3D because they have a high depth of field. Transmission electron microscopy (TEM) tomography of thick sections produces detailed 3D objects at the cubic nanometer scale (see Section 19.13, Figure 13.2A), while single-particle analysis provides 3D objects resolved at the angstrom level (see Section
Figure 13.1 Macro scales in three dimensions. (A) Aerial view of the forest surrounding the ruins of Tikal in Guatemala. The visible monument is about 50 m high. Photo by Dr. Francisco Estrada-Belli. Used with permission. (B) LiDAR (light detection and ranging) view of a region containing A, showing how it can penetrate through forest canopies to show ancient Mayan ruins. Image courtesy of Francisco Estrada-Belli/Pacunam Lidar Initiative. (C) Diffusion-weighted magnetic resonance imaging of mouse brain about 508 cubic mm in volume. 16.4 Tesla. Resolution = 0.1 mm in all three dimensions. With permission of Bruker. Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
288
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.2 Micro scales in three dimensions. (A) Three-dimensional reconstruction of Golgi body and endoplasmic reticulum (ER) from transmission electron microscope tomography. The asterisk marks an ER tubule traversing a clear zone between two cis stacks. Scale bar = 200 nm. Martínez-Martínez, N., Martínez-Alonso, E., Tomás, M., et al. 2017 / PLOS / CC BY 4.0. (B) Focused ion beam scanning electron microscope reconstruction of the ER (blue), Golgi body (red to green), and multivesicular body in Micrasterias denticulata, from the forming face to the maturing face. Scale bar = 1 µm. Wanner G et al. 2013 / With permission of Elsevier. (C) Pymol representation of a TATA-box (green and brown) binding protein (blue). Bounding box = 6.7 × 6.7 × 8.6 nm.
19.14). Focused ion beam scanning electron micrography (FIBSEM) (see Section 19.12, Figure 13.2B), serial block face SEM (SBF-SEM) (see Section 19.12), array tomography SEM (AT-SEM) (see Section 19.12), and serial section TEM (see Section 19.10) derive 3D images from two-dimensional (2D) physical sections of embedded or frozen biological material. Crystalized proteins and their ligands produce diffraction patterns that allow visualization at the atomic level in 3D (Figure 13.2C). A variety of software packages generate, manipulate, and position objects within the 3D environment. Some of these packages address specific spatial scales. For example, Jmol and Pymol produce a 3D environment for the visualization of molecules and atoms (Figure 13.2C). On the other hand, geographic information system (GIS) software produces a 3D environment for kilometer-sized objects in large landforms from LiDAR data sets (see Figure 13.1A). There are more than 100 3D computer graphics software packages listed in Wikipedia, not including specialized programs such as Pymol, ImageJ or ICY plugins, and GIS programs. Examples in this chapter use the open-source program Blender along with open-source image analysis plugins to ImageJ (FIJI) and Icy with Visualization Toolkit (VTK) for analysis. For electron microscopy (EM) tomography in this chapter and Chapter 20, IMOD is the go-to open-source package. There are proprietary microscopy-related (Amira, MetaMorph, NIS-Elements, v-LUME-subscription), CT scanning (VG Studio Max, Ikeda’s TomoShop), and MRI (OsiriX) programs (see Section 7.2, Table 7.1). There are also proprietary programs for 3D instrument design, such as AutoCAD. An intermediate case is the proprietary MATLAB/Simulink language (note that there is a less-developed, opensource version, Octave), in which many scientists write 3D visualization and analysis routines that are available as opensource scripts. Julia is a recently developed, popular, open-source language for scientific applications and has, for example, a program for running 3D visualizations out of web browsers, MeshCat.jl. Open-source programs are powerful enough to provide most of the 3D modeling and analysis capabilities needed in the life sciences. Companies making proprietary software may not maintain archives of older versions, and the actual algorithms used by proprietary software are unknown, so open-source software is preferable for most scientific applications. This book provides examples in the open-source program Blender. Finally, notebooks and applications such as Jupyter and KNIME provide platforms for publishable data analytics using a variety of computer languages (e.g., Python, the language for Blender) and simplify data management while reproducing processing steps. Viewing a 3D world through a computer screen can be confusing. A stationary camera or viewport of the 3D world reduces the imaging complexity of these systems. Such is the case for Jmol and Pymol (see Figure 13.2C). Rotating and positioning the object in 3D is possible, but the viewport or camera generally remains stationary. The standard 3D viewport is part of the user interface or workspace and starts with an object on a grid or ground with the rendering camera in view in Blender (Figure 13.3A). It is possible to add several different rendering cameras, with the active camera being the one from which rendering takes place. The view from the rendering camera shows its field of view, which includes a ground grid, for rendering the 3D scene as a 2D image (Figure 13.3B). The ground grid is there for placement, not for final rendering. The grid provides a reference scale in 3D space. Each square in the grid can represent a user-defined unit of measure. Setting the right units is important for 3D printing as well as for measurement. There is a 3D printing add-on in Blender that provides volume and surface area measurements. The world displayed in the 3D viewport is a scene and contained within the scene, listed in the outliner (see Figure 13.3A), are objects or actors. In addition to the 3D models, actors can be cameras, lights, or even empties, placeholders around which something happens. The Blender interface is flexible, and the workspace is a combination of such windows (see Figures 13.3 and 13.4). For example, the workspace for video sequence editing in Blender (see Section 12.1, Figures
13.1 Three-Dimensional Worlds Are Scalable and Require Both Camera and Actor Views
Figure 13.3 Blender three-dimensional viewport. (A) User view of ground grid with a cube and a light source and, highlighted in orange, a camera. The outliner list shows all of the actors and the selected actor, the camera. The properties menus show the properties of the camera. (B) Camera view of cube and ground grid. The units are metric with one square equal to 1 m2. A measuring tool measures the distance between any two points. The measured distance is off (it should be 2 m) because the view is a perspective projection, not perpendicular to the measured edge. Orthographic projections work better for measurement. Images by L. Griffing.
Figure 13.4 Different Blender menu windows, highlighted with cyan circles. Menu window 1 is the properties. Below it is the complete list of windows. Selecting a different window icon switches the window in the workspace. Window 2 shows the outliner. Window 3 shows the three-dimensional viewport. Window 4 shows the timeline window. Image by L. Griffing.
289
290
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.5 View frustum of the camera in the three-dimensional environment. The view plane normal is green. Diagram by L. Griffing.
12.2, 12.3, and 12.5) uses the video editing workspace. The workspace for 3D layout in Figure 13.4 is the layout workspace. There are other preset workspaces such as the modeling, sculpting, and animation workspaces, but they are just combinations of the windows. Figure 13.4 shows four different windows in the workspace, determined by choosing the icon circled in cyan. The view from the active camera is the view frustum. It is a rectangular pyramid-shaped space with the apex at the camera and the base at the backclipping plane (see Section 4.7, Figure 4.14) or the plane beyond which the camera does not see (Figure 13.5). There is also a front-clipping plane in front of which the camera cannot see, making the view frustum a truncated pyramid. The view plane normal is the vector at right angles to the focal view plane of the camera (see Figures 13.5 and 13.6A). The angle between the top of the frustrum and the view plane normal is the view angle. The camera can rotate in three dimensions producing pitch (see Figure 13.6) and roll and yaw (Figure 13.7). In Figure 13.7, the view plane normal lies along the y-axis; the camera pitch is the angle of rotation around the x-axis; and the roll and yaw are the angles of rotation around the y- and z-axes, respectively.
Figure 13.6 Terms for movement of the focal point of the camera in three-dimensional view. (A) View plane normal. Pitch is the rotation around the x-axis pointing out of the page. (B) Camera yaw is rotation around the z-axis pointing out of the page. Rotation around the y-axis is camera roll. Diagrams by L. Griffing. Figure 13.7 Parenting a cube to sphere in Blender. This sets up a parent–child relationship between the two objects (i.e., when one moves the other moves as well). (Shift-click on both objects, the ctrl P for parenting command.) Diagram by L. Griffing.
13.2 Stacking Multiple Adjacent Slices Can Produce a Three-Dimensional Volume or Surface
Scaling, rotation, and translational movement of actors in the scene are affine transformations (see Section 12.1) in 3D space. Each actor has its own coordinate system, and transforming these into the scene coordinates specifies its relative position in the scene. Setting up a 3D scene by positioning objects within the scene requires switching, not only between the rendering camera and the 3D viewport but also between the different views in 3D viewport. Multiple 3D views showing the front, back, sides, top, and bottom visually position the object in space. After being positioned in 3D space, the actors have fixed or programmable positional relationships relative to each other. These are parent–child relationships, with the child following the parent when the parent moves (see Figure 13.7). Multiple render engines are available on different software packages. In Blender, there are two rendering engines, the default Blender internal render engine (Eevee) and the render cycle (Cycles) engine. The Blender Eevee engine uses non-photorealistic surface rendering from isosurfaces (see Section 13.4). For scientists, surface rendering is great for generating 3D models of processes. It is also computationally fast. For photorealistic rendering, the Blender Cycles engine is superior because it uses ray tracing (see Section 13.6). Although ray tracing is computationally more demanding, graphics programming units (GPUs) with thousands of core processors working together in parallel can now do it in real time. The company Nvidia produces GPUs programmed through the CUDA (Compute Unified Device Architecture) interface. Consumer-grade CUDA-enabled GPUs are becoming very powerful, driven by the gaming and digital currency industries. Their speed comes not only from parallel processing but also from the on-board GPU memory. Before detailing the difference between ray tracing and surface rendering (and combinations thereof), it is important to briefly detail the nature of the data that are 3D rendered.
13.2 Stacking Multiple Adjacent Slices Can Produce a Three-Dimensional Volume or Surface To generate a series of 2D images for stacking in 3D, the images can come from the physical sections of sliced objects, or if the object is transparent or translucent, from optical sections or sections generated from multiple views with tomography. Microscopes with a shallow depth of field produce optical sections from each focal plane. Only light from each thin optical section is in focus and deconvolution (see Section 10.8) can further remove out-of-focus information. Confocal microscopy (see Section 6.9, 17.8, and 17.11) and multiphoton microscopy (see Section 17.12) collect only the light from each focal plane, making each image an optical section. Optical sectioning, as in the 200 optical sections in Figure 13.8, can generate 3D objects from living cells and give quick four-dimensional (4D) information as the cells change with time. Physical sectioning (see Sections 16.1–16.2 and Section 19.5), however, is a necessity when the complexity, opacity, and thickness of biological tissues are issues. Tomography (see Sections 6.6–6.7) provides sections non-invasively from thick objects, whether they are organisms or semi-thick EM sections (see Section 19.13). To get 3D images back from sections, stacking the 2D images in 3D reconstructs the volume of the specimen. The stacking process has to be precise. Features in one section have to line up with the same features in the next. If physical deformation of the internal features or markers takes place, rubber sheeting, landmark registration, and scale-invariant feature transform (SIFT) (see Sections 11.2, 11.4, and Section 19.10) can correct it. The problem of precise stacking occurs in all forms of sectioning but is more difficult for physical sections because the act of cutting the sample can cause deformation. Precise registration uses either internal features or added fiducial markers, such as precisely positioned laser-drilled holes.
Figure 13.8 Reconstruction of tobacco leaf hair from confocal optical sections. (A) Reduced scale images of a series of 200 of the 230 optical sections used to generate the volume in B. (B) Surface-rendered volume of a tobacco leaf hair expressing green fluorescent protein localized to the endoplasmic reticulum. Magenta sphere diameter = 18 µm. Images by L. Griffing. (C) Scanning electron micrograph of leaf hairs, pavement cells, and stomatal complexes on the epidermis of a tobacco leaf. Scale bar = 50 µm. From Goldberg, M. and Richardson, C. 2016. Tobacco leaf: critical point drying protocol for scanning electron micrography. Leica technical note. https://www.leica-microsystems.com/science-lab/tobacco-leaf-critical-point-drying-protocol-for-sem. Used with permission.
291
292
13 Three-Dimensional Imaging, Modeling, and Analysis
Certain forms of physical sectioning, such as sectioning using focused ion beam scanning electron microscopy (FIBSEM), do not cause deformation and re-alignment of images is not necessary. For optical sectioning and tomographic sectioning, it is important that the object remain completely still during the sectioning process. This could be a problem because light microscopy and CT scanning can image living, moving objects. The best approach to overcoming this problem is high-speed optical sectioning, such as that available with light sheet microscopy (see Section 18.4) and high-speed cameras (see Sections 12.1 and 18.1) or high-speed acquisition during tomography. In CT scanning with a controlled rate of movement of the carrier through the X-ray ring, software corrections are possible. Sections have a certain z-dimension or axial thickness. A 2D image, however, is only 1 pixel thick. It is an image of focused light from a certain thickness of tissue and out-of-focus light from tissue above and below the optical section. Since confocal micrographs and deconvolved images have much of this out-of-focus light eliminated, they are very good for 3D reconstruction (see Figure 13.8). Stacking images of sections produces a volume with a z-dimension. The z-dimension of Figure 13.9 Aliased isosurface volume reconstruction from the T1 head volume sample in the 3D volume corresponds to the combined thicknesses of the secImageJ. Image reconstruction by L. Griffing. tions. A good starting section thickness is twice as thin as the resolvable z-dimension distance to meet the Nyquist sampling criterion. If the resolvable z-dimension, depth of field or axial resolving power of the lens (for standard light microscopy, ~ 500 nm) (see Section 8.7) is greater than the resolvable x-y dimensions (for standard light microscopy, ~ 200 nm), then it is important to adjust (reduce) the z-dimensional size or scale to provide cubic voxels. This means over-sampling in the z-dimension. Oversampling three- to sixfold is often necessary; however, the final z-dimension will need rescaling. Z-sectioning the sample below the Nyquist criterion spatial frequency and below the lateral resolving power produces an aliased volume, with a stair-stepped, jaggy outline (Figure 13.9). 3D interpolation (see Section 11.3) can smooth the edge, just as interpolation can smooth a movie. Remeshing a surface volume can also smooth it (see Section 13.5, Figure 13.23). However, if there are connections between features in the sections occurring at distances smaller than the section thickness, then interpolated features would miss them.
13.3 Structure-from-Motion Photogrammetry Reconstructs Three-Dimensional Surfaces Using Multiple Camera Views Structure-from-motion algorithms can create 3D objects from different 2D views. Calculation of the 3D structure from motion of the camera uses motion parallax, whereby the position of objects in the field of view changes more when close to the camera and changes less when far from the camera. SIFT or SURF (see Section 11.4) identifies multiple scale-invariant features or keypoints of images of the same object taken from different distances and angles (Figure 13.10). Matching the keypoints shared by different images uses algorithms such as the Lucas-Kanade algorithm for motion stabilization (see Section 12.2). Following elimination of outlier keypoint matches that do not follow certain geometric constraints, a bundle adjustment algorithm uses the remaining matched keypoints to calculate the position and focal length of the camera for each image, as well as the relative positions of the matched keypoints. Multiview stereo matching (MVS) (Figure 13.11) relates the pixels in each view to each other and to the original matched keypoints, creating a dense cloud of points of 3D. Connecting these points with a polygonal mesh creates an isosurface 3D model of the object. Knowing the position of certain keypoints or the position of the camera and its focal length provides distance calibration or rectification. Several open-source programs provide the computation for these transformations. Blender can render a 3D view from Meshroom images saved as Wavefront (.obj, .mtl) files (Figure 13.12). Structure-from-motion software produces digital 3D photogrammetry, the ability to measure 3D objects from photographs. It is useful for archiving, analyzing, and quantitatively comparing species using both prepared and preserved
13.3 Structure-from-Motion Photogrammetry Reconstructs Three-Dimensional Surfaces Using Multiple Camera Views
Figure 13.10 First steps for structure from motion. Matched keypoints on two views of an onion. The scale-invariant feature transform (SIFT) descriptors match. The numbered sequence is the set of next steps for constructing a point cloud for two cameras. Ransom sample consensus (RANSAC) is an iterative approach to modeling data that contains outliers. Images by L. Griffing.
specimens and living specimens. The challenge with living animals is that they might move. To address this, an array of many cameras can capture simultaneous images of the object from several different angles. However, high-quality reconstructions require 30 or more views and therefore 30 or more cameras (Figure 13.13). A stationary object (e.g., the onion in Figures 13.10–13.12) requires only one camera. Using camera shots from multiple points of view is especially powerful for 3D terrain models. Prior to this approach, stereo photogrammetry required a stable platform of a plane or helicopter at a fixed elevation taking pictures at known incidence angles and orientations or expensive LiDAR instrumentation. Now, inexpensive drones can map terrain using structure from motion, which compares favorably with LiDAR (Figure 13.14). However, LiDAR has the advantage of being able to penetrate dense canopies (see Section 6.8). Recent LiDAR mapping of the terrain in Central America has given us a much broader view of the structures produced by the Mayan culture (see Figure 13.1A and B). Structure from motion can also track objects that have identifiable keypoints. The same transformations apply to a stationary camera and moving object, giving not the new camera position Figure 13.11 The multiview stereo matching from with each view but also the new object position with each view. Meshroom of 37 different views of the onion, two of Some of the structure from-motion algorithms provide open-source which are in Figure 13.12. The dense cloud of points in motion capture with consumer-grade cameras. the center represents the onion. Image by L. Griffing.
293
294
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.12 Three-dimensional reconstructed onion in Blender after structure-from-motion processing in Meshroom. (A) Tilted side to show a similar view to the left panel in Figure 13.10. (B) Tilted side to show a similar view to the right panel in Figure 13.10. Images by L. Griffing.
Figure 13.13 A frog surrounded by digital cameras. The Beastcam ARRAY, consisting of 40 cameras, provides images for threedimensional models of animals 7.5–25 cm in length. Photo by Christine Shepard. From Bot, J. and Irschick, D.J. 2019. Using 3D photogrammetry to create open-access models of live animals using open-source 2D and 3D software solutions open source 2D & 3D software solutions. Ed. by J. Grayburn, Z. Lischer-Katz, K. Golubiewski-Davis, and V. Ikeshoji-Orlati. 3D/VR in the Academic Library: Emerging Practices and Trends. CLIR Reports, CLIR Publication No. 176. pp. 54–72.
Figure 13.14 Comparison of surface topography using structure from motion (SfM; A) and light detection and ranging (LiDAR; B). Although there are some differences, they are slight. Fonstad, M.A., et al. 2013 / With permission of John Wiley & Sons.
13.4 Reconstruction of Aligned Images in Fourier Space Produces Three-Dimensional Volumes or Surfaces
13.4 Reconstruction of Aligned Images in Fourier Space Produces Three-Dimensional Volumes or Surfaces MRI acquires lateral (x-y) information in Fourier space. Signals produced by the x- and y-dimension magnetic gradients are spatial frequencies (see Section 15.10). Inverse Fourier transforms of these signals produce the image slice. By acquiring the z-dimensional information in Fourier space as well, the z-dimensional resolution of the system improves, but this process takes a long time (see Section 15.13). However, in this case, an inverse Fourier transform of the aligned images in 3D Fourier space can produce a 3D volume in real space. In TEM, 3D Fourier tomography reconstructs single high-resolution images of particles viewed from unique multiple angles (Figure 13.15). When particles such as protein complexes, ribosomes, or viruses settle as a droplet on an electron microscope grid, they assume random orientations. Negatively staining the particles with a heavy metal salt, such as uranyl acetate, produces a dark background, leaving the particles with negative contrast (see Figure 13.15A). Freezing a suspension of particles in vitreous ice and examining them with cryoelectron microscopy produces images of unstained particles with positive, but faint, contrast (see Figure 13.15B). Autocorrelation and K-means clustering analysis places different views of each particle into different classes representing different angles of orientation (see Section 11.13, Figure 11.31). The images are different 2D projections of a 3D object, so with enough views, back-projection of these 2D views can fill a 3D space, but the 2D views require alignment with each other. Finding the line shared by the Fourier transforms of these different 2D views achieves the alignment. This shared line is the common line of the Fourier transforms. It is the axis around which the 2D Fourier transforms pivot (see Section 19.13). The 3D structure of the 30S subunit of the ribosome at 5-Å (0.5-nm) resolution using cryoelectron microscopy favorably compares with the 3D structure obtained by crystallography (Figure 13.16) (5Å resolution). Another technique in cryoelectron microscopy uses relatively thick sections containing either a semi-thick cryosection or a cryo-preparation of a particle in vitreous ice (see Sections 19.13 and 19.14). It provides multiple views of the particle or cellular structure when rotated on a cryostage with a eucentric goniometer, a rotatable stage with a fixed, centered axis of rotation of a centered sample. The preparations are semi-thin (300–500 nm) sections in vitreous ice or plastic-embedded tissue following high-pressure freezing and freeze substitution. Registration of tilted images provides the information needed to produce a virtual tomographic section thickness of 10 nm or less, producing images like those shown in Figure 13.1D of the Golgi body of animal cells and Figure 13.17 of the trans-Golgi network and multivesicular body in plants. Filtered back-projection (see Section 6.6) in real space can generate tomographic slices (see Section 19.13). The final reconstruction emerges from the contoured manual outlines of the cell organelles (Figure 13.17B) used to create organelle surfaces when stacked in real space.
Figure 13.15 Transmission electron microscopy of the 30S ribosomal subunit of Escherichia coli. (A) Negative stain preparation Scale bar = 50 nm. From Lake, J. A. 1976. Ribosome structure determined by electron microscopy of Escherichia coli small subunits, large subunits, and monomeric ribosomes. Journal of Molecular Biology 105: 131–159. Used with permission. (B) CryoEM image of a vitreous suspension of 30s ribosomal particles. Scale bar = 50 nm. Inset shows Thon rings (see Section 19.8, Figure 19.28). From Jahagirdar, D., Jha V., Basu K., et al. 2020. Alternative conformations and motions adopted by 30S ribosomal subunits visualized by cryo-electron microscopy. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.21.001677. Used with permission. CC BY-NC 4.0.
295
296
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.16 (A) Crystallographic reconstruction of the 30S subunit of a ribosome. From Wimberly, B.T., Brodersen, D., Clemons, W. et al. 2000. Structure of the 30S ribosomal subunit. Nature 407: 327–339. Used with permission. (B) Three-dimensional image from cryoelectron tomography of the 30s ribosome. Both show helix 44 (h44) of the ribosomal RNA. From Jahagirdar, D., Jha V., Basu K., et al. 2020. Alternative conformations and motions adopted by 30S ribosomal subunits visualized by cryo-electron microscopy. bioRxiv preprint doi: https://doi. org/10.1101/2020.03.21.001677. Used with permission.
Figure 13.17 Three-dimensional reconstruction of plant Golgi and multivesicular body membranes using transmission electron microscopy tomographic slices. (A) A single tomographic slice of plant cell cytoplasm. (B) Manually drawn contours follow the boundaries of the trans-Golgi network (green) and the multivesicular endosome (yellow). (C) Partial isosurface render superimposed on the tomographic slice. (D) Isosurface render of a more complete organelle after applying IMOD (image modeling and display) mesh. Scale bars = 50 nm. From Otequi, M.S. 2020. Electron tomography and immunogold labeling of plant cells. Methods in Cell Biology 160. ISSN 0091-679X. https://doi.org/10.1016/ bs.mcb.2020.06.005. Used with permission.
13.5 Surface Rendering Produces Isosurface Polygon Meshes Generated from Contoured Intensities Surface rendering divides the volume up into cubic voxels and makes boundaries through the voxels if the values of voxel vertices in a certain region are below a designated threshold. This produces a surface of constant scalar value in a 3D field called an isosurface. These surfaces are polygons. They combine to make a polygon mesh. Because the vertices of the cubic voxel are used to contour the voxel with a mesh, the polygon meshes are sub-voxel, or within a voxel. Surface rendering renders the mesh. A common approach to deciding how to draw a surface by slicing a voxel with a contour is the marching cubes algorithm. It uses the symmetry of the cubic voxel to determine that there are 15 ways or cases used to slice a cubic voxel (Figure 13.18). Cases 3, 6, 7, 10, 11, and 12 have multiple surfaces where the sidedness of each polygon, either inside or outside the contoured volume, is ambiguous. Unambiguous complementary cases can substitute for them. Generating the 3D object from optical sections using marching cubes is possible with several different programs, including the 3D Viewer in Image J (Figure 13.8B). The 3D Viewer produces isosurface data sets that export to a standard
13.5 Surface Rendering Produces Isosurface Polygon Meshes Generated from Contoured Intensities
Figure 13.18 Marching cubes surface rendering voxel contouring cases. Using symmetry, 256 cases reduce to 15 cases. The isosurface values are less than the values of the dark vertices. Adapted from Schroeder, W., Martin, K., and Lorensen, B. 2018. The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics. Fourth Edition. Kitware Inc., Clifton Park, NY.
set of file formats, such as the .obj/.mtl files (Wavefront open-source file format), which import into Blender. A note of caution, however: the thresholding command in the 3D Viewer that produces an isosurface object from microscopy, CT, or MRI scan may produce multiple unconnected objects that are segmented together and identified as one object when exported to Blender. There is a Blender command to unlink these unconnected isosurfaces (Edit Mode, Mesh>Separate). The Image 3D ImageJ suite (see Section 13.12) also provides segmentation of unconnected 3D objects, as well as iterative segmentation to optimize thresholding of separate objects. These can all be isosurface rendered in the 3D Viewer and exported to Blender as multiple, separate objects. After contouring the voxel, the next step is to render it. Isosurface rendering renders geometric primitives, such as polygons, or, more specifically, triangles, or triangle strips. In surface rendering, the computer converts the geometric primitive into a raster image, a process called rasterization or scan conversion (Figure 13.19). Clipping the polygon occurs at the image edge or by the front- and back-clipping planes of the camera. After positioning, rotating, or scaling the polygon with the appropriate affine transformation, projection into the image plane starts the scan-line processing. Interpolation of color data, transparency data, and z-buffer data (depth data along the direction of propagation from the camera) of the vertices of the polygon along its sides generate the projected polygon. The rows of pixels between the sides of the polygon are spans, which have interpolated values based on the side values. The process continues span by span to completion.
297
298
13 Three-Dimensional Imaging, Modeling, and Analysis
The lighting of the surface contributes to its color and texture. Objects that reflect light back from shiny smooth surfaces have specular reflection (Figure 13.20A). Objects that reflect light back like a ball covered in fabric show diffuse reflection (Figure 13.20B). Diffuse and specular reflection refer to light reflected from a spot source. Reflected light also gives the object color and contrast (Figure 13.20C). Besides object color and its reflections from a colored spotlight, ambient light reflections and color also influence its appearance. The reflection of ambient light is often the same as the diffuse reflection. The lighting and shading of the surfaces give them a 3D effect. In the simplest form of rendering, the angle of the surface normal (the vector perpendicular to the surface plane) relative to the lighting vectors determines lighting. If the surface normal points away from Figure 13.19 Rasterization of a geometric primitive. the light source, the object is dark. If it points toward the light Interpolation of values between the vertices of the source, it is bright. This is flat shading (Figure 13.21A). A smoother polygon, defining the edges of the polygon is followed by rendering style uses vectors perpendicular to the vertex of each interpolation of values across the span of pixels across the polygon in the surface and interpolates color values along and betpolygon. Adapted from Schroeder, W., Martin, K., and Lorensen, B. 2018. The Visualization Toolkit: An Objectween the sides of the polygon. This is Gouraud shading, or Oriented Approach to 3D Graphics. Fourth Edition. Kitware smooth face shading in Blender (Figure 13.21B). Phong shading Inc., Clifton Park, NY. Used with permission. combines diffuse, specular, and ambient light separately for all points on a surface and is available through the POV-ray 3.7 add-on in Blender. Blender has a specific workspace for shading that uses nodes to assign values to the surface and interior of the mesh (Figure 13.22A, lower section). Vertex painting, part of the shading workspace in Blender (Figure 13.22B), can color different parts of a reconstructed object without texture mapping or texture painting (see Section 13.6). This operation assigns different vertices different colors with a paint brush. Shading then gives different colors to different faces. This is very useful for identification and separation of individual bones in micro-CT scans of organisms (Figure 13.22C), a feature that aids in species Figure 13.20 Color and reflection off a sphere. (A) Red sphere with specular reflection of a white spotlight. (B) Red identification. sphere with diffuse reflection of a white spotlight. (C) White Transparency, or its complement, opacity, is also part of the sphere with specular reflection of a red spotlight. Diagram coloring scheme because the colors of objects behind a semiby L. Griffing. transparent polygon will alter its color. When the opacity is 1, the object is completely opaque. When the opacity is 0, it is completely transparent. Opacity values are alpha values in computer graphics (see Section 13.9 for opacity transfer functions), and their calculation for a polygon, like color values, is by interpolation of the polygon edge values. Stacking layers so that their colors and opacity change is alpha compositing. Subdivision (or multiresolution) algorithms can add polygons to make images smoother; adding more polygons to a polygon generated sphere makes it look smoother. Such smoothFigure 13.21 Flat vs. smooth face shading in Blender. ing modifiers are available as subdivision surface modifiers in (A) Flat shading of a mesh sphere. (B) Smooth shading of Blender. Also available in Blender are modifiers that reduce the a mesh sphere. Diagram by L. Griffing. number of polygons, the decimation algorithms. These remove adjacent polygons that show little deformation (the normal vector is about the same). This is particularly good when surface rendering adds polygons between optical sections or slices, producing an aliased, jaggy appearance (see Figure 13.9), such as in cases in which the z-dimension resolution is less than the x-y dimension resolution (micro-CT scanning and standard confocal microscopy). In these cases, decimation smooths the structure but obscures detail. Remesh
13.6 Texture Maps of Object Isosurfaces Are Images or Movies
Figure 13.22 Vertex painting of micro computed tomography (CT) reconstructions. (A) Blender interface for vertex shading the bottom panel shows the connected nodes for instances of vertex painting. The skull is that of the eastern clingfish, Aspasmogaster costata. https://www.morphosource.org/concern/media/000030754. (B) Rendered image from A showing part of one bone vertexpainted in orange the rest vertex-painted in brown. Images by L. Griffing. (C) Dorsal view of a micro-CT reconstructed skull and anterior body of the blind catfish, Xyliphius sofiae. Scale bar = 2 mm. Photo Courtesy of Tiago Carvalho. Used with permission.
Figure 13.23 Quad re-meshing in Blender. (A) Portion of skull of a clingfish reconstruction from micro computed tomography data. The skull data set has about 1.5 million triangular faces. (B) Quad remeshing showing the same region in Blender 2.92. This data set now has about 175,000 quadrilateral faces. Data from Morphosource.org: https://www.morphosource.org/concern/media/000030754. Image by L. Griffing.
algorithms, also called retopo for retopologizing algorithms, rebuild the geometry to give a more uniform topology and can reduce mesh sizes (polygon numbers) considerably. They are great for reconstructions from structure-from-motion reconstructions or CT stacks (Figure 13.23). By reducing the number of polygons, vertex painting becomes easier. Finally, the projection of 3D objects uses intensity depth cueing. Objects farther from the observer appear less bright than nearer objects due to the atmosphere. The attenuation of intensity projected in the final image is a function of the distance from the camera.
13.6 Texture Maps of Object Isosurfaces Are Images or Movies Texture mapping superimposes an image on the surface of the polygon. The image is the texture map. The texture coordinates define where the image or image piece reside on the polygon or mesh object. A texture map with one value assigned to each coordinate is an intensity texture map. Assigning an RGB (red, green, blue) triplet value to each coordinate produces a colored texture map. An example is the use of diffusion tensor images to texture map 3D brain reconstructions of MRI scans (see Figure 13.1C). In addition, each pixel in the texture map can have an opacity or alpha value. This is useful for modeling complicated features such as leaves on a tree. Where the opacity is 0, there are no leaves; where it is 1, there are leaves. In this case, there would be no need to model the leaves with polygon mesh. Movies can also texture map surfaces, animating, for example, the leaves on a tree. Procedural texture mapping calls a procedure to recalculate the texture, also animating the texture. Realistic texture mapping involves UV wrapping. UV stands for the coordinates of the surface, instead of the XYZ coordinates of the model, flattened out in 2D. UV wraps usually correspond to different contours of a 3D object, identified by different convex hulls (see Section 11.4, Figure 11.4). Flattened out, they look like clothing patterns. In Blender, there is an unwrap function that cuts along the edges of the convex hull and flattens out the image (Figure 13.24). These UV wraps
299
300
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.24 The edges of the convex hulls of the turtle, marked in red, are the edges of UV wraps for the different parts of the turtle body. Bot, J., et al. 2019 / Council on Library and Information Resources / CC BY SA 4.0.
can originate from the UV wraps on structure-from-motion meshes made in Meshroom, as illustrated in Figure 13.12. However, auto-generated UV maps are typically small and fragmented, each convex hull being bumps on the object. The smart UV project tool in the UV editor in Blender can re-select convex hulls using different angles to generate larger ones. Also, manual selection of groups of hulls in edit mode is possible using the UV project from view option. A UV wrapper is simply an image. After unwrapping the 3D object, a bake process translates mesh data, such as surface normals, to image data. There is a texture painting option in Blender that provides editing from within the 3D Viewport. However, most image painting programs can edit UV wrappers.
13.7 Ray Tracing Follows a Ray of Light Backward from the Eye or Camera to Its Source Ray tracing (or light path tracing) calculates the light reflected or transmitted by all of the visible objects in the scene (Figure 13.25). The key word here is visible. If there is an occluded object or region, it does not enter into the calculations. There are four ray types: direct from camera, reflection, transmission, and shadow. Setting the number of maximum bounces a ray can experience determines the realism of the scene. With a high number of bounces, the image is photorealistic, more so than isosurface rendering. Ray tracing is a form of image-order volume rendering, but it is not necessarily a raster. It can build up the image, for example, from the center outward. The computer technology for ray tracing is a spin-off of entertainment technology. Ray tracing is a common approach for reconstructing 3D volumes using CT scans and MRI. Several ray-tracing algorithms are available in FIJI (ImageJ) (Figure 13.26), as well as in Blender using the Cycles render engine. The number of calculations required for any given ray can be large if there are large numbers of bounces, so ray-traced render engines speed up the process using CUDA-enabled graphics processing units.
13.8 Ray Tracing Shows the Object Based on Internal Intensities or Nearness to the Camera Plotting the intensity sampling along a line through a volume reveals maxima and average values (see Figure 13.26) and, using these values, creates different image projections. The simple ray-tracing interface in ImageJ (see Figure 13.26C) and the 3D Volume Viewer plugin in FIJI have options for an average or mean value projection (see Figure 13.26D), a maximum intensity projection (MIP) (also called brightest point; see Figure 13.26E) or a nearest point
Figure 13.25 Ray tracing a scene. All rays travel in straight lines. Camera rays come from the camera. Shadow rays are transparent and come from a source. Reflection rays reflect off surfaces. Transmission rays transmit through a surface. Rays that do not trace back to the camera do not contribute to the scene. Light path nodes in Blender identify the ray types. Diagram by L. Griffing.
13.9 Transfer Functions Discriminate Objects in Ray-Traced Three Dimensions
Figure 13.26 Ray tracing for three-dimensional projection in ImageJ. (A) Optical section 30 of the array of optical sections in Figure 13.10 with a magenta “ray” traversing it. (B) Gray scale values along the magenta ray in (A), with the brightest point, nearest point, and mean value marked. (C) Three-dimensional projection dialog box in ImageJ. (D) Mean value projection. (E) Brightest point or maximum intensity projection, (F) Nearest point projection. Images and diagram by L. Griffing.
projection (see Figure 13.26F). The MIP is the common projection. Its quick calculation is from the maximum of the derivative of the ray-cast line. The simplest way to shoot a ray through a multi-voxel volume is as a line traversing the voxels in the volume (Figure 13.27). Using one of a variety of interpolation methods (see Section 11.3) to calculate the light intensity and color within a voxel, the line samples the intersected constant-value voxels at a certain step size. If the step size is too great, the volume has a banded appearance. Smaller step sizes provide better rendering. Another approach is to consider the line as a set of connected discrete voxels in the volume. Connections can occur through faces as a 6 value-connected line; through edges or faces as an 18 value-connected line; or through edges, faces, or points as a 26 value-connected line (see Figure 13.27). A set of 6-connected voxels samples more of the volume. Selecting voxel values rather than traversal ray values may obtain better accuracy, identifying, for example, the exact location where a ray first encounters a certain value
13.9 Transfer Functions Discriminate Objects in Ray-Traced Three Dimensions Different opacities or alpha values, colors, or intensities of objects allow visualization of separate overlapping objects. Changing these values uses transfer functions. (Chapter 8 discusses optical transfer functions [OTFs] and modulation transfer functions [MTFs] as measures of image fidelity.) An opacity transfer function or an alpha transfer function in the 3D Viewer in ImageJ can make objects more visible by increasing their opacity. For multiple channel images, a brightness
301
302
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.27 Ray traversal of a volume. (A) Traversal by a ray sampled at uniform distances (dots). (B) Sampling a ray with faceconnected voxels. (C) Sampling a ray with edge-connected voxels. (D) Sampling a ray with vertex-connected voxels. Diagram by L. Griffing.
Figure 13.28 Ray tracing three-dimensional projections of the endoplasmic reticulum network of a tobacco cell filament from a suspension culture. (A) Opacity gradient transfer function set to maximum where the gradient is highest and colored green. (B) The same data set has new opacity gradient transfer function at 30% and colored orange, while the interior shows a maximum intensity projection, with different intensities showing colors according to the color chart. The lines between (A) and (B) identify the same regions of the cells in the two views. Scale bar = 10 µm. Images by L. Griffing.
transfer function in, for example, the green brightens that channel in relation to the red. This serves the same function as the intensity transfer graph (see Section 2.8), only in 3D. Thresholding objects based on intensity, color, or material transfer functions can visually segment 3D volumes (see Section 13.12). There are atlases of 3D brain anatomy in which specific anatomical features reside at a position within the brain volume with a certain probability. Probabilistic in vivo atlases of brain MRI data can specify transfer functions to these probable positions, identifying these specific features by coloring them or increasing their opacity. These can aid identification of regions that show increased blood flow during functional MRI (see Section 15.17) or MRI angiography (see Section 15.15). Opacity gradient transfer functions, which assign a high opacity to regions where the intensity gradient is greatest, can produce surfaces in ray-cast volumes. These are particularly useful in 3D rendering membranes using ray casting in cells (Figure 13.28). VTK has algorithms for opacity gradient transfer functions, but they are not currently implemented by default in programs that provide some of VTK, such as Icy.
13.11 Volumes Rendered with Splats and Texture Maps Provide Realistic Object-Ordered Reconstructions
13.10 Four Dimensions, a Time Series of Three-Dimensional Volumes, Can Use Either Ray-Traced or Isosurface Rendering The 3D Viewer in FIJI provides 4D visualization, projections of the entire volume changing over time, as in the five-dimensional (5D) volume of the cell undergoing mitosis sample in ImageJ (File>Open Samples>Mitosis 5D Stack). As mentioned in Section 13.2, a limitation to this approach is that the capture speed of the entire volume should be much faster than the motions within that volume, so that internal movement does not appreciably occur during acquisition from the first and to the last optical sections. For light microscopy, spinning-disk confocal microscopy or laser-scanning confocal microscopy with resonant scanning (see Section 17.8) can acquire volumes at 10–30 frames per second (Figure 13.29). Light-sheet imaging (see Section 18.4) can capture volumes even faster (60–100 frames per second). It would take light sheet microscopy about half a second to acquire a 10-µm volume requiring 30 optical sections while exposing the sample to a lower overall photon dose. For more on time-limited acquisition, see Sections 12.1 and 18.2. For surface-rendered objects, mesh interpolation can smooth the 4D animation by interpolating volumes between the captured volumes, making playback without judder (see Section 12.2). This is like keyframing only with mesh transformation rather than frame transformation. VTK has a polydata interpolation algorithm (see Figure 13.29). For ray-traced objects, interpolating changes in intensity, color, and opacity provides intermediate volumes between the captured volumes. Figure 13.31 shows both interpolated ray-traced volumes and interpolated isosurface meshes (insets). In Blender, both the Eevee and the Cycles rendering engines handle animation. Object animation uses keyframes (see Section 12.2). Rigging organism models with an armature of bones or joints (Figure 13.30) produces walk or swim cycles that can reproduce realistic motions. Furthermore, cameras can track along certain paths in the animated 4D data set. Shape keys in Blender animate the meshes with interpolation, but unlike VTK, require the same number of polygons in the starting and ending meshes. The strength of the Blender animation tools is that they are programmable and can handle multiple objects with movement constraints (e.g., the parent–child relationship in Figure 13.9 and simulations of the physical environment).
Figure 13.29 Interpolation of 10 volumes between two volumes taken 8.57 seconds apart using confocal microscopy with a resonant scanner (see Section 18.11). The images show volumes of endoplasmic reticulum labeled with luminal green fluorescent protein in bright yellow 2 (BY2) tobacco suspension culture cells. Each of the two volumes has 60 optical sections, 0.14 seconds and 100 nm per section. Each image has a low-magnification reconstruction using a ray-cast opacity gradient transfer function to show the membranes (cyan) against a dark blue background and an inset of a region, 5 µm square (white square in the 0 second panel), showing interpolated isosurface mesh (inset) outlining the membranes. Two tubules fuse (arrowhead in inset) at the interpolated 5.45-s time in a manner consistent with natural fusion events. Algorithms in Visualization Toolkit generated the interpolated volumes. Images by L. Griffing.
13.11 Volumes Rendered with Splats and Texture Maps Provide Realistic Object-Ordered Reconstructions Ray casting gives a more photorealistic image than surface rendering. However, ray-cast, high-resolving-power images require many rays and many ray bounces, increasing the computation time. An object-ordered approach that produces high-quality photorealistic images with splats and texture maps uses less computation. Consider a 3D object that is a series of planes or sheets perpendicular to the viewing plane normal, each having a texture map (see Figure 13.31). Each sheet is an object, and reading out the voxels in each sheet generates an object-rendered image. The voxels in the sheet project their values onto the rendered viewing screen one footprint, or splat, at a time. A kernel (e.g., a Gaussian kernel, or array of
303
304
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.30 An armature attached to vertices in the mesh three-dimensional model of a sea turtle controls mesh deformation to provide motion of the swim cycle. From Bot, J. and Irschick, D.J. 2019. Using 3D photogrammetry to create open-access models of live animals using open source 2D and 3D software solutions open source 2D & 3D software solutions. Ed. by J. Grayburn, Z. Lischer-Katz, K. Golubiewski-Davis, and V. Ikeshoji-Orlati. 3D/VR in the Academic Library: Emerging Practices and Trends. CLIR Reports, CLIR Publication No. 176. pp. 54–72.
values of a specified radius; see Section 10.4, Figure 10.13) surrounds each voxel data sample. Interpolation algorithms produce the sheet values from the texture map values. Integrating the kernel along the viewing direction produces a 2D splat table. The splat is the projected contribution of the data sample blended (alpha composited) onto the rendered viewing screen. A main advantage to using texture mapping of spaced planes in the sample is that it can use the memory and speed of processors in a GPU graphics card for rendering. A simple example of the use of splat-based texture maps is the rendering of the physical environment in computer games. But the use of these for scientific imaging is much more profound. Photorealistic rendering of objects using splats can also take advantage of transfer functions, just like ray tracing. After establishing transfer functions of the splat-based texture maps approximating the contours of an object, minimization functions can further specify contours independently of the user. However, the user-identified contour needs to be close to the optimal one. These are active contours, which can also identify clusters of objects (see Section 11.11). In MRI, these contours and shapes segment regions of suspected pathology, while in light microscopy, they can identify nuclear contours, even if the cell is undergoing division. Active contouring is a form of 3D segmentation, part of the analysis of volumes.
Figure 13.31 Volume rendering with splats. (A) A kernel of defined radius, in this case a Gaussian kernel, operates on the vertices of a group of voxels. The texture maps of the voxels are sheets that when operated on by the kernel, project a footprint or splat on the viewing screen. The region of the kernel between sheets is a slab. The slabs explicitly make the splats. If the sheets align with the voxel faces, then two-dimensional interpolation works for the view. (B) If the sheets normal to the viewing direction do not align with the voxel faces, threedimensional (3D) interpolation (e.g., trilinear interpolation) produces the data samples operated on by the kernel. Each sheet reads out in an object-ordered sequence. The sheet buffers combine separately, each with the compositing buffer using a transfer function, such as the maximum intensity projection or opacity transfer function, to make the final splats that are the projected 3D screen image. Diagram by L. Griffing.
13.12 Analysis of Three-Dimensional Volumes Uses the Same Approaches as Two-Dimensional Area Analysis
13.12 Analysis of Three-Dimensional Volumes Uses the Same Approaches as Two-Dimensional Area Analysis But Includes Voxel Adjacency and Connectivity As with other analysis tools, this discussion will focus on available plugins to ImageJ in FIJI. The 3D ImageJ Suite provides 3D segmentation tools in FIJI, including, for example, the 3D object counter. The 3D ImageJ Suite requires the Java3D library or the Imagescience library installed with FIJI. The underlying principles behind segmentation of 3D objects are first, grouping voxels based on connectivity and then, splitting those groups based on intensity or other correlated features. Connectivity can vary depending on the voxel neighborhood. Voxels connect to others three ways (see Figure 13.27). Connectivity and the shape of the structuring element (or seed) form the basis for region growing (dilation) and shrinking (erosion) and other morphometric operations, analogous to 2D morphometrics (see Section 7.7). These operations are available with the 3D mathematical morphology tools in FIJI. The 3D mereotopology tools in FIJI provide analysis of the relationships between 3D objects based on parts (mereology) and boundaries. Digital mereotopology has functions that are equivalent to mathematical morphology and uses mathematical morphology to identify some cases of adjacency, such as the neighborhood connection, whereby one object is one dilation operation from another (Figure 13.32). Disconnected objects are more than one dilation operFigure 13.32 Mereotopological classification of the ation from each other. The mereotopology tools distinguish partial three-dimensional relationship between objects. Note overlap from external connections and when some internal object that disconnected objects remain separate after one touches the edge, making it a tangential proper part. dilation operation (see Section 7.8). Connected objects When considering unconnected, but touching or overlapping, 3D connect on their surface, externally connect, partially overlap, or occupy equal spaces, or if occupying the same structures, it is important to distinguish between shape and topology. space, they connect to the edge as a tangential proper Shape refers to the object properties that change by deformation but part or don’t connect to an edge as a non-tangential are independent of rotation, translation, or scaling (see Section 11.4). proper part. Adapted from Landini G et al. 2019. Topology, on the other hand, doesn’t change with deformation. An object with a hole in it, for example, is different topologically from an object without one. No matter how the holey object is deformed, it still has a hole. In 3D, objects have tunnels instead of holes. Enclosed spaces are cavities. Volume estimation is a calculation of the number of contoured voxels and their volume. The 3D Object Counter calculates both the volume and gives the number of voxels in each object. Likewise, it provides a surface area measurement of the selected objects and the number of voxels that the surfaces occupy (Figure 13.33).
Figure 13.33 Three-dimensional (3D) object counter in ImageJ. (A) Red channel from Organ of Corti sample image. (B) Nuclei in red channel, median filter. (B) Segmented objects after application of 3D object counter are different colors and numbered. (C) Example output from 3D object counter for red channel. (D) Nerve fibers in the green channel (median filter used to suppress noise) of the Organ of Corti sample image. (E) Segmented objects. Blue fibers are one object. (F) Example output for green channel. The values save as a .csv file. (G) Options for measurement in in the 3D object counter. Figure by L. Griffing.
305
306
13 Three-Dimensional Imaging, Modeling, and Analysis
Some 3D measures are direct extensions of 2D analysis. The minimum enclosing rectangle in 2D (see Section 7.10, Figure 7.27) extends to the bounding box of a 3D object. The bounding box option in 3D object counter (see Figure 13.33G) is the difference between the minimum and maximum coordinates in the x, y, and z directions of an object, just like the bounding rectangle is the difference between the maximum and minimum coordinates in the x and y directions. Likewise, the center of mass and the centroid (same shape with constant mass per volume) in three dimensions extends the 2D approach, with the x, y, and z coordinates of the center of mass being determined by the sum of the masses in each direction divided by the sum of the total mass. Besides the 3D object counter, these measurements are available in the ROI (region of interest) Manager 3D measurement options (Figure 13.34). The advantage of this tool is that it allows choosing specific single segmented objects for analysis. Several 3D plugins produce Euclidean distance maps (see Section 7.8) from the surface or centroid of one object to the surface or centroid of another. That implemented in the ROI Manager 3D (Figure 13.35) measures center-to-center
Figure 13.34 Measure three-dimensional (3D) in 3D ROI (region of interest) Manager. (A) FIJI Menu bar. (B) Green channel of an organ of Corti sample image, median filter. (C) Segmented objects after 3D segmentation using the ROI Manager. To enter the objects into the ROI Manager, click the Add Image option. Clicking Add Image also adds objects from images segmented by the 3D object counter. The object intensities reflect their object number and lower intensities are hard to see. Therefore, this image uses the Glasbey on dark lookup table. (D) The ROI Manager 3D dialog box. Selecting individual objects in the list (multiple objects selected with Ctrl-click) produces a subset of the objects for measurement. (E) 3D measurement dialog box from four different objects selected in ROI Manager 3D. The results save as a .csv file. (F) Measurement options for measuring objects using the Measure 3D command in the ROI Manager 3D dialog box. Note that the bounding box option combines the bounding boxes in 2D in (C). Figure by L. Griffing.
Figure 13.35 Measuring distance and intensity in three-dimensional (3D) ROI (region of interest) Manager. (A) FIJI Menu bar. (B) Green channel of an organ of Corti sample image, median filter. (C) Two segmented objects after 3D segmentation and image adding using the ROI Manager. (D) The ROI Manager 3D dialog box. Ctrl-clicking on two objects, one with a value 1 and one with a value 41, selects them. (E) Clicking on the Distance button in the ROI Manager produces distances between the centers and borders of the two objects. (F) Clicking on the Quantif 3D button in the ROI Manager 3D dialog box produces measures of the min and max intensities of the two objects. (G) By clicking on the 3D Viewer button in the ROI Manager 3D dialog box, the objects appear separately in the 3D Viewer panel. (H) Log file showing record of number of objects and threshold values. Figure by L. Griffing.
13.13 Head-Mounted Displays and Holograms Achieve an Immersive Three-Dimensional Experience
(cen-cen) distances of two chosen objects, or the center-to-border (cen-bor) and border-to-border (bor-bor) distances. The ROI Manager 3D also provides views of the chosen objects in the 3D viewer as surface renderings (see Figure 13.35G). Coloring the individual objects separately distinguishes them and provides a way to bring complex multiple objects into other programs such as Blender using a single File>Export command. Probably the most basic element of 3D analysis is being able to provide scale bars for the objects. Because they are 3D, the scale bars need to be in 3D as well, as scale cubes, spheres, or 3D axes (see Figure 13.8B). There are tools in the 3D ImageJ suite to draw 3D lines and objects of known length, width, or height (Plugins>3D>3D Draw Shape). The measurement tool in Blender (see Figure 13.3) provides manual measurements of objects using line rulers as well as for scale objects. A useful recent feature of Blender, MeasureIt, provides vertex-to-vertex measurement and object-to-object Euclidean distance, either to vertices or object origins. As with most image processing and analysis tasks, recording the process of image creation and analysis is necessary. Timed 4D dynamics generates a huge amount of data that need processing (e.g., contrast enhancement, alignment, deconvolution, and segmentation measuring the volumes of interest; see Section 12.6, Figure 12.19). This requires extensive data management developed through an imaging pipeline. In ImageJ, the macro utility records the steps involved in a routine processing. Jupyter notebooks, the KNIME analytics platform, and Icy protocols are node-based data recording and management tools that are useful for large – and small – data sets.
13.13 Head-Mounted Displays and Holograms Achieve an Immersive Three-Dimensional Experience 3D sensing by the eye uses parallax to provide depth perception (see Sections 4.7 and 4.8). Without parallax, with only one eye (or camera), 3D sensing comes from monocular cues like size, with farther objects appearing smaller, and occlusion, with nearer objects occluding or hiding objects behind them. Anaglyphs, composed of two superimposed images of different color or polarization taken at the angle of parallax, reproduce 3D when viewed with special glasses (see Section 4.8, Figure 4.16). Another way to generate anaglyphs but without the glasses is the lenticular display, where a ridged lens over the image provides separate scenes depending on the angle of view. Instead of using superimposed images, a binocular display projects separate views to each eye at the angle of parallax with a split screen. Each 3D view is a set of two images (see Section 4.8, Figure 4.15). When parallax-adjusted views integrate with the motion sensors in a cell phone (e.g., Google Cardboard) or head-mounted displays (HMD; e.g., Oculus VR, or HTC Vive VR headsets), the motion sensors can change the camera yaw, pitch, and roll (see Figure 13.6). External settings provide the elevation and side-to-side azimuth of the camera as well as movement in the direction of projection. Separating the views into windows and moving the camera produces motion parallax, the basis for structure from motion (see Section 13.3), which makes the motion appear faster when the object is closer, even though it is moving at a constant speed. The camera changes make it appear as if the viewer is in the 3D space, generating virtual reality (VR) views. Incorporating the physics of object movement and collision detection produces VR, particularly when it includes integrated touch or sound effects, haptics. Blender provides a stereoscopy output feature that produces stereo images in the output camera as anaglyphs, interlaced images for 3D TV, time-sequential images, and side-by-side images for binocular display. Blender also supports HMD VR, primarily for OculusVR and SteamVR using the OpenXR standard. A VR interface is employed by v-Lume, a software package designed to bring VR to superresolution microscopy (see Section 18.9, Figure 18.35). It has a VR interface for doing the processing. To project 3D images into a space requires holography, viewing the interference pattern produced by collimated and coherent (laser) light under illumination with the same coherent light source (see Section 4.8). One design of virtual holography reproduces a 3D scene from a flat, 2D spatial light modulator (SLM) illuminated with lasers, reconstructing the wave field in 3D (Figure 13.36). The reconstruction projects to the entrance pupil of the eye. Commercial SLMs are 1.5 cm2 or less and have a viewing angle of less than 10 degrees (e.g., the HOLOEYE 10 megapixel). Because the viewing angle is limited (see Figure 13.36A), some implementations only have a part of the hologram, a subhologram, active at any one time, controlled by eye-tracking algorithms. A more recent implementation uses a non-periodic set of pinholes, a photon sieve, after the spatial light modulator, which diffracts the light to achieve a wider viewing angle (see Figure 13.36B). With its superior capabilities, real-time holography is the ideal 3D technique; however, it is not yet widely available.
307
308
13 Three-Dimensional Imaging, Modeling, and Analysis
Figure 13.36 Hologram imaging. (A) A lensless holographic display that projects three-dimensional (3D) objects in the shaded region. However, the small diffraction angle of the spatial light modulator (SLM) limits the viewing zone. (B) If a set of non-periodic pinholes diffract light through a larger angle, it increases the viewing zone but decreases the shaded region with projected 3D objects. From Park, Y., Lee, K., and Park, Y. 2019. Ultrathin wideangle large-area digital 3D holographic display using a non-periodic photon sieve. Nature Communications 10: 1304 https://doi.org/10.1038/ s41467-019-09126-9. Creative Commons license.
Annotated Images, Video, Web Sites, and References 13.1 Three-Dimensional Worlds Are Scalable and Require Both Camera and Actor Views A foundational treatment of 3D imaging is in Schroeder, W., Martin, K., and Lorensen, B. 2018. The Visualization Toolkit: An Object-Oriented Approach to 3D Graphics. Fourth Edition. Kitware Inc., Clifton Park, NY. This book is written by some of the pioneers of 3D graphics. Another online extensive introduction is in Engel, K. Hadwiger, M., Kniss, J.M., et al. 2006. Real Time Volume Graphics. A.K. Peters, Wellesley, MA. Also consult the book The Visualization Handbook, 2005. Ed. by Hansen, C.D. and Johnson, C.R. Elsevier ButterworthHeinemann, Oxford, UK. This book is at the graduate level. The chapter on volume visualization applies here: Kaufman, A.E. and Mueller, K. Overview of volume rendering. Open-source software with 3D features or VR used in this book: Pymol. https://pymol.org/2 Jmol.http://jmol.sourceforge.net ImageJ. https://imagej.nih.gov/ij/download.html Fiji. https://fiji.sc ICY. http://icy.bioimageanalysis.org/download IMOD. https://bio3d.colorado.edu/imod Meshroom. https://alicevision.org/#meshroom Blender. https://www.blender.org/download VTK. https://vtk.org/download. This also has the VTK textbook. v-Lume, basic version. https://github.com/lumevr/vLume/releases. VR for superresolution microscopy. Julia. https://julialang.org/downloads, MeshCat.jl, https://github.com/rdeits/MeshCat.jl Jupyter notebooks. https://jupyter.org/install Knime analytics platform. https://www.knime.com/downloads Commercial 3D imaging: AutoCad. https://www.autodesk.com/products/autocad/overview?term=1-YEAR MATLAB/Simulink. https://www.mathworks.com/products/simulink.html Commercial microscopy software: Amira. https://www.thermofisher.com/us/en/home/industrial/electron-microscopy/electron-microscopy-instrumentsworkflow-solutions/3d-visualization-analysis-software/amira-life-sciences-biomedical.html MetaMorph. https://www.moleculardevices.com/products/cellular-imaging-systems/acquisition-and-analysis-software/ metamorph-microscopy#gref
Annotated Images, Video, Web Sites, and References
NIS-Elements. https://www.microscope.healthcare.nikon.com/products/software v-LUME-subscription, https://www.lumevr.com. VR for superresolution microscopy. Commercial CT-scanning software: VG Studio Max. https://www.volumegraphics.com/en/products/vgstudio-max.html Ikeda’s TomoShop. https://www.ikeda-shoponline.com/engctsoft Commercial MRI and CT scan software: OsiriX MD. https://www.osirix-viewer.com
13.2 Stacking Multiple Adjacent Slices Can Produce a Three-Dimensional Volume or Surface High-speed reconstruction from confocal data sets can employ the light field confocal. Zhang, Z., Bai, L., Cong, L., et al. 2021. Imaging volumetric dynamics at high speed in mouse and zebrafish brains with confocal light field microscopy. Nature Biotechnology 39: 74–83. Section 6.15 discusses light field cameras. A review of light field microscopy is in Levoy, M., Zhang, Z., and McDowall, I. 2009. Recording and controlling the 4D light field in a microscope using microlens arrays. Journal of Microscopy 235: 144–162. Miranda, K. 2015. Three-dimensional reconstruction by electron microscopy in the life sciences: an introduction for cell and tissue biologists. Molecular Reproduction & Development 82: 530–547. This article covers 3D from both TEM and SEM modalities. Chapter 20 has additional coverage. An approach to registering brain slices regardless of the angle of slicing is in Agarwal, N. 2018. Geometry processing of conventionally produced mouse brain slice images. Journal of Neuroscience Methods 306: 45–56. Section 11.8 on deconvolution discusses axial dimension deconvolution. The “missing cone” of low frequencies in the axial OTF compromises reconstruction. The difference in z-dimensional treatment of the OTF from that of x-y dimension is a problem for accurate 3D reconstruction.
13.3 Structure-from-Motion Algorithms Reconstruct Three-Dimensional Surfaces Using Multiple Camera Views One of the most informative approaches to structure from motion is the interface for Meshroom. It requires a compute unified device architecture (CUDA)–enabled GPU. However, it goes through the different software routines stepwise, showing which steps are successful. It provides alternatives to SIFT, such as SURF, and shows whether keypoint matching is successful. Consequently, it is useful for screening multiple views to see if they will likely be successful for multi-view stereo, even though the computer may have not a CUDA-enabled GPU and not be able to generate the final mesh. Open-source software is available for several of the separate steps for structure from motion such as some software that just implements SIFT, some that bundles, some that does multi-view stereo (e.g., VisualSFM; http://ccwu.me/vsfm/index. html), MeshLab (http://www.meshlab.net), Bundler (http://www.cs.cornell.edu/~snavely/bundler), and OpenMVS (http://cdcseacave.github.io/openMVS). Meshroom conveniently combines these. A review of wildlife photogrammetry for sea turtles using structure from motion is in Irschick, D.J., Bot, J., Brooks, A., et al. 2020. Creating 3D models of several sea turtle species as digital voucher specimens. Herpetological Review 5: 709–715. A general how-to discussion is in Bot, J., and Irschick, D. J. 2019. Using 3D photogrammetry to create open-access models of live animals using open-source 2D and 3D software solutions. Ed. by J. Grayburn, Z. Lischer-Katz, K. GolubiewskiDavis, and V. Ikeshoji-Orlati. 3D/VR in the Academic Library: Emerging Practices and Trends. CLIR Reports, CLIR Publication No. 176. pp. 54–72. Some structure-from-motion algorithms help bundle the views for a motion capture approach. Jackson, B.E., Evangelista, D.J., Ray, D.D., et al. 2016. 3D for the people: multi-camera motion capture in the field with consumer-grade cameras and open-source software. Biology Open 5: 1334–1342. doi:10.1242/bio.018713.
309
310
13 Three-Dimensional Imaging, Modeling, and Analysis
13.4 Reconstruction of Aligned Images in Fourier Space Produces Three-Dimensional Volumes or Surfaces Lake, J.A. 1976. Ribosome structure determined by electron microscopy of Escherichia coli small subunits, large subunits and monomeric ribosomes. Journal of Molecular Biology 105: 131–159. The 30s ribosome from E. coli has been a challenge for single-particle analysis throughout its history. A review of the 30s particle reconstruction is Razi, A. Britton, R.A., and Ortega, J. 2017. The impact of recent improvements in cryo-electron microscopy technology on the understanding of bacterial ribosome assembly. Nucleic Acids Research 45: 1027–1040. For EM tomography, see the review by McIntosh, R. Nicastro, D., and Mastronarde, D. 2005. New views of cells in 3D: an introduction to electron tomography. Trends in Cell Biology 15: 43–51. Also, this is an excellent more recent review: Ercius, P. 2015. Electron tomography: a three-dimensional analytic tool for hard and soft materials research. Advanced Materials 27: 5638–5663. For EM tomography of plant cells, see Otegui, M.S. 2020. Electron tomography and immunogold labeling of plant cells. Methods in Cell Biology 160. ISSN 0091-679X. https://doi.org/10.1016/bs.mcb.2020.06.005.
13.5 Surface Rendering Produces Isosurface Polygon Meshes Generated from Contoured Intensities Besides marching cubes, which is the historical standard for isosurface rendering, there are other algorithms that render isosurface volumes, such as marching tetrahedra, surface nets, and dual contouring. There is a simple tutorial on the difference between marching cubes and dual contouring at https://wordsandbuttons.online/interactive_explanation_of_ marching_cubes_and_dual_contouring.html. Besides Wavefront file formats (.obj/mtl), there are also Autocad (.dxf), stereolithography (.stl binary and .stl ASCII), and Universal 3D (.u3d – can be embedded in .pdf document) formats for isosurfaces exported from the ImageJ 3D Viewer plugin. Blender can import Wavefront and stereolithography formats. Other 3D formats importable by Blender include those that provide mesh animation and physics, such as the Universal Scene Description (.usd) file, originally from Pixar; the Collada digital asset exchange (.dae) file format that is a version of XML; the Alembic file format (.abc), which is an open computer graphics interchange network; and the Autodesk filmbox (.fbx) file. There is a wide variety of open micro-CT volumes of animals on morphosource.org. It requires registration and permission to download data sets. The example skull is that of the eastern clingfish, Aspasmogaster costata. https://www.morphosource. org/concern/media/000030754. The use of “automated” vertex painting to identify separate bones in micro-CT scans is in Carvalho, T.P, Reis, R.E., and Sabaj, M.H. 2017. Description of a new blind and rare species of Xyliphius (siluriformes: Aspredinidae) from the Amazon Basin using high-resolution computed tomography. Copeia 105: 14–28.
13.6 Texture Maps of Object Isosurfaces Are Images or Movies For an excellent tutorial on retrieving texture maps from Meshroom in Blender, see https://www.youtube.com/watch?v=L_ SdlR57NtU by Micro Singularity. A standard approach to painting texture maps is to manually unwrap an object by marking a seam in the retopologized structure, unwrapping the area of the object, and using the bake process to translate mesh data to texture images. Mesh data include, for example, ambient occlusion and surface normals. The image editor in Blender can add base colors and, for example, roughness and specular colors. Also, external image painting programs, such as the open-source Krita (https:// krita.org), can paint on texture projections from Blender.
13.7 Ray Tracing Follows a Ray of Light Backward from the Eye or Camera to Its Source More information on the ray types in ray tracing and specifying the number of bounces in each ray is in the Blender manual at https://docs.blender.org/manual/en/latest/render/cycles/render_settings/light_paths.htm.
Annotated Images, Video, Web Sites, and References
13.8 Ray Tracing Shows the Object Based on Internal Intensities or Nearness to the Camera The VTK book Schroeder, W., Martin, K., and Lorensen, B. 2018. The Visualization Toolkit: An Object-oriented Approach to 3D Graphics. Fourth Edition. Kitware Inc., Clifton Park, NY discusses ray traversal and ray tracing in Chapter 7 on imageorder volume rendering.
13.9 Transfer Functions Discriminate Objects in Ray-Traced Three Dimensions For Blender, the transfer functions are not called transfer functions. Instead, they are volume attributes that are adjusted with ramps (color ramps that have either color or alpha output) using shader nodes. Atlases that use probabilistic transfer functions classify regions within a volume. An AI treatment of this is Soundararajan, K.P. and Schultz, T. 2015 Learning probabilistic transfer functions: a comparative study of classifiers. Comparative Graphics Forum. Eurographics Conference on Visualization (EuroVis). Ed. by H. Carr, K.-L. Ma, and G. Santucci. Volume 34.
13.10 Four Dimensions, a Time Series of Three-Dimensional Volumes, Can Use Either Ray-Traced or Isosurface Rendering Image 13.29 used routines in an older version of VTK (VTK 5.0). These classes still exist in somewhat modified form in VTK 9.0. To interpolate volumes, the vtkInterpolateDataAttribute object works. Opacity transfer function objects include vtkOpenGLVolumeMaskGradientOpacityTransferFunction2D and vtkOpenGLVolumeGradientOpacity Table. Some of these algorithms are now in Paraview and ITK, which are also products of kitware.com.
13.11 Volumes Rendered with Splats and Texture Maps Provide Realistic Object-Ordered Reconstructions with Low Computation Overhead Biomedical applications of layered volume splatting, which is a variation on the splatting described here, is in Schlegel, P. and Pajarola, R. 2009. Layered volume splatting. Ed. by G. Bebis G., R. Boyle, B, Parvin, et al. Advances in Visual Computing. ISVC 2009. Lecture Notes in Computer Science, vol 5876. Springer, Berlin. https://doi.org/10.1007/978-3-642-10520-3_1.
13.12 Analysis of Three-Dimensional Volumes Uses the Same Approaches as Two-Dimensional Area Analysis But Includes Voxel Adjacency and Connectivity A tutorial for the 3D object counter by T. Boudier is at https://imagejdocu.tudor.lu/_media/plugin/analysis/3d_object_ counter/3d-oc.pdf. A tutorial for using the ImageJ 3D suite by T. Boudier is at https://imagejdocu.tudor.lu/_media/tutorial/working/ workshop3d.pdf. These examples are dated but are updated in the figures presented here. Both use the organ of Corti images available through ImageJ (File>Open Samples>Organ of Corti [4D stack]). The reference for Boudier’s algorithms is in Ollion, J., Cochennec, J., Loll, F., Escudé, C., and Boudier, T. 2013. TANGO: a generic tool for high-throughput 3D image analysis for studying nuclear organization. Bioinformatics 29(14): 1840–1841. For a sample workflow for time-lapse changes during growth, see Wuyts, N., Palauqui, J.C., Conejero, G., et al. 2010. High contrast three-dimensional imaging of the Arabidopsis leaf enables the analysis of cell dimensions in the epidermis and mesophyll. Plant Methods 6: 17 http://www.plantmethods.com/content/6/1/17.
13.13 Head-Mounted Displays and Holograms Achieve an Immersive Three-Dimensional Experience See Real Technologies (https://www.seereal.com) produces a virtual holographic display. See Reichelt, S., Häussler, R., Fütterer G., and Leister, N. 2010. Depth cues in human visual perception and their realization in 3D displays. Ed. by B. Javidi and J.-Y. Son. Proceedings of SPIE 7690, 76900B, Three‐Dimensional Imaging, Visualization, and Display. doi: 10.1117/12.850094.
311
313
Section 3 Image Modalities
315
14 Ultrasound Imaging 14.1 Ultrasonography Is a Cheap, High-Resolution, Deep-Penetration, Non-invasive Imaging Modality X-ray scanning technologies (see Sections 6.6 and 6.7) are common standard clinical imaging modalities, and when accompanied by computer-aided tomography and three-dimensional (3D) modeling (see Section 13.5), they provide non-invasive diagnosis of calcified tissues, bones, and tissues containing X-ray–absorbing contrast agents (e.g., barium, iodine). However, they use ionizing radiation, which can be harmful. A less harmful non-invasive alternative, magnetic resonance imaging (MRI), is useful for evaluating many of the soft organ tissues that cannot be detected by hard X-rays (>15 kV). High-field MRI employs strong magnetic fields, which are not harmful but are very expensive to produce in large, expensive machines (see Section 15.1). This chapter introduces a non-electromagnetic image acquisition system, ultrasonography. Ultrasonography uses a small, portable machine to image soft tissues. It is useful not only for imaging but also for treatment of, for example, connective tissue problems and kidney stones. More sophisticated, recent approaches use ultrasound to specifically activate stretch or heat receptors deep inside tissues. It is so harmless that we entrust it to get digital images and scans, or sonograms, of forming fetuses and fetal organs (Figure 14.1). Its portability, low cost, and lack of harm make it a very attractive alternative to the other non-invasive modalities. Furthermore, refraction and absorbance of tissues limit penetration of optical approaches to about 1 mm (see Sections 8.13 and 17.12). The penetration of ultrasound through soft tissues can be as deep as 10 cm. With new techniques in ultrafast ultrasound Doppler and the development of ultrasound contrast agents that can serve as reporters of gene expression and localization (see Section 14.13), this modality offers non-invasive evaluation of tissue and cellular dynamics deep inside the body.
14.2 Many Species Use Ultrasound and Infrasound for Communication and Detection Many small mammals use ultrasound for echolocation. Ultrasound has a frequency above that detected by humans (20 kHz). Because it is high frequency, small things reflect it, like the insects a bat eats. The direction of the echo and the phase difference detected between the well-separated and enlarged left and right ears of the bat (see Figure 14.1A) guide bats to their prey. They use it instead of sight in the darkest of caves. Echolocation also works under water. The speed of sound is higher in water (1484 m s–1) than in air (343 m s–1), so there is less lag between the action and the sound in water. Furthermore, water does not absorb the ultrasound for echolocation in sonar (an acronym for sound navigation and ranging) as much as it does electromagnetic waves of light and radio. Dolphins and other marine animals use it to find each other and their prey in dark waters. Both bats and dolphins make clicks of shorter and shorter durations (from 2 seconds to 60 microseconds) as the object that they are seeking gets closer, providing us with an important lesson in ultrasound: ultrasonic pulses of shorter duration detect the distance to the object more accurately.
Imaging Life: Image Acquisition and Analysis in Biology and Medicine, First Edition. Lawrence R. Griffing. © 2023 John Wiley & Sons, Inc. Published 2023 by John Wiley & Sons, Inc. Companion Website: www.wiley.com/go/griffing/imaginglife
316
14 Ultrasound Imaging
Figure 14.1 Ultrasound analysis of a pregnant bat. (A) Ghost bats use ultrasound, and ultrasound can treat them. Courtesy of the Featherdale Sydney Wildlife Park. (B) Photo of the head of an adult Trident leaf-nosed bat, Asellia tridens. The zygomatic breadth (ZB) is the width of the head between the zygomatic arches. (C) The length of the region that shapes the sound from the nasal passages of the bat is the horseshoe length. The top of the region is the trident, and the bottom of the region is the horseshoe. (D) B-mode ultrasound imaging of an in utero developing bat fetus. (E) Inset from D showing the developing head of the fetus. The horseshoe length increases much more rapidly than ZB during development, making it possible for the bat to emit focused ultrasound at birth. Adapted from Amichai, E., Tal, S., Boonman, A., et al. 2019. Ultrasound imaging reveals accelerated in-utero development of a sensory apparatus in echolocating bats. Scientific Reports 9: 5275 https://doi.org/10.1038/s41598-019-41715-y. CC BY 4.0.
Infrasonics are sounds that humans cannot hear, those with frequencies lower than 20 Hz. Animals also use these very low frequencies to communicate because even large objects do not reflect them. Large mammals, such as elephants and whales, as well as large reptiles, such as alligators, use infrasonics to communicate over large distances and to detect the natural infrasonics of oncoming thunderstorms and earthquakes.
14.3 Sound Is a Compression, or Pressure, Wave Because the displacement of molecules by the force of the sound is parallel to the direction of wave travel, sound is a longitudinal wave. Longitudinal sound waves are unlike transverse ocean waves or electromagnetic waves in which the displacement or field is perpendicular to the direction of travel. Compression waves generate regions of high and low molecular density as they travel. They have high-density, high-pressure, compressed regions separated by low-density, low-pressure, rarefied regions (Figure 14.2). The pressure at any point in the ultrasound wave depends on the speed of the wave, the velocity of the particles in the media, and the density of the medium. High-density media
14.4 The Measurement of Audible Sound Intensity Is in Decibels
Figure 14.2 A longitudinal wave of pressure has regions of positive-pressure compression of molecules at the peak of the wave and regions of negative-pressure rarefaction of molecules in the trough of the wave. The direction of propagation is the z or axial direction. The pressure ratio between what we hear and don’t hear is the sound level. The particle displacement uz that occurs is the ratio between the pressure and the acoustic impedance of the medium. Acoustic impedance is the product of the medium density and the speed of sound in the medium. Diagram by L. Griffing.
exert high pressures – the principle behind hydraulics. But as you know from listening to loud music in a still room, the actual displacement of the molecules of air by sound is small. Music doesn’t create a wind, and the actual particle displacement is much, much smaller than the speed of the wave (see Figure 14.2, Equation 14.1). uz = P / Z,
(14.1)
in which P is pressure and Z is acoustic impedance.
14.4 The Measurement of Audible Sound Intensity Is in Decibels As with light (see Section 4.3), we humans do not sense sound intensity accurately over the entire spectrum of frequencies, hearing only sounds between 20 Hz and 20 kHz. We are most sensitive to frequencies around 1 kHz (roughly high C). The pressure of sound near the limit of our hearing, 20 Hz, is 20 μPa (micropascals). The sound intensity, or power per unit area, is 1012W/m2. That sound level is defined as 0 decibels (dB). The dB measure of what we hear is a log10 ratio of the pressure of sound we hear, P, divided by the pressure we don’t hear, P0, 20 μPa (Equation 14.2). Sound intensity (decibels) = 20log10P / P0 .
(14.2)
Consequently, a 10-fold increase in sound intensity or an increase in 10 dB sounds about twice as loud to our ears. Because we are less sensitive to sounds lower and higher than high C, we need more decibels in these frequencies to hear them, just like we need higher radiometric intensities of red and blue light than of green light (see Section 9.2). When we get to ultrasounds and infrasounds, we cannot hear them even at very high decibels. We need to limit exposure to high intensities of sounds that we can hear. At high C, about 120 dB is the pain threshold, and repeated exposure to sounds above 85 dB can cause noise-induced hearing loss (100 dB is produced by a typical gas lawn mower). Exposure to high intensities of sounds we can’t hear doesn’t damage our hearing.
317
318
14 Ultrasound Imaging
14.5 A Piezoelectric Transducer Creates the Ultrasound Wave Ultrasound machines produce the acoustic pressure wave with a piezoelectric crystal or material that expands or contracts in response to an electrical pulse, a resin-embedded composite of lead zirconate titanate (PZT) crystalline rods. The same piezoelectric elements detect the echo, which generates a much smaller pressure when arriving back at the PZT, but the PZT transduces that small pressure into an electrical current. The piezoelectric element is therefore a transducer (Figure 14.3A). A transmitting and receiving switch separates the sending and receiving pulses of electricity (Figure 14.3B). To generate images, an array of transducers sends pressure waves across a plane and detects the pressure wave echoes coming from within the tissues. The size of the transducers determines the size of the vertical line in the image (Figures 14.3B and 14.4).
Figure 14.3 (A) Construction elements of a lead zirconate titanate (PZT) transducer. The alternating voltage frequency is the sound frequency of the PZT. The wavelength varies with the frequency, being larger at lower frequencies. As the voltage across the top and bottom electrode varies, the PZT element changes thickness. The optimal starting thickness of the PZT element is half the wavelength of the operating frequency. The optimal thickness of the matching layer is one quarter the wavelength of the operating frequency. (B) A hand-held ultrasound imaging device. The PZTs align in a linear array of 128–512 transducers. The number of vertical lines in the ultrasound image is limited by the number of transducers in the array. Diagram by L. Griffing.
14.6 Different Tissues Have Different Acoustic Impedances
Figure 14.4 A hand-held ultrasound probe with a linear array of transducers can generate an image containing the same number of vertical lines as the number of transducers in the array. The sound is reflected back to the transducer, with stronger reflections in regions of larger acoustic impedance difference. BruceBlaus / Wikimedia Commons / CC BY 4.0.
14.6 Different Tissues Have Different Acoustic Impedances How well the acoustic pressure generated by the piezoelectric crystal produces a change in particle velocity and hence produces these regions of high and low particle density is the acoustic impedance (Equation 14.3) of the medium. Z = ρc,
(14.3)
in which ρ is the density of the medium and c is the speed of sound. At a given pressure, low-impedance media have faster moving particles than high-impedance media. If you’ve ever tried to shout under water, you know that water has a higher acoustic impedance than air. That is because water is denser, and the speed of sound in water is higher. In sum, the acoustic impedance is the combined effect of medium density and the speed of the sound wave on how much a change in pressure produces a change in particle displacement. It is this difference in acoustic impedance in different tissues that determines how well ultrasound can discriminate them (Table 14.1) because as the sound wave travels between tissues of different acoustic impedance, the intensity of the reflected sound wave changes. Snell’s law, which governs the transmission, reflection, and refraction of light (see Section 4.4) also governs the reflection, refraction, and transmission of sound through media of different compositions (Figure 14.5). The angle of the reflected ultrasound wave is equal to the angle of the incident sound wave, but the transmitted wave has a slightly different, refracted, angle determined by the ratios of the speed of sound in the two different media. Usually in ultrasound analysis of soft tissues, the speed of sound is not greatly different (see Table 14.1), but when it is, such as at a calcified interface, it creates artifacts by sound wave refraction. The ratio between the reflected to the incident intensities is the reflection coefficient. The ratio of the transmitted to the incident intensity is the transmission coefficient. In soft tissues, where the acoustic impedance difference is small, the reflection coefficient is around 0.1%. When acoustic impedance is very different between the two tissues (Z1>>Z2 or Z2>>Z1), the reflected coefficient approaches 1, and the transmission coefficient approaches 0 when a sound wave hits an interface between two tissues at right angles. Because this stops the transmission of the sound wave, the underlying tissue lies in a shadow. This is typical in sonograms of gallstones (Figure 14.6). There is high reflectance from the surface of the stone but a shadow behind it. This artifact is useful for diagnosing gallstones, so it is not always a bad thing. Large reflections and shadowing also happen when there is an air pocket in the tissue. Reverberations occur in the signal if there is a very strong reflector close to the surface, appearing as a series of bright lines. The transducer has a Z value of 30 × 105 g cm–2 s–1, whereas the skin or tissue Z is 1.99–1.7 × 105 g cm–2 s–1.
319
320
14 Ultrasound Imaging
Figure 14.5 Snell’s law illustrated for the relationship between the reflected and transmitted wave intensity and angle when sound travels from a medium of one acoustic impedance to another. Diagram by L. Griffing. Table 14.1 Representative Values of Acoustic Properties of Tissues, Air, and Water.
Air and lungs
Impedance (Z)
Speed of Sound (c)
Density (ρ)
Attenuation (α0)
×105g cm–2s–1
m s–1
mg cm–3
dBcm–1MHz–1
0.00043
343
1.3
41
Bone
7.8
4000
1906
20
Muscle
1.7
1590
1075
1.3–3.3
Kidney
1.62
1560
1040
1.0
Liver
1.65
1570
1050
0.94
Brain
1.58
1540
1025
0.85
Fat
1.38
1450
925
0.63
Blood
1.59
1570
1060
0.18
Water
1.48
1480
1000
0.0022
Figure 14.6 Ultrasound image of a gallbladder. (A) The curvilinear array of transducers in a probe where the ultrasound lines diverge in an arc produce the image sweep. White arrow points to the gallstone. From Murphy, M.C., Gibney, B., Gillespie, C., et al. 2020. Gallstones top to toe: what the radiologist needs to know. Insights Imaging 11: 13. https://doi.org/10.1186/s13244-019-0825-4. Creative Commons 4. (B) Interpretation of image in A showing the gallbladder, a gallstone, and the shadow from the stone. Diagram by L. Griffing.
14.7 Sonic Wave Scatter Generates Speckle
Reverberation is overcome by matching the acoustic impedance of the output transducer with that of the tissue (usually skin and underlying fat or muscle) using a matching layer (see Figure 14.3), one quarter of the wavelength of the emitted ultrasound, and a gel applied to get a good contact.
14.7 Sonic Wave Scatter Generates Speckle Sometimes other bright or dark artifacts occur. One cause is constructive or destructive interference of sound waves scattered by particles a little smaller than the wavelength of the sound. This is a form of Rayleigh scattering, the same effect that makes the sky blue because blue light is more scattered by particles in the atmosphere than the longer wavelength light. If the particles are widely separated, the scattered sound can constructively or destructively interfere at spots, generating an artifact called speckle. If the objects are very small (5–10 μm in diameter) and close together, like red blood cells moving through an artery, their scattering patterns add together constructively. Although this signal is low compared with highly reflective surfaces, it is the basis for visualizing the blood flowing using Doppler ultrasound (see Section 14.12). Although sometimes considered noise, the speckle produced by the cellular features of each tissue (Figure 14.7) helps distinguish different tissues and contains useful information. Different speckle patterns generate different tissue textures that give contrast to the sonogram. Compound imaging (see Section 14.11) reduces speckle. The fineness of the speckle also reveals the lateral resolution and resolution above and below the plane of the primary longitudinal wave, the elevation resolution. Ultrasound images usually display in the x-z plane; elevation is in the y direction. The finer the speckle pattern, the higher the lateral resolution.
Figure 14.7 Speckle is caused by scatter. (A) Rayleigh scattering from a particle slightly smaller in size than the wavelength of the ultrasound. (B) Groups of particles, such as red blood cells, produce constructive interference when they scatter sound. (C) Separated particles can cause destructive interference with intervening particles. (D) Ultrasound image with speckle. (E) Ultrasound image with speckle filtered out. (F) Zoomed region showing speckle and signal. (G) Despeckled region in (F). H) Isolated speckle in F. D–H from L. Zhu, C. Fu, M. S. Brown and P. Heng, “A Non-local Low-Rank Framework for Ultrasound Speckle Reduction,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 493–501, doi: 10. 1109/CVPR.2017.60. Used with permission.
321
322
14 Ultrasound Imaging
14.8 Lateral Resolution Depends on Sound Frequency and the Size and Focal Length of the Transducer Elements Lateral resolution depends on the wavelength of the imaging wave, as it does with light microscopy (see Section 8.4). The frequency of the ultrasound for clinical work varies between 1 and 18 MHz, about 1000 times higher than the frequency we can detect by ear. The wavelength of ultrasound varies with its frequency and the speed of sound in the medium, λ = c/ν, for both sound and light (see Section 4.4). Consequently, for brain tissue, where the speed of sound is about 1540 m/s (see Table 14.1), low-frequency, 1-MHz, ultrasound has a wavelength of 1.5 mm, while at 18 MHz, it has a 0.09-mm wavelength. Lateral resolution also depends on the aperture of the imaging device (see Sections 8.4 and 5.14). The aperture of an ultrasound system is the area of the transducer (Figure 14.8). The focal length is the distance from the transducer to the region where the lateral beam comes to focus (see Figure 14.8). As with light imaging, the f number (f#) of the ultrasound system is ratio of the focal length to the aperture, and as in photography, a large f# (small aperture) produces a large depth of field, and a small f# (large aperture) produces fairly shallow depth of field (see Figures 14.8A and 14.8B; see Section 5.15). The wavelength of the sound times the f# gives the lateral resolution of the transducer. Lateral resolution = λF / D or λf #,
(14.4)
in which λ is wavelength of sound, F is focal length, D is probe diameter (aperture), and f# is focal length/probe diameter (F/D). Typical values for lateral (and elevational) resolution are 1–10 mm. Beam focusing improves the lateral resolution. Curving the linear array of transducers in the ultrasound probe focuses the pressure waves from each transducer element in a narrow waist some distance in front of the probe (see Figure 14.8C). Emitting the beams at different times from each transducer, phasing, achieves focus (Figure 14.9). Phasing causes constructive interference near the center of the beam, producing centralized pulses, thereby achieving closer focus (see Figure 14.9). A phased array starts the timing sequence at one end of a linear array. The composite beam comes to focus near the opposite end of the array. Controlling the timing to sweep the beam across the entire field electronically generates faster scans. Clinical scan rates are 70–80 frames/s for line-by-line acquisition. For planar wave acquisition (see Section 14.11), 20,000 frames/s are possible.
Figure 14.8 The trade-off between depth-of-field and lateral resolution for a flat single element transducer (A), a concave single element transducer (B), and a phased array of pulses on a multi-element transducer (C). (D) Equation 14.8 for lateral resolution for ultrasound. Diagram by L. Griffing.
14.9 Axial Resolution Depends on the Duration of the Ultrasound Pulse
Figure 14.9 The focus achieved by phasing. The left-side transducer array produces a pulse by all the elements synchronously, while the right-side array starts the pulse in the outer elements at t1. At t2, the next inward set of elements produces a pulse in the right-side transducer. At t3, the central element in the right-side transducer produces a pulse. By t5, the right-side transducer pulses have nearly come to a very sharp focus, while the left-side has yet to come to focus; when it does, it will be broader. 1D, one-dimensional. Diagram by L. Griffing.
14.9 Axial Resolution Depends on the Duration of the Ultrasound Pulse The axial resolution of ultrasound is 0.1–1 mm, an order of magnitude better than the lateral resolution. The duration of the transmitted pressure pulse generated by the crystal determines axial resolution. It has to be short enough so that it does not interfere with the echo pressure pulse it receives. The pulse needs to travel at least twice the distance between the two smallest sets of reflections, setting the limit of axial resolution (Figure 14.10). Improvement in axial resolution can come from transducer design, in which improved damping makes the signal more discrete. Transducer handsets incorporate a damping agent behind the PZT composite (see Figure 14.3).
Figure 14.10 Axial resolution depends on pulse duration. The amplitudes of the waves have not been adjusted for absorption. When the initial transmitted wave hits the first and second surfaces, it is reflected sequentially, separated by the time difference between these reflections. If the pulse duration is twice the distance between the surfaces, the two can be accurately discriminated because the returning echoes do not overlap. Diagram by L. Griffing.
323
324
14 Ultrasound Imaging
14.10 Scatter and Absorption by Tissues Attenuate the Ultrasound Beam Absorption of sound is a complex function of how fast the particles in the tissue relax to a non-compressed state after compression by the sound wave and the friction between the particles during displacement by the sound wave. Different tissues absorb the sound more than others and dissipate the absorbed energy as heat. Sound absorption and scatter, without reflection or transmission, attenuation, is higher for tissues with higher densities (see Table 14.1). The change in intensity caused by attenuation in soft tissue (see Table 14.1) is about 1 dB cm–1 MHz–1, while in air and bone, it is higher: 41 dB cm–1 MHz–1 and 20 dB cm–1 MHz–1, respectively. Increasing the operating frequency of ultrasound also increases attenuation. There is a near-linear relationship between the absorption of the sound and the ultrasound operating frequency. To compensate for attenuation of the signal, a time-gain compensator is part of the electronics of the ultrasound probe. In deeper tissues, as the time to receive the echo increases and the attenuation increases, the time-gain compensator amplifies the signal. It reduces the dynamic range of the signal but provides better visualization of deep tissues. Also, the probe includes electronic logarithmic histogram stretching (see Section 2.10) of the signal to improve the contrast of fainter features. The heat produced by sound absorption during routine ultrasound imaging is very small and generally quite safe. However, there are power-intensive procedures, such as 3D and Doppler scanning (see Section 14.12), in which safety limits are needed. The upper limit for the temperature increase for a routine diagnostic procedure is 1.5oC. On embryos and fetuses, anything that causes the temperature to rise above 41oC for 5 minutes is unsafe. Also, when using microbubble contrast agents (see Section 14.13), the rarefactional pressure can cause the bubble to burst, so acoustic output is limited by the US Food and Drug Administration to 720 mW/cm2. However, sometimes the bubble burst can be used therapeutically (see Section 14.13). On some machines, there is a read-out on the machine for the mechanical index (MI), which is a function of the rarefactional pressure. The upper limit is 1.9 MI for scans of everything except for the eye, with an upper MI of 0.23. Some machines display a thermal index (TI) for soft tissue (TIS), bone (TIB) and cranial bone (TIC). There is an upper limit of 1oC on TI.
14.11 Amplitude Mode, Motion Mode, Brightness Mode, and Coherent Planar Wave Mode Are the Standard Modes for Clinical Practice An amplitude (A)-mode scan produces a timeline plot of the amplitude of the ultrasound echo. Knowing the speed of sound in the different compartments through which the echo travels, the timeline provides a measure of the distance. The major application is ophthalmic pachymetry, which is a noninvasive technique that measures corneal, lens, and vitreal body thickness. Following topical anesthesia, a small ultrasound probe centered and in contact with the cornea produces high-frequency (10–20 MHz) pulses. Ophthalmic pachymetry can diagnose glaucoma and cataracts (Figure 14.11). It also can evaluate the post-operative success of corneal transplants and other eye surgeries. Outside of diagnostic procedures, A-mode ultrasound can therapeutically break up kidney stones or other calcifications, a process known as lithotripsy, besides giving you a nice white tartar-free smile with ultrasound teeth cleaners. Motion (M)-mode scanning acquires a continuous series of A-mode lines and displays them as a function of time. It is a kymograph (see Section 12.5), showing movement in the axial dimension. The brightness of the displayed M-mode signal is the amplitude of the echo. The lines have an incremental timeline on the horizontal axis, shown in 250-ms increments in Figure 14.12A and 100-ms increments in Figure 14.12B. Several thousands of lines per second produce real-time or slow-motion display of dynamic movement. M-mode scanning is common in cardiac diagnostics. Figure 14.12 shows the difference between the closing cycle of the mitral valve in a normal patient (see Figure 14.12A) and a patient with mitral valve prolapse (see Figure 14.12B). The top of each figure shows a B-scan with an arrow through it showing the line along which the M-Mode is taking place. Besides the expanded horizontal time scale, there is a reduced vertical depth scale in Figure 14.12B, showing the slightly prolonged atrial systole and the initially incomplete closing of the mitral valve. The most common mode for clinical diagnosis, including the popular fetal scans, is brightness (B)-mode (Figure 14.13). The image is two-dimensional (2D), produced by a linear array transducer (Figure 14.4). More
14.11 Amplitude Mode, Motion Mode, Brightness Mode, and Coherent Planar Wave Mode Are the Standard Modes for Clinical Practice
commonly, the transducer is curvilinear (see Figure 14.6). Each line in the image is an A-mode line, with the intensity of each echo represented by the brightness on the 2D scan. In most clinical instruments, phased array transducers with electronically processed signals achieve the scan at faster rates with dynamic focusing (see Figure 14.9). A combination of the linear array image and phased imaging produces multi-angle B-mode imaging, or compound imaging, which produces better images by providing multiple angles of view and reducing speckle. Wobbling or rotating a one-dimensional array or using a 2D matrix array transducer generates 3D B-mode imaging (see Figure 14.14). The hardware complications of using a 2D transducer make it less common. Algorithms to assemble 2D images into 3D volumes use advanced registration programs such as scale-invariant feature transform (SIFT) (see Section 11.4) and some of the same programs used in structure from motion programs (see Section 13.3). 3D or four-dimensional scans facilitate the diagnosis the extent of tissue damage in mitral valve prolapse (Figure 14.15) and deformities in developing fetuses. Coherent planar wave imaging is a much faster version of B-mode imaging, speeding it up 300-fold (Figure 14.16). Instead of sending a focused transmitted pulse sequentially along the x-z plane, coherent planar wave imaging sends out a coherent longitudinal compression wave across the entire field. Sorting the scan lines from the reflection data occurs by focusing in the receive mode. Receiver-side beam forming comes from a summed set of time delays, with each line calculated from a different set of time delays. This is similar to timed delay sequences that focus the outward beam in standard B-mode imaging (see Figure 14.9) but on the receiving end.
Figure 14.11 Ophthalmic pachymetry uses A-mode ultrasound to measure cornea, lens, and vitreal body thickness. Note that the reflections occur at interfaces, so even though the lens is quite hard, there are few reflections, except where a cataract, in this case, occurs. It employs a small probe in contact with the cornea, which produces single line of 10- to 20-MHz frequency ultrasound. The time scale at the bottom converts to a distance measure because the speed of the sound in the different tissues is known. Diagram by L. Griffing.
Figure 14.12 M-mode data acquisition. The transducer is placed above the heart and sends out a single line of ultrasound. An A-mode scan is recorded, and as soon as the last echo has been acquired, the A mode scan is repeated. The horizontal time-axis increments for each scan, and therefore a time-series of one-dimensional scans is built up. A straight line represents a structure that is stationary, whereas the front of the heart shows large changes in position. Note the different time scales in A and B. (A) Normal mitral valve closure. Zorkun / Wikimedia Commons / CC BY-SA 3.0. (B) Prolapsed mitral valve partial closure. Zorkun / Wikimedia Commons / CC BY-SA 3.0.
325
326
14 Ultrasound Imaging
Figure 14.13 B-mode data acquisition. The image is generated by dynamically phasing the sweep of the ultrasound across the field or with a curvilinear probe. B-scan image of 12-week fetus by Wolfgang Moroder. Measured distance between crown and rump is 6.51 cm. "Wolfgang Moroder / Wikimedia Commons / CC BY-SA 3.0.
Figure 14.14 Three-dimensional (3D) acquisition of ultrasound. (A) Rotation of a one-dimensional (1D) phased transducer. (B) Wobbling a one-dimensional phased transducer. Diagram by L. Griffing. (C) A two-dimensional matrix transducer directly acquires the 3D volume. From Lindseth, F., et al. 2013. Ultrasound-based guidance and therapy. in advancements and breakthroughs in ultrasound imaging. http:// dx.doi.org/10.5772/55884. CC BY 3.0.
Figure 14.15 Ultrasound three-dimensional renderings of a prolapsing mitral valve. (A) Surface rendered volume of prolapsed mitral valve. The posterior leaflet of the valve shows prolapse. (B) Color coding of prolapse relative to the elevation of the annular plane of the valve, with red showing deviation of the posterior leaflet from the green-colored annular plane. (C) Volume rendering of the view from the side of a prolapsed mitral valve superimposed on a Doppler image of the regurgitation caused by prolapsing (tan on blue in the upper part of the scan). A, anterior; AL, anterolateral; Ao, aorta; P, posterior; PM, posteromedial. Jaydev K.D et al. 2018 / With permission of Elsevier.
14.12 Doppler Scans of Moving Red Blood Cells Reveal Changes in Vascular Flows with Time and Provide the Basis for Functional Ultrasound Imaging
A
Conventional ultrasound imaging
US signal
Ultrasonic probe transducer array focused wave
space focus
ultrasound time B
Ultrafast ultrasound imaging
imaging time
US signal
space plane wave
ultrasound time
imaging time
Figure 14.16 Ultrafast planar imaging. (A) Conventional imaging with a focused beam that moves across the sample using a set of time delays for beam forming. (B) Ultrafast imaging sends out a coherent planar wave, and all the reflected signals backscatter to the receiver at once, making a faster image-by-image acquisition rather than line-by-line acquisition. US, ultrasound. Demené, C. et al. 2019 / With permission of Elsevier.
14.12 Doppler Scans of Moving Red Blood Cells Reveal Changes in Vascular Flows with Time and Provide the Basis for Functional Ultrasound Imaging Doppler analysis measures flow, usually of constructively interfering sub-resolution particles, such as red blood cells. A Doppler shift is the familiar change in frequency of a sound as a loud object moves toward you, when the frequency increases (gets higher), and moves away from you, when the frequency decreases (gets lower). There are four kinds of Doppler analysis in ultrasound. The first is continuous-wave Doppler, which measures the echo from a wave of sound focused at one particular point in the tissue over time. It does not have spatial information. The shift in frequency is a measure of whether the particles are moving toward the probe or away from the probe and at what velocity (Figure 14.17A). fD = (2ft Vcosα ) / c,
(14.5)
in which the angle of flow toward the probe is α, the Doppler frequency fD is the difference between the transmitted frequency ft and the reflected frequency with its accompanying phase change, V is velocity of the particles (e.g., red blood cells), and c is the speed of sound in the material. Continuous-wave Doppler sonograms of sound spectrograms plot velocity in the area of interest over time. In Figure 14.17B, a B-mode scan locates the region of the sonogram of the velocity of blood in a carotid artery. The second type of Doppler analysis is pulsed-wave Doppler, which delivers pulses of waves to the region of interest, or range gate. The pulse repetition period changes in the reflected signal when it reflects off moving objects. The reflected signal is a time-varying sine function of the Doppler frequency. As in continuous-wave Doppler, the Doppler frequency provides the velocity of the particles. This technique produces slightly higher resolution sonograms than continuous-wave Doppler.
327
328
14 Ultrasound Imaging
Figure 14.17 (A) Principle behind Doppler sonography, showing phase and frequency shift caused by flowing sub-resolution particles. The transmitted ft and reflected fr frequencies and phase π change as particles flow toward the ultrasound probe. The angle of flow toward the probe is α. The Doppler frequency fD is the difference between the transmitted frequency ft, and the reflected frequency is approximately (2ftVcosα)/c. (B) Medical spectral Doppler of common carotid artery. Top view shows B-scan of artery and the line for acquiring the underlying sonogram. Daniel W. Rickey / Wikimedia Commons / CC BY-SA 2.5.
The third type of Doppler analysis is standard color-flow imaging, which uses several pulses (3–7) for each image line. Color-flow imaging uses a color table to represent flow, with red toward the transducer and blue away from the transducer (see Figure 14.15C). Increasing velocities have higher intensities in the color table. Color flow also uses pulsed waves but calculates the phase shift between pulses. Direct calculation of the phase shift requires that the returning waves are no greater than a quarter wavelength different, hence the multiple pulses required. Another way to calculate phase difference is by cross-correlation between the signals (see Section 11.13). The fourth type of Doppler analysis is power Doppler, which simply integrates both the signal toward the transducer with that away from the transducer, producing a positive integrated power. This approach removes the directionality information in the Doppler image but eliminates aliasing and removes signal voids in regions where positive and negative flows cancel out. An important tool for examining brain activity, functional ultrasound imaging (fUSI), combines power Doppler with coherent planar wave imaging. The power Doppler signal is simply the magnitude of the flow but is also proportional to the cerebral blood volume. Hence, bigger vessels make brighter images because they have both more volume and faster flows. Larger cerebral vessels have flows greater than 10 mm/s, while microvessels have flows of 0.5–1.5 mm/s. fUSI works through the skull for small rodents and through the fontanel in infants (the nonmineralized region of the skull; see Figure 14.18). Higher levels of brain activity produce higher ultrafast Doppler (UfD) signals. There are higher UfD signals during a drug-resistant seizure produced by a congenital disease in a newborn (see Figure 14.18). Overcoming the skull barrier represents a challenge for fUSI. Some approaches are trepanning, thinning, or installation of ultrasound transparent windows in the skull, as well as the use of some high-signal contrast agents (see Section 14.13). When brain surgery removes the skull, fUSI can help identify vascular regions of tumor growth. The time resolution is about 0.4 s, with acquisition using a pulse repetition frequency of 5 kHz, 10 angles (see compound imaging, Section 14.11), and averaging 200 frames at 500 frames/s. The ability to use fUSI on mobile, awake animals is a huge advantage over functional MRI (see Section 15.17), which confines the unmoving subject to a large, immobile, restrictive chamber. Supra-dural skull implantation of miniaturized transducers is a possible future direction for brain–machine interfaces. As with 3D ultrasound, to acquire a 3D image without rotating or wobbling the transducer for 3D, fUSI requires a 2D matrix array. A design example is a planar matrix array transducer with a large number (1024) of small acquisition elements. Acquisition using this probe requires changing the wave pulse sequence and compensates for signal loss found with the smaller transducer elements.
14.13 Microbubbles and Gas Vesicles Provide Ultrasound Contrast and Have Therapeutic Potential
Imaging plane Ultrasonic Probe Fontanel
A
B
60
UfD signal (%)
40 20 0 –20 –40
C
–60
Figure 14.18 Functional ultrasound imaging of epileptic seizures in an infant. (A) Positioning the ultrasonic probe on the fontanel of an infant. Demené, C. et al. 2019 / With permission of Elsevier. (B) Representative ultrafast Doppler (UfD) image (colored) overlaid onto an ultrasound B-mode image (black and white) of a patient with drug-resistant seizures with congenital tuberous sclerosis complex. The outlines are three regions of interest (ROIs) in the cortex used for UfD analysis. (C) UfD snapshots during a seizure and no seizure. Color-coded relative changes of the UfD signal are superimposed. P < 10−5 right hemisphere ROIs versus left hemisphere ROIs during ictal events using Student’s t test. Adapted from Demené, C et al. 2017. Functional ultrasound imaging of brain activity in human newborns. Science Translational Medicine 9(411): eaah6756.
14.13 Microbubbles and Gas Vesicles Provide Ultrasound Contrast and Have Therapeutic Potential Microbubbles and gas vesicles injected into tissues or vessels, or gas vesicle gene clusters expressed in tissues, produce a bright, high-contrast signal. Although it is easy to see that the characteristic impedance of a gas-filled microbubble or gas vesicle is much different from the surrounding tissue or blood, this is not the principal reason why such agents are so effective. They contain a compressible gas and respond to the propagating ultrasound beams by compressing during compression periods and expanding during rarefaction periods of the longitudinal sound wave, as shown in Figure 14.19. Compared with the energy absorbed by an incompressible fluid-filled, equivalently sized cell, microbubbles and gas vesicles absorb much more energy. Re-radiating this absorbed energy during expansion results in a strong echo signal returning to the transducer. Gas vesicles buckle under compression and produce a non-linear echo response that is even easier to separate from the surrounding noise. Injection of microbubbles into the patient’s bloodstream immediately follows rehydration of freeze-dried microbubbles in physiological saline solution. SonoVue and Optison are microbubble preps approved for worldwide clinical use. The
329
330
14 Ultrasound Imaging
Figure 14.19 Microbubble and gas vesicle ultrasound contrast agents. (A) The diameter of the microbubble is between 2 and 10 µm. They have either a polymer or lipid coat. (B) Gas vesicles are much smaller, with a maximum length of 800 nm. They have a complex 2-nm coat with two structural proteins, gas vesicle protein A (GvpA) and C (GvpC). When compressed, they buckle, giving rise to a non-linear echo response. (C) The diameter of a microbubble changes as an ultrasound pressure wave passes through the tissue in which the microbubble is located. Some pulses can burst the microbubble. Diagram by L. Griffing.
SonoVue microbubbles contain sulfur hexafluoride gas surrounded by a phospholipid monolayer. The Optison microbubbles contain perfluoropropane gas within a cross-linked serum albumen shell. These shells are about 20 nm thick, and the bubble has a diameter between 2 and 10 μm (see Figure 14.19). Monodisperse (all the same size) microbubbles produce less noise. The microbubble diameter and the thickness of the shell determine the ultrasound frequency where the expansion and contraction are greatest, their resonance frequency. Luckily, the resonance frequency of microbubbles that fit through 6-μm diameter of human capillaries is within the frequency range of diagnostic ultrasound. Microbubble contrast agents extend the spatial resolution of ultrasound by an order of magnitude, providing visualization of 10-μm vessels compared with 100-μm vessels (compare Figure 14.20A with B). Resolved vessels can be 16 μm apart (e.g., trace 3 in Figure 14.20C). Gas vesicles occur naturally in some cyanobacteria, which use them for buoyancy. The main structural proteins in a gas vesicle are gas vesicle proteins A (GvpA) and C (GvpC) (see Figure 14.19B). Besides providing a much better signal than microbubbles, gas vesicles can be ultrasound gene activity reporters and protein localization reporters, becoming the “GFP of ultrasound” (see Sections 11.12, 12.7–12.10, 17.2, and 18.7–18.10 for use of fluorescent proteins as reporter molecules). Mammalian gene promoters can drive expression of the stretch of the bacterial chromosome that contains all the proteins (there are 10) for gas vesicle production in mammalian cell culture. Using these contrast agents, ultrasound can determine the amount and localization of gene expression driven by a promoter deep inside a tissue. They are also biosensors, revealing specific forms of enzyme activity. For example, engineering the GvpC protein with specific protease cleavage sites provides the location of tissue proteases because gas vesicles with less GvpC buckle more easily and emit non-linear echoes that differ from other reflected signals (Figure 14.21). A cross-amplitude modulation (x-AM) pulse sequence detects the non-linear scattering from buckling gas vesicles and subtracts the B-wave linear signal from each wave. A treatment modality under current research targets microbubbles or gas vesicles to tumors or to the blood–brain barrier. Ultrasound frequencies that burst (cavitate) the bubbles open the blood–brain barrier or disrupt the tumor and release oxygenated gas or other drugs that can improve or replace radiation therapies.
14.13 Microbubbles and Gas Vesicles Provide Ultrasound Contrast and Have Therapeutic Potential
Figure 14.20 Microvasculature of a rat brain cortex following microbubble injection and imaged through a skull window. (A) A microbubble density map has a spatial resolution of λ/10 (pixel size = 8μm ×10μm). (B) Same region as A using conventional power Doppler imaging. (C) Intensity profiles along the lines marked in A showing 9-μm vessels (2) and resolution of two vessels closer than 16 μm (3). au, arbitrary units. Errico C et al. 2015 / Springer Nature.
Figure 14.21 Gas vesicle reporter of calpain activity in the presence and absence of calcium. (A) Linear B-scan taken at 132 kPa of an agarose phantom containing gas vesicles OD500 2.2 engineered with a calpain-sensitive site in gas vesicle protein C (GvpC). Scale bar, 1 μm. (B) Same as A with cross-amplitude modulation (x-AM) images taken at 425 kPa. Signal from the cross-amplitude modulation pulses at higher acoustic pressure of the vesicles treated with calpain were higher in the presence than in the absence of the required calcium cofactor. The color bars represent relative ultrasound signal intensity on the decibels scale. (C) The solid curves represent the mean, and the error bars indicate the standard error of the signal from the gas vesicles at increasing acoustic pressures. au, arbitrary units. Lakshmanan A et al. 2020 / Springer Nature.
331
332
14 Ultrasound Imaging
Annotated Images, Video, Web Sites, and References 14.1 Ultrasonography Is a Cheap, High-Resolution, Deep-Penetration, Non-invasive Imaging Modality General treatments of ultrasound: Suetens, P. 2009. Fundamentals of Medical Imaging. Second Edition. Cambridge University Press, Cambridge, UK. doi: 10.1017/CBO9780511596803. Introduces math concepts with some proofs. Prince, J.L. and Links, J.M. 2014. Medical Imaging Signals and Systems. Second Edition. Pearson Education, Upper Saddle River, NJ. An engineering and physics-oriented text at the graduate level. Smith, N. and Webb, A. 2011. Introduction to Medical Imaging. Physics, Engineering and Clinical Applications. Cambridge University Press, Cambridge, UK. An accessible text for undergraduates. A comprehensive review of ultrasound modalities for analysis of nerve tissues is Rabut, C., Yoo, S., Hurt, R.C., et al. 2020. Ultrasound technologies for imaging and modulating neural activity. Neuron 108: 93–110. New techniques of neuromodulation with ultrasound may give rise to new therapies. Discussion of the therapeutic value of ultrasound is in Miller, D. Smith, N.B., Bailey, M.R., et al. 2012. Overview of therapeutic ultrasound applications and safety considerations. Journal of Ultrasound in Medicine 31: 623–634.
14.2 Many Species Use Ultrasound and Infrasound for Communication and Detection Over 1000 species echolocate: https://www.nationalgeographic.com/animals/article/echolocation-is-nature-built-insonar-here-is-how-it-works.
14.3 Sound is a Compression, or Pressure, Wave Pressure waves are traveling waves, moving outward from a source, as distinguished from standing waves.
14.4 Measurement of Sound Intensity Uses Decibels There are several sound level measurement apps available for both Android and iOS cell phones.
14.5 A Piezoelectric Transducer is Used to Create the Ultrasound Wave Besides measuring sound waves with piezoelectric pressure transducers, optical techniques are also available. Wissmeyer, G. Pleitez, M.A., Rosenthal, A., et al. 2018. Looking at sound: opto-acoustics with all-optical ultrasound detection. Light: Science & Applications 7: 53.
14.6 Different Tissues Have Different Acoustic Impedance The unit for impedance is the rayl. In MKS units, 1 rayl equals 1 pascal-second per meter (Pa·s·m−1), or equivalently 1 newton-second per cubic meter (N·s·m−3). In SI base units, that is kg∙s−1∙m−2. The non-equivalent centimeter-gram-second (CGS) rayl measure is ten times larger than the MKS rayl (confusing). Matching layers typically have a value of between 15 and 3 Mrayls, in between the impedance of the source (33 Mrayls) and the tissue (1.5 Mrayls).
14.7 Sonic Wave Scatter Generates Speckle Although speckle can help identify tissues, it also produces noisy images. Image processing approaches can reduce speckle, Zhu, L. Fu, C. -W., Brown, M.S., et al. 2017. A non-local low-rank framework for ultrasound speckle reduction. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 493–501. doi: 10.1109/CVPR.2017.60.
Annotated Images, Video, Web Sites, and References
14.8 Lateral Resolution Depends on Sound Frequency and the Size and Focal Length of the Transducer Elements Coherent planar wave imaging (see Section 14.11), with acquisition from multiple angles, has a lateral resolution as good as standard B mode (without contrast agents) of about 0.7 mm. See Madiena, C. Faurie J., Poree J., et al. 2018. Color and vector flow imaging in parallel ultrasound with sub-Nyquist sampling. IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control 65: 795–802.
14.9 Axial Resolution Depends on the Duration of the Ultrasound Pulse Contrast-to-noise, axial resolution, and lateral resolution analysis of advanced ultrasound imaging, including high frequency imaging are in Sassaroli, E., Crake C., Scorza A., et al. 2019. Image quality evaluation of ultrasound imaging systems: advanced B‐modes. Journal of Applied Clinical Medical Physics 20: 115–124. As with standard B-mode imaging, axial resolution is always better than lateral resolution.
14.10 Scatter and Absorption by Tissues Attenuate the Ultrasound Beam The Food and Drug Administration has a good web site on ultrasound safety: https://www.fda.gov/radiation-emittingproducts/medical-imaging/ultrasound-imaging.
14.11 Amplitude Mode, Motion Mode, Brightness Mode, and Coherent Planar Wave Mode Are the Standard Modes for Clinical Practice One of the earlier papers on coherent planar wave mode is Montaldo, G. Tanter, M., Bercoff, J., et al. 2009. Coherent planewave compounding for very high frame rate ultrasonography and transient elastography. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 56: 489–506. An interesting review of 3D ultrasound diagnosis of fetus abnormalities is Gonçalves, L.F. 2016. Three-dimensional ultrasound of the fetus: how does it help? Pediatric Radiology 46: 177–189.
14.12 Doppler Scans of Moving Red Blood Cells Reveal Changes in Vascular Flows with Time and Provide the Basis for Functional Ultrasound Imaging A review of fUSI is in Deffieux, T., Demene, C., Pernot, M., et al. 2018. Functional ultrasound neuroimaging: a review of the preclinical and clinical state of the art. Current Opinion in Neurobiology 50: 128–135. Application of fUSI for analysis of pain states in arthritic animals is in Rajul, L., Thibaut, M., Rivals, I., et al. 2020. Ultrafast ultrasound imaging pattern analysis reveals distinctive dynamic brain states and potent sub-network alterations in arthritic animals. Scientific Reports 10: 10485. https://doi.org/10.1038/s41598-020-66967-x.
14.13 Microbubbles and Gas Vesicles Provide Ultrasound Contrast Agent and Have Therapeutic Potential An example of a clinical application of contrast-enhanced ultrasound imaging is the identification of ectopic pregnancies in cesarean scar tissue: Xiong, X., Yan P., Gao, C. et al. 2016. The value of contrast-enhanced ultrasound in the diagnosis of cesarean scar pregnancy. BioMed Research International 2016: 4762785. http://dx.doi.org/10.1155/2016/4762785. A review of gas vesicle engineering is in Lakshmanan, A., Farhadi, A., Nety, S.P., et al. 2016. Molecular engineering of acoustic protein nanostructures. ACS Nano 10: 7314−7322. Demonstration of the use of non-linear X-wave detection of gas vesicles is in Maresca, D., Sawyer, D.P., Renaud, G., et al. 2018. Nonlinear X-wave ultrasound imaging of acoustic biomolecules. Physical Review X 8. doi: 10.1103/ physrevx.8.041002. Generation of acoustic holograms and microbubble patterning using the acoustic equivalent of a spatial light modulator (see Section 4.8) is in Ma, Z. Melde, K., Athanassiadis, A.G., et al. 2020 Spatial ultrasound modulation by digitally controlling microbubble arrays. Nature Communications 11: 4537. doi: 10.1038/s41467-020-18347-2. This technology will not only read ultrasound biosensors but also position them.
333
334
15 Magnetic Resonance Imaging 15.1 Magnetic Resonance Imaging, Like Ultrasound, Performs Non-invasive Analysis without Ionizing Radiation Magnetic resonance imaging (MRI), like ultrasound imaging, non-invasively images internal organs. Ultrasound brain imaging and functional ultrasound imaging (fUSI; see Section 14.12) are only possible in infants with incompletely formed skulls or following surgery to remove the skull on adults, whereas brain imaging is one of the main uses of MRI. Firm outer coverings, such as the bony cranium of the head, diminish the penetration of an ultrasound signal because much of the signal reflects back at the hard boundary (see Section 14.1). Unlike the longitudinal compression waves of ultrasound, the electromagnetic radiation of MRI easily penetrates bone and hard tissues. Consequently, MRI can image the interior of the skull (Figure 15.1). The spatial resolution of conventional (high-field) MRI is about the same as ultrasound without contrast agents, about 1 mm in the axial (z) and lateral (x-y) dimensions. Consequently, neither modality can track single neurons, which are less than 1% of 1 mm (i.e., 20% of the cells die within 1 hour)
Basic fuchsin
40
Nuclei – purple
Janus green B
6
Mitochondria – green
Methyl violet 10B (crystal violet, gentian violet)
2
Active ingredient in Gram stain [gram-positive bacteria take it up, becoming red-purple, and die (except streptococci)]
Safranin
111
Counterstain in Gram stain – stains endospores; stains lignified walls bright red in plants; stains cartilage and mucin orange to red and nuclei black
Low Toxicity (