226 36 21MB
English Pages 910 Year 2000
HANDBOOK
OF
MEDICAL IMAGING
Dr. William Brody President Johns Hopkins University
Dr. Elias Zerhouni Chairman, Department of Radiology and Radiological Science Johns Hopkins Medical Institutions
Dr. Rangaraj M. Rangayyan Department of Electrical and Computer Engineering University of Calgary
Dr. Richard A. Robb Director, Biomedical Imaging Resource Mayo Foundation
Dr. Roger P. Woods Division of Brain Mapping UCLA School of Medicine
Dr. H. K. Huang Department of Radiology Childrens Hospital of Los Angeles/ University of Southern California
The focus of this series will be twofold. First, the series will produce a set of core text/ references for biomedical engineering undergraduate and graduate courses. With biomedical engineers coming from a variety of engineering and biomedical backgrounds, it will be necessary to create new cross-disciplinary teaching and self-study books. Secondly, the series will also develop handbooks for each of the major subject areas of biomedical engineering. Joseph Bronzino, the series editor, is one of the most renowned biomedical engineers in the world. He is the Vernon Roosa Professor of Applied Science at Trinity College in Hartford, Connecticut.
HANDBOOK
OF
MEDICAL IMAGING P RO C E S S I N G
A N D A N A LY S I S
Editor-in-Chief
Isaac N. Bankman, PhD Applied Physics Laboratory Johns Hopkins University Laurel, Maryland
San Diego / San Francisco / New York / Boston / London / Sydney / Tokyo
This book is printed on acid-free paper. s ? Copyright # 2000 by Academic Press All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Requests for permission to make copies of any part of the work should be mailed to: Permissions Department, Harcourt, Inc., 6277 Sea Harbor Drive, Orlando, Florida, 32887-6777. ACADEMIC PRESS 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA http://www.academicpress.com Academic Press Harcourt Place, 32 Jamestown Road, London NW1 7BY, UK Library of Congress Catalog Card Number: 00-101315 International Standard Book Number: 0-12-077790-8 Printed in the United States of America 00 01 02 03 04 COB 9 8 7 6 5 4 3 2 1
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
Enhancement
1 2 3 4
Fundamental Enhancement Techniques Raman B. Paranjape. . . . . . . . . . . . . . . Adaptive Image Filtering Carl-Fredrik Westin, Hans Knutsson, and Ron Kikinis. . . Enhancement by Multiscale Nonlinear Operators Andrew Laine and Walter Huda Medical Image Enhancement with Hybrid Filters Wei Qian . . . . . . . . . . . . . . .
II 5 6 7 8 9 10 11 12 13
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
ix xi xiii
. . . .
3 19 33 57
Overview and Fundamentals of Medical Image Segmentation Jadwiga Rogowska . . . . . . . . . . . . . . . . . . . . . Image Segmentation by Fuzzy Clustering: Methods and Issues Melanie A. Sutton, James C. Bezdek, Tobias C. Cahoon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Segmentation with Neural Networks Axel WismuÈller, Frank Vietze, and Dominik R. Dersch . . . . . . . . . . . . . . Deformable Models Tim McInerney and Demetri Terzopoulos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shape Constraints in Deformable Models Lawrence H. Staib, Xiaolan Zeng, James S. Duncan, Robert T. Schultz, and Amit Chakraborty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gradient Vector Flow Deformable Models Chenyang Xu and Jerry L. Prince . . . . . . . . . . . . . . . . . . . . . . . . Fully Automated Hybrid Segmentation of the Brain M. Stella Atkins and Blair T. Mackiewich. . . . . . . . . . . . . Volumetric Segmentation Alberto F. Goldszal and Dzung L. Pham . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Partial Volume Segmentation with Voxel Histograms David H. Laidlaw, Kurt W. Fleischer, and Alan H. Barr . .
69
Segmentation 87 107 127 147 159 171 185 195
III Quanti®cation 14 15 16 17 18 19 20 21 22 23 24 25
Two-Dimensional Shape and Texture Quanti®cation Isaac N. Bankman, Thomas S. Spisz, and Sotiris Pavlopoulos Texture Analysis in Three Dimensions as a Cue to Medical Diagnosis Vassili A. Kovalev and Maria Petrou . . . . Computational Neuroanatomy Using Shape Transformations Christos Davatzikos . . . . . . . . . . . . . . . . . . . . Arterial Tree Morphometry Roger Johnson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image-Based Computational Biomechanics of the Musculoskeletal System Edmund Y. Chao, N. Inoue, J.J. Elias, and F.J. Frassica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three-Dimensional Bone Angle Quanti®cation Jens A. Richolt, Nobuhiko Hata, Ron Kikinis, Jens Kordelle, and Michael B. Millis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Database Selection and Feature Extraction for Neural Networks Bin Zheng . . . . . . . . . . . . . . . . . . . . . . . . . Quantitative Image Analysis for Estimation of Breast Cancer Risk Martin J. Yaffe, Jeffrey W. Byng, and Norman F. Boyd. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classi®cation of Breast Lesions in Mammograms Yulei Jiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantitative Analysis of Cardiac Function Osman Ratib. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Processing and Analysis in Tagged Cardiac MRI William S. Kerwin, Nael F. Osman, and Jerry L. Prince . Image Interpolation and Resampling Philippe TheÂvenaz, Thierry Blu, and Michael Unser . . . . . . . . . . . . . . . .
215 231 249 261 285 299 311 323 341 359 375 393
IV Registration 26 27 28 29
Physical Basis of Spatial Distortions in Magnetic Resonance Images Peter Jezzard . . . . . . . . . . . . . . . . . . . . . Physical and Biological Bases of Spatial Distortions in Positron Emission Tomography Images Magnus Dahlbom and Sung-Cheng (Henry) Huang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biological Underpinnings of Anatomic Consistency and Variability in the Human Brain N. Tzourio-Mazoyer, F. Crivello, M. Joliot, and B. Mazoyer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spatial Transformation Models Roger P. Woods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
425 439 449 465
vii
30 31 32 33 34 35 36 37 38 39 40
Validation of Registration Accuracy Roger P. Woods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Landmark-Based Registration Using Features Identi®ed Through Differential Geometry Xavier Pennec, Nicholas Ayache, and Jean-Philippe Thirion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Registration Using Chamfer Matching Marcel Van Herk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Within-Modality Registration Using Intensity-Based Cost Functions Roger P. Woods. . . . . . . . . . . . . . . Across-Modality Registration Using Intensity-Based Cost Functions Derek L.G. Hill and David J. Hawkes Talairach Space as a Tool for Intersubject Standardization in the Brain Jack L. Lancaster and Peter T. Fox Warping Strategies for Intersubject Registration Paul M. Thompson and Arthur W. Toga . . . . . . . . . . . . Optimizing the Resampling of Registered Images William F. Eddy and Terence K. Young . . . . . . . . . . . . Clinical Applications of Image Registration Robert Knowlton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Registration for Image-Guided Surgery Eric Grimson and Ron Kikinis . . . . . . . . . . . . . . . . . . . . . . . . Image Registration and the Construction of Multidimensional Brain Atlases Arthur W. Toga and Paul M. Thompson. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
V
Visualization
41 42 43 44
Visualization Pathways in Biomedicine Meiyappan Solaiyappan . . . . . . . . . . . . . . . . . . . . . Three-Dimensional Visualization in Medicine and Biology Richard A. Robb . . . . . . . . . . . . . Volume Visualization in Medicine Arie E. Kaufman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fast Isosurface Extraction Methods for Large Image Data Sets Yarden Livnat, Steven G. Parker, and Christopher R. Johnson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Morphometric Methods for Virtual Endoscopy Ronald M. Summers . . . . . . . . . . . . . . . . . .
45
.....
491
. . . . . . . . .
. . . . . . . . .
499 515 529 537 555 569 603 613 623
.....
635
............ ............ ............
659 685 713
............ ............
731 747
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
VI Compression Storage and Communication 46
52
Fundamentals and Standards of Compression and Communication Stephen P. Yanek, Quentin E. Dolecek, Robert L. Holland, and Joan E. Fetter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Medical Image Archive and Retrieval Albert Wong and Shyh-Liang Lou . . . . . . . . . . . . . . . . . . . . . . . . Image Standardization in PACS Ewa Pietka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Evaluation for Compressed Medical Images: Fundamentals Pamela Cosman, Robert Gray, and Richard Olshen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Evaluation for Compressed Medical Images: Diagnostic Accuracy Pamela Cosman, Robert Gray, and Richard Olshen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Evaluation for Compressed Medical Images: Statistical Issues Pamela Cosman, Robert Gray, and Richard Olshen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three-Dimensional Image Compression with Wavelet Transforms Jun Wang and H.K. Huang . . . . . . . . .
53
Medical Image Processing and Analysis Software
47 48 49 50 51
Index
viii
.... .... ....
759 771 783
....
803
....
821
.... ....
841 851
Thomas S. Spisz and Isaac N. Bankman . . . . . . . . . . . . . . . .
863
............................................................................
895
Foreword
!" #
$
$ !" %& '
$ '
' $ $
'
( )
$
$ * # " ) % + * ,
Preface -
(
$ .
*
/ $
0
*
1 $ )
$ $
$ $ $
$ '
&
$ &
$
0
$
" !" 2
2
3
$ # $ $
$ ! $
4
4
$
+%&5 ( 6! *47 %
8%' 09
$
* :
)
* " 2 " " " " #
2 * * / # ) + * ,
& ;
" " 0 ( * +
* ' % 2 + ) ' % !
# ' * ! 4 + (
2 64) can be explained by load imbalances and the time required to synchronize processors at the required frame rate. The ef®ciencies would be higher for a larger image. Table 2 shows the improvements that were obtained through the data bricking and spatial hierarchy optimizations.
5 Example Applications In this last section, we give examples of the NOISE, viewdependent, and real-time ray tracing algorithms given in the three previous sections. The examples we chose are from large medical imaging data sets, as well as from a large-scale geoscience imaging data set. Further examples of each of the algorithms along with detailed performance analyses can be found in our recent papers on isosurface extraction [16, 25, 29, 35, 36]. Figure 15 shows NOISE examples; Fig. 16 shows view-dependent examples, and Figs. 17 and 18 show examples of real-time ray tracing.
TABLE 2 Times in seconds for optimizations for ray tracing the visible human View
Initial
Bricking
Hierarchy bricking
Skin: front Bone: front Bone: close Bone: from feet
1.41 2.35 3.61 26.1
1.27 2.07 3.52 5.8
0.53 0.52 0.76 0.62
A &+%6&+% image was generated on 16 processors using a single view of an isosurface.
FIGURE 15 Mantle convection modeling that was done on a Cray T3D at the Advanced Computing Laboratory at Los Alamos National Laboratory using more than 20 million elements. The left image shows a single, hot, isosurface, while the second adds a cold (blue) transparent isosurface. The third image shows a slice through the mantle with a single isosurface. See also Plate 133.
FIGURE 16 Full vs view-dependent isosurface extraction. The isosurfaces were computed based on a user point of view that was above and behind the skull. These images illustrate the large portions of the isosurface that the view-dependent algorithm was able to avoid. See also Plate 134.
FIGURE 17 Ray tracings of the bone and skin isosurfaces of the visible woman. See also Plate 135.
#$
#
FIGURE 18 A ray tracing with and without shadows. See also Plate 136.
Acknowledgments This work was supported in part by awards from the Department of Energy and the National Science Foundation. The authors thank Peter Shirley, Chuck Hansen, Han-Wei Shen, and Peter-Pike Sloan for their signi®cant contributors to the research presented in this chapter. The Visible Woman data set was obtained from the Visible Human Project of the National Library of Medicine.
References 1. B. Wyvill, G. Wyvill, C. McPheeters. Data structures for soft objects. The Visual Computer, 2:227±234, 1986. 2. William E. Lorensen and Harvey E. Cline. Marching cubes: A high resolution 3D surface construction algorithm. Computer Graphics, 21(4):163±169, July 1987. ACM Siggraph '87 Conference Proceedings. 3. J. Wilhelms and A. Van Gelder. Octrees for faster isosurface generation. Computer Graphics, 24(5):57±62, November 1990. 4. J. Wilhelms and A. Van Gelder. Octrees for faster isosurface generation. ACM Transactions on Graphics, 11(3):201±227, July 1992. 5. T. Itoh and K. Koyamada. Isosurface generation by using extrema graphs. In Visualization '94, pages 77±83. IEEE Computer Society Press, Los Alamitos, CA, 1994.
6. T. Itoh, Y. Yamaguchi, and K. Koyyamada. Volume thinning for automatic isosurface propagation. In Visualization '96 pages 303±310. IEEE Computer Society Press, Los Alamitos, CA, 1996. 7. R. S. Gallagher. Span ®lter: An optimization schemes for volume visualization of large ®nite element models. In Proceedings of Visualization '91, pages 68±75. IEEE Computer Soceity Press, Los Alamitos, CA, 1991. 8. M. Giles and R. Haimes. Advanced interactive visualization for CFD. Computer Systems in Engineering, 1(1):51± 62, 1990. 9. H. Shen and C. R. Johnson. Sweeping simplicies: A fast isosurface extraction algorithm for unstructured grids. Proceedings of Visualization '95, pages 143±150. IEEE Computer Society Press, Los Alamitos, CA, 1995. 10. J. L. Bentley. Multidimentional binary search trees used for associative search. Communications of the ACM, 18(9):509± 516, 1975. 11. M. Blum, R. W. Floyd, V. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds for selection. J. Computer and System Science, 7:448±461, 1973. 12. Sedgewick R. Algorithms in C . Addison-Wesley, Reading, MA, 1992. 13. D. T. Lee and C. K. Wong. Worst-case analysis for region and partial region searches in multidimentional binary search trees and balanced quad trees. Acta Information, 9(23):23±29, 1977. 14. J. L. Bentley and Stanat D. F. Analysis of range searches in quad trees. Info. Proc. Lett., 3(6):170±173, 1975.
15. P. Cignoni, C. Montani, E. Puppo, and R. Scopigno. Optimal isosurface extraction from irregular volume data. In Proceedings of IEEE 1996 Symposium on Volume Visualization. ACM Press, 1996. 16. H. Shen, C. D. Hansen, Y. Livnat, and C. R. Johnson. Isosurfacing is span space with utmost ef®ciency (ISSUE). In Proceedings of Visualization '96, pages 287±294. IEEE Computer Society Press, Los Alamitos, CA, 1996. 17. Ned Greeene. Hierarchical polygon tiling with coverage masks. In Computer Graphics, Annual Conference Series, pages 65±74, August 1996. 18. H.E. Cline, Lorensen W.E., and Ludke S. Two algorithms for the three-dimensional reconstruction of tomograms. Medical Physics, 15(3):320±327, 1988. 19. Philippe Lacroute and Mark Levoy. Fast volume rendering using a shear-warp factorization of the viewing transformation. In Computer Graphics, Annual Conference Series, pages 451±458, ACM SIGGRAPH, 1994. 20. Philippe G. Lacroute. Fast volume rendering using shearwarp factorization of the viewing transformation. Technical Report, Stanford University, September 1995. 21. Chyi-Cheng Lin and Yu-Tai Ching. An ef®cient volumerendering algorithm with an analytic approach. The Visual Computer, 12(10):515±526, 1996. 22. Stephen Marschner and Richard Lobb. An evaluationof reconstruction ®lters for volume rendering. In Proceedings of Visualization '94, pages 100±107, October 1994. 23. Milos Sramek. Fast surface rendering from raster data by voxel traversal using chessboard distance. In Proceedings of Visualization '94 pages 188±195, October 1994. 24. James T. Kajiya. An overview and comparison of rendering methods. A Consumer's and Developer's Guide to Images Synthesis, pages 259±263, 1988. ACM Siggraph '88 Course 12 Notes.
#&
25. Y. Livnat and C.D. Hansen. View dependent isosurface extraction. In IEEE Visualization '98 pages 175±180. IEEE Computer Society, Oct. 1998. 26. Mark Levoy. Display of surfaces from volume data. IEEE Computer Graphics & Applications, 8(3):29±27, 1988. 27. Paolo Sabella. A rendering algorithm for visualizing 3D scalar ®elds. Computer Graphics, 22(4):51±58, July 1988. ACM Siggraph '88 Conference Proceedings. 28. Craig Upson and Micheal Keeler. V-buffer: Visible volume rendering. Computer Graphics, 22(4):59±64, July 1988. ACM Siggraph '88 Conference Proceedings. 29. Steven Parker, Peter Shirley, Yarden Livnat, Charles Hansen, and Peter-Pike Sloan. Interactive ray tracing for isosurface rendering. In Proceedings of Visualization '98 October 1998. 30. John Amanatides and Andrew Woo. A fast voxel traversal algorithm for ray tracing. In Eurographics '87 1987. 31. Michael B. Cox and David Ellsworth. Applicationcontrolled demand paging for Out-of-Core visualization. In Proceedings of Visualization '97, pages 235±244, October 1997. 32. James Arvo and David Kirk. A survey of ray tracing acceleration techniques. In Andrew S. Glassner, editor, An Introduction to Ray Tracing, Academic Press, San Diego, CA, 1989. 33. Al Globus. Octree optimization. Technical Report RNR90-011, NASA Ames Research Center, July 1990. 34. Scott Whitman. Multiprocessor Methods for Computer Graphics Rendering. Jones and Bartlett Publishers, Sudbury, MA, 1992. 35. Y Livnat, H. Shen, and C. R. Johnson. A near optimal isosurface extraction algorithm using the span space. IEEE Trans. Vis. Comp. Graphics, 2(1):73±84, 1996. 36. J. Painter, H.P. Bunge, and Y. Livnat. Case study: Mantle convection visualization on the Cray T3D. In Proceedings of IEEE Visualization '96, IEEE Press, Oct. 1996.
National Institutes of Health
1 2 3
Overview of Virtual Endoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 Current Problems in Virtual Endoscopy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748 Shape-Based Detection of Endoluminal Lesions Using Curvature Analysis . . . . . . . 748
4
Fractal Measures of Roughness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
5
Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
3.1 Clinical Application of Shape-Based Lesion Detection 4.1 Clinical Application of Fractal Measures of Roughness
Virtual endoscopy (VE) is a novel display method for threedimensional medical imaging data. It produces endoscope-like displays of the interior of hollow anatomic structures such as airways, the gastrointestinal tract, and blood vessels [1±5]. For example, Fig. 1 shows a virtual bronchoscopy reconstruction of the air passages of a human lung with depiction of bronchi as small as ®fth and sixth order. Studies have shown virtual endoscopy to be useful for the visualization of morphologic abnormalities such as aneurysms, tumors, and stenoses [6±8]. VE displays are usually produced from planar 2D computed tomography (CT) or magnetic resonance (MR) images using surface or volume rendering [9, 10]. Surface rendering is generally done by taking an isosurface through the imaging volume at a speci®ed threshold value; this generates contours in three dimensions that are analogous to the two-dimensional isocontours of temperature or pressure (isotherms, isobars) on weather maps. In the case of VE, the contours represent the wall of a hollow anatomic structure, such as airways or blood vessels. One commonly used isosurface algorithm called ``marching cubes'' generates a triangular tessellation of the isosurface suitable for interactive display on computers equipped with graphics accelerators [11]. Disadvantages of surface rendering are that a complex segmentation must be done as a preprocessing step and only a fraction of the data is retained in the ®nal image. In contrast to surface rendering, volume rendering is done by considering the imaging volume to be a translucent gelatin
whose optical density and opacity are mapped to each voxel intensity through user-adjustable transfer functions. Volume rendering overcomes a disadvantage of surface rendering in that segmentation is not required. However, volume rendering is typically not interactive unless very expensive computers are used. In addition, the transfer functions can span a wide range of values; this freedom offers considerable ¯exibility, but it can be dif®cult to identify the correct choices to produce an accurate image. Some progress has been made toward de®ning an appropriate transfer function for virtual colonoscopy [12]. Since VE is used to visualize small structures such as airways, blood vessels, and colonic polyps whose size may be 1 cm or less, image data with small voxel dimensions, desirably 1 mm3 or less, is required to generate a VE. The smallest pathologic structure visualized on a VE will be on the order of the voxel resolution, which is as large as or larger than the voxel dimension. The voxel resolution depends on a host of adjustable parameters (e.g., for CT: helical pitch, reconstruction algorithm, section index) [13]. Presently, VE is viewed as either static images or as a movielike ``¯y-through'' that simulates conventional endoscopy. The lack of physical feedback necessitates new tools to orient the observer. Examples of such tools include navigation aids to integrate cross-sectional images with the VE image, centerline computation for automated ¯ight planning, unraveling of the colon to ease polyp identi®cation, and cockpit displays to provide greater visual coverage of the wall of the lumen and reduce blind spots [14±18]. Clinical uses for VE presently include detection of many of 747
Images derived from CT scans of autopsy human lung specimen. (A) Coronal multiplanar reformatted image shows excellent resolution required to depict small airways. Airways are the black branching structures (arrow). The three lobes of the right lung (left side of image) and two lobes of the left lung are shown. (B) Anteroposterior view of three-dimensional surface reconstruction of airways showing exquisite depiction of branching to ®fth and sixth order. (C) Virtual bronchoscopy view of bifurcation of a ®fth order bronchus only 2 or 3 mm in diameter, at or near furthest point reachable by a conventional bronchoscope.
the same abnormalities for which conventional endoscopy is indicated, for example virtual colonoscopy to detect colonic polyps, virtual bronchoscopy to detect bronchial stenoses and to guide biopsies, and virtual angioscopy to detect vascular pathology [19±22]. Relative to conventional endoscopy, VE's main bene®ts are its noninvasiveness and ability to integrate information about both the lumen and extraluminal structures into a single image. Although VE seeks to emulate conventional endoscopy, which has proven to be a powerful diagnostic and therapeutic tool, VE may surpass conventional endoscopy by solving some of the problems of conventional endoscopy (underutilization due to expense and invasiveness). In this context, VE may play a role in screening a general patient population for the presence of disease and serial evaluation to detect disease recurrence in already affected individuals. Screening would greatly expand the number of VE studies performed.
feces can simulate masses); and the utility of VE appears to be limited to focal abnormalities such as stenoses and tumors. There are a number of approaches to solving the problem of inef®ciency. Automated lesion detection software could direct physicians to sites likely to harbor lesions, thereby facilitating interpretation. Such software may reduce the number of false negatives by identifying lesions that could be missed on an automated ¯y-through, such as polyps hidden behind folds. The problem of identifying more diffuse abnormalites (e.g., atherosclerosis and in¯ammation can present as focal or diffuse disease) can be addressed by the development of new algorithms to analyze the VE. These problems fall into the category of morphometric analysis of VE reconstructions. We have previously described simple morphometric methods in VE (such as size measurement) and potential pitfalls (e.g., z-axis broadening, distortions due to perspective rendering) [9, 13]. In this paper, we review more sophisticated morphometric approaches (curvature and fractal analyses) that address the problems described above [23±25].
Important tasks of clinical diagnosis are to detect lesions and determine their signi®cance. Although a number of studies have shown that lesions can be detected using VE, a number of roadblocks to more widespread acceptance of VE have been identi®ed. Interpretation of a VE can be inef®cient, tedious, and time consuming, which can lead to fatigue, misdiagnoses, and decreased throughput; lesion identi®cation can be dif®cult in some circumstances (for example, on virtual colonoscopy, polyps can be missed if they lie behind haustral folds; retained
! "# $ Detection of endoluminal lesions (those that distort the wall of a hollow anatomic structure or protrude into its lumen) such as polypoid masses of the airway and colon is one important task of diagnostic imaging [26]. The routine use of thin CT and MR sections allows for detection of small lesions with VE. We
have found that curvature-based shape analysis can detect endobronchial lesions [23, 24]. This analysis uses the principal curvatures of the VE surface to segment it into areas of different shape. Lesions that protrude into the lumen are identi®ed as areas of the luminal surface with ``elliptical curvature'' of the ``peak subtype.'' We now describe a method we have used for computing curvatures of gray-scale image data. For a gray-scale image Ix; y; z, the local shape of an isosurface at a point is described by the Gaussian (K) and mean (H) curvatures, which can be computed from 2 2 Ix
Iyy Izz Iyz2 2Iy Iz
Ixz Ixy Ixx Iyz 16 6 2 2Ix Iz
Iyz Ixy Iyy Ixz K 2 6 Iy2
Ixx Izz Ixz
1 h 2 Iz2
Ixx Iyy Ixy 2Ix Iy
Ixz Iyz Izz Ixy 2
Ix2
Iyy Izz
2Iy Iz Iyz
1 6 6 H 3=2 6 Iy2
Ixx Izz 2h Iz2
Ixx Iyy
2Ix Iz Ixz ; 2Ix Iy Ixy
2
3b
They can be thought of as the reciprocal of the radii of the smallest and largest circles, respectively, that can be placed tangent to P. The greater the curvature, the smaller the osculating circle; the less the curvature, the ¯atter the surface and the larger the circle. Equations (1) and (2) require ®rst- and second-order partial derivatives, which are known to be susceptible to noise. To ameliorate this problem, ®lters can be used to smooth the data and reduce the undesirable effects of noise. Monga and Benayoun used the 3D Deriche ®lters to both smooth and compute derivatives [28]. The functions f0 ; f1 ; f2 are used to smooth and compute ®rst- and second-order partial derivatives, respectively: f0
x c0
1 ajxje f1
x c1 xa2 e f2
x c2
1
ajxj
ajxj
c3 ajxje
4 ajxj
Ixx
f2
xf0
yf0
z I
x; y; z
:
These functions are applied to the 3D image data as convolution ®lters. For ef®ciency, the functions are applied only to
5
Ixy
f1
xf1
yf0
z I
x; y; z The parameter a in Eq. (4) is inversely proportional to the width of the ®lter and the amount of smoothing that is performed. We set a to 1.0 for f0 and f1 and to 0.7 for f2 , since the second-order derivatives require greater smoothing to obtain better noise immunity [28]. The coef®cients c0 ; c1 ; c2 ; c3 are chosen to normalize the ®lters in Eq. (4). For the discrete implementation, the normalization is done using ?
f0
n 1;
?
nf1
n 1;
?
f2
n 0 and
?
and
p H2 K:
Ix
f1
xf0
yf0
z I
x; y; z
?
where Ix indicates the partial derivative of the image data with respect to x, and Ixz indicates the mixed partial derivative with respect to x and z, etc. and h Ix2 Iy2 Iz2 [27]. The maximum
kmax and minimum
kmin principal curvatures of a surface at a point P are given by p kmax H H 2 K
3a kmin H
points lying on the desired isosurface within the image. The required partial derivatives are computed using, for example,
?
6 ? 2 n ?
2
f2
n 1
for integer sampling [28]. The coef®cients need to be adjusted for the case of noninteger or anisotropic sampling. We use ®lters of ®nite width (for example, seven or nine voxels), compute the values of the functions in Eq. (4) and normalize the values using Eq. (6). Since the vertices forming the surface do not necessarily lie on voxel boundaries, we used linear interpolation to compute voxel intensities. The curvature computation is used to identify surface patches with a common curvature in order to segment the overall surface. Vertices having elliptical curvature of the peak subtype are considered to be within a potential lesion and are colored red to distinguish them from the others and assist visual inspection. Other curvature types describe normal airway surfaces. For example, hyperbolic curvature describes saddle points at bifurcations, cartilaginous rings, and haustral folds; cylindrical curvature and elliptical curvature of the ``pit subtype'' describe normal airway and colon (ulcerations ®t into the latter category but are ignored because of overlap with normal shape). The next step is to cluster these vertices using region growing to achieve a minimum lesion size. Size criteria offer some immunity to noise and excessive false positive detections, for example, by ignoring isolated vertices and ``lesions'' smaller than some threshold. The minimum size criterion is preferably expressed in millimeters rather than number of vertices, since vertex density can vary depending upon the voxel size. This method for computing differential geometric quantities for surfaces has a number of features that make it preferable to
an alternative method that ®ts bicubic spline patches at each surface point on the isosurface [23, 24]. First, it is very fast, since curve ®tting is unnecessary. Second, it works well for highly curved surfaces (such as small airways and vessels). Like the patch-®tting method, it can be used to color the original surface to indicate areas of different curvature.
% $ ! We have applied this curvature-based technique to identify lesions of the airway. In a study of 18 virtual bronchoscopy patient examinations, we found sensitivities of 55 to 100% and speci®cities of 63 to 82% for detecting airway lesions 5 mm in diameter or larger [24]. The results varied depending upon the choice of an adjustable parameter (the mean principal curvature). Other potential applications of this method are to detect pulmonary emboli and colonic lesions (Figs. 2 and 3) [29, 30]. In Fig. 2, a tiny but physiologically signi®cant embolus in a pulmonary artery branch of a pig is detected using curvature analysis. In Fig. 3, nodular lesions of the colonic mucosa are automatically detected. A software application which permits rapid inspection of the potential lesion sites has been shown to improve ef®ciency of interpretation as much as 77% [31]. Limitations of these methods include an inability to detect stenoses and a large number of false positive detections. Additional criteria (such as wall thickness) may be necessary to reduce the number of false positive detections [32, 33].
& ' ( ) # Another type of endoluminal surface abnormality potentially detectible with VE is surface roughness. Biomedical surface texture is usually thought of in the context of biomaterials (i.e., orthopedic prostheses, dental implants), but the same concepts can be applied to endoluminal surfaces. This is a new concept, and the ``normal'' roughness of endoluminal surfaces is not well characterized. Surface texture depends on the choice of scale of observation. Typically, normal endoluminal surfaces are smooth on gross examination. On a microscopic scale the intestinal mucosa is rough, consisting of millions of villi (®ne ®nger-like projections) per square centimeter. We con®ne ourselves to macroscopic scales on the order of the voxel dimensions. Based on our clinical experience, abnormal endoluminal surface texture (i.e., too smooth or rough) may occur under a variety of circumstances, including in¯ammation, atherosclerosis, or invasion of the wall by tumor. Hypothetically, swelling (edema) of the wall of a lumen could present as a smoother surface and atherosclerosis and tumor as a rougher surface compared to the baseline state. There are a number of ways to measure surface roughness
and develop a numeric index of roughness. These include fractal analysis, Fourier descriptors, variation of the surface normal, and the difference between either a ®tted spline patch or a smoothed version of the surface and the original [34±36]. For example, as a surface becomes rougher the relative weighting of high-frequency to low-frequency components in its Fourier spectrum increases; the variance of the direction of the surface normal increases; and the disparity between the smooth and unsmoothed data increases. We chose to investigate fractal methods for quantitating roughness because of their widespread use in a variety of medical settings. For example, fractal analysis has been used to quantitate roughness of dental implants, orthopedic prostheses, and osteoarthritic joints [37±39]. Fractals have the property of being self-similar over a wide range of scale and are a natural way to describe roughness. One accepted method of using fractal analysis to quantitate roughness is to compute the fractal dimension (D), a nonintegral number that lies between the corresponding ideal topological dimension and the Euclidean dimension of the space that contains the structure [40]. Points, curves, surfaces, and volumes have topological dimensions of 0, 1, 2, and 3, respectively. In contrast, fractal curves have a fractal dimension between 1 and 2 and fractal surfaces have a fractal dimension between 2 and 3. Figure 4 shows a curve with a fractal dimension that increases from left to right. As the fractal dimension increases, the curve becomes more chaotic and ®lls more space. If one imagines sliding a small disk (for example, of diameter 0.01 unit in this case) along the curve, the area swept out by the disk would be greatest for the most chaotic portion. This idea of measuring the area swept out by a shape or template (such as a disk, sphere, or higher dimensional analogue) forms the foundation for many methods of measuring fractal dimension. Figure 5 shows prototypical fractal surfaces generated using the midpoint displacement method [41]. The surfaces in the ®gure are similar in texture to anatomic surfaces, we have studied. The rougher the surface the greater the fractal dimension. There are several methods for computing the fractal dimension of experimental data. Examples include boxcounting, Fourier power spectral density, variation, and Minkowski-Bouligand sausage, of which box-counting is probably the most familiar [42±45]. These are fraught with error and there is some controversy over the best method [46, 47]. We implemented the variation method that is a modi®ed version of the Minkowski sausage [25, 48]. For a real-valued nonconstant function f x; y de®ned on the interval 0 x; y 1, the -oscillation of the function f in an neighborhood of the point
x; y is de®ned to be vf
x; y; sup jf
x1 ; y1
f
x2 ; y2 j
7
where the supremum is taken over all data pairs
x1 ; y1 ,
Pig pulmonary embolus model. (A) Axial CT image shows pulmonary embolus (arrow) in right pulmonary artery. (B) Anteroposterior view of three-dimensional surface rendering of main pulmonary arteries. (C) Closeup view of right pulmonary artery showing indentation in surface reconstruction due to presence of embolus. (D) Virtual angioscopic view showing narrowing of vessel lumen by embolus. Red coloring indicates area detected by curvature-based algorithm. Dataset kindly provided by Dr. James Brink, Yale University School of Medicine. See also Plate 137.
x2 ; y2 that lie within the grid square
x+; y+. The variation of f is de®ned as the average of vf over all
x; y:
Vf
1 1 0
vf
x; y; dxdy
8
0
With discrete data on an N N grid, the -variation of f is
Vf
n
1
R 1
R R
un
i; j 2 i0 j0
bn
i; j
9
where un and bn are the maximum and minimum values of the function f within the grid squares bounded by the grid indices
i kn ; j kn and
i kn ; j kn , where 1 kn 5R and 0 i; j5R, and the data are grouped into R 2 bins, where R5N . n kn =R is the scale over which the variation is
Images derived from CT scans of insuf¯ated human ileocecal autopsy specimen. (A) Three-dimensional surface rendering of terminal ileum and cecum. (B) Virtual colonoscopic view of cecum. Nodular areas shown in red were detected by curvature-based lesion detection algorithm and correspond with foci of lymphoid hyperplasia in patient with aplastic anemia and fungal typhlitis. See also Plate 138.
computed. The fractal dimension D is the slope of the plot of log Vf =3 vs log 1=. We found that the variation method tends to overestimate the fractal dimensions of test surfaces with low fractal
dimensions and underestimate the fractal dimensions of surfaces with high fractal dimensions [25]. However, the computed estimates of the surfaces' fractal dimensions did show a monotonic relationship to the true fractal dimension; this indicates that the relative roughness of two surfaces can be compared using this method. In other words, the method provides an ordinal measure of roughness.
&% $ ' ( ) # Application of this method to virtual bronchoscopy in a patient with a subcarinal tumor is shown in Fig. 6. The portion of
Fractal curve. Fractal dimension (D) of this curve varies continuously from its ideal topological dimension (D 1, left side of curve) to the Euclidean dimension of a surface (D 2, right side of curve). Amplitude of the curve is arbitrary. As its fractal dimension increases, the curve appears to ®ll more and more of the adjacent space and appears rougher. Fractal dimension is widely used to quantitate roughness of curves and surfaces. Adapted from [50].
Demonstration of synthetic fractal surfaces generated using midpoint displacement method. Fractal dimension of surfaces are (A) D 2:1 and (B) D 2:3. Analogous to the situation for fractal curves (Fig. 4), rougher fractal surfaces have greater fractal dimensions. These surfaces, although generated using a mathematical algorithm, have texture that mimics texture of some abnormal anatomic surfaces at virtual endoscopy.
Fractal analysis of virtual bronchoscopy reconstruction. Mediastinal melanoma metastasis in a 34year-old man. (A) Axial CT image shows subcarinal mass (``M''). (B) Three-dimensional reconstruction shows mass effect on carina (large arrow) and inferior wall of right mainstem bronchus. Segment of left mainstem bronchus is occluded (small arrows). Virtual bronchoscopy views of (C) carina and (D) right mainstem bronchus show irregular wall due to mass (``M''). Fractal dimension of carina
2:38+0:05 was greater than that of lateral, smooth wall (*) of right mainstem bronchus
2:26+0:04.
the airway wall involved by tumor D 2:38+0:05 was found to be rougher than an adjacent uninvolved wall
D 2:26+0:04. The error estimates are computed from the linear regression of the log-log plot and likely underestimate the true error. In a pilot study comparing a magnetic resonance virtual angioscopy of an atherosclerotic aorta to an aorta from a normal volunteer we found that the (visually) rougher atherosclerotic aorta
D 2:14+0:03 had a slightly higher fractal dimension than the normal aorta
D 2:09+0:05 in a region in the ascending aorta [25]. We attributed the small difference between these fractal dimensions to the limited data resolution. Additional studies will be necessary to test the clinical viability of these techniques and to better characterize the error estimates. Fractal dimension measurements are exquisitely sensitive to the noise level and resolution of the data. As CT and MRI
scanners improve and generate datasets with greater resolution and signal-to-noise ratios, VE data will be more amenable to precise fractal analysis.
* We have shown how morphometric methods can be applied to VE reconstructions to quantitate surface roughness and improve detection of abnormalities. Curvature and roughness analyses can be used to automatically detect potential lesions on VE, thereby improving ef®ciency and accuracy and enable detection of other endoluminal abnormalities such as atherosclerosis or invasion by tumor. The algorithms described in this review are logical
approaches to shape and roughness analyses of VE reconstructions. There is little question about the useful role of VE for evaluation of gross lesions. However, rugosity is a much newer application and its value is an open question. As the technology and clinical applications of VE progress, methods such as those described here to improve ef®ciency and accuracy and facilitate interpretation will likely become more important. Newer and faster scanners are becoming available that will improve the quality (reduced motion artifact) and resolution of the imaging data [49].
8.
9.
10.
$+# Andrew Dwyer is thanked for helpful comments and review of the manuscript. Nikos Courcoutsakis and David Kleiner are thanked for assistance with autopsy specimens. Lynne Pusanik provided computer programming support. James Malley is thanked for helpful discussions. James Brink kindly provided the pig pulmonary embolus dataset.
11.
12.
13.
) 1. Vining DJ, Liu K, Choplin RH, Haponik EF. Virtual bronchoscopy. Relationships of virtual reality endobronchial simulations to actual bronchoscopic ®ndings. Chest 1996; 109:549±553. 2. Higgins WE, Ramaswamy K, Swift RD, McLennan G, Hoffman EA. Virtual bronchoscopy for three-dimensional pulmonary image assessment: state of the art and future needs. Radiographics 1998; 18:761±778. 3. Hara AK, Johnson CD, Reed JE, Ahlquist DA, Nelson H, Ehman RL, McCollough CH, Ilstrup DM. Detection of colorectal polyps by computed tomographic colography: feasibility of a novel technique. Gastroenterology 1996; 110:284±290. 4. Davis CP, Ladd ME, Romanowski BJ, Wildermuth S, Knoplioch JF, Debatin JF. Human aorta: Preliminary results with virtual endoscopy based on three-dimensional MR imaging data sets. Radiology 1996; 199:37±40. 5. Robb RA, Aharon S, Cameron BM. Patient-speci®c anatomic models from three dimensional medical image data for clinical applications in surgery and endoscopy. Journal of Digital Imaging 1997; 10:31±35. 6. Kimura F, Shen Y, Date S, Azemoto S, Mochizuki T. Thoracic aortic aneurysm and aortic dissection: New endoscopic mode for three-dimensional CT display of aorta. Radiology 1996; 198:573±578. 7. Fleiter T, Merkle EM, Aschoff AJ, Lang G, Stein M, Gorich J, Liewald F, Rilinger N, Sokiranski R. Comparison of real-
14. 15.
16.
17.
18.
19.
20.
21.
time virtual and ®beroptic bronchoscopy in patients with bronchial carcinoma: opportunities and limitations. Am J Roentgenol 1997; 169:1591±1595. Ferretti GR, Knoplioch J, Bricault I, Brambilla C, Coulomb M. Central airway stenoses: preliminary results of spiralCT-generated virtual bronchoscopy simulations in 29 patients. Eur Radiol 1997; 7:854±859. Summers RM, Feng DH, Holland SM, Sneller MC, Shelhamer JH. Virtual bronchoscopy: segmentation method for real-time display. Radiology 1996; 200:857± 862. Rubin GD, Beaulieu CF, Argiro V, Ringl H, Norbash AM, Feller JF, Dake MD, Jeffrey RB, Napel S. Perspective volume rendering of CT and MR images: applications for endoscopic imaging. Radiology 1996; 199:321±330. Lorensen WE, Cline HE. Marching Cubes: A high resolution 3D surface reconstruction algorithm. ACM Computer Graphics 1987; 21:163±169. McFarland EG, Brink JA, Loh J, Wang G, Argiro V, Balfe DM, Heiken JP, Vannier MW. Visualization of colorectal polyps with spiral CT colography: evaluation of processing parameters with perspective volume rendering. Radiology 1997; 205:701±707. Summers RM, Shaw DJ, Shelhamer JH. CT virtual bronchoscopy of simulated endobronchial lesions: effect of scanning, reconstruction, and display settings and potential pitfalls. Am J Roentgenol 1998; 170:947±950. Summers RM. Navigational aids for real-time virtual bronchoscopy. Am J Roentgenol 1997; 168:1165±1170. Paik DS, Beaulieu CF, Jeffrey RB, Napel S. Virtual colonoscopy visualization modes using cylindrical and planar map projections: Technique and evaluation. Radiology 1998; 209P:429±429. Paik DS, Beaulieu CF, Jeffrey RB, Rubin GD, Napel S. Automated ¯ight path planning for virtual endoscopy. Medical Physics 1998; 25:629±637. Beaulieu CF, Jeffrey RB, Karadi G, Paik DS, Napel S. Visualization modes for CT colonography: Blinded comparison of axial CT, virtual endoscopy, and panoramic-view volume rendering. Radiology 1998; 209P:296±297. Wang G, McFarland EG, Brown BP, Vannier MW. GI tract unraveling with curved cross sections. IEEE Trans Med Imaging 1998; 17:318±322. Hara AK, Johnson CD, Reed JE, Ahlquist DA, Nelson H, MacCarty RL, Harmsen WS, Ilstrup DM. Detection of colorectal polyps with CT colography: initial assessment of sensitivity and speci®city. Radiology 1997; 205:59±65. McAdams HP, Goodman PC, Kussin P. Virtual bronchoscopy for directing transbronchial needle aspiration of hilar and mediastinal lymph nodes: a pilot study. Am J Roentgenol 1998; 170:1361±1364. McAdams HP, Palmer SM, Erasmus JJ, Patz EF, Connolly JE, Goodman PC, Delong DM, Tapson VF. Bronchial
22.
23.
24.
25.
26. 27.
28.
29.
30.
31.
32.
33.
34.
anastomotic complications in lung transplant recipients: virtual bronchoscopy for noninvasive assessment. Radiology 1998; 209:689±695. Smith PA, Heath DG, Fishman EK. Virtual angioscopy using spiral CT and real-time interactive volumerendering techniques. J Comput Assist Tomogr 1998; 22:212±214. Summers RM, Selbie WS, Malley JD, Pusanik LM, Dwyer AJ, Courcoutsakis N, Shaw DJ, Kleiner DE, Sneller MC, Langford CA, Holland SM, Shelhamer JH. Polypoid lesions of airways: early experience with computer-assisted detection by using virtual bronchoscopy and surface curvature. Radiology 1998; 208:331±337. Summers RM, Pusanik LM, Malley JD. Automatic detection of endobronchial lesions with virtual bronchoscopy: comparison of two methods. In: Medical Imaging 1998: Image Processing. San Diego, California: SPIE, 1998; 3338:327±335, http://www.cc.nih.gov/drd/twomethpc.pdf Summers RM, Pusanik LM, Malley JD, Hoeg JM. Fractal analysis of virtual endoscopy reconstructions. In: Medical Imaging 1999: Physiology and Function from Multidimensional Images. San Diego, California: SPIE, 1999; 3660:258±269, http://www.cc.nih.gov/drd/fractal.pdf Shepard JA. The bronchi: an imaging perspective. J Thorac Imaging 1995; 10:236±254. Thirion J-P, Gourdon A. Computing the differential characteristics of isointensity surfaces. Comput Vision Image Understand 1995; 61:190±202. Monga O, Benayoun S. Using partial derivatives of 3D images to extract typical surface features. Comput Vision Image Understand 1995; 61:171±189. Summers RM, Beaulieu CF, Pusanik LM, Malley JD, Jeffrey RB, Glazer DI, Napel S. An automated polyp detector for CT colonography Ð feasibility study. Radiology 2000; in press. Summers RM, Pusanik LM, Malley JD, Reed JE, Johnson CD. Method of labeling colonic polyps at CT colonography using computer-assisted detection. In: Computer Assisted Radiology and Surgery (CARS). San Francisco, CA: Elsevier Science, 2000:in press. Summers RM. Image gallery: a tool for rapid endobronchial lesion detection and display using virtual bronchoscopy. J Digital Imaging 1998; 11:53±55. Vining D, Ge Y, Ahn D, Stelts D, Pineau B. Enhanced virtual colonoscopy system employing automatic detection of colon polyps. Gastroenterology 1998; 114:A698± A698. Vining DJ, Ahn DK, Ge Y, Stelts DR. Improved computerassisted colon polyp detection. Radiology 1998; 209P:649± 649. Mandelbrot BB. The Fractal Geometry of Nature. San Francisco: W.H. Freeman, 1982.
35. Zahouani H, Vargiolu R, Loubet JL. Fractal models of surface topography and contact mechanics. Mathematical and Computer Modelling 1998; 28:517±534. 36. Lestrel PE. Fourier Descriptors and Their Applications in Biology. New York: Cambridge University Press, 1997. 37. Oshida Y, Hashem A, Nishihara T, Yapchulay MV. Fractal dimension analysis of mandibular bones: toward a morphological compatibility of implants. Biomed Mater Eng 1994; 4:397±407. 38. Stachowiak GW, Stachowiak GB, Campbell P. Application of numerical descriptors to the characterization of wear particles obtained from joint replacements. Proc Inst Mech Eng [H] 1997; 211:1±10. 39. Fazzalari NL, Parkinson IH. Fractal properties of subchondral cancellous bone in severe osteoarthritis of the hip. J Bone Miner Res 1997; 12:632±640. 40. Talibuddin S, Runt JP. Reliability test of popular fractal techniques applied to small 2-dimensional self-af®ne data sets. J Appl Phys 1994; 76:5070±5078. 41. Saupe D. Algorithms for random fractals. In: M. F. Barnsley, R. L. Devaney, B. B. Mandelbrot, H.-O. Peitgen, D. Saupe and R. F. Voss, ed. The Science of Fractal Images. New York: Springer±Verlag, 1988. 42. Biswas MK, Ghose T, Guha S, Biswas PK. Fractal dimension estimation for texture images: A parallel approach. Pattern Recognition Letters 1998; 19:309±313. 43. Milman VY, Stelmashenko NA, Blumenfeld R. Fracture surfaces Ð a critical-review of fractal studies and a novel morphological analysis of scanning-tunneling-microscopy measurements. Prog Mat Sci 1994; 38:425±474. 44. Dubuc B. On estimating fractal dimension. M. Eng. Thesis, Montreal: McGill University, 1988. 45. Dubuc B, Quiniou JF, Roques-Carmes C, Tricot C, Zucker SW. Evaluating the fractal dimension of pro®les. Phys Rev A 1989; 39:1500±1512. 46. Huang Q, Lorch JR, Dubes RC. Can the fractal dimension of images be measured. Pattern Recognition 1994; 27:339± 349. 47. Kulatilake P, Um J, Pan G. Requirements for accurate estimation of fractal parameters for self-af®ne roughness pro®les using the line scaling method. Rock Mechanics and Rock Engineering 1997; 30:181±206. 48. Dubuc B, Zucker SW, Tricot C, Quiniou JF, Wehbi D. Evaluating the fractal dimension of surfaces. Proc R Soc Lond A 1989; 425:113±127. 49. Summers RM, Sneller MC, Langford CA, Shelhamer JH, Wood BJ. Improved virtual bronchoscopy using a multislice helical CT scanner. In: Medical Imaging 2000: Physiology and Function from Multidimensional Images. San Diego, California: SPIE, 2000; 3978 : 117±121, http:// www.cc.nih.gov/drd/betterVB.pdf 50. Turner MJ, Blackledge JM, Andrews PR. Fractal Geometry in Digital Imaging. San Diego: Academic Press, 1998.
46 47 48 49 50 51 52
Fundamentals and Standards of Compression and Communication Stephen P. Yanek, Quentin E. Dolecek, Robert L. Holland, and Joan E. Fetter . . . . . . . Medical Image Archive and Retrieval Albert Wong and Shyh-Liang Lou. . . . . . . . . Image Standardization in PACS Ewa Pietka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Evaluation for Compressed Medical Images: Fundamentals Pamela Cosman, Robert Gray, and Richard Olshen . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Evaluation for Compressed Medical Images: Diagnostic Accuracy Pamela Cosman, Robert Gray, and Richard Olshen . . . . . . . . . . . . . . . . . . . . . . . . . . Quality Evaluation for Compressed Medical Images: Statistical Issues Pamela Cosman, Robert Gray, and Richard Olshen . . . . . . . . . . . . . . . . . . . . . . . . . . Three-Dimensional Image Compression with Wavelet Transforms Jun Wang and H.K. Huang. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
759 771 783 803 821 841 851
T
his section presents the fundamentals and standards that form the basis of medical image archiving and telemedicine systems. Seven chapters are included. The ®rst chapter (46) introduces the basic concepts and standards of image compression and communication, including JPEG and MPEG, as well as the potential contributions of wavelets and fractals. The chapter also describes the essentials of technologies, procedures, and protocols of modern telecommunications systems used in medicine. The second chapter (47) discusses medical image archiving, retrieval, and communication. The structure and function of Picture Archiving and Communication Systems (PACS) are described. The de facto DICOM (digital image communications in medicine) format, communication standards for images, as well as the HL-7 (Health Level 7) format for medical text data, are introduced. The DICOM image query and retrieve service class operations are also addressed. The third chapter (48) presents a suggested standard imposed by users according to their needs in clinical practice. Since the type of users can be diverse, the format and content of the data vary and may include diagnostic images, medical reports, comments, and administrative reports. This chapter focuses on the adjustment of the image content in preparation for medical diagnosis, and provides recommendations for adoption of such image standardization in PACS. The other four chapters focus on medical image compression with lossy techniques. Chapters 49, 50, and 51 discuss quality evaluation for lossy compressed images, addressing fundamentals, diagnostic
757
VI
accuracy, and statistical issues. Topics that include average distortion, signal-to-noise ratio, subjective ratings, diagnostic accuracy measurements, and the preparation of a gold standard are covered in Chapter 49. Chapter 50 provides examples to illustrate quantitative methods of evaluation of diagnostic accuracy of compressed images. These include lung nodules and mediastinal adenopathy in CT images, aortic aneurysms in MR images, and mammograms with microcalci®cations and masses. Chapter 51 addresses the statistical basis of quality evaluation. Topics including differences among radiologists, effectiveness of the experimental design, relationship of diagnostic accuracy to other measures, statistical size and power, effects of learning on the outcomes, relationship between computed measures and perceptual measures, and con®dence intervals are discussed. Chapter 52 in this section reviews the methodology used in 3D wavelet transforms. The basic principles of 3D wavelet transform are ®rst discussed and the performance of various ®lter functions are compared using 3D CT and MRI data sets of various parts of the anatomy.
46 Fundamentals and Standards of Compression and Communication Stephen P. Yanek Quentin E. Dolecek Robert L. Holland Joan E. Fetter Johns Hopkins University
1 2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Compression and Decompression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 761 2.1 Joint Photographic Experts Group (JPEG) Compression 2.2 Moving Picture Experts Group (MPEG) Compression 2.3 Wavelet Compression 2.4 Fractal Compression
3
Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 3.1 Signal Hierarchy and Transfer Rates 3.2 Network Interoperability 3.3 Telemedicine Applications Compatibility
4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
The principal objectives of this chapter are to introduce basic concepts and standards for data compression and communications in telemedicine systems, and to lay a foundation for further study in this ®eld. According to the Institute of Medicine, telemedicine is the use of electronic information and communications technology to provide and support health care when distance separates the participants [1]. Viewing a system as a collection of hardware, software, people, facilities, data, and procedures organized to accomplish a common objective offers insight into the wide range of issues that affect the design, deployment, and use of telemedicine systems. The interest in telemedicine systems has created signi®cant demand for standards that ensure compatibility between heterogeneous systems that process and display health care data, improved compression and decompression algorithms, and economical and reliable ways for sending and receiving, storing, and retrieving multimedia data. Multimedia is a general term used for documents, presentations, and other means of disseminating information in the form of text, voice, and graphics, as well as moving and still pictures in color, gray scale, or black and white [2]. Telemedicine applications involve mainly image transmission within and among health care organizations. In an earlier time, the term ``telemedicine'' was used interchangeably with Copyright # 2000 by Academic Press. All rights of reproduction in any form reserved.
video teleconferencing, or speci®c clinical applications such as teleradiology or telepathology. Then, as now, a signi®cant number of consultations between participants involved audio teleconferencing and exchange of facsimiles containing text, numerical data, or waveforms such as the electrocardiogram. Several steps are typically involved in transferring multimedia data from one site to another, including scanning and digitizing ®lm images, and incorporating demographic and other patient information. Then may come compressing the volume of data to allow images to be sent more economically and quickly, followed by reconstruction of images at the receiving end for viewing and interpretation. In general, transactions used in telemedicine applications may be placed into three general categories: dynamic (e.g., interactive television); static (e.g., teleradiology); and a combination involving static data and interactive or dynamic features (e.g., telesurgery and telementoring) [3]. As depicted in Fig. 1, telemedicine systems comprise a variety of component technologies and services. Figure 1 illustrates technology and services that perform several basic functions, for example, image or data acquisition, digitization, compression, storage, data manipulation and computation, display, and communication. The software that enables many of the technologies and services is not apparent in the ®gure. Telemedicine applications have demanding networking requirements when large volumes of data are required. Since
759
760
Compression Storage and Communication
Elements of a typical telemedicine system.
use of images by participants in telemedicine applications places great demands on hardware and software technology associated with each telemedicine system, we will focus here on image processing and telecommunications features. In general, an image is a representation of an object, organ, or tissue made visible through physical and computational processes. Images used in telemedicine applications are frequently reproductions of a clinical image initially obtained using various modalities such as conventional projection Xrays, computed radiography (CR), computed tomography (CT), nuclear medicine (NM), magnetic resonance imaging (MRI) or ultrasound (US). A digital image processing system, for the purposes of this discussion, will consist of a set of hardware and software modules that perform basic functions such as acquisition, display, and communication. As an example, a picture archive and communications system (PACS), discussed in Chapters 47 and 48, provides storage, retrieval, manipulation, and communication of digital images and related data. PACS manage image records that come from a variety of imaging sources. A PACS is designed to provide archiving as well as rapid access to images, and to integrate images of different modalities. The large volume of
information produced by these modalities requires the use of appropriate storage systems or media for both recent images and archives for older images. Furthermore, reduction in the time and cost of transmission of images and related data requires effective and ef®cient compression and communications standards. In some imaging modalities such as photography or projection X-ray ®lm, the original image is the ®rst analog counterpart of the imaged object. The photographic camera forms an image on a light-sensitive ®lm or plate and the radiograph is a negative image on photographic ®lm made by exposure to X-rays or gamma rays that have passed through matter or tissue. For computerized analysis this two-dimensional continuous function f
x; y has to be discretized in space, and intensity or color. In the resulting digital image g
i; j each pixel is represented by one or more bits of data depending on whether the image is binary, gray scale, or color. Typically, gray-scale medical images require 8, 10, or 12 bits per pixel and full color images require 24 bits per pixel. In other imaging techniques such as MRI, the original image is in digital form. Bits representing pixels are the essential elements that need to be stored, retrieved, transmitted, and displayed. For the
761
46 Fundamentals and Standards of Compression and Communication
purposes of compression we will consider the initial digital image as the original image. Digital images often require large amounts of storage that depends upon the resolution of the imaging, digitization, or scanning processes. As the resolution or size requirement of the image increases, the amount of data increases. For example, an image that covers an area of 1 square inch with a density or resolution of 400 dots per inch (dpi) consists of 160,000 pixels. Therefore, an 8-by-10 inch image of the same resolution requires 12,800,000 pixels. As the volume of pixel data storage grows, the time required to access, retrieve, and display the data increases. The location (i.e., on-line or off-line), type, and ef®ciency of storage media affect access and retrieval times. Images are frequently compressed to reduce the ®le size for ef®cient storage and transmission. Once compressed, the image requires less space for storage and less time for transmission over a network and between devices. An important question in the maturation of telemedicine concerns the quality and utility of the clinical images provided for interpretation. For example, dermatologists are trained to provide an accurate diagnosis and/or differential diagnosis using photographs and photographic slides during residency training, intramural exams, in-service exams, board certi®cation preparation, and continuing medical education. Although dermatologists are accustomed to evaluating skin conditions using 2-by-2 inch photographic slides, there is reluctance to accept digital images because they inherently provide less visual information (resolution) than conventional clinical photography. In addition, color is an important factor in diagnosis of dermatological images. At present the rendition of colors in a telemedicine image is often determined by adjusting the red± green±blue mix in the display. Table 1 sets the stage for discussions about data compression techniques by specifying the typical size of digital images that are generated by modalities just cited.
Compression and Decompression Compression and decompression are essential aspects of data management, display, and communication in telemedicine systems. The International Consultative Committee for
Telegraph and Telephone (CCITT) de®nes the standards for image compression and decompression. Initially de®ned for facsimile transmissions, the standards now contain recommendations that accommodate higher-resolution images. The main objective of compression is removal of redundancies that occur in three different types: coding redundancy, interpixel redundancy, and psychovisual redundancy. Coding refers to the numerical representation of the intensity or color of each pixel, and several techniques exist to optimize the choice of code primarily using the histogram of image intensity or color [5]. Interpixel redundancy is related to the correlation among consecutive pixels in the image. If a black pixel is followed by 17 white pixels, the latters can be represented and stored in a more ef®cient manner than storing 17 pixels with the same values, for example, with a mechanism that indicates the start and length of the white-pixel run such as in run length coding [5]. Psychovisual redundancy results from the fact that human perception of image information does not rely on speci®c analysis of individual pixels. Typically, local edges and texture structures are evaluated and compared to known information to interpret the image visually. The level and location of psychovisual redundancy can be determined only with feedback from a human operator and tends to be relatively subjective. While removal of coding and interpixel redundancies does not eliminate information from the image, the removal of psychovisual redundancies decreases the information in the information theoretic sense. The effect of this reduction depends on the image, observer, and application. Compression techniques that remove only coding and interpixel redundancies are lossless, compression techniques that also remove psychovisual redundancies are lossy. The evaluation of the quality of images compressed with lossy techniques is addressed in Chapters 49, 50, and 51. Compression techniques generally attempt to achieve a compromise between two undesirable outcomes: potentially deleting critical information and insuf®cient reduction of the image ®le size. Furthermore, compression and decompression should not introduce errors or artifacts. Performance and cost are important factors in choosing between lossy and lossless compression. Higher performance (i.e., closer to lossless) compression and decompression typically requires more storage. In other words, an identical copy of the original image costs more than a ``reasonable facsimile.'' In general,
Common resolutions of digital images [4]
Image acquisition modality
Image size (number of pixels)
Pixel value (number of bits)
Scanned conventional radiography Computerized tomography Magnetic resonance imaging Ultrasound Nuclear medicine
204862048 5126512 2566256 5126512 1286128
12 16 12 8 8
762
VI
lossy compression is acceptable for images if further analysis or use of the images will tolerate missing data. The following are common lossy compression techniques [2]: * * * *
Joint Photographic Experts Group (JPEG) Moving Picture Experts Group (MPEG) Wavelets Fractals
Compression Storage and Communication
V by 2, thereby obtaining some image compression at this stage. JPEG and MPEG use the discrete cosine transform (DCT), which is given for an N 6N image f
x; y by F
u; v auav
N 1 N 1 x0 y0
f x; y cos
2x 1up 2N
2y 1vp cos ; 2N
The following are representative lossless compression techniques [2]:
u; v 0; 1; 2; :::; N 1; where
* * * *
CCITT Group 3 1D CCITT Group 3 2D CCITT Group 4 Lempel-Ziv and Welch (LZW) algorithm
( p 1=N au p 2=N
for for
u0
u 1; 2; . . . ; N 1;
and the inverse transform is Binary images such as black-and-white text and graphics are good candidates for lossless compression. Some color and gray-scale images that contain series of successive identical pixels may also be compressed signi®cantly with lossless compression. Pixels of color images commonly vary in chromatics, since adjacent pixels may have different color values. Frequent change in color requires storing the bits for a large number of pixels because there is little repetition. The same applies to gray-scale images where a large number of shades are present. In these cases, lossless compression techniques may not produce a reduction in size suf®cient to be practical. Generally, when lossless compression techniques do not produce acceptable results, lossy approaches are used in a way that minimizes the effects of the loss. The payoff of compression becomes generally questionable with animated images and full motion color video. All current image compression techniques utilize one or more of the blocks shown in Fig 2. If the image data is in red±green±blue (RGB) format it may be changed to hue saturation value (YUV). This process is applied to color images to change the relationship between pixels and produce an image that is more appropriate for human perception. MPEG uses this format and reduces Y and
f x; y
N 1 N 1 X X
auavFu; v cos
u0 v0
2x 1up 2N
2y 1vp cos ; 2N x; y 0; 1; 2; :::; N 1: The DCT is a reversible transform and retains all the information in the image. Compression in JPEG and MPEG is obtained by discarding some of the DCT components. The principle is that a small section of the image can be represented by the average color intensity and only major differences from the average. Minor differences that are ignored provide the compression and introduce the loss. The wavelet transform described in a subsequent chapter is based on one of many available basis functions, and it is typically used for lossy compression by discarding some of the transform components. The subsequent step, threshold and quantization, reduces the number of levels that represent gray scale or color, producing a reasonable but not accurate representation of the output of the transform step. The quality of reproduction depends on the number of levels used in quantizing color or
Block diagram of transform based image compression.
46 Fundamentals and Standards of Compression and Communication
gray level. Quantization and thresholding are the steps where most of the image loss usually occurs and the quality of the compressed image is determined. Entropy encoding is the ®nal stage where lossless compression is applied to the quantized data. Several techniques can be used, such as run-length and Huffman coding [5] (CCITT Group 3), and two-dimensional encoding (CCITT Group 4).
2.1 Joint Photographic Experts Group (JPEG) Compression JPEG, formed as a joint committee of the International Standards Organization (ISO) and CCITT, focuses on standards for still image compression. The JPEG compression standard is designed for still color and gray-scale images, otherwise known as continuous tone images, or images that are not restricted to dual-tone (black and white) only. Emerging technologies such as color fax, scanners, and printers need a compression standard that can be implemented at acceptable price-to-performance ratios. The JPEG standard is published in two parts: (1) The part that speci®es the modes of operation, the interchange formats, and the encoder/decoder speci®ed for these modes along with implementation guidelines (2) The part that describes compliance tests which determine whether the implementation of an encoder or decoder conforms to Part 1 to ensure interoperability of systems. The JPEG compression standard has three levels of de®nition: * * *
Baseline system Extended system Special lossless function
A coding function performed by a device that converts analog signals to digital codes and digital codes to analog signals is called a codec. Every codec implements a baseline system, also known as the baseline sequential encoding. The codec performs analog sampling, encoding/decoding, and digital compression/decompression. The baseline system must satisfactorily decompress color images and handle resolutions ranging from 4 to 16 bits per pixel. At this level, the JPEG compression standard ensures that software, custom very large scale integration (VLSI), and digital signal processing (DSP) implementations of JPEG produce compatible data. The extended system covers encoding aspects such as variable length encoding, progressive encoding, and the hierarchical mode of encoding. All of these encoding methods are extensions of the baseline sequential encoding. The special lossless function also known as predictive loss coding, is used when loss in compressing the digital image is not acceptable. There are four modes in JPEG:
* * * *
763
Sequential encoding Progressive encoding Hierarchical encoding Lossless encoding
JPEG sequential encoding requirements dictate encoding in a left-to-right sequence and top-to-bottom sequence to ensure that each pixel is encoded only once. Progressive encoding is usually achieved by multiple scans. The image is decompressed so that a coarser image is displayed ®rst and is ®lled in as more components of the image are decompressed. With hierarchical encoding, the image is compressed to multiple resolution levels so that lower resolution levels may be accessed for lower resolution target systems without having to decompress the entire image. With lossless encoding, the image is expected to provide full detail when decompressed. JPEG and wavelet compression are compared in Fig. 3, on 8bit gray scale images, both at a compression ratio of 60 to 1. The top row shows chest X-ray images and the bottom row presents typical magni®ed retina images. The image detail is retained better in the wavelet compressed image.
2.2 Moving Picture Experts Group (MPEG) Compression Standardization of compression algorithms for video was ®rst initiated by CCITT for teleconferencing and video telephony. The digital storage media for the purpose of this standard include digital audio tape (DAT), CD-ROM, writeable optical disks, magnetic tapes, and magnetic disks, as well as communications channels for local and wide area networks, LANs and WANs, respectively. Unlike still image compression, full motion image compression has time and sequence constraints. The compression level is described in terms of a compression rate for a speci®c resolution. The MPEG standards consist of a number of different standards. The original MPEG standard did not take into account the requirements of high-de®nition television (HDTV). The MPEG-2 standards, released at the end of 1993, include HDTV requirements in addition to other enhancements. The MPEG-2 suite of standards consists of standards for MPEG-2 audio, MPEG-2 video, and MPEG-2 systems. It is also de®ned at different levels to accommodate different rates and resolutions as described in Table 2. Moving pictures consist of sequences of video pictures or frames that are played back at a ®xed number of frames per second. Motion compensation is the basis for most compression algorithms for video. In general, motion compensation assumes that the current picture (or frame) is a revision of a previous picture (or frame). Subsequent frames may differ slightly as a result of moving objects or a moving camera, or both. Motion compensation attempts to account for this movement. To make the process of comparison more ef®cient, a frame is not encoded as a whole. Rather, it is split into blocks,
764
VI
3
Effect of 60 to 1 compression on 8-bit gray-scale images.
and the blocks are encoded and then compared. Motion compensation is a central part of MPEG-2 (as well as MPEG-4) standards. It is the most demanding of the computational algorithms of a video encoder. The established standards for image and video compression developed by JPEG and MPEG have been in existence, in one form or another, for over a decade. When ®rst introduced, both processes were implemented via codec engines that were entirely in software and very slow in execution on the computers of that era. Dedicated hardware engines have been developed and realtime video compression of standard television transmission is
Compression Storage and Communication
now an everyday process, albeit with hardware costs that range from $10,000 to $100,000, depending on the resolution of the video frame. JPEG compression of ®xed or still images can be accomplished with current generation PCs. Both JPEG and MPEG standards are in general usage within the multimedia image compression world. However, it seems that the DCT is reaching the end of its performance potential since much higher compression capability is needed by most of the users in multimedia applications. The image compression standards are in the process of turning away from DCT toward wavelet compression.
MPEG-2 resolutions, rates, and metrics [2]
Level
Pixel to line ratio
Compression and decompression rate
Lines per frame
Frames per second
Pixels per second
High High Main Low
1920 1440 720 352
Up Up Up Up
1152 1152 576 288
60 60 30 30
62.7 million 47 million 10.4 million 2.53 million
to to to to
60 Mbits per second 60 Mbits per second 15 Mbits per second 4 Mbits per second
765
46 Fundamentals and Standards of Compression and Communication
2.3 Wavelet Compression Table 3 presents quantitative information related to the digitization and manipulation of a representative set of ®lm images. As shown, the average medical image from any of several sources translates into large digital image ®les. The signi®cant payoff achieved with wavelet compression is the capability to send the image or set of images over low-cost telephone lines in a few seconds rather than tens of minutes to an hour or more if compression is not used. With wavelet compression, on-line medical collaboration can be accomplished almost instantaneously via dial-up telephone circuits. The average compression ratios shown in Table 3 are typically achieved with no loss of diagnostic quality using the wavelet compression process. In many cases, even higher ratios are achievable with retention of diagnostic quality. The effect of compression on storage capability is equally remarkable. For example, a set of six 35-mm slide images scanned at 1200 dpi producing nearly 34 Mbytes of data would compress to less than 175 kbytes. A single CD-ROM would hold the equivalent of nearly 24,000 slide images. DCT will give way to wavelet compression simply because the wavelet transform provides 3 to 5 times higher compression ratios for still images than the DCT with an identical image quality. Figure 3 compares JPEG compression with wavelet compression on a chest X-ray and a retina image. The original images shown on the left are compressed with the wavelet transform (middle) and JPEG (right), both with a 60:1 compression ratio. The original chest X-ray is compressed from 1.34 Mbytes to 22 Kbytes while the original retina image is compressed from 300 Kbytes to 5 Kbytes. The ratio for video compression could be as much as 10 times the compression ratio of MPEG-1 or MPEG-2 for identical visual quality video and television applications. A change from the DCT is coming 3
because of the transmission bandwidth reduction available with wavelets and the capability to store more waveletcompressed ®les on a CD-ROM, DVD-ROM, or any medium capable of storing digital ®les.
2.4 Fractal Compression Fractal compression is an approach that applies a mathematical transformation iteratively to a reference image to reproduce the essential elements of the original image. The quality of the decompressed image is a function of the number of iterations that are performed on the reference image and the processing power of the computer. This discussion will focus on the compression of black-and-white images and gray-scale images. For purposes of analysis, black-and-white images are modeled mathematically as point sets (black and white points) in a twodimensional Euclidean space and gray-scale images are modeled as point sets in a three-dimensional Euclidean space. Fractal image compression employs a set of functions that are mappings from a two-dimensional Euclidean space onto itself for black-and-white images, or mappings from a threedimensional Euclidean space onto itself for gray-scale images. The set of mappings is employed recursively beginning with an initial (two- or three-dimensional) point set called the ``initial image'' to produce the ®nal, ``target image'' (the original image to be compressed); i.e., application of the mappings to the initial image produces a secondary image, to which the mappings are applied to produce a tertiary image, and so on. The resulting sequence of images will converge to an approximation of the original image. The mappings employed are special to the extent that they are chosen to be af®ne linear contraction (ALC) mappings generally composed of a simple linear transformation combined with a translation. The fact
Examples of medical image sizes and transmission times
Format
Image size
35 mm color slide Kodachrome *1500 DPI (24 bit)
1:37860:944 (inches)
Pixel count 2,926,872
Data per frame (Mbytes) 8.75
Time to send uncompressed, 56 kbaud modem (minutes) 22
Time to send wavelet compressed, Average compression 56 kbaud modem (seconds) ratio 200
6.6
35 mm color slide 1200 DPI (24 bit)
1:37860:944 (inches)
1,873,198
5.62
14.14
200
4.25
Digital color camera AGFA E 1680 (24 bit)
160061200 (CCD resolution)
1,920,000
5.760
14.5
200
4.35
14617 X-ray ®lm scanned at 200 DPI (12-bit gray scale) 4K64K imaging sensor (12-bit gray scale) (24-bit color)
14617 (inches)
9,520,000
14.28
35.92
50
43
Various Various
16,000,000 16,000,000
24.0 48.0
60.4 120.8
50 200
72.5 36/23
All on 14617 ®lm (inches)
307.200 each
Ð
Ð
Ð
50
33.4
MRI image (12-bit gray scale) Each 4806640 Set of 24
0.4608 each 11.060
27.82
766
that these mappings are contractions means that the point sets resulting from the application of these mappings to a portion of an image are diminished in size relative to that original portion of the image. Once a set of ALC mappings has been determined such that its repeated application produces an acceptable approximation to the target image, the target image is said to have been ``compressed.'' The target image is then compressed because, rather than having to store the entire data of the target images, one needs only to store the values of the parameters that de®ne the ALC mappings. Usually the amount of data required to characterize the ALC mappings is signi®cantly less than that required to store the target image point set. The process of iterative application of the ALC mappings described earlier produces the ``decompressed'' approximation of the target image. An essential feature of the decompression process is that it leads to a unique ``limit image,'' so that a given collection of ALC mappings will always yield a sequence of images that converge to the same limiting image [6]. This limit image is a point set that is approached by the sequence of images as the number of applications of the ALC mappings increases without limit. Hence, in practical applications, one must select an image that results from a ®nite number of applications of the ALC mappings as the approximation to the target image. Another important feature of the decompression process is that the limit image produced is independent of the initial image with which the process begins, although the rate at which the sequence of target image approximations approaches the limit image is in¯uenced by the choice of initial image. A fractal is a geometric structure obtained as the limit image (or attractor) of a speci®c set of ALC mappings applied to point sets in Euclidean space [7]. That is why compression based on the ALC iterations is called fractal compression. Consequently, target images that are fractals can always be compressed with a set of ALC mappings and can be decompressed with arbitrary accuracy by simply increasing the number of applications of the ALC mappings. A characteristic feature of fractals is ``self-similarity,'' which means that certain subsets of a fractal are compressed versions of the entire fractal; in other words, certain subsets are versions of the entire fractal image under some ALC mapping [7]. Self-similarity of fractals follows as a direct consequence of the repeated application of ALC mappings that generates the fractals. It is important to note that when decompressing an arbitrary image one does not produce the fractal (limit image), but only an intermediate approximation to the fractal, and self-similarity at all scales is not manifested in fractal compressed images. Once the mappings are speci®ed, it is relatively easy to determine the number of iterations necessary to produce an approximate target image that is within a speci®ed distance of the limit image, according to an appropriately de®ned metric. The process of decompression of a fractal compressed image is well de®ned and easily implemented on a computer; one
VI
Compression Storage and Communication
simply implements the iterated application of the ALC mappings. However, the most signi®cant problem one faces when attempting to apply fractal image compression to target images of practical interest is the target image compression process. Even though many naturally occurring physical objects and their photographic images exhibit a certain degree of self-similarity (over several scales), no real-world objects are fractals [8]. Thus, the process of specifying the set of ALC mappings capable of reproducing an arbitrary target image with acceptable ®delity is a dif®cult one and constitutes the subject of many research activities at this time. Existing methods are not fully automated, requiring involvement of a human operator using a computer-based decision aid [9]. Hence, the time required to compress an arbitrary target image is a matter of concern. Most methods start by partitioning the target image into subsets and search for ALC mappings that can modify one subset to look like the modi®ed version of another subset. Successful application of this procedure requires the target image to exhibit at least ``local'' selfsimilarity so that reasonable ®delity can be achieved when decompressing the image. An arbitrary target image may or may not possess such local self-similarity, although experience has shown that many ordinary images do exhibit enough to allow effective fractal compression. Because a fractal compressed image is decompressed through the repeated application of the ALC mappings and magni®cation of that image is easily achieved by increasing the value of multiplicative constants in the ALC mappings, fractal compressed images will display detailed structure at all levels of magni®cation, unlike standard images in which details are obscured with increased magni®cation as individual pixels become visible. Since the unlimited detail produced by the fractal image decompression process will eventually have no relation to the target image, care must be exercised when interpreting highly magni®ed, fractal compressed images.
3 Telecommunications Telecommunications involve the use of wire, radio, optical, or other electromagnetic channels to transmit or receive signals for voice, data, and video communications. This section brie¯y describes features of technologies, standards, and procedures or protocols that contribute to the speed and reliability of modern telecommunications systems. The seven-part (a.k.a. level or layer) classi®cation scheme for computer communications known as the Open Systems Interconnection (OSI) model of the International Standards Organization (ISO) provides a useful foundation for the discussion that follows. The model can be applied to the scenario illustrated in Fig. 1 where two users of telemedicine systems are connected via communications services. The seven levels and their objectives are as follows [10]:
767
46 Fundamentals and Standards of Compression and Communication
(1) Physical. Speci®es the physical, electrical, and procedural standards for a network, including voltage levels and circuit impedances. The EIA RS-232 serial interface standard is an example of a Level 1 area of responsibility. (2) Data Link. Describes means to activate, maintain, and close the telecommunications link. (3) Network. Speci®es standards for dial-up, leased, or packet networks. It also provides rules for building data packets and routing through a network. (4) Transport. Describes rules for message routing, segmenting, and error recovery. (5) Session. Starts and stops (i.e., log-on and log-off ) network sessions. It determines the type of dialogue that will be used (i.e., simplex, half-duplex, or full-duplex). (6) Presentation. Converts data to the appropriate syntax for the display devices (character sets and graphics); compresses and decompresses and encrypts and decrypts data. (7) Application. Provides a log-in procedure, checks passwords, allows ®le upload or download, and tabulates system resources usage. The transport layer is the highest level concerned with technological aspects of the telecommunications network. It acts as the interface between the telecommunications network and the telemedicine application. The upper three layers manage the content of the telemedicine application. Furthermore, various equipment, software, communications services, and data acquisition and display components of a telemedicine system can be updated and maintained by replacing or modifying hardware or software at each level instead of replacing the whole system.
3.1 Signal Hierarchy and Transfer Rates To achieve reliable and effective communications, signals must be accurately generated and propagated. Signals associated with telecommunications systems are usually stated in rather self-descriptive terms, for example, digital signal (DS), optical carrier (OC), or synchronous transport signal (STS). Signals, which contain information, are typically described as frequencies. The bandwidth needed to convey the information is the difference between the highest and lowest frequencies of the signals containing the information. The bandwidth of a communications channel is the difference between the highest and lowest frequencies that the channel can accommodate. The bandwidth of the communications channel must be equal to or greater than the bandwidth of the information-carrying signals. Thus, a communications channel that carries the range of voice frequencies (300 to 3000 Hz) must have a bandwidth of at least 3000 Hz. In contrast, approximately 200 KHz of bandwidth is required for FM transmission of high-®delity music, and 6 MHz for full-motion, full-color television signals. A digital communications system uses digital pulses rather
than analog signals to encode information. The North American high-speed digital telephone service is referred to as the T carrier system. The T carrier system uses pulse code modulation (PCM) techniques to sample and encode information contained in voice-grade channels, and then time division multiplexing (TDM) techniques to form a DS from a group of 24 voice-grade channels. The information carrying signal of each channel is sampled 8000 times per second. The sample is represented or encoded using 8 bits, thus forming frames of information at a rate of 64 bps. Details about the T carrier system are summarized in Table 4. Metrics are bit transfer rates, the number of voice frequency analog signals that are multiplexed to form a DS, and the number of 6 MHz TV channels that can be transmitted via a T carrier. A single DS-1 signal is usually transmitted over one pair of twisted wires of either a 19-gauge or 22-gauge, known as a T1 line span. Two lines, one for transmit and one for receive, are used in a line span. Repeaters are required about every mile to compensate for power loss. The assemblage of equipment and circuits is known as the T1 carrier system. The lengths of T1 carrier systems range from about 5 miles to 50 miles. The T2 carrier system uses a single 6.312 Mbps DS for transmission up to 500 miles over a low capacitance cable. A T3 carrier moves 672 PCM encoded voice channels over a single metallic cable. With the development of ®ber optic telecommunications systems, signaling standards that were adequate for wire pairs and coaxial cable warranted revision. Fiber-based telecommunications systems are virtually error-free and are capable of reliably moving signals more rapidly than wire systems. The American National Standards Institute (ANSI) published a new standard called Synchronous Optical Network (SONET) in 1988 [10]. It was known as ANSI T1.105 and evolved into an international standard that was adopted by CCITT in 1989. The OC-1 signal is an optical signal that is turned on and off (modulated) by an electrical binary signal that has a signaling rate of 51.84 Mbits/sec, the fundamental line rate from which all other SONET rates are derived. The electrical signal is known as the STS-1 signal (Synchronous Transport Signal Ð level 1). OC-N signals have data rates of exactly N times the OC-1 rate. Table 5 lists standard transmission bit rates used with SONET and their equivalent STSs.
TABLE 4 T-carrier baseband system [10] T carrier designator
Data rate (Mbits/sec)
Digital signal type
Voice grade channels
TV channels
Medium
T1 T2 T3 T4 T5
1.544 6.312 44.736 274.176 560.160
DS-1 DS-2 DS-3 DS-4 DS-5
24 96 672 4032 8064
Ð Ð 1 6 12
Wire pair Wire pair Coax, ®ber Coax, ®ber Coax
768
VI
TABLE 5 SONET signal hierarchy [10] OC level
Data rate (Mbits/sec)
Synchronous transport signal
Number of DS-Is
OC-1 OC-3 OC-9 OC-12 OC-18 OC-24 OC-36 OC-48
51.84 155.52 466.56 622.08 933.12 1244.16 1866.24 2488.32
STS-1 STS-3 STS-9 STS-12 STS-18 STS-24 STS-36 STS-48
28 84 252 336 504 672 1008 1344
3.2 Network Interoperability The telephone industry is moving toward an all-digital network that integrates voice and data over a single telephone line from each user to the telephone company equipment. Integrated services digital networks (ISDNs) are being implemented with the intent of providing worldwide communications for data, voice, video, and facsimile services within the same network. The basic principles and evolution of ISDN are outlined in CCITT recommendation 1.120 (1984). One of the principles is that a layered protocol structure should be used to specify the access procedures to an ISDN, and that the structure can be mapped into the OSI model. However, in essence, ISDN ignores levels 4 to 7 of the OSI model. Standards already developed for OSI-applications can be used for ISDN, such as X.25 level 3 for access to packet switching networks. In addition, whenever practical, ISDN services should be compatible with 64-kbps switched digital connections. The 64-kbps frame is the basic building block of ISDNs that expect to use the plant and equipment of existing telecommunications systems. Interactive television and telemedicine applications require transfer rates that exceed the original ISDN speci®cations. Broadband ISDN (BISDN) addresses this need [11]. With the advent of BISDN, the original concept of ISDN is referred to as narrowband ISDN. The new BISDN standards are based on the concept of an asynchronous transfer mode (ATM), which will include optical ®ber cable as a transmission medium for data transmission. BISDN standards set a maximum length of 1 km per cable, with expected data rates of 11, 155, or 600 Mbps. ATM is a member of a set of packet technologies that relay traf®c through nodes in an ISDN via an address contained within the packet. Unlike packet technology such as X.25 or frame relay, ATM uses short ®xed-length packets called cells [11]. This type of service is also known as cell relay. An ATM cell is 53 bytes long, with the ®rst 5 bytes called a header, and the next 48 bytes called an information ®eld. The header contains the address and is sometimes referred to as a label. In contrast, frame relay uses a 2-byte header and a variable-length information ®eld. The X.25 protocol was developed for use over relatively noisy
Compression Storage and Communication
analog transmission facilities and addresses only the physical, data link, and network layers in the OSI model [11]. It is a protocol that was developed to ensure reasonably reliable transport over copper transmission facilities. To accomplish this, every node in an X.25 network goes through a rigorous procedure to check the validity of the structure of the message and its contents and ®nd and recover from detected errors. It can then proceed with or abort communications, acknowledge receipt, or request retransmission, before passing the message along to the next node where the process is repeated. In total, it can be a relatively slow and time-consuming process, constraining throughput. The extensive checking of the X.25 may be replaced by one or two simple checks: address validity and frame integrity. If the frame fails either check, it is discarded, leaving the processors at the ends of the connection to recover from the lost message. Frame relay's strength (i.e., robust error checking and recovery) is ultimately its weakness. It is suited for data communications, but not ¯exible enough to cope with the variety of multimedia traf®c expected to be running on an ISDN. In the future, traf®c on an ISDN will be a mixture of transactions devoted exclusively to data, voice, or video and transactions embedded with combinations of data, voice, video, and static images. The following paragraphs compare performance of cell and frame relay schemes in network scenarios. If the interval between voice samples varies too much, problems (called jitter) arise with reconstructing the signal. If the intervals become very large, echo becomes a problem. ATM's cell length is chosen to minimize these problems. It will be necessary to ensure that data messages do not impose overly long delays to voice traf®c (or similar services that count on periodic transmission of information). Frame relay is just as likely to insert a several-thousand-byte data frame between voice samples as it is a 64-byte frame, playing havoc with the equipment trying to reconstruct the voice traf®c. The short, ®xed cell size ensures that the cells carrying voice samples arrive regularly, not sandwiched between data frames of varying, irregular length. The short cell length also minimizes latency in the network, or end-to-end delay. The ATM cell size is a compromise between the long frames generated by data communications applications and the needs of voice. It is also suitable for other isochronous services such as video. Isochronous signals carry embedded timing information or are dependent on uniform timing information. Voice and video are intimately tied to timing. Frame relay also suffers from a limited address space. Initial implementations provide about 1000 addresses per switch port. This is adequate for most corporate networks today, but as needs expand beyond corporate networking to encompass partner corporations and home of®ces, more ¯exibility will be needed. Speed and extensibility (expandability) are also issues with frame relay as a solution to network interoperability. As network transmission capabilities grow, so will transmission
46 Fundamentals and Standards of Compression and Communication
speeds, which, in turn, will lead to faster switching speed requirements. As the network expands, the work and functions of the switches will grow. The increased requirements dictate very fast, extensible switches. Network switches, however, do not cope well with the variable-length messages of the frame relay. They require identical ®xed-length messages, preferably short ones. The ATM standard does not include parameters for rates or physical medium. Thus, different communications networks can transport the same ATM cell. With ATM, a cell generated from a 100 Mbits/sec LAN can be carried over a 45 Mbits/sec T3 carrier system to a central of®ce and switched into a 2.4 Gbits/sec SONET system.
3.3 Telemedicine Applications Compatibility A telemedicine application can be a complex collection of a variety of objects. In general, it can be an integrated session containing text, rich text, binary ®les, images, bitmaps, voice and sound, and full-motion video. The utility of these applications will in part depend on their ability to accommodate these different types of data. The development and use of standards which allow interoperability between systems and applications developed by different manufacturers is a critical part of this process. Health Level Seven (HL7) Health Level Seven (HL7), which dates back to 1987, is a standard for exchanging clinical, administrative, and ®nancial information among hospitals, government agencies, laboratories, and other parties. The HL7 standard covers the interchange of computer data about patient admissions, discharges, transfers, laboratory orders and reports, charges, and other activities. Its purpose is to facilitate communication in health-care settings. The main goal is to provide a standard for exchange of data among health-care computer applications and reduce the amount of customized software development. The HL7 standard focuses primarily on the issues that occur within the seventh level of the OSI. The standard is organized to address the following issues: * * * * * * * *
Patient admission, discharge, transfer, and registration Order entry Patient accounting (billing) systems Clinical observation data such as laboratory results Synchronization of common reference ®les Medical information management Patient and resource scheduling Patient referral messages for referring a patient between two institutions
In the future, the HL7 Working Group expects to undertake development of standards in the following special interest areas:
* * *
769
Decision support Ancillary departments Information needs of health-care delivery systems outside of acute care settings.
Digital Imaging and Communications in Medicine (DICOM) The American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) developed DICOM to meet the needs of manufacturers and users of medical imaging equipment for interconnection of devices on standard networks [12]. DICOM, which is described in detail in the chapter ``Medical Image Archive and Retrieval,'' has multiple parts that facilitate expansion and updating and allow simpli®ed development of medical imaging systems. DICOM also provides a means by which users of imaging equipment are able to exchange information. In addition to speci®cations for hardware connections, the standard includes a dictionary of the data elements needed for proper image identi®cation, display, and interpretation. The future additions to DICOM include support for creation of ®les on removable media (e.g., optical disks or high-capacity magnetic tape), new data structures for X-ray angiography, and extended print management. Although the HL7 standard enables disparate text-based systems to communicate and share data and the DICOM standard does the same for image based systems, efforts to link compliant systems have met with limited success. The goal of an initiative known as Integrating the Healthcare Enterprise (IHE) is to foster the use of HL7 and DICOM and ensure that they are used in a coordinated manner. The initiative is a joint undertaking of the Healthcare Information and Management Systems (HIMS) and the Radiological Society of North America (RSNA). The major impact of DICOM is expected to be on PACS because it can serve in many interfacing applications. For example, DICOM may be used as the interface standard among CT and MR imaging units, and printer systems.
4 Conclusion During the early 1900s, the evolution of the radio, which exploited electromagnetic waves, initiated a series of technology developments that form the foundation for modern telemedicine systems. Facsimile ®rst attracted attention in 1924 when a picture was sent from Cleveland to the New York Times. Nevertheless, it came into widespread use in medical applications only during the past 30 years. Several clinical applications have experimented with telemedicine systems that use evolving technologies such as high-resolution monitors, interactive video and audio, cellular telephones, the public switched telephone network, and the Internet.
770
Clinical applications that have experimented with telemedicine include radiology, pathology, cardiology, orthopedics, dermatology, pediatrics, ophthalmology, and surgery. As telecommunication technologies improve and become available, reliable, easy to use, and affordable, interest in telemedicine systems in clinical applications will increase. Today, few health-care organizations routinely use the entire range of capabilities resident in computer-based telemedicine systems to integrate, correlate, or otherwise manage the variety of multimedia data available for patient care. Use of the widerange of capabilities inherently available in modern telemedicine technology is probably more commonplace in biomedical education and research applications than in clinical applications focusing on patient care. Patient care places emphasis on reliability of data and accuracy and timeliness of diagnosis, decision-making, and treatment. The technical and clinical issues in telemedicine systems include quality of images, videos, and other data, the percentage of the patient examination that can be accomplished using existing telemedicine technologies, and the integration of telemedicine service with current clinical practices. Basic technological components that affect the transmission, storage, and display of multimedia data presented in this chapter indicate the complexity of telemedicine systems. Further research is required to identify performance and interoperability requirements for telemedicine systems that will assist the care provider in achieving better outcomes for the patient.
VI
Compression Storage and Communication
)eferences 1. Field, Marilyn J., ed., Telemedicine: A Guide to Assessing Telecommunications in Health Care. Institute of Medicine, National Academy Press, Washington, D.C., 1996. 2. Andleigh, Prabhat K., and Thakrar, Kiran, Multimedia Systems Design. Prentice Hall, Englewood Cliffs, NJ, 1966, pp. 52±123. 3. Harris Corporation, Guide to Telemedicine. 1995. 4. Degoulet, Patrice, and Fieschi, Marius, Introduction to Clinical Informatics. Springer-Verlag, 1997, pp. 139±151. 5. Gonzalez, Rafael C., and Woods, Richard E., Digital Image Processing. Addison-Wesley, Reading, MA, 1993. 6. Peitgen, Heinz, Jurgens, Hartmut, and Saupe, Dietmar, Chaos and Fractals, New Frontiers of Science. SpringerVerlag, 1992. 7. Barnsley, Michael, Fractals Everywhere. Academic Press, New York, 1988, pp. 43±113. 8. Ref. 6, pp. 229±282 and 903±918. 9. Barnsley, Michael, and Hurd, Lyman, Fractal Image Compression. A. K. Peters Ltd., Wellesley, MA, 1993, pp. 47±116. 10. Couch, Leon W. II, Modern Communication Systems, Principles and Applications. Prentice Hall, Englewood Cliffs, NJ, 1995, pp. 204±209. 11. Telco Systems, Inc., Asynchronous Transfer Mode: Bandwidth for the Future. Norwood, MA, 1992. 12. Horiil, Steven C., et al., DICOM: An Introduction to the Standard. http://www.xray.hmc.psu.edu/dicom/
47 Medical Image Archive, Retrieval, and Communication Albert Wong S. L. Lou
1 2 3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771 Medical Image Information Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 Medical Image Archive System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 3.1 Archive Server 3.2 Database Management Subsystem 3.3 Storage Subsystem 3.4 Communication Network
4
DICOM Image Communication Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773 4.1 Information Objects 4.2 Service Classes 4.3 Example of C-STORE DIMSE Service
5
Archive Software Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 5.1 Image Receiving 5.2 Image Routing 5.3 Image Stacking 5.4 Image Archiving 5.5 Database Updating 5.6 Image Retrieving
6
HIS/RIS Interfacing and Image Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
7
DICOM Image Archive Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778
6.1 Health Level Seven (HL7) Communication Standard 6.2 Prefetch Mechanism 7.1 Image File Format 7.2 Query/Retrieve Service Class Operation 7.3 Q/R Service Class Support Levels
8 9
PACS Research Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781
angiography (DSA), digital ¯uoroscopy (DF), and projectional radiography. These images generally are archived digitally or in Archiving medical images for future retrieval allows access to analog format on different types of storage media such as patients' historical images from their previous examinations. magnetic disks or tapes, compact discs (CDs), optical disks, These images can be used in clinical review and diagnosis to videotapes, ®lms, digital versatile discs (DVDs), or digital compare with patients' current examinations, or as a resource linear tapes (DLTs). Retrieving these images from their in medical imaging related research. This chapter describes the archived media requires certain manually operated procedures, storage and management of medical images using current which tend to be tedious and inef®cient. Computer-based digital archiving technology, and the hardware and software medical image archiving was initially introduced in the early requirements for the implementation of a medical image implementation of the picture archiving and communication archive system to facilitate timely access to medical images in systems (PACS) [2, 3]. In these implementations, an image an integrated hospital environment. archive subsystem was built into a PACS, providing a reliable Medical images are produced by a wide variety of imaging and ef®cient means to store and manage the high-capacity equipment, such as computed tomography (CT), magnetic medical images in support of the entire PACS operations. Implementation of an image archive system to support the resonance imaging (MRI), ultrasound (US), computed radiography (CR), nuclear medicine (NM), digital subtraction operation of a PACS requires connectivity and interoperability between the archive system and the individual medical imaging Portions of this chapter reprinted from Wong A, Huang HK, Arenson RL, equipment (e.g., CT scanners, MR imagers, CR systems) and Lee JK, ``Digital archive system for radiologic images,'' . PACS components (acquisition computers, archive system, display workstations, etc.). This always has been dif®cult 14:1119±1126, 1994. # 1994 Radiological Society of North America.
Copyright # 2000 by Academic Press. All rights of reproduction in any form reserved.
771
772
because of the multiple platforms and vendor-speci®c communication protocols and ®le formats [6]. With the introduction of the Digital Imaging and Communications in Medicine (DICOM) standard, data communication among the imaging equipment and PACS components becomes feasible [5,8]. An image archive system based on DICOM [1, 4] allows itself to serve as an image manager that controls the acquisition, archive, retrieval, and distribution of medical images within the entire PACS environment.
Medical Image Information Model A medical imaging examination performed on a patient can produce multiple images. These images, depending on the speci®c imaging procedure undertaken, are generally organized into studies and series. A study is a collection of one or multiple series of images that are correlated for the purpose of diagnosing a patient. A series, on the other hand, is a set of images that are produced with a single imaging procedure (i.e., a CT scan). Medical images generated by digital imaging equipment of various modalities are stored as information objects composed of pixels. A pixel is a data element that contains the gray level of a gray-scale image, or the RGB (red, green, and blue) value of a color image. The gray level can range from 0 to 255 (8-bit), 0 to 1023 (10-bit), or 0 to 4095 (12-bit), depending on the procedure taken by the imaging equipment. The RBG value is composed of the red, green, and blue elements, each being represented by an 8-bit value ranging from 0 to 255. Most sectional images (CT, MRI, US, etc.) are two-dimensional. The size of such an image can be measured in terms of number of lines (rows) per image, number of pixels (columns) per line, and number of bits (bit depth) per pixel. Thus an image speci®ed at 24956204868 bits indicates the image is composed of 2495 lines, each line consisting of 2048 pixels and each pixel containing a maximum value of 255. If expressed in bytes, the aforementioned image has a size of 24956 204861 5;251;072 bytes, or 4.87 Mbytes, approximately. However, for those images that consist of 10-bit or 12-bit pixel data, each pixel will require 16 bits, or 2 bytes, of computer
TABLE
storage, as all computers store information in units of 8 bits, or bytes. For this reason, the size of a 51265126 12-bit CT image is equivalent to 512651262 524;288 bytes, or 0.5 Mbyte. Table 1 shows the size of medical images produced by some common modalities.
3 Medical Image Archive System The three major subsystems constituting a PACS are acquisition, archive, and display. The acquisition system comprises multiple computers to acquire medical images that are generated by the individual image modalities in the clinic. The archive system consists of a host computer equipped with mass storage for archiving the high-volume medical images in support of future retrieval. The display system comprises multiple display workstations, each composed of a control computer and image display device (e.g., high-resolution monitor) that allow a clinician to display and manipulate images [7]. The acquisition, archive, and display systems are connected to a communication network [9]. Medical images acquired by the acquisition computers from the image modalities are transmitted to the host computer (archive server) of the archive system, where they are archived to the storage devices and distributed to the appropriate display workstations. An archive system for PACS consists of four major components: an archive server, a database management system, a storage subsystem, and a communication network. Figure 1 shows the con®guration of a centralized archive system widely adopted by most PACS systems. However, a distributed archive system based on a many-to-many service model is more suitable for use in a health-care enterprise integrated PACS. The implementation of a distributed archive system is more complicated than that of a centralized archive system, in terms of system con®guration and software control. The technical details comparing a centralized and a distributed archive system are outside the scope of this chapter. Sections 3.1 to 3.4 describe the four major components (archive server, database management system, storage subsystem, and communication network) of an archive system.
Size of medical images produced by some common modalities
Modality
Image dimension
Average number of images per examination
Mbytes per examination
Magnetic resonance imaging (MRI) Computed tomography (CT) Computed radiography (CR) Ultrasound (US) Digital subtraction angiography (DSA)
2566256612 5126512612 257762048612 512651268 102461024612
80 60 2 36 20
10 30 20 9 40
773
include a mirroring feature that allows data to be automatically duplicated on a separate system disk in the archive server. The archive database does not store any medical images. Instead, it stores the ®le index leading to access the corresponding images that are physically stored in the storage subsystem.
3.3 Storage Subsystem
FIGURE 1 Con®guration of an image archive system. The system consists of an archive server, a mirrored archive database, and a storage subsystem composed of cache storage (e.g., magnetic disks or high-speed RAID) and long-term storage (e.g. DLT tape library or optical disk library). The archive server is connected to a communication network, over which images are transmitted from the acquisition computers to the archive server and display workstations. A PACS gateway interfacing the Hospital Information System (HIS) and the Radiology Information System (RIS) allows the archive server to recieve information from HIS and RIS.
3.1 Archive Server The archive server is a multitasking computer system that supports multiple processes (computer programs) to run simultaneously in its operation system environment. The archive server is con®gured with high-capacity RAM (random access memory), dual or multiple CPU (central processing unit), and high-speed network interface for better performance. An integrated archive server runs sophisticated image management software that controls the archival, retrieval, and distribution of medical images for the archive system. Major functions performed by an archive server include (a) accepting images from the acquisition computers; (b) archiving images to the storage subsystem; (c) routing images to the display workstations; (d) updating archive database tables; and (e) handling query and retrieve requests from the display workstations.
3.2 Database Management Subsystem The archive database is a relational database comprising prede®ned data tables that store information necessary for the archive server to perform the individual tasks supporting the image archive system. To ensure data integrity, the archive database is con®gured to
The storage subsystem provides high-capacity storage for medical images and supports two levels of image storage: (1) short-term storage for data caching; and (2) long-term storage for permanent archiving. Short-term storage uses fast data access storage devices such as magnetic disks or high-speed redundant array of inexpensive disks (RAID) that provide immediate access to images. Longterm storage, on the other hand, uses low-cost, high-capacity storage media such as magnetic tapes, optical disks, or digital linear tapes (DLT) that provide access to images at a slower speed.
3.4 Communication Network The communication network is a digital interface that connects the archive server of the image archive system to other PACS components such as the acquisition computers and display workstations, allowing communications for medical images and relevant information. The low-cost, 10-Mbps 10BaseT Ethernet can be used as the network interface to provide communications between the PACS components. However, high-bandwidth networks such as the 100-Mbps Fast Ethernet, 155-Mbps OC-3 ATM (asynchronous transfer mode), or 622-Mbps OC-12 ATM are more suitable for the communication because of the highvolume data transmission taking place in the PACS applications.
4 DICOM Image Communication Standard Communication of images between medical imaging systems and among their applications has always been dif®cult because of the multiple platforms and vendor-speci®c communication protocols and data formats. The DICOM standard, developed in 1992 by a joint committee formed by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA), is intended to provide connectivity and interoperability for multivendor imaging equipment, allowing communication of images and exchange of information among these individual systems [10]. This section describes two basic DICOM components (information objects and service classes) that are used for the communication of images.
774
4.1 Information Objects Medical images are de®ned in DICOM as information objects or data sets. An information object represents an instance of a real-world information object (i.e., an image) and is composed of multiple data elements that contain the encoded values of attributes of that object. Each data element is made of three ®elds: the data element tag, the value length, and the value ®eld. The data element tag is a unique identi®er consisting of a group number and an element number in hexadecimal notation and is used to identify the speci®c attribute of the element. For example, the pixel data of an image is stored in the data element with a tag [7FE0, 0010], where 7FE0 represents the group number and 0010 represents the element number. The value length speci®es the number of bytes that make up the value ®eld of the element. The value ®eld contains the value(s) of the data element. Figure 2 illustrates the composition of a DICOM image information object. Image communication between medical imaging systems and among their applications takes place when a system or an application initiates a transfer of images to a designated system or application. The initiator (image sender) then transmits image data in terms of information objects to the designator (image receiver). In an image acquisition process, for example, medical images are transmitted as information objects from an image modality (e.g., a CT scanner) to a PACS acquisition computer. From the acquisition computer, these information objects are routed to their designated workstations for instantaneous display and to the archive server for archiving.
4.2 Service Classes PACS applications are referred by DICOM as application entities (AEs). An AE that involves the communication of images is built on top of a set of DICOM services. These
services, performed by the DICOM message service elements (DIMSEs), are categorized into two service classes, the DIMSEC services and the DIMSE-N services. DIMSE-C services refer to those services that are applicable to composite information objects (i.e., objects that represent several entities in a DICOM information model) and provide only operation services. DIMSE-N services, on the other hand, are the services that are applicable to normalized information objects (i.e., objects that represent a single entity in a DICOM information model) and provide both operation and noti®cation services. The DIMSE-C and DIMSE-N services and their operations are given in Tables 2 and 3, respectively. A typical DIMSE service involves two AEs, a service class user (SCU), and a service class provider (SCP). A SCU is an AE that requests a speci®c DIMSE service from another AE (SCP). A SCP is an AE that performs an appropriate operation to provide a speci®c service. Operations carried out by the DIMSE services are based on client/service applications with the SCU being a client and the SCP being a server. Section 4.3 is an example of the storage service class that uses the C-STORE DIMSE service for transmitting medical images from a storage SCU to a storage SCP.
4.3 Example of C-STORE DIMSE Service The following procedures describe the operation of a C-STORE DIMSE service that transmits images from a PACS acquisition computer to an archive server (Fig. 3): (a) The acquisition computer (Storage SCU) issues an ASSOCIATION request to the archive server (Storage SCP) (b) The archive server grants the association (c) The acquisition computer invokes the C-STORE service and requests the storage of an image in the archive system (d) The archive server accepts the request (e) The acquisition computer transmits the image to the archive server (f ) The archive server stores the image in its storage device and acknowledges successful operation (g) The acquisition computer issues a request to drop the association (h) The archive server drops the association
5 Archive Software Components FIGURE 2 DICOM information object. A DICOM image is an information object consisting of multiple data elements. Each data element is uniquely identi®ed by its corresponding tag composed of a group number and an element number. Pixel data of the image is stored in element 0010 within group 7FE0.
The software implemented in an archive server controls the archival, retrieval, and distribution of medical images for the archive system. In the archive server, processes of diverse functions run independently and communicate simultaneously
775
TABLE 2 Composite DICOM message service elements (DIMSE-C) DIMSE-C service
Operation
C-ECHO C-STORE C-FIND C-GET C-MOVE
Veri®cation of communication between two peer application entities (AEs) Transmission of information objects from one AE to another Querying information about the information objects Retrieval of stored information objects from another AE using the C-STORE operation Instructing another AE to transfer stored information objects to a third-party AE using the C-STORE operation
with other processes by using client/server programming, queuing mechanisms, and job prioritizing mechanisms. Figure 4 illustrates the interprocess communication among the major processes running on the archive server, and Table 4 describes the functions of these individual processes. The major tasks performed by the archive server include image receiving, image routing, image stacking, image archiving, database updating, image retrieving, and image pre-fetching. This section and Section 6.2 (Prefetch Mechanism) describe these individual tasks.
on the role of SCU to transmit images to the display workstations (SCP). The routing algorithm is driven by prede®ned parameters such as examination type, patient type, location of display workstation, section radiologist, and referring physician. These parameters are stored in a routing table managed by the archive database. The routing process performs table lookup for each individual image based on the Health Level Seven (HL7) message received via the Hospital Information System/ Radiology Information System (HIS/RIS) interface (Section 6). Results from the table lookup will determine what destination(s) an image should be sent to.
5.1 Image Receiving Images acquired by the acquisition computers from various medical imaging devices are transmitted over the communication network to the archive server using standard DICOM storage service class via TCP/IP (transmission control protocol/ Internet protocol) network protocols. The storage service class is based on client/server applications, of which an acquisition computer (client) serves as a DICOM service class user (SCU) transmitting images to a service class provider (SCP) or the archive server. Like most client/server applications, the archive server supports concurrent connections to receive images from multiple acquisition computers.
5.2 Image Routing Images arriving in the archive server from various acquisition computers are immediately routed to their destination display workstations for instantaneous display. The routing process is a DICOM storage service class, of which the archive server takes
TABLE 3
5.3 Image Stacking Image stacking is a data caching mechanism that stores images temporarily in high-speed storage devices such as magnetic disks and RAIDs for fast retrieval. Images received in the archive server from various acquisition computers are stacked in the archive server's cache storage (magnetic disks or RAID) to allow immediate access. After being successfully archived to the long-term storage, these images will remain in the cache storage and are managed by the archive server on a per-patientper-hospital stay basis. In this way, all recent images that are not already in a display workstation's local disks can be retrieved from the archive server's cache storage instead of the low-speed long-term storage device such as an optical disk library or a DLT tape library. This timely access to images is particularly convenient for physicians or radiologists to retrieve images at a display workstation located in a different radiology section or department.
Normalized DICOM message service elements (DIMSE-N)
DIMSE-N service
Operation
N-EVENT-REPORT N-GET N-SET N-ACTION N-CREATE N-DELETE
Reporting an event to a peer AE Retrieval of attribute values from another AE Requesting another AE to modify attribute values Requesting another AE to perform an action on its managed DIMSE service Requesting another AE to create a new managed DIMSE service Requesting another AE to delete a managed DIMSE service
776
FIGURE 3 DICOM storage service class applied to a PACS image acquisition process. An acquisition computer acts as a storage SCU to transmit images to the archive server (storage SCP).
FIGURE 4 Diagram illustrating interprocess communication among the major processes running on the archive server. The symbols are de®ned in Table 4.
777
TABLE 4
Processes in the archive server
Process
Description
arch
Copy images from cache storage to long-term storage; update archive database; notify and processes for successful archiving Acknowledge acquisition computers for successful archiving; process at acquisition computers deletes images from local storage Process image information; update archive database; notify and processes Select historical images from archive database; notify process Receive images from acquisition computers; notify process ( : DICOM Storage SCP) Receive ADT messages from HIS or RIS; notify process Retrieve images from cache or long-term storage; notify process Send images to destination display workstations ( : DICOM storage SCU) Manage cache storage of the archive server Handle query and retrieve requests from the process at the display workstations ( : DICOM query/retrieve SCP)
arch_ack image_mgr pre_fetch dcm_recv adt_gw retrv dcm_send cache_mgr qr_server
5.4 Image Archiving Images received in the archive server from various acquisition computers are copied from the archive server's cache storage to the long-term storage device, such as an optical disk library or a DLT tape library, for permanent archiving. These images are stored as standard DICOM image ®les (see Section 7.1), which can be accessed by display workstations on the PACS communication network.
5.5 Database Updating Information extracted from the data elements of the images received by the archive server is inserted into the archive database. These data are categorized and stored in different prede®ned tables, with each table describing only one kind of entity. For example, the patient description table consists of master patient records, which store patient demographics, examination worklist, and correlated studies; the archive index table consists of archive records for individual images; and the study description table consists of study records describing individual medical imaging procedures. These tables provide information necessary for the archive server to perform individual tasks such as ®le indexing, query and retrieve key search, image routing, and image prefetching.
6 HIS/RIS Interfacing and Image Prefetching A PACS for use in clinical practice cannot operate successfully without an interface to HIS and RIS. This section describes the HIS/RIS interface, and the prefetch mechanism performed by an archive system in PACS with the use of this interface.
6.1 Health Level Seven (HL7) Communication Standard Health Level Seven (HL7) is an industry standard data interface widely used in many healthcare information systems (e.g., HIS, RIS, PACS) for exchange of textual information [11]. Information exchange among these heterogeneous computer systems takes place over a communication network with the use of TCP/IP protocols on a client/server basis. By utilizing the HL7 interface, messages, or events, such as patient demographics, ADT (admission, discharge, and transfer), examination scheduling, examination description, and diagnostic reports can be acquired by a PACS from HIS or RIS via a hospital-integrated communication network. Information extracted from these messages can be used by the PACS archive subsystem to perform speci®c tasks such as the routing (Section 5.2) and stacking (Section 5.3) mechanisms, and the image prefetch mechanism (Section 6.2).
5.6 Image Retrieving Image retrieval takes place at the display workstations that are connected to the archive system through the communication network. Retrieval requests are issued to the archive server by the display workstation using DICOM query/retrieve (Q/R) service class via TCP/IP network protocols on a client/server basis. In the retrieval operation, the archive server serves as a Q/ R SCP processing query and retrieval requests it received from the display workstations (Q/R SCU). Requested images are retrieved from the storage subsystem by the archive server and distributed to the destination display workstation with use of DICOM storage service class.
6.2 Prefetch Mechanism PACS operated in a clinical environment requires fast access to patients' previous and current images in support of online diagnosis. Therefore, the time delay in the retrieval of historical images from a low-speed long-term archive during a clinical review session will discourage the use of the PACS by the physicians. The implementation of a prefetch mechanism can eliminate online image retrieval, which thereby will increase the acceptance of a PACS for clinical use. The prefetch mechanism is triggered when a PACS archive server detects the arrival of a patient via the ADT message from
778
HIS or RIS. Selected historical images and relevant diagnostic reports are retrieved from the long-term storage and the archive database. These data are then distributed to the destination display workstation(s) prior to the completion of the patient's current examination. The algorithm of the prefetch mechanism is based on a table lookup process driven by some prede®ned parameters such as examination type, disease category, location of display workstation, section radiologist, referring physician, and the number and age of the patient's archived images. These parameters are stored in a prefetch table managed by the archive database and will determine which historical images should be retrieved. There are several factors affecting the operation of a prefetch mechanism: (a) Storage space of a display workstation capable of stacking the prefetched historical images (b) Capability of the storage devices in the archive system to handle multiple concurrent retrieval operations (c) Network capacity to support concurrent transmission of the high-volume images (d) Capability of a display workstation to receive real-time prefetched historical images with minimal interference to the end users during a clinical review session All these factors should be taken into consideration when a prefetch mechanism is implemented.
2 DICOM Image Archive Standard This section describes the archive and retrieval of medical images performed by an archive system using the Q/R service class speci®ed by the DICOM standard.
7.1 Image File Format Medical images are archived to the storage media as DICOMformatted ®les. A DICOM ®le consists of a ®le metainformation header and an image information object. The ®le meta-information header is composed of a 128-byte ®le preamble, a 4-byte DICOM pre®x (``DICM''), and the ®le meta elements (DICOM group 0002). This ®le meta-information header contains information that identi®es the encapsulated DICOM image data set (Fig. 5). An archive server receives DICOM images as information objects from the acquisition computers. These images are ®rst encapsulated with the corresponding meta-information headers forming the DICOM ®les and then archived. When these archived image ®les are retrieved from the storage device, only the encapsulated image information object will be extracted from the ®les, which will then be transmitted to the destinations as information objects.
FIGURE DICOM ®le format. A DICOM ®le consists of a ®le metainformation header and an information object (image data set). The ®le meta-information header is made of a ®le preamble, a DICOM pre®x, and multiple ®le meta-elements.
7.2 Query/Retrieve Service Class Operation A server process running on an archive server controls query and retrieval of the medical images stored in the archive system. This server process takes on the SCP role of the Q/R service class to communicate with the display workstations (Q/R SCU), allowing the latter to query and retrieve images from the archive database and the storage subsystem, respectively. The Q/R service class is based on the following DIMSE services: C-FIND, C-GET, C-MOVE, and C-STORE. These DIMSE services are described next. The C-FIND service is invoked by a Q/R SCU to match a series of attribute values, or the search keys (i.e., patient name, examination date, modality type), that the Q/R SCU supplies. The Q/R SCP returns for each match a list of requested attributes and their values. The C-GET service is invoked by a Q/R SCU to retrieve an information object (i.e., a DICOM image) from a Q/R SCP and transmit the object to the invoking Q/R SCU, based upon the attributes supplied by the latter. The C-MOVE service is invoked by a Q/R SCU to retrieve an information object (i.e., a DICOM image) from a Q/R SCP and transmit the object to a third-party DIMSE application (i.e., a storage SCP), based upon the attributes supplied by the invoking Q/R SCU. The C-STORE service is invoked by a storage SCU to request the storage of an information object (i.e., a DICOM image) in the storage SCP computer system. The storage SCP receives the information object from the invoking SCU and stores the object in its storage subsystem. The following is an example describing how DICOM uses the aforementioned DIMSE services to carry out a Q/R Service Class operation: Suppose a radiologist at a display workstation queries the image archive system to retrieve a historical MRI examination
779
FIGURE DICOM query/retrieve operation. The archive server acts as Q/R SCP and storage SCU, whereas the display workstation serves as Q/R SCU and storage ACP. The C-STORE DIMSE service is a suboperation of the C-MOVE DIMSE service in a DICOM query/retrieve operation.
to compare with a current study that is available in the display workstation. To perform this Q/R operation, three DIMSE services, C-FIND, C-MOVE, and C-STORE, are involved. The archive server serves as a Q/R SCP and a Storage SCU, whereas the display workstation serves as a Q/R SCU and a Storage SCP. The following procedures take place in order to complete the operation (Fig. 6): (a) The Q/R client process (Q/R SCU) at the display workstation requests the Q/R server process (Q/R SCP) at the archive server to establish an association.
(b) The Q/R server process grants the association. (c) The Q/R client process issues a C-FIND request to query historical examinations belonging to a given patient. (d) The Q/R server process returns a list of examinations that matches the attribute values supplied by the Q/R client process. (e) A radiologist at the display workstation selects interesting images from the examination list and issues a CMOVE service request. (f ) The Q/R server process retrieves requested images from the storage devices and requests the archive server's C-
780
STORE client process (storage SCU) to transmit the images to the display workstation's C-STORE server process (storage SCP). (g) The C-STORE client process requests the C-STORE server process to establish an association, waits for the association to be granted, and transmits the images to the C-STORE server process. (h) Upon successful transmission of the images, the CSTORE client process terminates the storage service class association. (i) The C-MOVE client process terminates the Q/R service class association.
7.3 Q/R Service Class Support Levels The Q/R server process at the archive server takes on the SCP role of the Q/R service class by processing the query and retrieval requests based on the information about the attributes de®ned in a Q/R information model that the image archive system supports. This Q/R information model may be a standard Q/R information model de®ned in DICOM, or a private Q/R information model de®ned in the conformance statement of the implemented archive system. There are three hierarchical Q/R information models de®ned by DICOM. These are the patient root, study root, and patient/ study only models (Table 5). The following subsections describe these models. Patient Root Q/R Information Model The patient root Q/R information model is based upon a fourlevel hierarchy: Patient, Study, Series, and Image. The Patient level is the top level and contains attributes associated with the patient information entity (IE) of the corresponding image's information object de®nitions (IODs). Below the Patient level is the Study level, which contains attributes associated with the study IE of the corresponding image's IODs. Below the Study level is the Series level, which contains attributes associated with the series IE of the corresponding image's IODs. The lowest level is the Image level and contains attributes associated with the image IODs. In each level of the patient root Q/R information model, one attribute of the IE is de®ned as a unique key attribute that provides access to the associated IE. A Q/R SCU (i.e., Q/R
TABLE Q/R service class information models and their support levels Q/R information model
Hierarchy of Q/R levels
Patient root
Study root
Patient Study Series Images
Study Series Images
Patient/study only Patient Study
TABLE Key attributes commonly used in Q/R service class operations Patient name Patient ID Study instance UID Study ID Study Date Study Time Accession number Series instance UID Series ID Modality SOP instance UID Image number
UID; unique identi®er; SOP, service±object pair.
client process at a display workstation), therefore, can perform a hierarchical query and retrieve operation to obtain a desired image or set of images within any level of the information model. A list of key attributes, or search keys, that are commonly used in a hierarchical searching algorithm supporting the Q/R service class operations is given in Table 6. Study Root Q/R Information Model The Study Root Q/R information model is similar to the patient root Q/R information model, except the top level of the hierarchy is the Study level (see Table 5). Patient/Study Only Q/R Information Model The patient/study only Q/R information model is similar to the patient root Q/R information model, except that it does not support the Series and Image levels (see Table 5).
8 PACS Research Applications Image processing applied to medical research has made many clinical diagnosis protocols and treatment plans more ef®cient and accurate. For example, a sophisticated nodule detection algorithm applied to digital mammogram images can aid in the early detection of breast cancer. However, image processing applications usually require signi®cant implementation and evaluation effort before they can be accepted for clinical use. The common necessities during the implementation and evaluation of these applications are image data and the workstations that allow the display and manipulation of the images. For this purpose, PACS can serve as a powerful tool that
781
provides (a) numerous sample images of statistical signi®cance for testing and debugging the image processing algorithm, and (b) display workstations with built-in image manipulation functions in support of clinical evaluation.
7 Summary PACS provide a means to acquire, store, and manage medical images that are produced by a wide variety of imaging equipment. These images can be used for clinical review, diagnosis, and medical imaging related research. The introduction of DICOM provides an industry standard for interconnecting multivendor medical imaging equipment and PACS components, allowing communication of images and exchange of information among these individual computer systems. Digital image archiving provides online access to patients' historical images, which facilitates the clinical practice of radiologists and physicians. The archived images can be used as a resource to provide enormous image data in support of medical imaging related research. When an image archive system is placed in a clinical environment supporting a PACS, reliability and timely access to images become dominant factors in determining whether the archive system is operated satisfactorily. The sophisticated image management software implemented in the archive system therefore plays an important role in providing reliable yet ef®cient operations that support the archive, retrieval, and distribution of images for the entire PACS. Several aspects must be considered when implementing a medical image archive system to support a PACS for clinical use: (a) Data integrity, which promises no loss of medical images and their relevant data (b) System ef®ciency, which provides timely access to both current and historical images (c) Scalability for future expansion of the archive system (d) System availability, which provides nearly 100% system uptime for the archive, retrieval, and distribution of images within the entire PACS
References 1. Wong A, Huang HK, ``Integrated DICOM-based image archive system for PACS,'' 205(P):615, 1997. 2. Wong A, Huang HK, Arenson RL, Lee JK, ``Digital archive system for radiologic images,'' . 14:1119± 1126, 1994. 3. Wong A, Taira RK, Huang HK, ``Digital archive center: Implementation for a radiology department,'' 159:1101±1105, 1992. 4. Wong A, Huang HK, ``High-performance image storage and communications with ®bre channel technology for PACS,'' 3662:163±170, 1999. 5. Lou SL, Hoogstrate RD, Huang HK, ``An automated PACS image acquisition and recovery scheme for image integrity based on the DICOM standard,'' ! " , 21(4):209±218, 1997. 6. Lou SL, Wang J, Moskowitz M, Bazzill T, Huang HK, ``Methods of automatically acquiring images from digital medical system,'' ! " , 19(4): 369±376, 1995. 7. Lou SL, Huang HK, Arenson RL, ``Workstation design: Image manipulation, image set handling, and display issues,'' # , 34(3):525± 544, 1996. 8. Wong A, Huang HK, Arenson RL, ``Adaptation of DICOM to an operational PACS,'' 3035:153±158, 1997. 9. Wong A, Huang HK, Lee JK, Bazzill TM, Zhu X, ``Highperformance image communication network with asynchronous mode technology,'' 2711:44±52, 1996. 10. NEMA Standards Publication, ``Digital imaging and communications in medicine (DICOM),'' National Electrical Manufacturers Association, 1999. 11. Health Level Seven: an application protocol for electronic data exchange in healthcare environment (version 2.3). Ann Arbor, MI: Health Level Seven, 1996.
48 Image Standardization in PACS Ewa Pietka Silesian University of Technology
1 2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783 Background Removal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
3
Improvement of Visual Perception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
4
Image Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
5
On the Accuracy of Quantitative Measurements in Image Intensi®er Systems . . . . . 793
2.1 Collimator-Caused Background 2.2 Removal of Background Outside Patient Boundary 3.1 Window/Level Correction 3.2 Histogram Modi®cation 4.1 Detection of the Rotation Angle 4.2 Image Rotation 5.1 Geometrical Description of Pincushion Distortion 5.2 Surface Area Correction 5.3 Clinical Implementation
6
Implementation of Image Standardization Functions in HI-PACS . . . . . . . . . . . . . . 797 6.1 Industrial Standard for Image Format and Communication 6.2 Image Content Standardization Functions
7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800
Hospital Integrated Picture Archiving and Communication Systems (HI-PACS) have become multivendor and multiuser medical systems. Generally, there are two levels of standardization to be considered (Fig. 1). One level is required by the multivendor equipment implemented in PACS (i.e., acquisition stations, workstation, archive library). As a result of integration, PACS receive also data from other information systems working in the hospital (i.e., the Radiological Information System (RIS), the Hospital Information System (HIS), or the Digital Voice System (DVS)). Information systems deliver data to be viewed and analyzed at multivendor workstations. The second level is required by human subjects who address the system according to their professional duties. Among them there are radiologists, clinicians, technologists, and admission service managers. The format and content of the data varies and may include diagnostic images, medical reports, comments, or administrative reports. Thus, both levels of standardization have to be considered to make the entire system more effective in the clinical environment. The ®rst level of standardization is required to integrate all pieces of equipment through industrial data and image standards capable of handling information access and intersystem communication needs. The second level of Copyright # 2000 by Academic Press. All rights of reproduction in any form reserved.
standardization facilitates the medical requirements and use of the data. In this approach we concentrate on the image content adjustment to make images more readable and of better quality in preparation for medical diagnosis. This step also makes advanced image processing phase easier and permits some preprocessing steps not to be considered at the level of development of the methodology, which results in a computer added diagnosis. Although this group of standardization procedures is related to image information itself without referring to image format or intersystem communication, standardization functions are integrated with clinical PACS and installed at various sites of the system. The ®rst two sections of the chapter discuss improvement of X-ray image quality by the removal of image background caused by blocking of the collimator as well as background outside the patient boundary. Then, selected histogram modi®cation techniques are shown for adjustment of anatomical structures under consideration. Section 4 discusses the image orientation problem and its correction. Since acquisition stations linked to PACS also deliver images mapped on image intensi®er systems, Section 5 discusses the most important sources of distortions in quantitative measurements. Section 6 brie¯y discusses the industrial standard (yet the reader is asked to refer to other sources for details) and then gives some outlines of how to implement image standardization functions in clinical PACS.
783
784
VI
Compression Storage and Communication
blocking of the collimator is called background, whereas in Section 2.2 an area outside the patient's boundary (within the radiation ®eld) is referred to as background.
2.1 Collimator-Caused Background Glare in Image Diagnosis
PACS integration into a hospital-wide multivendor and multiuser medical system.
Background Removal Various de®nitions of background have been introduced already. Usually it is described as an area of no importance attached to a region that is to be enhanced. Furthermore, very often it affects the visual image quality. In image standardization three various areas may be referred to as background. First consideration is given to the area outside the radiation ®eld, caused by blocking of the collimator and resulting in white borders surrounding the radiation ®eld. The other two types of background areas are located within the radiation ®eld. Depending on following image analysis steps and expected result, the area referred to as background may change from one phase of analysis to another. The ®rst phase of the image analysis (often a part of the preprocessing stage) usually concentrates on detection of the patient contour and its orientation correction. At this point the area outside the patient boundary is referred to as background and set to zero. Later phases of the image analysis, including segmentation and region of interest (ROI) extraction, may consider areas inside the patient boundary as background. For example, bone analysis in orthopedics or phalangeal and epiphyseal analysis in bone age assessment leads to considering soft tissue as background with respect to bony structures. Chest analysis also may require arms, mediastinum, and subdiaphragm to be referred to as background. The goal of many image processing functions is to suppress the background and/or enhance the foreground (i.e., diagnostically important regions) in order to increase the signal-to-noise ratio and/or extract features of high discrimination power. This section deals only with background to be removed by image standardization ( preprocessing) functions. Thus, removal of collimator-caused background and areas outside the patient contour are discussed. In the following sections two different regions are de®ned and later refer to as background. In Section 2.1. a region outside the radiation ®eld caused by
Optimization of the amount of light exposing the viewer during the image reading procedure signi®cantly affects the diagnostic performance, which is a result of information processing based on properties of the human visual system. Several sources of extraneous light may be pointed out, beginning with an increase in light re¯ection from the surroundings, which in turn increases the re¯ection of light from the radiograph itself. This phenomenon in¯uences the perception of details in the radiograph. Then, an increase of the background level changes the adaptation level of the photoreceptors in the retina, shifting the visual sensitivity threshold. This process decreases the photoreceptors' response to a given stimulus increment and causes reduced contrast discriminability. Involuntary eye movements have also been implicated as a source of decreased visual sensitivity. Finally, transparent areas within the ®eld of view resulting in excessive light hitting the viewer's eyes increase glare, which causes two principal effects. First, the glare results in signi®cant eyestrain and discomfort. Second, it decreases visual contrast sensitivity, which is proportional to the Weber ratio WR, WR
BB RB
BB
;
1
where BB is the background brightness and RB is the region of interest brightness. Glare also causes problems in digital projection radiography. Two conditions create glare in the soft-copy display; ®rst, the surrounding light re¯ected from the display screen, and second, the light emitted by the screen, which results from parts of the image as well as from its periphery. Extraneous surrounding light is limited by providing low ambient light in reading rooms. This reduces the re¯ection light to a level that does not shift the visual adaptation levels or alter perception to any appreciable degree. The second source, transparent background in the periphery, can be reduced by implementing software functions (as described later) to black out the borders. In return, this increases the visual quality of the image. Removal of the Collimator-Caused Background Whenever a collimator appears in the radiation ®eld, transparent borders are generated within the ®eld of view (Fig. 2). Their removal reduces the amount of unwanted light in the images during the soft-copy display as well as almost transparent borders on the ®lm. Moreover, the removed
785
48 Image Standardization in PACS
CR image with an unexposed background (white frame surrounding the image). (Courtesy of J. Zhang.)
part of the lung area black). Only a repetition of the radiological procedure is able to undo this failure. All developed procedures decrease the accuracy of full background removal in order to ensure that no diagnostically valid part of the image is destroyed. The accuracy of developed methods ranges from 42 to 91% of full background removal and 86 to 99% for full and partial background removal. One of the approaches with high accuracy of background removal has been developed by Zhang. Images may include pure background that does not overlap with any external objects or dense anatomical parts (Fig. 2), prostheses (Fig. 3a), or markers projected onto the background due to insuf®cient thickness of the collimator, high sensitivity of the image plate resulting in anatomical structures visible on the background (Fig. 4a), and odd-shaped abdominal collimators. The ®rst step of the algorithm analyzes the intensity distribution of the CR image background and determines the probability that a pixel belongs to the image background. The estimation is based on a background characteristic found in 1000 sectors marked on 50 randomly selected clinical radiographs. Within each sector a relationship between the average intensity of consecutive pixels yields a parameter called a background score. Assignment of each sector to background or diagnostic ®eld permits the relation between background score and background probability to be de®ned. Recognition of image background is de®ned by Zhang as the location of background edges. A gradient method is used in which the differentiation of image I yields a vector V x; y de®ned as V x; y
background, without delivering any pertinent information, adversely affects observer performance. Two major advantages are gained by implementing background removal in clinical PACS. First, it immediately provides lossless data compression Ð an important cost-effective parameter in image archive and communication. On display stations a more representative lookup table (LUT) pertinent only to the range of gray scales in the diagnostically important part of the image can been assigned. It also shortens the response time and permits a full-range image display on smaller monitors. Second, background removal already performed reduces effort while designing further image standardization steps as well as computer-aided image diagnosis for a certain type of image. It will improve the subsequent automatic image analysis, which will concentrate on the diagnostically important information with no background rather than the combination of both types of information. Background removal becomes a challenging procedure to be incorporated in clinical PACS and performed automatically on all acquired images before sending them to the archive station. In case of failure, the software procedure may irreversibly destroy diagnostically important parts of the image (e.g., turn
r x; y e
iy x;y
2
qI=qy
2
and r x; y y x; y
qI=qx tan
1
2 1=2
qI=qy = qI=qx
3
A set of all V x; y forms a gradient image. A pixel is selected when its r x; y exceeds an empirically determined threshold and the pixel itself is located close to the image border. The selection of pixels to be considered in further analysis is based on the background probability described earlier. Then, each pixel is subjected to the following background edge condition. If the difference between the average intensity of a pixel and its neighbor toward the image center is greater than that of the pixel with its neighbor toward the image edge, then this pixel is assumed to be on the edge. In order to eliminate the failure of background removal, in which the diagnostically important part of the image would be erased, two additional conditions are imposed. First, pixels of low score are excluded. The second condition is based on an assumption that collimator edges are straight lines. Thus, the angle distribution curve of background pixels has four maxima. Location of these peaks corresponds to four edges.
786
VI
Compression Storage and Communication
(a) (a)
(b) CR image with an unexposed background. (a) Original image with a prosthetic device in the left arm; (b) background-removed image. (Courtesy of J. Zhang.)
Pixels with gradient values that fall within half width of the peak location are used to ®t lines to be considered as background edges. The contribution of every pixel to ®t the line is different with various weights de®ned as w
A6e
k6Dy
4
(b) CR image with an unexposed background. (a) Original image with anatomical structures visible in the background; (b) backgroundremoved image with enhancement of diagnostic ®eld. (Courtesy of J. Zhang.)
where A and k are positive constants, and Dy is the difference between the current pixel gradient angle and the maximum angle in the curve distribution angle. Finally, yet a very important step in background removal is
48 Image Standardization in PACS
the estimation of reliability. The goodness-of-®t of already selected points is performed by applying the chi-square ®tting technique. The second step is based on a comparison of two histograms. One is obtained from the original image, the second, from the image with no background. If in the diagnostically important part of the image the histogram remains unchanged, the background removal is accepted. Otherwise, the removal is ignored. Performance of the function is shown in Figs 3 and 4. Removal of collimator-caused background is a modalitydependent as well as anatomy-dependent procedure. It is performed only for computed radiography (CR) images and digitized images. However, not all anatomies require its implementation. It is performed in pediatric radiology (for all anatomies) and in limb, hand, and head imaging (for adults). Collimator-caused background should not appear in adults chest and pelvis radiograms. Since only background (none of the anatomical structure) is subjected to the analysis, one procedure can handle all anatomies.
Removal of Background Outside Patient Boundary In this section the term background is referred to the image area outside the patient boundary, yet within the radiation ®eld. In this area landmarks or labels with patient demographic data (name, birthdate, ID number, etc.) also may be found. It seems to be the most intuitively accepted and most often used de®nition of background in image processing systems. Selected approaches to the problem of background analysis, separation from the anatomical boundary, and removal are discussed later. In most cases the procedure is anatomy-dependent and has to be developed separately for each anatomical structure. Mostly it is applied to anatomical structures clearly separated from the background. As examples, CR hand and limb images or mammograms and sectional images may be subjected to such functions. The background removal not only serves image viewing or image preprocessing functions, but is often used as a lossless compression method in order to decrease the size of an archived or transmitted image. Various methods have already been presented, yet none of them yield results satisfying all anatomical structures. Two approaches are suggested in this section. One is based on a histogram analysis and another uses a dynamic thresholding technique, where the threshold is found by a statistical analysis of the background. Histogram Approach An image histogram is a gray-scale value distribution showing the frequency of occurrence of each gray-level value. For an image size of 10246102468 bits, the abscissa ranges from 0 to 255; the total number of pixels is equal to 102461024.
787
Modi®cation of original histograms very often is used in image enhancement procedures. The histogram analysis is based on an assumption that the gray-scale values of foreground (anatomical structures) and background (outside the patient boundary) are distinguishable (Fig. 5a). This results in two peaks appearing on a histogram (Fig. 5b). Those peaks usually overlap, yet a minimum in between can be detected in order to separate both objects. After smoothing of the histogram, the threshold value can be determined either by locating the local minimum and maximum, or with statistical methods. This separates the foreground (white region in Fig. 5c) from the background (black region in Fig. 5c). This approach fails in cases of nonuniformity of the background. This very rough assessment of the threshold very often cuts some parts of the anatomical structures, particularly the border areas between background and foreground. Parts of soft tissue have been cut in Fig. 5c. Dynamic Thresholding In the dynamic thresholding approach the threshold is adjusted dynamically and depends on the current background value. In the ®rst stage, a window of ®xed size is scanned in the vertical direction and statistical parameters such as mean, variance, maximum gradient, and maximum value are computed. The four windows of lowest variance located in the image corners become candidates for a background area. Then, the window with the highest mean value placed in the central part of the image is referred to as an area within the anatomical structure. Because of various landmarks and labels placed in the image periphery, location of the highest mean value window is limited to the central part of the image. The ratio of the highest and lowest mean values indicates the image quality and will be referred to as the quality control parameter. It may restrict the performance of this function. A local threshold is de®ned using the mean and variance values of background windows. Then, a linear interpolation in vertical direction yields the threshold value for each row. The interpolation is also performed in the horizontal direction, yielding the threshold value for each column. The thresholding procedure, performed separately in both directions, blacks out pixels below the threshold without changing those larger than the local threshold. If necessary, landmarks and labels are detected by searching the densest area of the histogram for a gray-scale value with the least frequency of occurrence. Morphological ®ltering is used to remove all small noisy elements in the background. An erosion function with a 363 pixel structuring element turns to zero all background elements equal to or smaller than the structuring element. Both approaches to patient's background removal implemented in clinical images destroy parts of the diagnostic ®eld. The area close to the patient contour is partially removed. This
788
VI
(a)
Compression Storage and Communication
(b)
(c) Removal of background outside anatomical structures. (a) Original image; (b) histogram (arrow marks the threshold value); (c) thresholded image: anatomical structures remaining in the image are marked in white.
prevents the methods from being used in an unsupervised implementation for clinical PACS. They can be applied at workstations in order to suppress the background and improve visual perception (as described in Section 3) or as a preprocessing function in computer-aided diagnosis. In both cases they are used interactively and do not irreversibly destroy the diagnostically important image regions.
3 Improvement of Visual Perception Visual perception is an essential factor in medical diagnosis performed on soft-copy displays. Monitors do not provide a diagnostically accepted standard of display quality if no
brightness and contrast adjustment is performed. This becomes of particular importance when digitized images are read. Image enhancement may be required for various modalities and anatomical structures. The procedure may be preceded by a background removal function (see Section 2). Two approaches are discussed. First, a manual window/level adjustment (Section 3.1) is performed at a workstation, and the user is responsible for the quality and enhancement of the displayed image. Secondly, image enhancement parameters are found automatically, usually by means of a histogram modi®cation technique (Section 3.2), and stored in the image header. At a workstation a user can easily switch from one set of parameters to another. Each set gives the enhancement of a different anatomical region (bony structure, soft tissue, lungs, etc.).
48 Image Standardization in PACS
789
3.1 Window/Level Correction
underpenetrated lung). By applying different gains the contrast can be increased or reduced. Based on this approach several lookup tables are created. Some of them enhance (at different levels) the radiographically dense tissue, others the radiographically soft tissue. One lookup table is created with no enhancement. In clinical PACS the analysis is performed at the acquisition gateway and parameters are stored in the image header. At the time of display the enhancement level is selected manually by the user to improve the brightness and contrast of a certain gray-level region.
In the window/level concept two parameters are de®ned. Window is referred to as the range of gray-scale values distributed over the entire dynamic range of the display monitor. A decrease of the window value increases the contrast in the display image, yet gray-scale values outside the window range are turned to black or white. The center of the interval is called the level value. The window/level adjustment can be performed manually at the workstation or automatically. A manual shift of the upper and/or lower edge of the gray-scale level changes the window value. A more user-friendly adjustment uses a mouse or trackball. A vertical movement typically changes the window level, whereas a horizontal shift controls the level value. The gray-scale value adjustment can be performed in real time using a lookup table. The mapping is accomplished by de®ning a curve in which output gray levels are plotted against input gray levels. A computerized window/level adjustment procedure ®rst ®nds the minimum and maximum of a global image histogram. In order to suppress the extraneous gray-level values, 5% of values are cut off from both sides of the histogram. This means that the minimum gray-scale value is the 5% gray scale level of the cumulative histogram, whereas the maximum value is found at the 95% level. Minimum and maximum values de®ne the window range and their average value yields the level. Window and level are used to generate the default lookup table for the image display. The computerized approach to the window/level correction can be applied for single images (e.g., CR or digitized images) as well as for computed tomography (CT) or magnetic resonance (MR). If a single image is analyzed, a histogram of one image is found. For CT/MR images the entire set of images is used to calculate the histogram.
3.2 Histogram Modi®cation The histogram equalization technique described in Chapter 1 can be used to improve the appearance of the image. Figure 6 shows the result of histogram equalization performed on a CR chest image. Another histogram method has been introduced to enhance speci®c anatomical structures. The goal of this preprocessing function is to create several piecewise-linear lookup tables to adjust the brightness and contrast of different tissue density. The procedure has been developed for CR chest images but also could be adapted for other anatomical structures. The ®rst step is to analyze the image histogram to ®nd key breakpoints that divide the image into three regions: background (outside the patient boundary, yet within the radiation ®eld), radiographically soft tissue (skin, muscle, fat, overexposed lungs), and radiographically dense tissue (mediastinum, subdiaphragm,
4 Image Orientation Image orientation becomes a very important issue in radiological systems in which a soft-copy display is used in daily clinical procedures. As a standard orientation, the position viewed by radiologists is considered. Typically two acquisition modalities may yield rotated images. First, a ®lm digitization procedure may result in a nonstandardized image orientation when a ®lm is placed in the wrong position. Yet clinical investigation has shown that scanned images are mostly at the correct position, and the orientation problem does not appear as a critical issue. Computed radiography (CR) is another modality that yields misoriented images. A survey in pediatric radiology has found that between 35 and 40% of procedures are not performed with a conventional orientation. It is caused by the patient condition as well as the clinical environment. The CR cassette can be placed at various orientations to accommodate the examination condition. Because of the position of the image plate, eight various orientations are considered (Fig. 7). They are divided into two major groups. The ®rst group of orientations includes correct (Fig. 7a), upside-down (Fig. 7c), and rotated 90 clockwise (Fig. 7b) and counterclockwise (Fig. 7d). Flipping around the y-axis of the anteroposterior projection (Fig. 7e) gives an additional three possible orientations: upside-down and ¯ipped (Fig. 7g), rotated 90 clockwise and ¯ipped (Fig. 7f ), and rotated 90 counterclockwise and ¯ipped (Fig. 7h). A turn to soft-copy display diagnosis makes the orientation correction a major problem. Since the orientation correction algorithm refers to the anatomical structure within the image, different functions must be applied for each anatomy (chest, pelvis, hand/wrist, etc.). For chest radiographs, the orientation correction function handles various types of CR chest images: adults posterioanterior (PA) or anterioposterior (AP) and lateral projections, and pediatric PA (or AP) and lateral projections. Although all these are chest images, there are different problems that must be addressed in each of these types. Adult chest images are much closer to a standard image than
VI
Compression Storage and Communication
Histogram equalization. (a) Original image; (b) histogram of the original image; (c) enhanced image; (d) modi®ed histogram.
Eight possible orientations of a chest radiograph.
48 Image Standardization in PACS
pediatric images in the sense that the mediastinum is usually centered within the image and the image itself contains only the thorax. The problem becomes much more dif®cult for pediatric images because the range of variation is much wider. The image may or may not include all or part of the head, the location of arms is random, and the area of exposed subdiaphragm differs from one image to another. Very often even the abdominal region is included, changing the location of lungs within the image. The image is usually not centered and is positioned at a random angle to the vertical axis of the image.
The orientation procedure is anatomy-dependent. The procedure uses some anatomical features in order to determine the current orientation and necessary rotation angle. Three procedures are described for three anatomical structures: chest, pelvis, and hand/wrist. In chest images the analysis is performed in three steps. First, the mediastinum is located and its orientation is found. This step excludes 90 rotations clockwise and counterclockwise in both groups of images (non¯ipped and ¯ipped). Then, a search
for the subdiaphragm is performed. It eliminates upside-down images in both groups. Finally, images are tested against the yaxis ¯ip. Detection of anatomical structures (i.e., mediastinum, subdiaphragm, lungs) is based on average density measured within prede®ned windows scanned horizontally and vertically. The window size is determined by the width of the subdiaphragm assessed on the basis of clinical images in pediatric radiology. The average density measures yield average pro®les re¯ecting the changes of the average gray-scale values in the horizontal (Fig. 8a) and vertical (Fig. 8b) direction. In the horizontal direction, the mediastinum is marked by a high-value plateau placed between two low-value levels representing the lungs. The high-value plateau corresponds to a high-value average pro®le marked vertically in between two lower-value average pro®les. One side of these average pro®les increases, re¯ecting the subdiaphragm, which horizontally corresponds to a high-value plateau. Once the mediastinum and subdiaphragm are located, the images can be rotated to the upright position. The ®nal step detects the y-axis ¯ip, recognized either by a detection of the local landmarks pointing the left or right image
(a)
(b)
4.1 Detection of the Rotation Angle
Pro®le analysis in the detection of a current image orientation. (a) Horizontal pro®les; (b) vertical pro®les scanned over a CR chest image. Image can be at any orientation shown in Fig. 7.
VI
side or by an analysis of the cardiac area. Landmarks (usually L for left or R for right image side) are placed within the radiation ®eld, sometimes even within the patient contour. They also can be found (entirely or partially) in the area blocked by the collimator if only a sensitive image plate makes them visible. Their orientation (angle with respect to the image edge) also is random, and size may differ from one image to another. All this makes their detection and recognition more dif®cult. Standardization of the location and orientation of landmarks would make the task much easier. Another way to detect the y-axis ¯ip is an analysis of the cardiac shadow. Many approaches to the problem of lung segmentation and assessment of heart shadow size have been published already, and it will not be discussed in detail. They can be implemented at this stage of analysis. Also, a simple average pro®le analysis performed on the lower part of the lung yields two values that are referred to as a right cardiac index (RCI) and left cardiac index (LCI) and de®ned as (Fig. 9) RCI
b= a
b
and LCI
c= c
d ;
5
Compression Storage and Communication
Each anatomical structure requires its own function. For abdominal images, again horizontal and vertical average pro®les are scanned. As for chest, the ®rst stage locates the spine by searching for a high-value plateau. Then, perpendicular to the spine the average density and uniformity of upper and lower average pro®les are tested. A denser area indicates the subdiaphragm. Location of spine and abdomen determines the rotation angle. No y-axis ¯ip is considered. For hand images, the analysis is performed on thresholded images (as discussed in Section 2.2). In order to ®nd the correction angle, a pair of average pro®les is scanned and shifted toward the image center until one average pro®le intersects the forearm, and the other at least three phalanges (Fig. 10). The forearm is detected if a high-value plateau located in the central part of the image and two neighbor lowvalue levels cover the entire image width (or height). Three phalanges are detected if three high-value plateaus are located within the average pro®le. The ranges of width of those plateaus are de®ned on the basis of anatomical width of forearms and ®ngers found in clinical pediatric hand images.
where a, b, c, and d are shown in Fig. 9. They re¯ect the size of the heart shadow in comparison to the overall lung size on both sides of the mediastinum. The LCI should be larger than the RCI. If this is not the case, the image is y-axis ¯ipped.
Heart shadow analysis in detection of the y-axis ¯ip.
Pair of average pro®les scanned over a CR hand/wrist image.
48 Image Standardization in PACS
The search is completed when a pair of average pro®les (scanned vertically or horizontally) meet the mentioned criteria.
4.2 Image Rotation Once the correction angle has been found, a rotation procedure is implemented. Four possible rotation angles are considered. A 0 angle means no rotation. Three other rotation angles (90 clockwise or counterclockwise and 180 ) may be with or without ¯ip. For the rotation procedure, let us ®rst consider a coordinate system with the x1 and y1 axes to be rotated without changing the origin. This yields new coordinates, yet the same origin as in the original system (Fig. 11). For a j rotation angle the old and new coordinates are related by " # " #" # x2 cos j sin j x1 6 sin j cos j y1 y2 In our approach rotation functions without ¯ip are de®ned as ) ( x2 y1 for j p=2 y2 x1 (
x2 y2
7x1 y1
)
for
j
p
(
x2 7y1 y2 x1
)
for
j
3p=2
7
If the y-axis ¯ip is required, the rotation functions become ) ( x2 7x1 for j 0 y2 y1 (
x2 y2
7y1 7x1
)
j
for
p=2 8
(
y2
x1 7y1
(
x2 y2
y1 x1
x2
)
)
j
for
for
j
p
3p=2
The orientation correction function is applied for CR images only. Other modalities do not require this type of standardization. The acquisition stations themselves secure a correct image orientation.
Rotation of coordinate system.
5 On the Accuracy of Quantitative Measurements in Image Intensi®er Systems The image intensi®er (II) tube is an electrooptical device used to detect, intensify, and shutter optical images. It is a vacuum tube that contains four basic elements: input phosphor screen and photocathode, electrostatic focusing lens, accelerating anode, and output phosphor. In diagnostic radiology, image intensi®ers are applied in ¯uoroscopy and angiography where the viewing of images in real time is desired. This means that the X-radiation pattern emerging from the patient has to be transformed immediately into an image directly viewable by a radiologist. In angiographic quantitative analysis, measurement of blood vessel diameters plays an important role and often also serves as a basis from which other values or indexes can be derived. This requires a technique that minimizes the distortion due to the structure of the II tube and permits a corrected image to be archived. Image-intensi®er tubes, built with electron focusing lenses, may have ®ve aberrations: distortion of image caused by the input phosphor screen, astigmatism, curvature of image ®eld, spherical aberration, or coma. Aberration caused by the curvature of an input phosphor surface changes the shape of images more than any other type of distortion. The discrepancy in size may reach even 20% in the periphery. Thus, the accuracy of measurement of abnormalities depends on its location within the image. Pathology viewed in the image periphery appears wider than that viewed in the central part of the image. The aberration caused by the spherical surface of the input phosphor screen introduces a nonlinear relationship between points in the object plane and the corresponding points in the image. The error, generally referred to as the pincushion distortion, depends on the distance from the center of the input phosphor screen (the point in which the plane is perpendicular to the central X-ray), the
794
VI
radius of input phosphor screen curvature, and the view angle between the central X-ray and the axis perpendicular to the object plane. There is one more type of distortion (discussed in Section 5.2) that originates from the same source. It results in brightness nonuniformity. Image periphery is denser than its center. The illumination decrease is caused by the spread of the X-ray beam over a larger surface. Since the light intensity is related to the size of the exposed area, the pixel value depends on the distance from the image center. The increase of the surface area deteriorates the sharpness in the periphery. Edges viewed off the image center become blurred. After shrinking the image ( performed by the correction function), enhancement is obtained.
5.1 Geometrical Description of Pincushion Distortion Depending on the value of the view angle (angle between the central X-ray beam and the object plane) two models are discussed. First, it is assumed that the central X-ray beam is perpendicular to the object plane. Second, an arbitrary view angle model is analyzed.
Compression Storage and Communication
dx and then lengthened to da (arc between S and Pa ) by the input phosphor spherical surface. Since in this approach no other distortions are considered, a linear magni®cation is assumed. Magni®cation along the central beam is dx d0
9
where d0 is the distance from the object to the central point in the input phosphor, dx is the magni®ed distance of d0 , and s1 and s2 are a distances from the focal spot FS to the object plane and input phosphor screen, respectively. From the geometry of Fig. 12, dc sin
s sin p=2
s2
=2
cos
=2
10
and s2 dx
ctg:
11
From Eq. (10) and by the use of Eq. (11), the magni®ed distance dx is s2 dc cos =2 : s2 dc sin =2
dx Object at 0 View Angle Let us ®rst consider a case when the X-ray falls perpendicular to the object plane (i.e., the central X-ray beam is at the zenith position with respect to the patient). A point P x; y (Fig. 12) is projected onto the input phosphor surface at Pa x; y . The original distance d0 is magni®ed to
s2 s1
12
With substitution of the relationship between the chord dc , arc , and radius R of the sphere, dc
2R sin =2
da =R
Geometrical distortions originated from projecting a ¯at object plate onto a curved input phosphor surface.
13
48 Image Standardization in PACS
In Eq. (12) we obtain dx
ctga
s2
2Rs2 tg da =2R : s2 2R tg2 da =2R
14
If we substitute Eq. (14) in Eq. (9), the relationship between the original distance d0 and its projection onto the input phosphor surface da is given by d0
s2
2Rs1 tg da =2R ; s2 2R tg2 da =2R
dc sin b=2 dc cos b=2
18
dx sin u : dx cos u
19
and ctga
s2
By combining Eqs. (18) and (19), the magni®cation in the plane parallel to the object plane becomes
15 dx
where R is the radius of the input phosphor screen curvature, and da is the d0 distance projected onto the input screen at a 0 angle. Note that, because of the manufacturing process, there is no one single radius of curvature for II tubes, yet for simplicity we consider only a single radius.
s2
s2 cos u
s2 dc cos b=2 : dc sin b=2 cos u sin u cos b=2
20
Substituting Eq. (13) in (20) and combining it with Eq. (9), we are given the correct distance from the central point to a current pixel, d0
s2
s2
2R
tg2
2Rs1 tg da =2R da =2R cos u
2R sin utg da =2R
; 21
Object at Arbitrary View Angle The X-ray beam may also fall at an arbitrary angle to an object. This leads to a generalization of the equations just derived. From the geometry of Fig. 13, dc sin
s sin p=2
s2
=2
cos
=2
16
and
where s1 is the distance from the focal spot FS to the object, s2 is the distance from the focal spot FS to the input phosphor, R is the radius of the input phosphor curvature, and da is the d0 distance projected onto the input phosphor, which can be measured from the input (uncorrected) image. For a 0 view angle, Eqs. (21) and (15) are equivalent. Pixel Coordinate Correction
dx sin
s sin p=2
a
u
s2 ; cos a u
17
where u is the view angle and a is de®ned in Fig. 1. Using trigonometric relationships, Eqs. (16) and (17) become
Once the pixel distance from the central point of the input image has been determined, the correction of the pixel coordinate may be performed. It is again performed with respect to the image center, which is unchanged. All other points are shifted along the radius toward the image center. Figure 14 shows the correction function performed on a grid
Geometrical distortions caused by a projection under an arbitrary view angle.
VI
Compression Storage and Communication
(a)
(b)
Image of a grid phantom. (a) Uncorrected image: The lines are bound and the periphery appears denser than the image center. (b) Corrected image: Straightening of lines and decrease of brightness nonuniformity are obtained.
image. The nonlinear magni®cation effect originates from the projection of a ¯at plane onto a spherical surface. The size difference is proportional to the ratio of the zone surface area to the base surface area. The uncorrected grid image (Fig. 14a) indicates that the range of correction depends on the distance between a current pixel and the image center. The larger the distance, the larger the correction coef®cient has to be. As a result, the ®nal (corrected) image (Fig. 14b) is shrunk along the radius toward the image center. This changes the entire image size.
d0 b
Object at 0 View Angle In order to describe the nonlinear magni®cation of the area obtained by mapping a ¯at unary circle onto a spherical surface, Eq. (15) is rewritten as a function of b d0 =R. Thus, Eq. (15) becomes
22
From the geometry of Fig. 12, the diameter of the unary area dD is given by dD
d0 b
d0 b
b :
23
If we substitute Eq. (22) in Eq. (23), we obtain a quadratic equation with respect to Db b b , A
5.2 Surface Area Correction Since the light intensity is related to the size of the exposed area, the pixel intensity in the image depends on the distance of the pixel from the image center. In the analysis we assume an X-ray beam to be responsible for the intensity of a unary circular area in the object plane. The analysis is again based on Figs 12 and 13, in which the unary circular area is marked between the upper solid and dashed lines. Both models (i.e., object at 0 view angle and at arbitrary view angle) are analyzed separately.
2Rs1 tg b=2 : s2 2R tg2 b=2
s2
Bx ; s2 Cx 2
24
where A
2Rs1 tg b=2 s2 s2 2R tg2 b=2 s2 s2 2R tg2 b=2
B
2Rs1
C
s2
x
tg Db=2 :
2R
After solution of Eq. (24), the value Db is obtained and the diameter Dda of the enlarged area is given by Dda
R6Db;
where R is the input phosphor screen curvature.
25
48 Image Standardization in PACS
Object at Arbitrary View Angle By generalizing Eq. (15), a description of an arbitrary view angle has been derived. Considering d0 in Eq. (21) as a function of b, and substituting it in d0 b
dD
b ;
d0 b
26
derived from the geometry of Fig. 13, a quadratic equation is obtained, A
s2
Bx Cx 2
Dx
;
27
where A
2R s1 s2
B
2Rs1
C
s2
sin u tg b=2 s2 s2 s2 2R tg2 b=2 cos u
2R tg2 b=2 cos u 2R sin utg b=2
2R cos u
D
2R sin u
x
tg Db=2
Solution of Eq. (27) yields Db diameter Dda
b
R6Db
b , with which the 28
is obtained. Pixel Value Correction Because of the quadratic dependency between the brightness and the surface of the exposed area, the pixel value correction is obtained by multiplying the image pixel value by the square of the area enlargement rate. Its value is close to 1 in the image center. Therefore, the brightness of the central part remains unchanged. The increase of the area enlargement rate toward the periphery yields the brightening of this area. This results in improvement of the uniformity of the intensity throughout the entire image (Fig. 14).
5.3 Clinical Implementation Parameters used in the correction procedure and describing the input phosphor screen curvature, view angle, and distances from the focal spot to the object plane and to the input phosphor screen are ®xed at the time of exposure. These parameters make the correction procedure independent of the system. The correction formula is applicable to X-ray systems with different image intensi®ers and for different view angle projection. System calibration is not required, nor must any constant values be determined by empirical methods. Since the view angle is a constant for a series of images, only one trigonometric function per pixel needs to be computed. In order to shorten the run time, lookup tables may be used. A
lookup table contains the tangent function values for Eqs. (15) and (21) and the inverse trigonometric function arctangent in Eqs. (25) and (28). The result of subjecting an angiogram to the correction procedure is shown in Fig. 15. Since the procedure shrank the image along the radius toward the center, the size of the entire image has changed. However, the size of objects placed in the image center remains unchanged, even though in the periphery the decrease is noticeable (see blood vessels indiated by arrows).
6 Implementation of Image Standardization Functions in HI-PACS 6.1 Industrial Standard for Image Format and Communication Although since 1992 DICOM 3.0 has become accepted worldwide, in clinical procedures, equipment as well as databases typically still comply with the ACR-NEMA 2.0 standard. Thus, in most cases an ACR-NEMA 2.0 to DICOM 3.0 conversion is required. The DICOM 3.0 standard provides several major enhancements to the earlier ACR-NEMA version. Two fundamental components of DICOM, described in detail in Chapter 47, are information object class and service class. The information objects de®ne the contents of a set of images and their relationship (e.g., patients, modalities, studies). They consist of normalized objects, including attributes inherent in the real-world entity, and composite objects, which combine normalized object classes. On the other hand, ACR-NEMA describes images obtained from different modalities based on a composite information concept. The service class describes the action performed on the objects (e.g., image storage, query, retrieval, print). These commands are backward compatible with the earlier ACRNEMA version. The composite commands are generalized, whereas the normalized commands are more speci®c. For image transmission, DICOM uses existing network communication standards based on the International Standards Organization Open Systems Interconnection. If an imaging device transmits an image object with a DICOM 3.0 command, the receiver must use a DICOM 3.0 command to receive the information. However, if a DICOM object is transmitted with a TCP/IP communication protocol (without invoking the DICOM communication), any device connected to the network can receive the data with the TCP/IP protocol. An ACR-NEMA to DICOM conversion also becomes a preprocessing function. Usually vendors provide the users with a DICOM interface that generates a new ®le format and permits the acquisition station (or workstation) to work in a DICOM environment. If users prefer to deal with the conversion problem themselves, the preprocessing function should be installed at the acquisition station or the acquisition gateway.
798
VI
(a)
Compression Storage and Communication
(b)
Angiograms. (a) Original image. (b) Processed image shrunk toward its center to perform the blood-vessel size correction and to sharpen the edges (examples are indicated by arrows).
4 Image Content Standardization Functions In clinical PACS the correction procedure can be implemented at two sites (Fig. 16). If image standardization is performed before archiving, the procedures are installed at the acquisition gateway and performed once. A corrected image is then archived and can be sent to the user with no preprocessing functions to be performed. However, if rough images are to be
archived and standardization is to be performed at the time of an image request, the standardization procedures are installed at the archive gateway or directly at the workstation. Preprocessing functions are then performed at any time an access to the image is requested. This may slow down the system response time. However, a very important advantage of this option is that it prevents irreversible damage to images in
Implementation of standardization functions in clinical PACS.
48 Image Standardization in PACS
case there is a failure of preprocessing functions. With standardization performed on acquisition gateway, if undesirable modi®cation of the image occurs, the radiological procedure has to be repeated. A comparison of both installations is made in Table 1. The standardization function to be implemented depends on the modality as well as the anatomy shown in the image. Not all of the images have to be subjected to all preprocessing functions discussed in this chapter, including correction of image format, background, and orientation, search for window/level values, optimization of brightness/contrast, or correction of image size in order to improve the accuracy of quantitative measurements. Thus, a set of functions performed on a certain image is precisely de®ned for each type of image. Depending on the application, preprocessing functions can be divided into two categories: only modality-dependent and both modality- and anatomy-dependent. Clinical experience has shown that each modality presents unique problems that need to be resolved before soft-copy images are suitable for diagnosis. In this chapter, preprocessing techniques for four modalities are considered: angiography, CR, CT, and MR. Quantitative analysis and measurement of recorded blood vessel diameters play an important role in angiography. It serves as a basis for other values or indexes to be derived. This requires implementation of a technique that reduces distortions caused by mapping a ¯at surface onto a spherical imageintensi®er surface. Functions suppressing those distortions are discussed in Section 5 of this chapter. Since no danger of image destruction has been observed, this function can be implemented at the acquisition gateway. CR images require preprocessing functions that remove the unexposed background (Section 2.1), standardize the orientation (Section 4), and improve the visual perception by adjusting selected anatomical regions (Section 3). Background removal may not be applied in adult chest or abdomen, yet it should by performed in hand, limb, and head images. On the other hand, about 70% of pediatric images show a wide unexposed background, which needs removal. Therefore, all pediatric images are subjected to a correction procedure. Since the methodology does not depend on the anatomical structure within the image (only background is subjected to the analysis) one function handles all anatomies. Although this function is installed at the acquisition gateway, a possible image damage issue needs to be considered carefully.
The procedure discussed in Section 2.1 has been clinically tested and no removal of the diagnostic ®eld of view has been reported. Yet, the function reaches a certain level of accuracy in a particular clinical environment and technical setup, including radiological equipment, type of collimators, sensitivity of image plates, etc. Thus, before being installed in another environment in an unsupervised mode, the function should be tested carefully on a large set of clinical data. Since about 35 to 40% of images acquired in radiology are not in an upright position, standardization of image orientation becomes an important issue. Requiring manual image rotation with a selection of the rotation angle would interrupt the medical diagnosis based on soft-copy display, as well as image presentation during seminars, conferences, and teaching. Methods for orientation correction discussed in Section 4 of this chapter do not destroy the image content. They may place the image in the wrong orientation, but this failure is reversible. Thus, the function should be installed at the acquisition gateway. Moreover, if rotation is required, the entire procedure becomes time-consuming. Its implementation on the archive gateway or workstation would signi®cantly lengthen the overall system response time. The delay depends on the image size. For CR as well as CT/MR images, certain LUTs are required to enhance one anatomical region and suppress the others. Fixed LUTs are of particular importance at viewing stations for a preview of images, during conferences, teleconferences, and teaching. They also are used during a primary diagnosis to enhance various gray-scale regions (i.e., lungs and mediastinum in chest images) at different levels. Thus, the best place for the implementation of those preprocessing functions is the acquisition gateway. At this time enhancement parameters are stored in the image header. Since no penetration into the image content is performed, no damage can be done to the image information.
7 Summary In this chapter an attempt has been made to provide an overview of main image standardization functions, their implementation in clinical PACS, and applications to clinical diagnosis. Integration of standardization functions as an application package in clinical PACS adds a new quality to
Comparison of standardization performed on acquisition and archive gateways
Standardization on acquisition gateway
Standardization on archive gateway
Standardized images are archived Standardization is performed once Does not affect the response time Irreversible image damage in case of failure
Rough images are archived Standardization is performed many times Slows down the system response time In case of failure, rough image is at the archive station
800
the image diagnosis and image preview during seminars, conferences, and teaching. It improves the visual image perception, leading to a more accurate diagnosis, and it provides a shorter system response time. During the development of computer-aided diagnosis systems, the use of preprocessing and standardization of the image content permits efforts to be focused on new image processing procedures that lead to extraction of features, evaluation of their discriminant power, and eventually pattern recognition and image classi®cation.
References 1. Alter AJ, Kargas GA, Kargas SA, Cameron JR, McDermott JC. The in¯uence of ambient and view box light upon visual detection of low contrast targets in a radiograph. Invest. Radiol. , 402 (1982). 2. Baxter B, Ravindra H, Norman RA, Changes in lesion delectability caused by light adaptation in retinal photoreceptors. Invest. Radiol. , 105 (1983). 3. Bollen R, Vranckx J. In¯uence of ambient light on visual sensitometric properties of, and detail perception on, a radiograph. SPIE , 57 (1981). 4. Cao F, Huang HK, Pietka E, Gilsanz V, Ominsky S. Diagnostic workstation for digital hand atlas in bone age assessment. SPIE
, 608±614 (1998). 5. Cheng D, Goldberg M. An algorithm for segmenting chest radiographs. Proc. SPIE , 261±268 (1988). 6. Csorba IP. Image Tubes. Indianapolis: Howard W, Sams & Co, 1985, Chapters 3, 4. 7. Curry TS, Dowdey JE, Murry RC. Christensen's Physics of Diagnostic Radiology, 4th ed. Lea & Febiger, Philadelphia 1990. 8. DaPonte JS, Fox MD. Enhancement of chest radiographs with gradient operators. IEEE Trans. Medical Imaging , 109±117 (1988). 9. Davies DH, Dance DR. Computer detection of soft tissue masses in digital mammograms: Automatic region segmentation. Proc. SPIE , 514±521 (1992). 10. Davis LS. A survey of edge detection techniques. Comput. Graphics Image Process. , 248±270, (1975). 11. Fujita H, Doi K, Fencil LE. Image feature analysis and computer-aided diagnosis in digital radiography. Computerized determination of vessel sizes in digital subtraction angiography. Med. Phys. , 549±556 (1987). 12. Gonzalez RC, Wintz P. Digital Image Processing, 2nd ed. Addison-Wesley Publishing Company, 1987. 13. Huang HK, Arenson SL, Lou SL, et al. Second-generation PACS. Radiology (P), 410 (1993). 14. Huang HK, Kangarloo H, Cho PS, Taira RK, Ho BKT, Chan KK. Planning a totally digital radiology department. AJR , 635±639 (1990).
VI
Compression Storage and Communication
15. Huang HK, Taira RK, Lou SL, Wong A, Breant C, et al. Implementation of a large-scale Picture Archiving and Communication System. Comp. Med. Imag. Graph. , 1± 11 (1993). 16. Huang HK, Taira RK. PACS infrastructure design. AJR , 743±749 (1992). 17. Huang HK, Wong STC, Pietka E. Medical image informatics infrastructure design and application. Medical Informatics , 279±289 (1998). 18. Huang HK. Elements of Digital Radiology. Prentice-Hall, 1987. 19. Huang HK. PACS Ð Basic Principles and Application. Wiley-Liss, 1999. 20. Kruger RP, Townes JR, Hall DL, Dwyer SJ, Lodwick GS. Automated radiographic diagnosis via feature extraction and classi®cation of cardiac size and shape desecrators. IEEE Trans. Biomed. Eng. , 174±186 (1972). 21. Ligier Y, Funk M, Ratib O, Perrier R, Girard C, The Osiris user interface for manipulating medical images. Proc NATO ASI, 1990. 22. Ligier Y, Pietka E, Logean M, et al. Quantitative evaluation of workstation tools. Radiology (P), 332 (1994). 23. Ligier Y, Ratib O, Logean M, et al. Object-oriented design of medical imaging software. Comput. Med. Imag. Graph. , 125±135 (1994). 24. McNitt-Gray MF, Pietka E, Huang HK. Image preprocessing for Picture Archiving and Communication System. Invest. Radiol. , 529±535 (1992). 25. McNitt-Gray MF, Pietka E, Huang HK. Preprocessing functions for computed radiography images in a PACS environment. Proc. SPIE , 94±102 (1992). 26. McNitt-Gray MF. Segmentation of chest radiographs as a pattern classi®cation problem. Ph.D. Dissertation, University of California, Los Angeles, 1993. 27. McNitt-Gray MF, Eldredge SL, Tagawa J, Huang HK. Automatic removal of unexposed background in digital radiographs. Proc. SPIE , 451±457 (1992). 28. Meyers PH, Nice CM, Becker HC et al. Automated computer analysis of radiographic images. Radiology , 1029±1033 (1986). 29. Michael DJ, Nelson AC. HANDX: A model-based system for automatic segmentation of bones from digital radiographs. IEEE Trans. Medical Imaging , 64±69 (1989). 30. Nakamori N, Doi K, Sabeti V, MacMahon H. Image feature analysis and computer-aided diagnosis in digital radiography. Automated analysis of size of heart and lung in chest images. Med. Phys. , 342±350 (1990). 31. Pietka E, Huang HK. Correction of aberration in imageintensi®er systems. Comput. Med. Imag. Graphics , 253± 258 (1992). 32. Pietka E, Huang HK. Image processing techniques in bone age assessment. In Image Processing Techniques and Applications (ed. C.T. Leondes), Gordon & Breach Publishers, 1997.
48 Image Standardization in PACS
33. Pietka E, Huang HK. Orientation correction for chest images. *. Digital Imag. , 185±189 (1992). 34. Pietka E, McNitt-Gray MF, Hall T, Huang HK. Computerized bone analysis of hand radiographs. Proc. SPIE , 522±528 (1992). 35. Pietka E, Ratib O. Segmentation of chest radiographs. Proceedings IEEE/EMBS , 1991±1912 (1992). 36. Pietka E. Lung segmentation in digital radiographs. J. Digital Imag. , 79±84 (1994). 37. Razavi M, Hall T, Aberle D, Hayreptian A, Loloyan M, Eldredge S. Display conditions and lesion detectability: Effect of background light. Proc. SPIE , 776±782 (1990). 38. Rehm K, Pitt MJ, Ovitt TW, Dallas WJ. Digital image enhancement for display of bone radiograph. Proc. SPIE , 582±590 (1992).
801
39. Rogers DC, Johnson RE. Effect of ambient light on electronically displayed medical images as measured by luminance-discrimination thresholds. J. Opt. Soc. Am. , 976 (1987). 40. Serra J (ed). Image Analysis and Mathematical Morphology, Vol. 2. Academic Press, London, 1988. 41. Sherrier RH, Johnson GA. Regionally adaptive histogram equalization of the chest. IEEE Trans. Med. Imag. , 1± 7 (1987). 42. Zhang J, Huang HK. Automatic background recognition and removal (ABRR) of Computed Radiography images. IEEE Trans. Med. Imaging , 762±771 (1997).
49 Quality Evaluation for Compressed Medical Images: Fundamentals Pamela Cosman
Robert Gray Richard Olshen
1 2 3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803 Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804 The Three Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805 3.1 CT Study 3.2 MR Study 3.3 Mammogram Study
4 5
Average Distortion and SNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807 Subjective Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809
6 7 8
Diagnostic Accuracy and ROC Methodology . . . . . . . . . . . . . Determination of a Gold Standard. . . . . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Mammography Subjective Ratings
As radiology becomes increasingly digital and picture archive and communication systems (PACS) move from research to development and practice, the quantity of digital information generated threatens to overwhelm available communication and storage media. Although these media will improve with technology, the need for ef®ciency will remain for inherently narrowband links such as satellites, wireless, and existing media such as twisted pair that will remain useful for many years. The expected growth in digital data as X-rays become digital will balance much of the expected gain in transmission bandwidth and local storage. Typical high-resolution digital mammograms require tens of megabytes for each image. The transfer of a collection of studies for research or education across the Internet can take hours. Image compression can provide increases in transmission speed and in the quantity of images stored on a given disk. Lossless compression, in which an original image is perfectly
Portions reprinted, with permission, from IEEE Trans. Medical Imaging, 12(4): 727±739, Dec. 1993 and Proceedings IEEE, 82(6): 919±932, June, 1994.
Copyright # 2000 by Academic Press. All rights of reproduction in any form reserved.
................ ................ ................ ................
813 816 817 817
recoverable from the compressed format, can be used without controversy. However, its gains are limited, ranging from a typical 2:1 compression (i.e., producing computer ®les of half the original size) to an optimistic 4:1. Serious compression of 10:1 or more must be lossy in that the original image cannot be recovered from the compressed format; one can only recover an approximation. How does one evaluate the approximation? Clearly the usefulness of image compression depends critically on the quality of the processed images. Quality is an attribute with many possible de®nitions and interpretations, depending on the use to which the images will be put. Sometimes it is felt that for a compressed image to be considered ``high quality,'' it should be visually indistinguishable from the original. This is sometimes referred to as ``transparent quality'' or ``perceptually lossless'' since the use of compression on the image is transparent to the viewer. Upon ®rst consideration, this concept of visually indistinguishable would seem to be a simple de®nition of a perceptually important quality threshold that everyone could agree upon. This is not the case. Two images that are visually indistinguishable when seen by a certain person may be distinguishable when seen by someone else. For example, a pair of medical images viewed by lay people may appear identical, but a radiologist trained in viewing those images might detect differences. Similarly, a pair of images seen
803
804
VI
by the same person under certain viewing conditions may appear identical, but when seen under different conditions of ambient lighting, viewing distance, or display characteristics might be easily seen to differ. A third issue is that a compressed image might differ from the original without necessarily being worse. To hold up transparent quality as the ultimate quality goal is to ignore the possibility that certain types of computer processing, including compression, can in some cases make images more pleasing perceptually than the originals. A fourth issue is that images have different applications, and the term ``high quality'' may be used to denote usefulness for a speci®c application rather than to indicate that an image is perceptually pleasing or visually identical to an original. For all of these reasons, the measurement of image quality is a dif®ult task, and only a few researchers consider quality as a binary quantity that either meets the transparent quality standard or does not. No single approach to quality measurement has gained universal acceptance. The various approaches can be categorized into the following three groups:
practice are digital, beginning with an information source that is discrete in time and amplitude. If an image is initially analog in space and amplitude, one must ®rst render it discrete in both space and amplitude before compression. Discretization in space is generally called sampling Ð this consists of examining the intensity of the analog image on a regular grid of points called picture elements or pixels. Discretization in amplitude is simply scalar quantization: a mapping from a continuous range of possible values into a ®nite set of approximating values. The term analog-to-digital (A/D) conversion is often used to mean both sampling and quantization Ð that is, the conversion of a signal that is analog in both space and amplitude to a signal that is discrete in both space and amplitude. Such a conversion is by itself an example of lossy compression. A general system for digital image compression is depicted in Fig. 1. It consists of one or more of the following operations, which may be combined with each other or with additional signal processing: *
*
*
*
Computable objective distortion measures such as squared error or signal-to-noise ratio. Subjective quality as measured by psychophysical tests or questionnaires with numerical ratings. Simulation and statistical analysis of a speci®c application of the images, e.g., diagnostic accuracy in medical images measured by clinical simulation and statistical analysis.
Within this latter category of evaluation methods, the methodology of receiver operating characteristic (ROC) curves has dominated historically, but a variety of other approaches have been used in which radiologists may be called upon to perform various interpretive tasks. Radiologists detect and localize the disease, make measurements of various structures, and make recommendations for patient management. The utility of a medical image can be evaluated in terms of how well it contributes to these functions. In this chapter, we begin with a brief introduction to image compression, and to the three different sets of medical images that form the basis of our studies. We discuss signal-to-noise ratios and subjective quality ratings in the context of these data sets, as well as ROC methodology. In the next chapter, we present the clinical studies including detection, measurement, and management tasks, and in the following chapter, we discuss a number of statistical issues that arise in this sort of clinical experiment.
Image ompression Image compression seeks to reduce the number of bits involved in representing an image. Most compression algorithms in
*
*
Signal decomposition: The image is decomposed into several images for separate processing. The most popular signal decompositions for image processing are linear transformations of the Fourier family, especially the discrete cosine transform (DCT), and ®ltering with a subband or wavelet ®lter bank. Both methods can be viewed as transforms of the original images into coef®cients with respect to some set of basis functions. There are many motivations behind such decompositions. Transforms tend to ``mash up'' the data so that the effects of quantization error are spread out and ultimately invisible. Good transforms concentrate the data in the lower order transform coef®cients so that the higher order coef®cients can be coded with few or no bits. Good transforms tend to decorrelate the data with the intention of rendering simple scalar quantization more ef®cient. The eye and ear are generally considered to operate in the transform domain, so that it is natural to focus on coding in that domain where psychophysical effects such as masking can be easily incorporated into frequency-dependent measures of distortion. Lastly, the transformed data may provide a useful data structure, as do the multiresolution representations of wavelet analysis. Quantization: High-rate digital pixel intensities are converted into relatively small numbers of bits. This operation is nonlinear and noninvertible; it is ``lossy.'' The conversion can operate on individual pixels (scalar quantization) or groups of pixels (vector quantization). Quantization can include discarding some of the components of the signal decomposition step. Our emphasis is on quantizer design. Lossless compression: Further compression is achieved by an invertible (lossless, entropy) code such as run-length, Huffman, Lempel±Ziv, or arithmetic code.
805
Image compression system.
Many approaches to systems for image compression have been proposed in the literature and incorporated into standards and products, both software and hardware. We note that the methods discussed in this chapter for evaluating the quality and utility of lossy compressed medical images, do not depend on the compression algorithm at all. The reader is referred to the literature on the subject for more information on image compression [23, 50].
3 The Three Data Sets In this chapter and the following two chapters, results are presented for three data sets: computerized tomography (CT), magnetic resonance (MR), and mammographic images. As will be seen later, these three studies provide examples of the detection, localization, measurement, and management aspects of a radiologist's interpretative functions.
3.1 CT Study The CT study involved two different sets of chest images. In one, the diagnostic task was the detection of abnormally enlarged lymph nodes, and in the other, the task was to detect lung nodules. Thirty patient studies were used for each task. The CT images were compressed using pruned predictive vector quantization [23] applied to 262 pixel blocks [15]. This method involves no transform of the data. Vector quantizers are often designed for a training set of representative images that can provide information about the statistics such as the spatial correlations that are typically found in those images. In such a situation, the compression algorithm will perform best for images that are similar to those used in the training set. For this study twenty CT images of the mediastinum were used in the training set for detecting enlarged lymph nodes, and 20 CT lung images were used in the training set for detecting lung nodules. All 5126512 pixel images were obtained using a GE 9800 scanner (120 kV, 140 mA, scan time 2 seconds per slice, bore size 38 cm, ®eld-of-view 32±34 cm). Although no formal research was undertaken to determine accurately what constitutes ``representative'' CT images, two radiologists were consulted concerning the typical range of appearance of adenopathy and nodules that occurs in daily clinical practice. The training and test images were chosen to be approximately representative of this range, and included images of both
normal and abnormal chests. The lung nodules ranged in size from 0.4 to 3.0 cm, with almost all nodules between 0.4 and 1.5 cm, and the abnormal lymph nodes were between 0.6 and 3.5 cm. The study also had a lower percentage of normal chest images than would be encountered in daily practice. For each study (lymph nodes, lung nodules), the original 30 test images were encoded at six compression levels: 0.57, 1.18, 1.33, 1.79, 2.19, and 2.63 bits per pixel (bpp). The original test images are considered to be 11-bit data. Figure 2 shows an original 11-bpp CT lung image to which the ``windows and levels'' contrast adjustment has been applied. Although the scanner was capable of producing 12-bit data, it was found for this data set that the 12th bit was never used. Patient studies represented in the training set were not used as test images, and the results reported on SNR, subjective quality, and diagnostic accuracy are based only on test images.
3.2 MR Study In the MR study, the diagnostic task was to measure the size of blood vessels in MR chest scans, as would be done in evaluating
Original 11.0 bpp CT chest scan.
806
VI
aortic aneurysms. The study had as its goal to quantify the effects of lossy compression on the accuracy of these measurements [46, 47]. As in the CT study, the image compression scheme was predictive pruned tree-structured vector quantization, although in this case it was applied to blocks of 264 pixels. The training data of 20 MR chest scans were chosen to include a wide range of both aneurysms and normal vessel structures. An additional 30 scans were chosen as test images. All images were obtained using a 1.5-T whole body imager (Signa, GE Medical Systems, Milwaukee, WI), a body coil, and an axial cardiac-gated T1-weighted spin echo pulse sequence with the following parameters: cardiac gating with repetition time (TR) of 1 R-R interval, echo time (TE) of 15±20 msec, respiratory compensation, number of repetition (NEX) of 2, 2566192 matrix, slice thickness of 7 mm with a 3-mm interslice gap. The compression rates for this study were 0.36, 0.55, 0.82, 1.14, and 1.70 bpp on the 30 test images. These bit rates are represented by compression levels 1±5. The original scans at 9.0 bpp are represented by level 6. Figure 3a shows an original 9.0 bpp MR chest scan. Figure 3b shows the same image compressed to 1.14 bpp, and Fig. 3c shows the image compressed to 0.36 bpp.
3.3 Mammogram Study The mammography study involved a variety of tasks: detection, localization, measurement, and management decisions. This work has been reported upon in [2, 24, 45] as well as in the recent Stanford Ph.D. thesis of Bradley J. Betts [8], which also includes detailed analyses of a much larger trial. The image database was generated in the Department of Radiology of the University of Virginia School of Medicine and is summarized in Table 1. The 57 studies included a variety of normal images and images containing benign and malignant objects.
TABLE 1 6 6 5 6 3 3 4 4 2 3 15
Data test set: 57 studies, 4 views per study
benign mass benign calci®cations malignant mass malignant calci®cations malignant combination of mass and calci®cations benign combination of mass and calci®cations breast edema malignant architectural distortion malignant focal asymmetry benign asymmetric density normals
Reprinted with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li, A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
Corroborative biopsy information was available on at least 31 of the test subjects. The images were compressed using Set Partitioning in Hierarchical Trees (SPIHT) [54], an algorithm in the subband/ wavelet/pyramid coding class. These codes typically decompose the image using an octave subband, critically sampled pyramid, or complete wavelet transformation, and then code the resulting transform coef®cients in an ef®cient way. The decomposition is typically produced by an analysis ®lter bank followed by downsampling. The most ef®cient wavelet coding techniques exploit both the spatial and frequency localization of wavelets. The idea is to group coef®cients of comparable signi®cance across scales by spatial location in bands oriented in the same direction. The early approach of Lewis and Knowles [31] was extended by Shapiro in his landmark paper on embedded zerotree wavelet coding [57], and the best performing schemes are descendants or variations on this theme. The approach provides codes with excellent rate±distortion trade-offs, modest complexity to implement, and an embedded bit stream, which makes the codes useful for applications where scalability or progressive coding are important. Scalability implies there is a ``successive approximation'' property to the bit stream. This feature is particularly attractive for a number of applications, especially those where one wishes to view an image as soon as bits begin to arrive, and where the image improves as further bits accumulate. With scalable coding, a single encoder can provide a variety of rates to customers with different capabilities. Images can be reconstructed to increasing quality as additional bits arrive. After experimenting with a variety of algorithms, we chose Said and Pearlman's variation [54] of Shapiro's EZW algorithm because of its good performance and the availability of working software for 12-bpp originals. We used the default ®lters (9±7 biorthogonal ®lter) in the software compression package of Said and Pearlman [54]. The system incorporates the adaptive arithmetic coding algorithm considered in Witten, Neal, and Cleary [66]. For our experiment, additional compression was achieved by a simple segmentation of the image using a thresholding rule. This segmented the image into a rectangular portion containing the breast Ð the region of interest or ROI Ð and a background portion containing the dark area and any alphanumeric data. The background/label portion of the image was coded using the same algorithm, but at only 0.07 bpp, resulting in higher distortion there. We report here SNRs and bit rates both for the full image and for the ROI. The image test set was compressed in this manner to three bit rates: 1.75, 0.4, and 0.15 bpp, where the bit rates refer to rates in ROI. The average bit rates for the full image thus depended on the size of the ROI. An example of the Said± Pearlman algorithm with a 12-bpp original and 0.15-bpp reproduction is given in Fig. 4.
807
(a)
(b)
(c) 3 (a) Original 9.0 bpp MR chest scan, (b) MR chest scan compressed to 1.14 bpp, and (c) MR chest scan compressed to 0.36 bpp.
4 Average Distortion and SNR By far the most common computable objective measures of image quality are mean squared error (MSE) and signal-tonoise ratio (SNR). Suppose that one has a system in which an input pixel block or vector X
X0 ; X1 ; ; Xk 1 is reproduced as X^
X^0 ; X^1 ; ; X^k 1 and that one has a measure ^ of distortion or cost resulting when X is reproduced as d
X; X ^ A natural measure of the quality or ®delity (actually the lack X. of quality or ®delity) of the system is the average of the distortions for all the vectors input to that system, denoted by
^ D Ed
X; X. The average might be with respect to a probability model for the images or, more commonly, a sample or time-average distortion. It is common to normalize the distortion in some fashion to produce a dimensionless quantity D=D0 , to form the inverse D0 =D as a measure of quality rather than distortion, and to describe the result in decibels. A common normalization is the minimum average distortion achievable if no bits are sent, D0 miny ED
X; y. When the ubiquitous squared-error distortion given by d
X; Y P kX Y k2 ki01
Xi Yi 2 is used, then D0 is simply the variance of the process, D0 EkX E
Xk2 s2X .
808
VI
4
Original image and compressed image at 0.15 bpp in the ROI.
Using this as a normalization factor produces the signal-tonoise ratio SNR 10 log10
D0 s2X 10 log10 : ^ 2 D EkX Xk
1
A common alternative normalization when the input is itself an r-bit discrete variable is to replace the variance or energy by the maximum input symbol energy
2r 12, yielding the socalled peak signal-to-noise ratio (PSNR). A key attribute of useful distortion measures is ease of computation, but other properties are also important. Ideally a distortion measure should re¯ect perceptual quality or usefulness in a particular application. No easily computable distortion measure such as squared error is generally agreed to have this property. Common faults of squared error are that a slight spatial shift of an image causes a large numerical distortion but no visual distortion and, conversely, a small average distortion can result in a damaging visual artifact if all the error is concentrated in a small important region. It is because of such shortcomings that many other quality measures have been studied. The pioneering work of Budrikus [10], Stockham [60], and Mannos and Sakrison [36] was aimed at developing computable measures of distortion that emphasize perceptually important attributes of an image by incorporating knowledge of human vision. Theirs and subsequent work has provided a bewildering variety of candidate measures of image quality or distortion [3±5, 7, 16, 17, 19, 20, 25, 27, 29, 32±34, 37, 40±44, 53, 55, 58, 63, 67]. Similar studies have been carried out for speech compression and other digital speech processing [49]. Examples are general lp norms such as the absolute error (l1 ),
the cube root of the sum of the cubed errors (l3 ), and maximum error (l? ), as well as variations on such error measures that incorporate linear weighting. A popular form is weighted quadratic distortion that attempts to incorporate properties of the human visual system such as sensitivity to edges, insensitivity to textures, and other masking effects. The image and the original can be transformed prior to computing distortion, providing a wide family of spectral distortions, which can also incorporate weighting in the transform domain to re¯ect perceptual importance. Alternatively, one can capture the perceptual aspects by linearly ®ltering the original and reproduction images prior to forming a distortion, which is equivalent to weighting the distortion in the transform domain. A simple variation of SNR that has proved popular in the speech and audio ®eld is the segmental SNR, which is an average of local SNRs in a log scale [28, 49], effectively replacing the arithmetic average of distortion by a geometric average. In addition to easing computation and re¯ecting perceptual quality, a third desirable property of a distortion measure is tractability in analysis. The popularity of squared error is partly owed to the wealth of theory and numerical methods available for the analysis and synthesis of systems that are optimal in the sense of minimizing mean squared error. One might design a system to minimize mean squared error because it is a straightforward optimization, but then use a different, more complicated measure to evaluate quality because it does better at predicting subjective quality. Ideally, one would like to have a subjectively meaningful distortion measure that could be incorporated into the system design. There are techniques for incorporating subjective criteria into compression system design, but these tend to be somewhat indirect. For example,
"#
one can transform the image and assign bits to transform coef®cients according to their perceptual importance or use post®ltering to emphasize important subbands before compression [51, 52, 60]. The traditional manner for comparing the performance of different lossy compression systems is to plot distortion rate or SNR vs bit rate curves. Figure 5a shows a scatter plot of the rate±SNR pairs for 24 images in the lung CT study. Only the compressed images can be shown on this plot, as the original images have by de®nition no noise and therefore in®nite SNR. The plot includes a quadratic spline ®t with a single knot at 1.5 bpp. Regression splines [48] are simple and ¯exible models for tracking data that can be ®t by least squares. The ®tting tends to be ``local'' in that the ®tted average value at a particular bit rate is in¯uenced primarily by observed data at nearby bit rates. The curve has four unknown parameters and can be expressed as 2
2
y a0 a1 a2 %2
max
0; 1:5 :
2
averages broken out by image type or view (left and right breast, CC and MLO view). This demonstrates the variability among various image types as well as the overall performance. Two sets of SNRs and bit rates are reported: ROI only and full image. For the ROI SNR the rates are identical and correspond to the nominal rate of the code used in the ROI. For the full images the rates vary since the ROI code is used in one portion of the image and a much lower rate code is used in the remaining background and the average depends on the size of the ROI, which varies among the images. A scatter plot of the ROI SNRs is presented in Fig. 6. It should be emphasized that this is the SNR comparing the digital original with the lossy compressed versions.
5 Subjective Ratings
It is quadratic ``by region'' and is continuous with a continuous ®rst derivative across the knot, where the functional form of the quadratic changes. Quadratic spline ®ts provide good indications of the overall distortion-rate performance of the code family on the test data. In this case, the location of the knot was chosen arbitrarily to be near the center of the data set. It would have been possible to allow the data themselves to guide the choice of knot location. The SNR results for the CT mediastinal images were very similar to those for the lung task. For the MR study, Fig. 5b shows SNR versus bit rate for the 30 test images compressed to the ®ve bit rates. The knot is at 1.0 bpp. For the mammography study, the SNRs are summarized in Tables 2 and 3. The overall averages are reported as well as the
Subjective quality of a reconstructed image can be judged in many ways. A suitably randomized set of images can be presented to experts or typical users who rate them, often on a scale of 1 to 5. Subsequent statistical analysis can then highlight averages, variability, and other trends in the data. Such formalized subjective testing is common in speech and audio compression systems as in the Mean Opinion Score (MOS) and the descriptive rating called the diagnostic acceptability measure (DAM) [1, 49, 62]. There has been no standardization for rating still images. A useful attribute of an objective quality measure such as SNR would be the ability to predict subjective quality. For medical images, it may be more important that a computable objective measure be able to predict diagnostic accuracy rather than subjective quality. A potential pitfall in relating objective
(a)
(b)
levels.
SNR as a function of bit rate for (a) CT lung images and (b) MR images. The x's indicate data points for all images, judges and compression
810
VI
TABLE 2 Average SNR: ROI, wavelet coding SNR View
0.15 bpp ROI
0.4 bpp ROI
1.75 bpp ROI
Left CC Right CC Left MLO Right MLO Left side (MLO and CC) Right side (MLO and CC) Overall
45.93 dB 45.93 dB 46.65 dB 46.61 dB 46.29 dB 46.27 dB 46.28 dB
47.55 dB 47.47 dB 48.49 dB 48.35 dB 48.02 dB 47.91 dB 47.97 dB
55.30 dB 55.40 dB 56.53 dB 56.46 dB 55.92 dB 55.93 dB 55.92 dB
Reprinted with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li, A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
TABLE 3 Average SNR: Full image, wavelet coding SNR, bit rate View
0.15 bpp ROI
0.4 bpp ROI
1.75 bpp ROI
Left CC Right CC Left MLO Right MLO Left side (MLO and CC) Right side (MLO and CC) Overall
44.30 dB, 44.53 dB, 44.91 dB, 45.22 dB, 44.60 dB, 44.88 dB, 44.74 dB,
45.03 dB, 45.21 dB, 45.73 dB, 46.06 dB, 45.38 dB, 45.63 dB, 45.51 dB,
46.44 dB, 46.88 dB, 47.28 dB, 47.96 dB, 46.89 dB, 47.41 dB, 47.14 dB,
0.11 bpp 0.11 bpp 0.11 bpp 0.11 bpp 0.11 bpp 0.11 bpp 0.11 bpp
0.24 bpp 0.22 bpp 0.25 bpp 0.25 bpp 0.24 bpp 0.24 bpp 0.24 bpp
0.91 bpp 0.85 bpp 1.00 bpp 0.96 bpp 0.96 bpp 0.92 bpp 0.93 bpp
Reprinted with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li, A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
FIGURE 6 Scatter plot of ROI SNR: Wavelet coding. Reprinted with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li , A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy. Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
distortion measures to subjective quality is the choice of image distortions used in the tests. Some of the literature on the subject has considered signal-independent distortions such as additive noise and blurring, yet it has been implied that the results were relevant for strongly signal dependent distortions such as quantization error. Experiments should imitate closely the actual distortions to be encountered. The assessment of subjective quality attempted to relate subjective image quality to diagnostic utility. For the MR study, each radiologist was asked at the time of measuring the vessels to ``assign a score of 1 (least) to 5 (most) to each image based on its usefulness for the measurement task.'' The term ``usefulness'' was de®ned as ``your opinion of whether the edges used for measurements were blurry or distorted, and your con®dence concerning the measurement you took.'' The question was phrased in this way because our concern is whether measurement accuracy is in fact maintained even when the radiologist perceives the image quality as degraded and may have lost some con®dence in the utility of the image for the task at hand. It is not clear to us whether radiologists are inculcated during their training to assess quality visually based on the entire image, or whether they rapidly focus on the medically relevant areas of the image. Indeed, one might
reasonably expect that radiologists would differ on this point, and a question that addressed overall subjective quality would therefore produce a variety of interpretations from the judges. By focusing the question on the speci®c measurement and the radiologists' con®dence in it regardless of what portion of the image contributed to that con®dence level, and then by examining the relationship between actual measurement error and these subjective opinions, we hoped to obtain data relevant to the question of whether radiologists can be asked to trust their diagnoses made on processed images in which they may lack full con®dence. No attempt was made to link the ®ve possible scores to speci®c descriptive phrases, as is done with the mean opinion score rating system for speech. However, the radiologists were asked to try to use the whole scale. The CT subjective assessment was performed separately from the diagnostic task by three different radiologists. The phrasing of the question was very similar. Images compressed to lower bit rates received worse quality scores as was expected. Figure 7 shows subjective score vs bit rate for the CT mediastinum study. The data are ®t with a quadratic spline with a single knot. Figure 8 shows the general trend of mean subjective score vs mean bit rate for the MR study. A spline-like function that is quadratic from 0 to 2.0 bpp and linear from 2.0 to 9.0 bpp was ®t to the data. The splines have knots at 0.6, 1.2, and 2.0 bpp. Figure 9 shows a spline ®t of subjective score plotted against actual bit rate for the compressed levels only for the MR study. The general conclusion from the plots is that the subjective scores at the higher levels were quite close to the subjective scores on the originals, but at lower levels there was a steep drop-off of scores with decreasing bit rate. These scores can also be analyzed by the Wilcoxon signed rank test. The paired t-test may be slightly less applicable since the subjective scores, which are integers over a very limited
study.
Subjective ratings vs bit rate for the CT mediastinum
811
Mean subjective score vs mean bit rate for the MR study. The dotted, dashed, and dash±dot curves are splines ®t to the data points for judges 1, 2, and 3, respectively. The solid curve is a spline ®t to the data points for all judges pooled. Reprinted with permission, from Proceedings First International Conference on Image Processing, ICIP '94, 2:861±865, Austin, Texas, Nov. 1994.
range, clearly fail to ®t a Gaussian model. We note that scores are assigned to the entire image rather than to any subsection of an image, such as each of the blood vessels in that image. More detailed information would be needed for a more thorough analysis since subjective score in the MR experiment is meant to re¯ect the quality of the image for vessel measurement, and this may differ for the different blood vessels. The Wilcoxon signed rank test showed that the subjective scores for the MR study at all of the six compression levels differ signi®cantly from the subjective scores of the
Subjective score vs bit rate for the MR study. The x's indicate data points for all images, pooled across judges and compression levels.
812
VI
originals at p50:05 for a two-tailed test. The subjective scores at all the compression levels also differ signi®cantly from each other. As discussed later, it appears that a radiologist's subjective perception of quality changes more rapidly and drastically with decreasing bit rate than does the actual measurement error.
5.1 Mammography Subjective Ratings For the mammography study, Table 4 provides the means and standard deviations for the subjective scores for each radiologist separately and for the radiologists pooled. The distribution of these subjective scores is displayed in Figs 10±12. Level 1 refers to the original analog images, level 2 to the uncompressed digital, level 3 to those images where the breast section was compressed to 0.15 bpp and the label to 0.07 bpp, level 4 to those images where the breast section was compressed to 0.4 bpp and the label to 0.07 bpp, and level 5 to those images where the breast section was compressed to 1.75 bpp and the label to 0.07 bpp. Figure 10 displays the frequency for each of the subjective scores obtained with the analog gold standard. Figure 11 displays the frequency for each of the subjective scores obtained
FIGURE 10 Subjective scores: Analog gold standard. Reprinted with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li , A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy. Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
with the uncompressed digital images ( judges pooled), and Fig. 12 displays the frequency for each of the subjective scores obtained with the digital images at Level 3.
TABLE 4 Level
Judge
Mean
St.dev.
1
Gold standard
3.6441
0.5539
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
A B C A B C A B C A B C A B C
3.90 4.52 4.59 3.91 3.85 3.67 3.82 4.27 3.49 3.91 3.93 3.82 3.92 3.66 3.82
0.97 0.75 0.79 0.41 0.53 0.65 0.39 0.93 0.64 0.39 0.55 0.50 0.42 0.57 0.55
1 2 3 4 5
Judges pooled Pooled Pooled Pooled Pooled Pooled
4.33 3.81 3.86 3.88 3.80
0.89 0.55 0.76 0.49 0.57
Reprint with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li, A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
FIGURE 11 Subjective scores: Original digital. Reprinted with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li , A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy. Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
813
# Diagnostic Accuracy and ROC Methodology
Subjective scores: Lossy compressed digital at 0.15 bpp. Reprinted with permission from S.M. Perlmutter, P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J. Betts, M. Williams, K.O. Perlmutter, J. Li , A. Aiyer, L. Fajardo, R. Birdwell, and B.L. Daniel, Image Quality in Lossy. Compressed Digital Mammograms, Signal Processing, 59:189±210, 1997. # Elsevier.
Using the Wilcoxon signed rank test, the results were as follows. A: All levels were signi®cantly different from each other except the digital to 0.4 bpp, digital to 1.75 bpp, and 0.4 to 1.75 bpp. : The only differences that were signi®cant were 0.15 bpp to 0.4 bpp and 0.15 bpp to digital. : All differences were signi®cant.
: All differences were signi®cant except digital to 0.15 bpp, digital to 1.75 bpp, 0.15 to 0.4 bpp, and 0.15 to 1.75 bpp. Comparing differences from the independent gold standard, for judge A all were signi®cant except digital uncompressed; for judge B all were signi®cant; and for judge C all were signi®cant except 1.75 bpp. When the judges were pooled, all differences were signi®cant. There were many statistically signi®cant differences in subjective ratings between the analog and the various digital modalities, but some of these may have been a result of the different printing processes used to create the original analog ®lms and the ®lms printed from digital ®les. The ®lms were clearly different in size and in background intensity. The judges in particular expressed dissatisfaction with the fact that the background in the digitally produced ®lms was not as dark as that of the photographic ®lms, even though this ideally had nothing to do with their diagnostic and management decisions.
Diagnostic ``accuracy'' is often used to mean the fraction of cases on which a physician is ``correct,'' where correctness is determined by comparing the diagnostic decision to some de®nition of ``truth.'' There are many different ways that ``truth'' can be determined, and this issue is discussed in Section 7. Apart from this issue, this simple de®nition of accuracy is ¯awed in two ways. First, it is strongly affected by disease prevalence. For a disease that appears in less than 1% of the population, a screening test could trivially be more than 99% accurate simply by ignoring all evidence and declaring the disease to be absent. Second, the notion of ``correctness'' does not distinguish between the two major types of errors, calling positive a case that is actually negative, and calling negative a case that is actually positive. The relative costs of these two types of errors are generally not equal. These can be differentiated by measuring diagnostic performance using a pair of statistics re¯ecting the relative frequencies of the two error types. Toward this end suppose for the moment that there exists a ``gold standard'' de®ning the ``truth'' of existence and locations of all lesions in a set of images. With each lesion identi®ed in the gold standard, a radiologist either gets it correct (true positive or TP) or misses it ( false negative or FN). For each lesion identi®ed by the radiologist, either it agrees with the gold standard (TP as above) or it does not ( false positive or FP). The sensitivity or true positive rate (or true positive fraction (TPF)) is the probability pTP that a lesion is said to be there given that it is there. This can be estimated by relative frequency Sensitivity
# TP : #TP # FN
3
The complement of sensitivity is the false negative rate (or fraction) pFN 1 pTP , the probability that a lesion is said to not be there given that it is there. In an apparently similar vein, the false positive rate pFP (or false positive fraction (FPF)) is the probability that a lesion is said to be there given that it is not there and the true negative rate pTN or speci®city is its complement. Here, however, it is not possible to de®ne a meaningful relative frequency estimate of these probablities except when the detection problem is binary, that is, each image can have only only one lesion of a single type or no lesions at all. In this case, exactly one lesion is not there if and only if 0 lesions are present, and one can de®ne a true negative TN as an image that is not a true positive. Hence if there are N images, the relative frequency becomes Specificity
# TN : N #TP
4
As discussed later, in the nonbinary case, however, speci®city
814
VI
cannot be de®ned in a meaningful fashion on an image-byimage basis. In the binary case, speci®city shares importance with sensitivity because perfect sensitivity alone does not preclude numerous false alarms, while speci®city near 1 ensures that missing no tumors does not come at the expense of calling false ones. An alternative statistic that is well de®ned in the nonbinary case and also penalizes false alarms is the predictive value positive (PVP), also known as positive predicted value (PPV) [64]. This is the probability that a lesion is there given that it is said to be there: PVP
# of abnormalities correctly marked : Total # of abnormalities marked
5
PVP is easily estimated by relative frequencies as PVP
# TP : # TP # FP
6
Sensitivity, PVP, and, when it makes sense, speci®city can be estimated from clinical trial data and provide indication of quality of detection. The next issues are these: (1) How does one design and conduct clinical experiments to estimate these statistics? (2) How are these statistics used in order to make judgments about diagnostic accuracy? Together, the responses to these questions form a protocol for evaluating diagnostic accuracy and drawing conclusions on the relative merits of competing image processing techniques. Before describing the dominant methodology used, it is useful to formulate several attributes that a protocol might reasonably be expected to have: *
*
*
*
*
*
The protocol should simulate ordinary clinical practice as closely as possible. Participating radiologists should perform in a manner that mimics their ordinary practice. The trials should require little or no special training of their clinical participants. The clinical trials should include examples of images containing the full range of possible anomalies, all but extremely rare conditions. The ®ndings should be reportable using the American College of Radiology (ACR) Standardized Lexicon. Statistical analyses of the trial outcomes should be based on assumptions as to the outcomes and sources of error that are faithful to the clinical scenario and tasks. The number of patients should be suf®cient to ensure satisfactory size and power for the principal statistical tests of interest. ``Gold standards'' for evaluation of equivalence or superiority of algorithms must be clearly de®ned and consistent with experimental hypotheses.
*
Careful experimental design should eliminate or minimize any sources of bias in the data that are due to differences between the experimental situation and ordinary clinical practice, e.g., learning effects that might accrue if a similar image is seen using separate imaging modalities.
Receiver operating characteristic (ROC) analysis is the dominant technique for evaluating the suitability of radiologic techniques for real applications [26, 38, 39, 61]. ROC analysis has its origins in signal detection theory. A ®ltered version of a signal plus Gaussian noise is sampled and compared to a threshold. If the sample is greater than the threshold, the signal is declared to be there; otherwise, it is declared absent. As the threshold is varied in one direction, the probability of erroneously declaring a signal absent when it is there (a false dismissal) goes down, but the probability of erroneously declaring a signal there when it is not (a false alarm) goes up. Suppose one has a large database of waveforms, some of which actually contain a signal, and some of which do not. Suppose further that for each waveform, the ``truth'' is known of whether a signal is present or not. One can set a value of the threshold and examine whether the test declares a signal present or not for each waveform. Each value of the threshold will give rise to a pair (TPF, FPF), and these points can be plotted for many different values of the threshold. The ROC curve is a smooth curve ®tted through these points. The ROC curve always passes through the point (1,1) because if the threshold is taken to be lower than the lowest value of any waveform, then all samples will be above the threshold, and the signal will be declared present for all waveforms. In that case, the true positive fraction is 1. The false positive fraction is also equal to 1, since there are no true negative decisions. Similar reasoning shows that the ROC curve must also always pass through the point (0, 0), because the threshold can be set very large, and all cases will be declared negative. A variety of summary statistics such as the area under the ROC curve can be computed and interpreted to compare the quality of different detection techniques. In general, larger area under the ROC curve is better. ROC analysis has a natural application to some problems in medical diagnosis. For example, in a blood serum assay of carbohydrate antigens (e.g., CA 125 or CA 19-9) to detect the presence of certain types of cancer, a single number results from the diagnostic test. The distributions of result values in actually positive and actually negative patients overlap. So no single threshold or decision criterion can be found that separates the populations cleanly. If the distributions did not overlap, then such a threshold would exist, and the test would be perfect. In the usual case of overlapping distributions, a threshold must be chosen, and each possible choice of threshold will yield different frequencies of the two types of errors. By varying the threshold and calculating the false alarm
815
rate and false dismissal rate for each value of the threshold, an ROC curve is obtained. Transferring this type of analysis to radiological applications requires the creation of some form of threshold whose variation allows a similar trade-off. For studies of the diagnostic accuracy of processed images, this is accomplished by asking radiologists to provide a subjective con®dence rating of their diagnoses (typically on a scale of 1±5) [39, 61]. An example of such ratings is shown in Table 5. First, only those responses in the category of highest certainty of a positive case are considered positive. This yields a pair (TPF, FPF) that can be plotted in ROC space and corresponds to a stringent threshold for detection. Next, those cases in either of the highest two categories of certainty of a positive decision are counted positive. Another (TPF, FPF) point is obtained, and so forth. The last nontrivial point is obtained by scoring any case as positive if it corresponds to any of the highest four categories of certainty for being positive. This corresponds to a very lax threshold for detection of disease. There are also two trivial (TPF, FPF) points that can be obtained, as discussed previously: All cases can be declared negative (TPF 0, FPF 0) or all cases can be declared positive (TPF 1, FPF 1). This type of analysis has been used extensively to examine the effects of computer processing on the diagnostic utility of medical images. Types of processing that have been evaluated include compression [6, 9, 14, 21, 22, 30, 35, 56, 65], and enhancement (unsharp masking, histogram equalization, and noise reduction). Although by far the dominant technique for quantifying diagnostic accuracy in radiology, ROC analysis possesses several shortcomings for this application. In particular, it violates several of the stated goals for a clinical protocol. By and large, the necessity for the radiologists to choose 1 of 5 speci®c values to indicate con®dence departs from ordinary clinical practice. Although radiologists are generally cognizant of differing levels of con®dence in their ®ndings, this uncertainty is often represented in a variety of qualitative ways, rather than with a numerical ranking. Further, as image data are nonGaussian, methods that rely on Gaussian assumptions are suspect. Modern computer-intensive statistical sample reuse techniques can help get around the failures of Gaussian assumptions. Classical ROC analysis is not location speci®c. The case in which an observer misses the lesion that is present in an image but mistakenly identi®es some noise feature as a
Subjective con®dence ratings used in ROC analysis 1 2 3 4 5
De®nitely or almost de®nitely negative Probably negative Possibly negative Probably positive De®nitely or almost de®nitely positive
lesion in that image would be scored as a true-positive event. Most importantly, many clinical detection tasks are nonbinary, in which case sensitivity can be suitably rede®ned, but speci®city cannot. That is, sensitivity as de®ned in Eq. (3) yields a fractional number for the whole data set. But for any one image, sensitivity takes on only the values 0 and 1. The sensitivity for the whole data set is then the average value of these binary-valued sensitivities de®ned for individual images. When the detection task for each image becomes nonbinary, it is possible to rede®ne sensitivity for an individual image: Sensitivity
# of true positive decisions within 1 image : # of actually positive items in that 1 image
7
Or, changing the language slightly, Sensitivity
# of abnormalities correctly found : # of abnormalities actually there
8
In this case, the sensitivity for each individual image becomes a fractional number between 0 and 1, and the sensitivity for the entire data set is still the average of these sensitivities de®ned for individual images. A similar attempt to rede®ne the speci®city leads to Specificity
# of abnormalities correctly said not to be there : # of abnormalities actually not there
9
This does not make sense because it has no natural or sensible denominator, as it is not possible to say how many abnormalities are absent. This de®nition is ®ne for a truly binary diagnostic task such as detection of a pneumothorax, for if the image is normal, then exactly one abnormality is absent. Early studies were able to use ROC analysis by focusing on detection tasks that were either truly binary or that could be rendered binary. For example, a nonbinary detection task such as ``locating any and all abnormalities that are present'' can be rendered binary simply by rephrasing the task as one of ``declaring whether or not disease is present.'' Otherwise, such a nonbinary task is not amenable to traditional ROC analysis techniques. Extensions to ROC to permit consideration of multiple abnormalities have been developed [11±13, 18, 59]. For example, the free-response receiver operating characteristic (FROC) observer performance experiment allows an arbitrary number of abnormalities per image, and the observer indicates their perceived locations and a con®dence rating for each one. While FROC resolves the binary task limitations and location insensitivity of traditional ROC, FROC does retain the constrained 5-point integer rating system for observer con®dence, and makes certain normality assumptions about the resultant data. Finally, ROC analysis has no natural extension to the evaluation of measurement accuracy in compressed medical images. By means of speci®c examples we describe an approach that closely simulates ordinary clinical practice,
816
VI
applies to nonbinary and non-Gaussian data, and extends naturally to measurement data. The recent Stanford Ph.D. thesis by Bradley J. Betts, mentioned earlier, includes new technologies for analyses of ROC curves. His focus is on regions of interest of the curve, that is, on the intersection of the area under the curve with rectangles determined by explicit lower bounds on sensitivity and speci®city. He has developed sample reuse techniques for making inferences concerning the areas enclosed and also for constructing rectangular con®dence regions for points on the curve.
7 Determination of a Gold Standard The typical scenario for evaluating diagnostic accuracy of computer processed medical images involves taking some database of original unprocessed images, applying some processing to them, and having the entire set of images judged in some speci®ed fashion by radiologists. Whether the subsequent analyses will be done by ROC or some other means, it is necessary to determine a ``gold standard'' that can represent the diagnostic truth of each original image and can serve as a basis of comparison for the diagnoses on all the processed versions of that image. There are many possible choices for the gold standard: *
*
*
*
A consensus gold standard is determined by the consensus of the judging radiologists on the original. A personal gold standard uses each judge's readings on an original (uncompressed) image as the gold standard for the readings of that same judge on the compressed versions of that same image. An independent gold standard is formed by the agreement of the members of an independent panel of particularly expert radiologists. A separate gold standard is produced by the results of autopsy, surgical biopsy, reading of images from a different imaging modality, or subsequent clinical or imaging studies.
The consensus method has the advantage of simplicity, but the judging radiologists may not agree on the exact diagnosis, even on the original image. Of course, this may happen among the members of the independent panel as well, but in that case an image can be removed from the trial or additional experts called upon to assist. Either case may entail the introduction of concerns as to generalizability of subsequent results. In the CT study, in an effort to achieve consensus for those cases where the initial CT readings disagreed in the number or location of abnormalities, the judges were asked separately to review their readings of that original. If this did not produce agreement, the judges discussed the image together. Six images
in each CT study could not be assigned a consensus gold standard because of irreconcilable disagreement. This was a fundamental drawback of the consensus gold standard, and our subsequent studies did not use this method. Although those images eliminated were clearly more controversial and dif®cult to diagnose than the others, it cannot be said whether the removal of diagnostically controversial images from the study biases the results in favor of compression or against it. Their failure to have a consensus gold standard de®ned was based only on the uncompressed versions, and it cannot be said a priori that the compression algorithm would have a harder time compressing such images. The consensus, when achieved, could be attained either by initial concordance among the readings of the three radiologists, or by subsequent discussion of the readings, during which one or more judges might change their decisions. The consensus was clearly more likely to be attained for those original images where the judges were in perfect agreement initially and thus where the original images would have perfect diagnostic accuracy relative to that gold standard. Therefore, this gold standard has a slight bias favoring the originals, which is thought to help make the study safely conservative, and not unduly promotional of speci®c compression techniques. The personal gold standard is even more strongly biased against compression. It de®nes a judge's reading on an original image to be perfect, and uses that reading as the basis of comparison for the compressed versions of that image. If there is any component of random error in the measurement process, since the personal gold standard de®nes the diagnoses on the originals to be correct (for that image and that judge), the compressed images cannot possibly perform as well as the originals according to this standard. That there is a substantial component of random error in these studies is suggested by the fact that there were several images during our CT tests on which judges changed their diagnoses back and forth, marking, for example, one lesion on the original image as well as on compressed levels E and B, and marking two lesions on the inbetween compressed levels F, D, and A. With a consensus gold standard, such changes tend to balance out. With a personal gold standard, the original is always right, and the changes count against compression. Because the compressed levels have this severe disadvantage, the personal gold standard is useful primarily for comparing the compressed levels among themselves. Comparisons of the original images with the compressed ones are conservative. The personal gold standard has, however, the advantage that all images can be used in the study. We no longer have to be concerned with the possible bias from the images eliminated due to failure to achieve consensus. One argument for the personal standard is that in some clinical settings a fundamental question is how the reports of a radiologist whose information is gathered from compressed images compare to what they would have been on the originals, the assumption being that systematic biases of a radiologist are well recognized and corrected for by the referring physicians
817
who regularly send cases to that radiologist. The personal gold standard thus concentrates on consistency of individual judges. The independent gold standard is what many studies use, and would seem to be a good choice. However, it is not without drawbacks. First of all, there is the danger of a systematic bias appearing in the diagnoses of a judge in comparison to the gold standard. For example, a judge who consistently chooses to diagnose tiny equivocal dots as abnormalities when the members of the independent panel choose to ignore such occurrences would have a high false positive rate relative to that independent gold standard. The computer processing may have some actual effect on this judge's diagnoses, but this effect might be swamped in comparison to this baseline high false positive rate. This is an argument for using the personal gold standard as well as the independent gold standard. The other drawback of an independent gold standard is somewhat more subtle and is discussed later. In the MR study, the independent panel was composed of two senior radiologists who ®rst measured the blood vessels separately and then discussed and remeasured in those cases where there was initial disagreement. A separate standard would seem to be the best choice, but it is generally not available. With phantom studies, there is of course a ``diagnostic truth'' that is established and known entirely separately from the diagnostic process. But with actual clinical images, there is often no autopsy or biopsy, as the patient may be alive and not operated upon. There are unlikely to be any images from other imaging modalities that can add to the information available from the modality under test, since there is typically one best way for imaging a given pathology in a given part of the body. And the image data set for the clinical study may be very dif®cult to gather if one wishes to restrict the image set to those patients for whom there are follow-up procedures or imaging studies that can be used to establish a gold standard. In any case, limiting the images to those patients who have subsequent studies done would introduce obvious bias into the study. In summary, the consensus method of achieving a gold standard has a major drawback together with the minor advantage of ease of availability. The other three methods for achieving a gold standard all have both signi®cant advantages and disadvantages, and perhaps the best solution is to analyze the data against more than one de®nition of diagnostic truth.
8 Concluding Remarks We have surveyed several key components required for evaluating the quality of compressed images: the compression itself; three data sets to be considered in depth in subsequent chapters; quantitative measures of quality involving measures of pixel intensity distortion, observer-judged subjective distortion, and diagnostic accuracy; and, lastly, several notions of
``gold standard'' with respect to which quality can be compared. In the next chapter these ideas provide a context and a collection of tools for a detailed analysis of three speci®c medical image modalities and tasks.
Acknowledgments The authors gratefully acknowledge the essential assistance of many colleagues who participated in and contributed to both the performance of the research described here and the writing of the papers and reports on which these chapters are based. In particular we acknowledge and thank C. N. Adams, A. Aiyer, C. Bergin, B. J. Betts, R. Birdwell, B. L. Daniel, H. C. Davidson, L. Fajardo, D. Ikeda, J. Li, K. C. P. Li, L. Moses, K. O. Perlmutter, S. M. Perlmutter, C. Tseng, and M. B. Williams.
References 1. IEEE recommended practice for speech quality measurements. IEEE Trans. Audio and Electroacoustics, pages 227± 246, Sep. 1969. 2. C. N. Adams, A. Aiyer, B. J. Betts, J. Li, P. C. Cosman, S. M. Perlmutter, M. Williams, K. O. Perlmutter, D. Ikeda, L. Fajardo, R. Birdwell, B. L. Daniel, S. Rossiter, R. A. Olshen, and R. M. Gray. Evaluating quality and utility of digital mammograms and lossy compressed digital mammograms. In Proceedings 3rd Intl. Workshop on Digital Mammography, Chicago, IL, June 1996. 3. A. J. Ahumada, Jr. Computational image-quality metrics: a review. In SID '93 Digest of Technical Papers, pages 305± 308, Seattle, Wa, May 1993. Society for Information Display. 4. V. R. Algazi, Y. Kato, M. Miyahara, and K. Kotani. Comparison of image coding techniques with a picture quality scale. In Proc. SPIE Applications of Digital Image Processing XV, volume 1771, pages 396±405, San Diego, CA, July 1992. 5. H. Barrett. Evaluation of image quality through linear discriminant models. In SID '92 Digest of Technical Papers, volume 23, pages 871±873. Society for Information Display, 1992. 6. H. H. Barrett, T. Gooley, K. Girodias, J. Rolland, T. White, and J. Yao. Linear discriminants and image quality. In Proceedings of the 1991 International Conference on Information Processing in Medical Imaging (IPMI '91), pages 458±473, Wye, United Kingdom, July 1991. Springer-Verlag. 7. P. Barten. Evaluation of subjective image quality with the square foot integral method. JOSA A, 7:2024±2031, 1990.
818
8. B. J. Betts. 0 Statistical Analysis of Digital Mammography. Ph.D. thesis, Stanford University, Department of Electrical Engineering, 1999. 9. J. M. Bramble, L. T. Cook, M. D. Murphey, N. L. Martin, W. H. Anderson, and K. S. Hensley. Image data compression in magni®cation hand radiographs. Radiology, 170:133±136, 1989. 10. Z. L. Budrikus. Visual ®delity criteria and modeling. Proc. IEEE, 60:771±779, July 1972. 11. P. C. Bunch, J. F. Hamilton, G. K. Sanderson, and A. H. Simmons. A free-response approach to the measurement and characterization of radiographic observer performance. J. Appl. Photogr. Engr., 4:166±171, 1978. 12. D. P. Chakraborty. Maximum likelihood analysis of freeresponse receiver operating characteristic (FROC) data. Med. Phys., 16:561±568, 1989. 13. D. P. Chakraborty and L. H. L. Winter. Free-response methodology: alternate analysis and a new observerperformance experiment. Radiology, 174(3):873±881, 1990. 14. J. Chen, M. J. Flynn, B. Gross, and D. Spizarny. Observer detection of image degradation caused by irreversible data compression processes. In Proceedings of Medical Imaging V: Image Capture, Formatting, and Display, volume 1444, pages 256±264. SPIE, 1991. 15. P. C. Cosman, C. Tseng, R. M. Gray, R. A. Olshen, L. E. Moses, H. C. Davidson, C. J. Bergin, and E. A. Riskin. Treestructured vector quantization of CT chest scans: image quality and diagnostic accuracy. IEEE Trans. Medical Imaging, 12(4):727±739. Dec. 1993. 16. S. Daly. Visible differences predictor: an algorithm for the assessment of image ®delity. In SPIE Proceedings, volume 1666, pages 2±14, 1992. 17. M. Duval-Destin. A spatio-temporal complete description of contrast. In SID '91 Digest of Technical Papers, volume 22, pages 615±618. Society for Information Display, 1991. 18. J. P. Egan, G. Z. Greenberg, and A. I. Schulman. Operating characteristics, signal detectability, and the method of free response. J. Acoust. Soc. Am., 33:993±1007, 1961. 19. A. M. Eskioglu and P. S. Fisher. A survey of quality measures for gray scale image compression. In Computing in Aerospace 9, pages 304±313, San Diego, CA, Oct. 1993. AIAA. 20. J. Farrell, H. Trontelj, C. Rosenberg, and J. Wiseman. Perceptual metrics for monochrome image compression. In SID '91 Digest of Technical Papers, volume 22, pages 631±634. Society for Information Display, 1991. 21. R. D. Fiete, H. H. Barrett, W. E. Smith, and K. J. Meyers. The Hotelling trace criterion and its correlation with human observer performance. J. Optical Soc. Amer. A, 4:945±953, 1987. 22. R. D. Fiete, H. H. Barrett, E. B. Cargill, K. J. Myers, and W. E. Smith. Psychophysical validation of the Hotelling trace criterion as a metric for system performance. In Proceedings SPIE Medical Imaging, volume 767, pages 298±305, 1987.
VI
23. A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic Publishers, Boston, 1992. 24. R. M. Gray, R. A. Olshen, D. Ikeda, P. C. Cosman, S. Perlmutter, C. Nash, and K. Perlmutter. Evaluating quality and utility in digital mammography. In Proceedings ICIP95, volume II, pages 5±8, Washington, D.C., October 1995. IEEE, IEEE Computer Society Press. 25. T. Grogan and D. Keene. Image quality evaluation with a contour-based perceptual model. In SPIE Proceedings, volume 1666, pages 188±197, 1992. 26. J. A. Hanley. Receiver operating characteristic (ROC) methodology: The state of the art. Critical Reviews in Diagnostic Imaging, 29:307±335, 1989. 27. N. Jayant, J. Johnston, and R. Safranek. Signal compression based on models of human perception. Proc. IEEE, 81:1385±1422, Oct. 1993. 28. N. S. Jayant and P. Noll. Digital Coding of Waveforms. Prentice-Hall, Englewood Cliffs, NJ, 1984. 29. S. Klein, A. Silverstein, and T. Carney. Relevance of human vision to JPEG±DCT compression. In SPIE Proceedings, volume 1666, pages 200±215, 1992. 30. H. Lee, A. H. Rowberg, M. S. Frank, H. S. Choi, and Y. Kim. Subjective evaluation of compressed image quality. In Proceedings of Medical Imaging VI: Image Capture, Formatting, and Display, volume 1653, pages 241±251. SPIE, Feb. 1992. 31. A. S. Lewis and G. Knowles. Image compression using the 2-D wavelet transform. IEEE Trans. Image Processing, 1(2):244±250, April 1992. 32. J. Limb. Distortion criteria of the human viewer. IEEE Trans. on Systems, Man, and Cybernetics, SMC-9:778±793, 1979. 33. J. Lubin. The use of psychovisual data and models in the analysis of display system performance. In A. Watson, editor, Visual Factors in Electronic Image Communications. MIT Press, Cambridge, MA, 1993. 34. F. X. J. Lukas and Z. L. Budrikis. Picture quality prediction based on a visual model. IEEE Trans. Comm., COM30(7):1679±1692, July 1982. 35. H. MacMahon, K. Doi, S. Sanada, S. M. Montner, M. L. Giger, C. E. Metz, N. Nakamori, F. Yin, X. Xu, H. Yonekawa, and H. Takeuchi. Data compression: effect on diagnostic accuracy in digital chest radiographs. Radiology, 178:175±179, 1991. 36. J. L. Mannos and D. J. Sakrison. The effects of a visual ®delity criterion of the encoding of images. IEEE Trans. Inform. Theory, 20:525±536, July 1974. 37. H. Marmolin. Subjective mse measures. IEEE Trans. on Systems, Man, and Cybernetics, SMC-16(3):486±489, May/ June 1986. 38. B. J. McNeil and J. A. Hanley. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making, 4:137±150, 1984.
"'
39. C. E. Metz. Basic principles of ROC analysis. Seminars in Nuclear Medicine, VIII(4):282±298, Oct. 1978. 40. A. Netravali and B. Prasada. Adaptive quantization of picture signals using spatial masking. Proc. IEEE, 65:536± 548, 1977. 41. K. Ngan, K. Leong, and H. Singh. Cosine transform coding incorporating human visual system model. In SPIE Proceedings, volume 707, pages 165±171, 1986. 42. N. Nill. A visual model weighted cosine transform for image compression and quality assessment. IEEE Trans. Comm., COM-33:551±557, 1985. 43. N. B. Nill and B. H. Bouzas. Objective image quality measure derived from digital image power spectra. Optical Engineering, 31(4):813±825, April 1992. 44. T. Pappas. Perceptual coding and printing of gray-scale and color images. In SID Digest, volume 23, pages 689±692, 1992. 45. S. M. Perlmutter, P. C. Cosman, R. M. Gray, R. A. Olshen, D. Ikeda, C. N. Adams, B. J. Betts, M. Williams, K. O. Perlmutter, J. Li, A. Aiyer, L. Fajardo, R. Birdwell, and B. L. Daniel. Image quality in lossy compressed digital mammograms. Signal Processing, 59:189±210, June 1997. 46. S. M. Perlmutter, P. C. Cosman, C. Tseng, R. A. Olshen, R. M. Gray, K. C. P. Li, and C. J. Bergin. Medical image compression and vector quantization. Statistical Science, 13(1):30±53, Jan. 1998. 47. S. M. Perlmutter, C. Tseng, P. C. Cosman, K. C. P. Li, R. A. Olshen, and R. M. Gray. Measurement accuracy as a measure of image quality in compressed MR chest scans. In Proceedings ICIP-94, volume 1, pages 861±865, Austin, TX, Nov. 1994. IEEE Computer Society Press. 48. M. J. D. Powell. Approximation Theory and Methods. Cambridge University Press, Cambridge, U.K., 1981. 49. S. R. Quackenbush, T. P. Barnwell III, and M. A. Clements. Objective Measures of Speech Quality. Prentice Hall Signal Processing Series. Prentice-Hall, Englewood Cliffs, NJ, 1988. 50. M. Rabbani and P. W. Jones. Digital Image Compression Techniques, volume TT7 of Tutorial Texts in Optical Engineering. SPIE Optical Engineering Press, Bellingham, WA, 1991. 51. R. J. Safranek and J. D. Johnston. A perceptually tuned sub-band image coder with image dependent quantization and post-quantization data compression. In Proceedings ICASSP, pages 1945±1948, Glasgow, U.K., 1989. 52. R. J. Safranek, J. D. Johnston, and R. E. Rosenholtz. A perceptually tuned sub-band image coder. In Proceedings
53.
54.
55.
56.
57.
58.
59.
60. 61.
62.
63. 64. 65.
66.
67.
of the SPIE Ð The International Society for Optical Engineering, pages 284±293, Santa Clara, Feb. 1990. IEEE. J. A. Sagrhi, P. S. Cheatham, and A. Habibi. Image quality measure based on a human visual system model. Optical Engineering, 28(7):813±818, July 1989. A. Said and W. A. Pearlman. A new, fast, and ef®cient image codec based on set partitioning in hierarchical trees. IEEE Trans. on Circuits and Systems for Video Technology, 6(3):243±250, June 1996. D. J. Sakrison. On the role of the observer and a distortion measure in image transmission. IEEE Trans. Comm., 25:1251±1267, 1977. J. Sayre, D. R. Aberle, M. I. Boechat, T. R. Hall, H. K. Huang, B. K. Ho, P. Kash®an, and G. Rahbar. Effect of data compression on diagnostic accuracy in digital hand and chest radiography. In Proceedings of Medical Imaging VI: Image Capture, Formatting, and Display, volume 1653, pages 232±240. SPIE, Feb. 1992. J. Shapiro. Embedded image coding using zerotrees of wavelet coef®cients. IEEE Transactions on Signal Processing, 41(12):3445±3462, December 1993. E. Shlomot, Y. Zeevi, and W. Pearlman. The importance of spatial frequency and orientation in image decomposition and coding. In SPIE Proceedings, volume 845, pages 152± 158, 1987. W. R. Steinbach and K. Richter. Multiple classi®cation and receiver operating characteristic (ROC) analysis. Medical Decision Making, 7:234±237, 1995. T. G. Stockham, Jr. Image processing in the context of a visual model. Proc. IEEE, 60:828±842, July 1972. J. A. Swets. ROC analysis applied to the evaluation of medical imaging techniques. Investigative Radiology, 14:109±121, March±April 1979. W. D. Voiers. Diagnostic acceptability measure for speech communication systems. In Proceedings ICASSP, pages 204±207, 1977. A. Watson. Ef®ciency of an image code based on human vision. JOSA A, 4:2401±2417, 1987. M. C. Weinstein and H. V. Fineberg. Clinical Decision Analysis. W. B. Saunders Company, Philadelphia, 1980. P. Wilhelm, D. R. Haynor, Y. Kim, and E. A. Riskin. Lossy image compression for digital medical imaging system. Optical Engineering, 30:1479±1485, Oct. 1991. I. H. Witten, R. M. Neal, and J. G. Cleary. Arithmetic coding for data compression. Communications of the ACM, 30:520±540, 1987. C. Zetzsche and G. Hauske. Multiple channel model for the prediction of subjective image quality. In SPIE Proceedings, volume 1077, pages 209±216, 1989.
50 Quality Evaluation for Compressed Medical Images: Diagnostic Accuracy Pamela Cosman ia, San Diego
Robert Gray Richard Olshen Stanford University
1 2
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821 CT Study: Example of Detection Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
3
MR Study: Example of Measurement Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
4
Mammography Study: Example of Management Accuracy . . . . . . . . . . . . . . . . . . . 832
2.1 Behrens±Fisher±Welch t-statistic 3.1 Study Design and Statistical Analysis 3.2 Discussion 4.1 Statistical Analysis 4.2 Results and Discussion
5
Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
We examined in the previous chapter several common computable measures of image quality, as well as subjective quality ratings. Although these quality measures are useful in many ways, for medical images one wishes a quality measure to take proper account of the diagnostic purpose of the image. The ROC methodology discussed in the previous chapter is one approach to this. In this chapter, we present several studies that attempt to evaluate diagnostic utility of the images more directly. The radiologists were not specially trained or calibrated in any way for these judging tasks, as the goal of these studies was speci®cally to evaluate compression performance in the context of radiologists carrying out tasks that resembled their everyday work. No constraints were placed on the viewing time, the viewing distance, or the lighting conditions. The judges were encouraged to simulate the conditions they would use in everyday work. The tasks were detection of lung nodules and mediastinal adenopathy in CT images, measurement of blood vessels in MR chest scans, and detection and management tasks in mammography. As we shall see, the results indicate that when these images are used in situations resembling everyday work, substantial compression can be applied without affecting the interpretation of the radiologist. Portions reprinted, with permission, from IEEE Trans. Medical Imaging, 12(4): 727±739, Dec. 1993 and Proceedings IEEE, 82(6): 919±932, June, 1994.
Copyright # 2000 by Academic Press. All rights of reproduction in any form reserved.
CT Study: Example of Detection Accuracy The detection of abnormal lymphoid tissue is an important aspect of chest imaging. This is true especially for the mediastinum, the central portion of the chest that contains the heart, major blood vessels, and other structures. Abnormally enlarged lymph nodes, or lymphadenopathy, in the mediastinum can be caused by primary malignancy such as lymphoma, metastatic disease that results from the spread of breast or lung cancer through the lymphatics, tuberculosis, or non-infectious in¯ammatory diseases such as sarcoidosis. Typically radiologists can easily locate lymph nodes in a chest scan. The detection task is therefore to determine which of the located lymph nodes are enlarged. The detection of lung nodules is another major objective in diagnostic imaging of the chest. A common cause of these nodules is malignancy, primary or metastatic. The latter, which spreads through the blood stream from a primary cancer almost anywhere in the body, can cause multiple nodules in one or both lungs. Other causes include fungal and bacterial infections, and noninfectious in¯ammatory conditions. Nodules range in size from undetectably small to large enough to ®ll an entire segment of the lung. The compressed and original images were viewed by three radiologists. For each of the 30 images in a study, each radiologist viewed the original and 5 of the 6 compressed levels,
821
#$$
and thus 360 images were seen by each judge. The judges were blinded in that no information concerning patient study or compression level was indicated on the ®lm. Images were viewed on hardcopy ®lm on a lightbox, the usual way in which radiologists view images. The ``windows and levels'' adjustment to the dynamic range of the image was applied to each image before ®lming. This simple contrast adjustment technique sets maximum and minimum intensities for the image. All intensities above the maximum are thresholded to equal that maximum value. All intensities below the minimum are thresholded to equal that minimum value. This minimum value will be displayed on the output device as black, and the maximum value will be displayed as white. All intensity values lying in between the minimum and maximum are linearly rescaled to lie between black and white. This process allows for more of the dynamic range of the display device (in this case, the ®lm) to be used for the features of interest. A radiologist who was not involved in the judging applied standard settings for windows and levels for the mediastinal images, and different standard settings for the lung nodule images. The compressed and original images were ®lmed in standard 12on-1 format on 14}617} ®lm using the scanner that produced the original images. The viewings were divided into 3 sessions during which the judges independently viewed 10 pages, each with 6 lung nodule images and 6 mediastinal images. The judges marked abnormalities directly on the ®lms with a grease pencil, although mediastinal lymph nodes were not marked unless their smallest cross-sectional diameter measured 10 mm or greater. All judges were provided with their own copy of the ®lms for marking. No constraints were placed on the viewing time, the viewing distance, or the lighting conditions; the judges were encouraged to simulate the conditions they would use in everyday work. They were, however, constrained to view the 10 pages in the predetermined order, and could not go back to review earlier pages. At each session, each judge saw each image at 2 of the 7 levels of compression (7 levels includes the original). The two levels never appeared on the same ®lm, and the ordering of the pages ensured that they never appeared with fewer than 3 pages separating them. This was intended to reduce learning effects. Learning effects will be discussed in the next chapter. A given image at a given level was never seen more than once by any one judge, and so intraobserver variability was not explicitly measured. Of the 6 images in one study on any one page, only one image was shown as the original, and exactly 5 of the 6 compressed levels were represented. The original versions of the images are denoted ``g.'' The compressed versions are ``a'' through ``f.'' The randomization follows what is known as a ``Latin square'' arrangement. The consensus gold standard for the lung determined that there were, respectively, 4 images with 0 nodules, 9 with 1, 4 with 2, 5 with 3, and 2 with 4 among those images retained. For the mediastinum, there were 3 images with 0 abnormal nodes, 17 with 1, 2 with 2, and 2 with 3.
VI
Compression Storage and Communication
Once a gold standard is established, a value can be assigned to the sensitivity and the predictive value positive (PVP). The sensitivity and PVP results are shown graphically using scatter plots, spline ®ts, and associated con®dence regions. The spline ®ts are quadratic splines with a single knot at 1.5 bits per pixel (bpp), as given in the previous chapter. The underlying probability model that governs the 450 observed values of ( sensitivity or PVP) is taken to be as follows. The random vector of quadratic spline coef®cients
a0 ; a1 ; a2 ; b2 has a single realization for each ( judge, image) pair. What is observed as the bit rate varies is the value for the chosen ®ve compression levels plus independent mean 0 noise. The expected value of is E
y E
a0 E
a1 x E
a2 x 2 E
b2
max
0; x
1:52 ;
where the expectation is with respect to the unconditional distribution of the random vector
a0 ; a1 ; a2 ; b2 . Associated with each spline ®t is the residual root mean square (RMS), an estimate of the standard deviation of the individual measurements from an analysis of variance of the spline ®ts. The standard method for computing simultaneous con®dence regions for such curves is the ``S'' (or ``ScheffeÂ'') method [20], which is valid under certain Gaussian assumptions that do not hold for our data. Therefore we use the statistical technique called ``the bootstrap'' [4, 10±12], speci®cally a variation of the ``correlation model'' [13] that is related to the bootstrap-based prediction regions of Olshen et al. [22]. We denote the estimate of PVP for the lung study at a bit rate bpp ^ by E
y
bpp. (1) A quadratic spline equation can be written as ^ E
y
bpp a0 a1 x a2 x 2 b2
max
0; x
2
x0 ;
where x0 is the ``knot'' (in this study, x bit rate and x0 1:5 bpp). This equation comes from the linear model Y Db e with one entry of Y (and corresponding row of D) per observation. D is the ``design matrix'' of size 45064. It has four columns, the ®rst having the multiple of a0 (always 1), the second the multiple of a1 (that is, the bit ^ to denote the fourrate), and so on. We use E
a dimensional vector of estimated least squares coef®cients: t ^ E
a
^ a0 ; a^1 ; a^2 ; b^2 :
(2) For a given bit rate b, write the row vector dictated by ^ ^ the spline as dt dt
b. Thus, E
y
bpp dt E
a. (3) The con®dence region will be of the form
#
%$50 Quality Evaluation: Diagnostic Accuracy t
^ E
a
p q ^ S F t
Dt D 1 d y dt E
a q p 1 S F dt
Dt D d;
where S is the square root of the residual mean square from an analysis of variance of the data. So, if Y is n61 and b is k61, then S
(4)
(5)
(6) (7)
r 1 2 ^ kY DE
ak : n k
The region will be truncated, if necessary, so that always 0 y 1. The bootstrapping is conducted by ®rst drawing a sample of size 3 with replacement from our group of three judges. This bootstrap sample may include one, two, or all three of the judges. For each chosen judge (including multiplicities, if any), we draw a sample of size 30 with replacement from the set of 30 original images. It can be shown that typically about 63%
100
1 e 1 % of the images will appear at least once in each bootstrap sample of images. For each chosen judge and original image, we include in the bootstrap sample all ®ve of the observed values of y. The motivation for this bootstrap sampling is simple: The bootstrap sample bears the same relationship to the original sample that the original sample bears to ``nature.'' We do not know the real relationship between the true data and nature; if we did, we would use it in judging coverage probabilities in steps 7 and 8. However, we do know the data themselves, and so we can imitate the relationship between nature and the data by examining the observed relationship between the data and a sample from them [10, 12]. Each bootstrap sample Y entails a bootstrap design matrix D , as well as corresponding E^
a and S . This bootstrap process will be carried out nb 1000 times. For the jth bootstrap sample compute the four new bootstrap quantitiespas in 5. Compute for each F
p 1 G^B
F
nb f#j : dt E^
a ^ dt E
a
pq 1 t t S F d
D D d
pq dt E^
a S F dt
Dt D 1 d Vdg
nb 1 f#j :
E^
a ^ E
a F
S 2 g:
t ^ E
a
Dt D
E^
a
Note that the latter expression is what is used in the computation. This is the standard Scheffe method, as described in [20].
(8) For 100% region compute p con®dence p p a
F p min f F : G^B
F pg and use that value in the equation in step 4. In our case, wep are interested in obtaining a 95% con®dence region, so F is chosen so that for 95% of the bootstrap samples
E^
a
t ^ E
a
Dt D
E^
a
^ E
a F
S 2 :
In this model, the bit rate is treated as a nonrandom predictor that we control, and the judges and images are ``random effects'' because our three judges and 30 images have been sampled from arbitrarily large numbers of possibilities. Figure 1 displays all data for lung sensitivity and lung PVP (calculated relative to the consensus gold standard) for all 24 images, judges, and compressed levels for which there was a consensus gold standard. There are 360 x's: 360 3 judges624 images65 compressed levels seen for each image. Figure 2 is the corresponding ®gure for the mediastinum relative to the personal gold standard. The o's mark the average of the x's for each bit rate. The values of the sensitivity and PVP are simple fractions such as 1/2 and 2/3 because there are at most a few abnormalities in each image. The curves are least squares quadratic spline ®ts to the data with a single knot at 1.5 bpp, together with the two-sided 95% con®dence regions. Since the sensitivity and PVP cannot exceed 1, the upper con®dence curve was thresholded at 1. The residual RMS is the square root of the residual mean square from an analysis of variance of the spline ®ts. Sensitivity for the lung seems to be nearly as good at low rates of compression as at high rates, but sensitivity for the mediastinum drops off at the lower bit rates, driven primarily by the results for one judge. PVP for the lung is roughly constant across the bit rates, and the same for the mediastinum. Table 1 shows the numbers of original test images (out of 30 total) that contain the listed number of abnormalities for each disease type according to each judge. Also, the rows marked All show the number of original test images (out of
TABLE Number of test images that contain the listed number of abnormalities (Mdst mediastinum) Number of abnormalities Type
Judge
0
1
2
3
4
Lung Lung Lung Lung Mdst Mdst Mdst Mdst
1 2 3 All 1 2 3 All
3 4 3 4 3 2 3 3
11 9 8 9 14 22 22 17
7 10 8 4 7 2 4 2
6 4 5 5 6 4 1 2
2 2 2 2
5
6
2
1 1 1
7
8
1
#$&
VI
(a)
Compression Storage and Communication
(b)
FIGURE 1 Relative to the consensus gold standard: (a) lung sensitivity (RMS 0.177), (b) lung PVP (RMS 0.215).
iastinum, where the chi-square value (on 6 degrees of freedom) is 8.83. However, Table 1 does not fully indicate the variability among the judges. For example, the table shows that each judge found six lung nodules in an original test image only once. However, it was not the same test image for all three for which this occurred.
24 total) that contain the listed number of abnormalities according to the consensus gold standard. We examine this table to determine whether or not it is valid to pool the sensitivity and PVP results across judges. Simple chi-square tests for homogeneity show that for both the lung and the mediastinum, judges do not differ beyond chance from equality in the numbers of abnormalities they found. In particular, if for the lung we categorize abnormalities found as 0, 1, 2, 3, or at least 4, then the chi-square statistic is 3.16 (on 8 degrees of freedom). Six cells have expectations below 5, a traditional concern, but an exact test would not have a different conclusion. Similar comments apply to the med-
The comparison of sensitivity and PVP at different bit rates was carried out using a permutation distribution of a two-sample t-test that is sometimes called the Behrens±Fisher±Welch test
(a)
(b)
FIGURE 2
2.1 Behrens±Fisher±Welch t-statistic
Relative to the personal gold standard: (a) Mediastinum Sensitivity (RMS 0.243), (b) Mediastinum PVP (RMS 0.245).
50 Quality Evaluation: Diagnostic Accuracy
[3, 18].The statistic takes account of the fact that the within group variances are different. In the standard paired t-test where we have n pairs of observations, let mD denote the true, and unknown, average difference between the members of a pair. If we denote the sample mean difference between the and the estimate of standard members of the pairs by D, deviation of these differences by sD, then the quantity t
D
mD sD
follows (under certain normality assumptions) Student's t distribution with
n 1 degrees of freedom, and this may be used to test the null hypothesis that mD 0, that is, that there is no difference between the members of a pair [27]. Now, with our sensitivity and PVP data, there is no single estimate sD of the standard deviation that can be made. For an image I1 that has only one abnormality according to the consensus gold standard, the judges can have sensitivity equal to either 0 or 1, but for an image I2 with three abnormalities the sensitivity can equal 0, 0.33, 0.67, or 1. So, in comparing bit rates b1 and b2 , when we form a pair out of image I1 seen at bit rates b1 and b2 , and we form another pair out of image I2 seen at bit rates b1 and b2 , we see that the variance associated with some pairs is larger than that associated with other pairs. The Behrens± Fisher±Welch test takes account of this inequality of variances. The test is exact and does not rely on Gaussian assumptions that would be patently false for this data set. The use of this statistic is illustrated by the following example. Suppose Judge 1 has judged N lung images at both levels A and B. These images can be divided into 5 groups, according to whether the consensus gold standard for the image contained 0, 1, 2, 3, or 4 abnormalities. Let Ni be the number of images in the ith group. Let ij represent the difference in sensitivities (or PVP) for the jth image in the ith group seen at level A and at level B. Let i be the average difference: 1 X i : Ni j ij We de®ne Si2
1 Ni
1
X j
ij
2 i ;
and then the Behrens±Fisher±Welch t statistic is given by P i i tBFW q P 2 Si i Ni
In the consensus gold standard, there were never more than four abnormalities found. So the ij are fractions with denominators not more than 4 and are utterly non-Gaussian. (For the personal gold standard, the denominator could be as large as 8.) Therefore, computations of attained signi®cance (p-values) are based on the restricted permutation distribution
825
of tBFW . For each of the N images, we can permute the results from the two levels [AB and BA] or not. There are 2N points possible in the full permutation distribution, and we calculate tBFW for each one. The motivation for the permutation distribution is that if there were no difference between the bit rates, then in computing the differences ij , it should not matter whether we compute level A level B or vice versa, and we would not expect the ``real'' tBFW to be an extreme value among the 2N values. If k is the number of permuted tBFW values that exceed the ``real'' one, then
k 1=2N is the attained one-sided signi®cance level for the test of the null hypothesis that the lower bit rate performs at least as well as the higher one. As discussed later, the one-sided test of signi®cance is chosen to be conservative and to argue most strongly against compression. When the judges were evaluated separately, level A (the lowest bit rate) was found to be signi®cantly different at the 5% level against most of the other levels for two of the judges, for both lung and mediastinum sensitivity. No differences were found among levels B through G. There were no signi®cant differences found between any pair of levels for PVP. When judges were pooled, more signi®cant differences were found. Level A was generally inferior to the other levels for both lung and mediastinal sensitivity. Also, levels B and C differed from level G for lung sensitivity (p 0:016 for both) and levels B and C differed from level G for mediastinal sensitivity (p 0:008 and 0.016, respectively). For PVP, no differences were found against level A with the exception of A vs E and F for the lungs (p 0:039 and 0.012, respectively), but B was somewhat different from C for the lungs
p 0:031, and C was different from E, F, and G for the mediastinum (p 0:016, 0.048, and 0.027, respectively). Using the consensus gold standard, the results indicate that level A (0.56 bpp) is unacceptable for diagnostic use. Since the blocking and prediction artifacts became quite noticeable at level A, the judges tended not to attempt to mark any abnormality unless they were quite sure it was there. This explains the initially surprising result that level A did well for PVP, but very poorly for sensitivity. Since no differences were found among levels D (1.8 bpp), E (2.2 bpp), F (2.64 bpp), and G (original images at 12 bpp), despite the biases against compression contained in our analysis methods, these three compressed levels are clearly acceptable for diagnostic use in our applications. The decision concerning levels B (1.18 bpp) and C (1.34 bpp) is less clear, and would require further tests involving a larger number of detection tasks, more judges, or use of a different gold standard that in principle could remove at least one of the biases against compression that are present in this study. Since the personal gold standard has the advantage of using all the images in the study, and the consensus gold standard has the advantage of having little bias between original and compressed images, we can capitalize on both sets of
#$'
advantages with a two-step comparison. Sensitivity and PVP values relative to the consensus gold standard show there to be no signi®cant differences between the slightly compressed images (levels D, E, and F) and the originals. This is true for both disease categories, for judges evaluated separately and pooled, and using both the Behrens±Fisher±Welch test to examine the sensitivity and PVP separately and using the McNemar test (discussed in the next chapter) to examine them in combination. With this assurance, the personal gold standard can then be used to look for differences between the more compressed levels (A, B, C) and the less compressed ones (D, E, F). The most compressed level A (0.56 bpp, 21:1 compression ratio) is unacceptable as observations made on these images were signi®cantly different from those on less compressed images for two judges. Level B (1.18 bpp) is also unacceptable, although barely so, because the only signi®cant difference was between the sensitivities at levels B and F for a single disease category and a single judge. No differences were found between level C and the less compressed levels, nor were there any signi®cant differences between levels D, E, and F. In summary, using the consensus gold standard alone, the results indicate that levels D, E, and F are clearly acceptable for diagnostic use, level A is clearly unacceptable, and levels B and C are marginally unacceptable. Using the personal and consensus gold standard data jointly, the results indicate that levels C, D, E, and F are clearly acceptable for diagnostic use, level A is clearly unacceptable, and level B is marginally unacceptable. We would like to conclude that there are some compression schemes whose implementation would not degrade clinical practice. To make this point, we must either use tests that are unbiased, or, acting as our own devil's advocates, use tests that are biased against compression. This criterion is met by the fact that the statistical approach described here contains four identi®able biases, none of which favors compression. The biases are as follows. (1) As discussed in the previous chapter, the gold standard confers an advantage upon the original images relative to the compressed levels. This bias is mild in the case of the consensus gold standard, but severe in the case of the personal gold standard. (2) There is a bias introduced by multiple comparisons [20]. Since (for each gold standard) we perform comparisons for all possible pairs out of the 7 levels, for both sensitivity and PVP, for both lung and mediastinal images, and both for 3 judges separately and for judges pooled, we are reporting on 21626264 336 tests for each gold standard. One would expect that, even if there were no effect of compression upon diagnosis, 5% of these comparisons would show signi®cant differences at the 5% signi®cance level.
VI
Compression Storage and Communication
(3) A third element that argues against compression is the use of a one-sided test instead of a two-sided test. In most contexts, for example when a new and old treatment are being compared and subjects on the new treatment do better than those on the old, we do a twosided test of signi®cance. Such two-sided tests implicitly account for both possibilities: that new interventions may make for better or worse outcomes than standard ones. For us, a two-sided test would implicitly recognize the possibility that compression improves, not degrades, clinical practice. In fact, we believe this can happen, but to incorporate such beliefs in our formulation of a test would make us less our own devil's advocates than would our use of a one-sided test. Our task is to ®nd when compression might be used with clinical impunity, not when it might enhance images. (4) The fourth bias stems from the fact that the summands in the numerator of tBFW may well be positively correlated (in the statistical sense), though we have no way to estimate this positive dependence from our data. If we did, the denominator of tBFW would typically be smaller, and such incorporation would make ®nding ``signi®cant'' differences between compression levels more dif®cult. For all of these reasons, we believe that the stated conclusions are conservative.
# MR Study: Example of Measurement Accuracy Previous studies of the effects of lossy compression on diagnostic accuracy have focused on the detection of structures [5, 7, 8, 15, 19, 26]. However, measurement tasks also play a crucial role in diagnostic radiology. Measurements on structures such as blood vessels, other organs, and tumors take a central role in the identi®cation of abnormalities and in following therapeutic response. Radiologists routinely measure almost everything they detect. For example, while diagnosing a fractured bone, they might measure the displacement between the two pieces of bone, or when reporting the presence of metastatic lesions in the liver, they might measure the size of the largest one. Often such measurements are not of great importance to clinical decision-making; but in some examples, they are of extreme signi®cance. In vascular surgery, for example, precise measurements taken on angiograms of the distance from an area of stenosis to the nearest bifurcation in the vascular structure are needed to reduce surgical exposure. In the evaluation of aneurysms, size is an important prognostic feature in any presurgical assessment. Precise measurements on images are increasingly important in those areas where 3D
#$(
50 Quality Evaluation: Diagnostic Accuracy
stereotactic localization can lead to less invasive surgical biopsy methods. For example, in mammography, ®ne needle biopsy techniques require careful distance measurements in order to place the needle correctly. For our study of the effects of compression on measurement accuracy, we chose to look at measurements of aortic aneurysms, one of the most common areas where size measurements radically affect clinical decision making. Abdominal aortic aneurysms are usually evaluated with ultrasound, and thoracic aortic aneurysms are evaluated by CT or MRI. In the latter case, the aortic diameter is usually measured manually with calipers. If the aorta exceeds 4 cm in diameter, an aneurysm is diagnosed. A larger aneurysm carries a greater risk of rupture, with approximately 10% risk of rupture for aneurysms between 5 and 10 cm in diameter, and about 50% for aneurysms greater than 10 cm [17]. Rupture is invariably fatal, and so when the aorta measures more than about 5 or 6 cm in diameter, operative repair is usually recommended [6, 28]. The clinical decision depends not only on the size of the aneurysm but also on the clinical status of the patient (issues of pain and hemodynamic instability). Dilation less than 5 cm in diameter may be followed conservatively by serial MR imaging studies at 6-month intervals. Observing an increase in the aortic diameter of 0.5 cm over the course of a 6month interval would be indication for surgical repair. The study described here had as its goal to quantify the effects of lossy compression on measurement accuracy through experiments that follow closely the clinical tasks of radiologists evaluating aortic aneurysms [24, 25]. We wished to examine whether compression maintains the information required for accurate measurements, or whether it leads to inaccuracies by blurring edges or distorting structures. If compression at a certain bit rate caused a 0.5-cm error in the aortic measurement, that would have an impact on the clinical decision, and the compression would be unacceptable. Although we focused on the medical problem of thoracic aortic aneurysms as seen on MR scans, the methodology developed in this research is applicable to any medical task requiring the measurement of structures. The task studied was the measurement of four primary blood vessels in the mediastinum: the ascending aorta, descending aorta, right pulmonary artery (RPA), and superior vena cava (SVC). A set of 9-bit original MR chest images containing aneurysms and normal vessels was compressed to ®ve bit rates between 0.36 and 1.7 bpp. Radiologists measured the four vessels on each image. In our statistical analyses, we set two gold standards, ``personal'' and ``independent.'' As discussed in the previous chapter, these represent two methods of establishing the correct size of each blood vessel, that is, the underlying diagnostic ``truth'' of each image. For each of these gold standards, we quantify the accuracy of the measurements at each compression level by taking the percent measurement error for each image, de®ned to be the difference between a
radiologist's measurement and the gold standard, scaled by the gold standard measurement. This error is examined as a function of bit rate by using the t-test and a nonparametric competitor, the Wilcoxon signed rank test.
# Study Design and Statistical Analysis To simulate normal clinical practice, test images were selected from 30 sequential thoracic MR examinations of diagnostic quality obtained after February 1, 1991. The patients studied included 16 females and 14 males, with ages ranging from 1 to 93 years and an average age of 48:0+24:7 years (mean + s.d.). Clinical diagnoses included aortic aneurysm
n 11, thoracic tumors
n 11, pre- or post-lung transplant
n 5, constrictive pericarditis
n 1, and subclavian artery rupture
n 1. From each examination, one image that best demonstrated all four major vessels of interest was selected. The training images were selected similarly from different examinations. All analyses were based solely on measurements made on the test images. The 30 test scans compressed to 5 bit rates plus the originals give rise to a total of 180 images. These images were arranged in a randomized sequence and presented on separate hardcopy ®lms to three radiologists. The viewing protocol consisted of 3 sessions held at least 2 weeks apart. Each session included 10 ®lms viewed in a predetermined order with 6 scans on each ®lm. The 3 radiologists began viewing ®lms at a different starting point in the randomized sequence. To minimize the probability of remembering measurements from past images, a radiologist saw only 2 of the 6 levels of each image in each session, with the second occurrence of each image spaced at least 4 ®lms after the ®rst occurrence of that image. Following standard clinical methods for detecting aneurysms, the radiologists used calipers and a millimeter scale available on each image to measure the four blood vessels appearing on each scan. Although the use of digital calipers might have allowed more accurate measurements, this would have violated one of our principal goals, namely to follow as closely as possible actual clinical practice. It is the standard practice of almost all radiologists to measure with manual calipers. This is largely because they lack the equipment, or they would prefer not to take the time to bring up the relevant image on the terminal and then perform the measurements with electronic calipers. We asked radiologists to make all measurements between the outer walls of the vessels along the axis of maximum diameter. It is this maximum diameter measurement that is used to make clinical decisions. Both the measurements and axes were marked on the ®lm with a grease pencil. The independent gold standard was set by having two radiologists come to an agreement on vessel sizes on the original scans. They ®rst independently measured the vessels on each scan and then remeasured only those vessels on which they initially differed until an exact agreement on the
#$#
VI
number of millimeters was reached. These two radiologists are different from the three radiologists whose judgments are used to determine diagnostic accuracy. A personal standard was also derived for each of the three judging radiologists by taking their own measurements on the original images. Once the gold standard measurement for each vessel in each image was assigned, measurement error can be quanti®ed in a variety of ways. If z is the radiologist's measurement and g represents the gold standard measurement, then some potential summary statistics are z g z z g : ;
z g; log ; g g g
These statistics have invariance properties that bear upon understanding the data. For example, z g is invariant to the same additive constant (that is, to a change in origin), log
z=g is invariant to the same multiplicative constant (that is, to a change in scale), and
z g=g is invariant to the same multiplicative constant and to the same sign changes. For simplicity and appropriateness in the statistical tests carried out, the error parameters chosen for this study are percent measurement error ( pme), pme
z
g g
6100%;
and absolute percent measurement error (apme) apme
jz
gj g
6100%;
both of which scale the error by the gold standard measurement to give a concept of error relative to the size of the vessel being measured. The differences in error achieved at each bit rate can be quanti®ed as statistically signi®cant by many tests. Each should respect the pairing of the measurements being compared and the multiplicity of comparisons being made. In order to ensure that our conclusions are not governed by the test being used, we chose to use two of the most common, the t and Wilcoxon tests. We also employed statistical techniques that account for this multiplicity of tests. The measurements are considered paired in a comparison of two bit rates since the same vessel in the same image is measured by the same radiologist at both bit rates. For instance, let x1 be the measurement of a vessel at bit rate 1, x2 be its measurement at bit rate 2, and g be the vessel's gold standard measurement. Then the pme at bit rates 1 and 2 are x g x g 6100% and pme2 2 6100%; pme1 1 g g and their difference is pmeD
x1
x2 g
6100%:
Compression Storage and Communication
In such a two-level comparison, pme more accurately preserves the difference between two errors than does apme. A vessel that is overmeasured by % ( positive) on bit rate 1 and undermeasured by % (negative) on bit rate 2 will have an error distance of 2% if pme is used but a distance of zero if apme is used. Therefore, both the t-test and the Wilcoxon signed rank test are carried out using only pme. Apme is used later to present a more accurate picture of error when we plot an average of apme across the 30 test images vs bit rate. The t-statistic quanti®es the statistical signi®cance of the observed difference between two data sets in which the data can be paired. Unlike the CT study, in which the Behrens±Fisher± Welch t-test was used because of the obviously different variances present for different images, here the ordinary t-test was applicable. The difference in error for two bit rates is calculated for all the vessels measured at both bit rates. If the radiologists made greater errors at bit rate 1 than at bit rate 2, the average difference in error over all the data will be positive. If bit rate 1 is no more or less likely to cause error than bit rate 2, the average difference in error is zero. The t-test assumes that the sample average difference in error between two bit rates varies in a Gaussian manner about the real average difference [27]. If the data are Gaussian, which they clearly cannot exactly be in our application, the paired t-test is an exact test. Quantile±Quantile plots of pme differences for comparing levels vary from linear to S-shaped; in general, the Q-Q plots indicate a moderate ®t to the Gaussian model. The size of our data set
4 vessels630 images66 levels63 judges 2160 data points makes a formal test for normality nearly irrelevant. The large number of data points serves to guarantee failure of even fairly Gaussian data at conventional levels of signi®cance. (That is, the generating distribution is likely not to be exactly Gaussian, and with enough data, even a tiny discrepancy from Gaussian will be apparent.) Even if the data are non-Gaussian, however, the central limit theorem renders the t-test approximately valid. With the Wilcoxon signed rank test [27] the signi®cance of the difference between the bit rates is obtained by comparing a standardized value of the Wilcoxon statistic against the normal standard deviate at the 95% twotail con®dence level. The distribution of this standardized Wilcoxon is nearly exactly Gaussian if the null hypothesis is true for samples as small as 20. %esults Using the Independent Gold Standard Plots of trends in measurement error as a function of bit rate are presented in Figs 3±6. In all cases, the general trend of the data is indicated by ®tting the data points with a quadratic spline having one knot at 1.0 bpp. Figure 3 gives average pme against the mean bit rate for all radiologists pooled (i.e., the data for all radiologists, images, levels, and structures, with each radiologist's measurements compared to the independent gold standard) and for each of the three radiologists separately. In Fig. 4, the pme vs actual achieved bit rate is plotted for all
50 Quality Evaluation: Diagnostic Accuracy
Mean pme vs mean bit rate using the independent gold standard. The dotted, dashed, and dash±dot curves are quadratic splines ®t to the data points for judges 1, 2, and 3, respectively. The solid curve is a quadratic spline ®t to the data points for all judges pooled. The splines have a single knot at 1.0 bpp.
data points. The relatively ¯at curve begins to increase slightly at the lowest bit rates, levels 1 and 2 (0.36, 0.55 bpp). It is apparent from an initial observation of these plots that except for measurement at the lowest bit rates, accuracy does not vary greatly with lossy compression. Possibly signi®cant increases in
FIGURE Percent measurement error vs. actual bit rate using the independent gold standard. The x's indicate data points for all images, pooled across judges and compression levels. The solid curve is a quadratic spline ®t to the data with a single knot at 1.0 bpp. Reprinted with permission, from Proceedings First International Conference on Image Processing, ICIP 194, I: 861±865, Austin, Texas, Nov. 1994.
#$)
FIGURE Mean apme vs mean bit rate using the independent gold standard. The dotted, dashed, and dash±dot curves are quadratic splines ®t to the data points for judges 1, 2, and 3, respectively. The solid curve is a quadratic spline ®t to the data points for all judges pooled.
error appear only at the lowest bit rates, whereas at the remaining bit rates measurement accuracy is similar to that obtained with the originals. The average performance on images compressed to level 5 (1.7 bpp) is actually better than performance on originals. Although the trends in pme vs. bit rate are useful, overmeasurement ( positive error) can cancel under-measurement (negative error) when these errors are being averaged or ®tted
FIGURE Apme vs actual bit rate using the independent gold standard. The x's indicate data points for all images, pooled across judges and compresssion levels. The solid curve is a quadratic spline ®t to the data.
#%
with a spline. For this reason, we turn to apme which measures the error made by a radiologist regardless of whether it originated from overmeasurement or undermeasurement. Figure 5 plots average apme vs average bit rate for each radiologist and for all radiologists pooled. Figure 6 shows actual apme vs actual bit rate achieved. These plots show trends similar to those observed before. The original level contains more or less the same apme as compression levels 3, 4, and 5 (0.82, 1.14, 1.7 bpp). Levels 1 and 2 (0.36, 0.55 bpp) show slightly higher error. These plots provide only approximate visual trends in data. The t-test was used to test the null-hypothesis that the ``true'' pme between two bit rates is zero. The standardized average difference is compared with the ``null'' value of zero by comparing with standard normal tables. None of the compressed images down to the lowest bit rate of 0.36 bpp was found to have a signi®cantly higher pme when compared to the error made on the originals. Among the compressed levels however, level 1 (0.36 bpp) was found to be signi®cantly different from level 5 (1.7 bpp). As was mentioned, the performance on level 5 was better than that on all levels, including the uncompressed level. When using the Wilcoxon signed rank test to compare compressed images against the originals, only level 1 (0.36 bpp) differed signi®cantly in the distribution of pme. Within the levels representing the compressed images, levels 1, 3, and 4 (0.36, 0.82, 1.14 bpp) had signi®cantly different pme than those at level 5 (1.7 bpp). Since measurement accuracy is determined from the differences with respect to the originals only, a conservative view of the results of the analyses using the independent gold standard is that accuracy is retained down to 0.55 bpp (level 2).
VI
Compression Storage and Communication
FIGURE Mean pme vs mean bit rate using the personal gold standard. The dotted, dashed, and dash-dot curves are quadratic splines ®t to the data points for judges 1, 2, and 3, respectively. The solid curve is a quadratic spline ®t to the data points for all judges pooled.
pooled are the measurements from all judges, images, levels, and vessels, with each judge's measurements compared to her or his personal gold standard. In each case, quadratic splines with a single knot at 1.0 bpp were ®t to the data. Figs 9 and 10 are the corresponding ®gures for the apme. As expected, with the personal gold standard the pme and the apme are less than those obtained with the independent gold standard. The graphs indicate that whereas both judges 2 and 3 overmeasured at all bit rates with respect to the independent gold standard, only
%esults Using the Personal Gold Standard As was discussed previously, the personal gold standard was set by taking a radiologist's recorded vessel size on the uncompressed image to be the correct measurement for judging her or his performance on the compressed images. Using a personal gold standard in general accounts for a measurement bias attributed to an individual radiologist, thereby providing a more consistent result among the measurements of each judge at the different compression levels. The personal gold standard thus eliminates the interobserver variability present with the independent gold standard. However, it does not allow us to compare performance at compressed bit rates to performance at the original bit rates, since the standard is determined from the original bit rates, thereby giving the original images zero error. As before, we ®rst consider visual trends and then quantify differences between levels by statistical tests. Figure 7 shows average pme vs mean bit rate for the ®ve compressed levels for each judge separately and for the judges pooled, whereas Fig. 8 is a display of the actual pme vs. actual achieved bit rate for all the data points. The data for the judges
FIGURE Pme vs actual bit rate using the personal gold standard. The x's indicate data points for all images, pooled across judges and compression levels. The solid curve is a quadratic spline ®t to the data.
#%*
50 Quality Evaluation: Diagnostic Accuracy
FIGURE Mean apme vs mean bit rate using the personal gold standard. The dotted, dashed, and dash-dot curves are quadratic splines ®t to the data points for judges 1, 2, and 3, respectively. The solid curve is a quadratic spline ®t to the data points for all judges pooled.
judge 3 overmeasured at the compressed bit rates with respect to the personal gold standard. The t-test results indicate that levels 1 (0.36 bpp) and 4 (1.14 bpp) have signi®cantly different pme associated with them than does the personal gold standard. The results of the Wilcoxon signed rank test on percent measurement error using the personal gold standard are similar to those obtained with the independent gold standard. In particular, only level 1 at 0.36 bpp differed signi®cantly from the originals. Furthermore, levels 1, 3, and 4 were signi®cantly different from level 5.
FIGURE Apme vs actual bit rate using the personal gold standard. The x's indicate data points for all images, pooled across judges and compresssion levels. The solid curve is a quadratic spline ®t to the data.
Since the t-test indicates that some results are marginally signi®cant when the Wilcoxon signed rank test indicates the results are not signi®cant, a Bonferroni simultaneous test (union bound) was constructed. This technique uses the signi®cance level of two different tests to obtain a signi®cance level that is simultaneously applicable for both. For example, in order to obtain a simultaneous signi®cance level of % with two tests, we could have the signi®cance of each test be at
=2%. With the simultaneous test, the pme at level 4 (1.14 bpp) is not signi®cantly different from the uncompressed level. As such, the simultaneous test indicates that only level 1 (0.36 bpp) has signi®cantly different pme from the uncompressed level. This agrees with the corresponding result using the independent gold standard. Thus, pme at compression levels down to 0.55 bpp does not seem to differ signi®cantly from the pme at the 9.0 bpp original. In summary, with both the independent and personal gold standards, the t-test and the Wilcoxon signed rank test indicate that pme at compression levels down to 0.55 bpp did not differ signi®cantly from the pme at the 9.0 bpp original. This was shown to be true for the independent gold standard by a direct application of the tests. For the personal gold standard, this was resolved by using the Bonferroni test for simultaneous validity of multiple analyses. The status of measurement accuracy at 0.36 bpp remains unclear, with the t-test concluding no difference and the Wilcoxon indicating signi®cant difference in pme from the original with the independent gold standard, and both tests indicating signi®cant difference in pme from the original with the personal gold standard. Since the model for the t-test is ®tted only fairly to moderately well by the data, we lean towards the more conservative conclusion that lossy compression by our vector quantization compression method is not a cause of signi®cant measurement error at bit rates ranging from 9.0 bpp down to 0.55 bpp, but it does introduce error at 0.36 bpp. A radiologist's subjective perception of quality changes more rapidly and drastically with decreasing bit rate than does the actual measurement error. Radiologists evidently believe that the usefulness of images for measurement tasks degrades rapidly with decreasing bit rate. However, their actual measurement performance on the images was shown by both the t-test and Wilcoxon signed rank test (or the Bonferroni simultaneous test to resolve differences between the two) to remain consistently high down to 0.55 bpp. Thus, the radiologist's opinion of an image's diagnostic utility seems not to coincide with its utility for the clinical purpose for which the image was taken. The radiologist's subjective opinion of an image's usefulness for diagnosis should not be used as the sole predictor of actual usefulness.
3.2 Discussion There are issues of bias and variability to consider in comparing and contrasting gold standards. One disadvantage
#%$
of an independent gold standard is that since it is determined by the measurements of radiologists who do not judge the compressed images, signi®cant differences between a compressed level and the originals may be due to differences between judges. For example, a biased judge who tends to overmeasure at all bit rates may have high pme that will not be entirely re¯ective of the effects of compression. In our study, we determined that two judges consistently overmeasured relative to the independent gold standard. The personal gold standard, however, overcomes this dif®culty. A personal gold standard also has the advantage of reducing pme and apme at the compressed levels. This will result in a clari®cation of trends in a judge's performance across different compression levels. Differences will be based solely on compression level and not on differences between judges. Another argument in favor of a personal gold standard is that in some clinical settings a fundamental question is how the reports of a radiologist whose information is gathered from compressed images compare to what they would have been on the originals. Indeed, systematic biases of a radiologist are sometimes well recognized and corrected for by the referring physicians. One disadvantage with the personal gold standard, however, is that by de®ning the measurements on the original images to be ``correct,'' we are not accounting for the inherent variability of a judge's measurement on an uncompressed image. For example, if a judge makes an inaccurate measurement on the original and accurate measurements on the compressed images, these correct measurements will be interpreted as incorrect. Thus the method is biased against compression. An independent gold standard reduces the possibility of this situation occurring since we need an agreement by two independent radiologists on the ``correct'' measurement. The analysis previously presented was based on judges, vessels, and images pooled. Other analyses in which the performances of judges on particular vessels and images are separated demonstrate additional variability. Judges seem to have performed signi®cantly differently from each other. Judges 2 and 3 consistently overmeasured. As a result, the Wilcoxon signed rank test using the independent gold standard indicates signi®cant differences between the gold standard and the measurements of judges 2 and 3 at all compression levels, including the original. Judge 1, however, does not have any signi®cant performance differences between the gold standard and any compression levels. In addition, certain vessels and images had greater variability in pme than others. To examine the validity of pooling the results of all judges, vessels, and images, an analysis of variance (ANOVA) [21] was used to assess whether this variability is signi®cant. The ANOVA took the judges, vessels, and images to be random effects and the levels to be ®xed effects, and separated out the variance due to each effect. For technical reasons it is not feasible here to use direct F-tests on each of the variances estimated. Thus, we obtained con®dence regions for each component of variance using a jackknife technique [21]. In particular, if zero falls
VI
Compression Storage and Communication
within the 95% con®dence interval of a certain effect, then the effect is not considered signi®cant at the 5% level. Using the jackknife technique, the ANOVA indicates that the variability in judges, vessels, images, and levels were not signi®cantly different from zero, thereby validating the pooling.
* Mammography Study: Example of Management Accuracy X-ray-mammography is the most sensitive technique for detecting breast cancer [2], with a reported sensitivity of 85± 95% for detecting small lesions. Most noninvasive ductal carcinomas, or DCIS, are characterized by tiny nonpalpable calci®cations detected at screening mammography [9, 16, 29]. Traditional mammography is essentially analog photography using X-rays in place of light and analog ®lm for display. Mammography machines based on direct digital acquisition exist, and the review is in process of FDA approval for market. The study discussed here, however, employed only digitized analog ®lms. The studies were digitized using a Lumisys Lumiscan 150 at 12 bpp with a spot size of 50 microns. After compression, the images were put on hardcopy ®lm. The ®lms were printed using a Kodak 2180 X-ray ®lm printer, a 79micron, 12-bit gray-scale printer that writes with a laser diode of 680-nm bandwidth. Images were viewed on hardcopy ®lm on an alternator by judges in a manner that simulates ordinary screening and diagnostic practice as closely as possible, although patient histories and other image modalities were not provided. Two views were provided of each breast (CC and MLO), so four views were seen simultaneously for each patient. Each of the judges viewed all the images in an appropriately randomized order over the course of nine sessions. Two sessions were held every other week, with a week off in between. A clear overlay was provided for the judge to mark on the image without leaving a visible trace. For each image, the judge either indicated that the image was normal, or, if something was detected, had an assistant ®ll out the Observer Form using the American College of Radiology (ACR) Standardized Lexicon by circling the appropriate answers or ®lling in blanks as directed. The Observer Form is given in Figs 11±13 below. The instructions for assistants and radiologists along with suggestions for prompting and a CGI web data entry form may be found at the project Web site http://www-isl.stanford.edu/ ~gray/army.html. The judges used a grease pencil to circle the detected item. The instructions to the judges speci®ed that ellipses drawn around clusters should include all microcalci®cations seen, as if making a recommendation for surgery, and outlines drawn around masses should include the main tumor as if grading for clinical staging, without including the spicules (if any) that extend outward from the mass. This corresponds
#%%
50 Quality Evaluation: Diagnostic Accuracy
ID number _______________ Session number ________ Case number ________ Reader initials______ Mammograms were of ( Right Both ) breast(s). -----------------------------------------------------------------Subjective rating for diagnostic quality: Left CC
(bad) 1±5 (good):
Left MLO
Right CC
If any rating is 5 4 the problem is: 1) sharpness 2) contrast 3) position 5) noise 6) artifact 7) penetration Recommend repeat? Breast Density:
Left
1) almost entirely fat 3) heterogeneously dense Findings:
Right MLO
4) breast compression
Right
2) scattered ®broglandular densities 4) extremely dense
Note: If there are NO ®ndings, the assessment is: !"#$%&%! '%!!
FIGURE 11
Observer form for mammograms: This part is completed for each case.
to what is done in clinical practice except for the requirement that the markings be made on copies. The judges were allowed to use a magnifying glass to examine the ®lms. Although the judging form is not standard, the ACR Lexicon is used to report ®ndings, and hence the judging requires no special training. The reported ®ndings permit subsequent analysis of the quality of an image in the context of its true use, ®nding and describing anomalies and using them to assess and manage patients. To con®rm that each radiologist identi®es and judges a speci®c ®nding, the location of each lesion is con®rmed both on the clear overlay and the judging form. Many of these lesions were judged as ``A'' (assessment incomplete), since it is often the practice of radiologists to obtain additional views in two distinct scenarios: (1) to con®rm or exclude the presence of a ®nding, that is, a ®nding that may or may not represent a true lesion, or (2) to further characterize a true lesion, that is, to say a lesion clearly exists but is incompletely evaluated. The judging form allows for two meanings of the ``A'' code. If the judge believes that the ®nding is a possible lesion, this is indicated by answering ``yes'' to the question ``are you uncertain if the ®nding exists?'' Otherwise, if the lesion is de®nite, the judges should give their best management decision based on the standard two-view mammogram.
The initial question requesting a subjective rating of diagnostic utility on a scale of 1±5 is intended for a separate evaluation of the general subjective opinion of the radiologists of the images. The degree of suspicion registered in the Management portion also provides a subjective rating, but this one is geared towards the strength of the opinion of the reader regarding the cause of the management decision. It is desirable that obviously malignant lesions in a gold standard should also be obviously malignant in the alternative method.
* Statistical Analysis We focus here on patient management, the decisions that are made based on the radiologists' readings of the image [1, 14, 23]. Management is a key issue in digital mammography. There is concern that artifacts could be introduced, leading to an increase in false positives and hence in unnecessary biopsies. The management categories we emphasize are the following four, given in order of increasing seriousness: RTS: incidental, negative, or benign with return to screening F/U: probably benign but requiring 6-month follow-up
#%&
VI
Findings (detection):
!'*!"+, '"+
()!"!
Individual ®nding side:
-+"%"+
Finding type: ./+, *0! )" )" '!"!! '"+'0'"! .'&+"* )" "%''&%"+ *%! "2))%' /%" & /%" *)"
''"!"+ '"%* /!! ".."%! '"+' 3! +! .+")" '++ )"-'%%2 '"+' +2). !* '"+'0* 0/%"*!)" *%)"+-3! '"+' %*&'! )")).+"2 /!! )"
Location: 45 5 45
5
6 6 6 6
&%-+"%"+ !!%-)*"+ &..%-'%"!"+ +1%-!%%
Compression Storage and Communication
!'*!"+, *& Finding # ______ of ______
'+&%* '"+'0'"! )" 1 &%%&!*! '"+' ++ *0!* )" +"%2 *+"* *&' '"+ "2))%' *!2 )&+.+ '"%* "!* ''"!"++2 '+&%* /!! ".."%! '"+' )&+.+ /!! ".."%! )" )+3 '"+'&) + '2 0/%"*!)" #"'&+"% '"+' . /.2 '"% ).+"! %
1+ /%" '!%"+ "7++"%2 "+ %%"%+"%
/ /%"-/+"%"+
View(s) in which ®nding is seen: 88 94 88 "!* 94 Associated ®ndings include: ( p possible, d de®nite) /%" *)" ., * "%''&%"+ *%! ., * 3! %"'! ., * '"+' "'"* 1 )" ., * !..+ %%"'! ., * )&+.+ )+"% )" ., *
3! '3!! ., * *+"* #! ., * +2)."*!."2 ., * "2))%' *!2 ., * %"/'&+"% '3!! ., * !! ., * '"% ., *
C/B: call back for more information, additional assessment needed BX: Immediate biopsy These categories are formed by combining categories from the basic form of Fig. 13: RTS is any study that had assessment 1 or 2, F/U is assessment 3, C/B is assessment indeterminate/incomplete with best guess either unsure it exists, 2 or 3, and BX is assessment indeterminate/incomplete with best guess either 4L, 4M, 4H or 5, or assessment 4L, 4M, 4H or 5. We also consider the binarization of these four categories
into two groups: Normal and Not Normal. But there is controversy as to where the F/U category belongs, so we make its placement optional with either group. The point is to see if lossy compression makes any difference to the fundamental decision made in screening: Does the patient return to ordinary screening as normal, or is there suspicion of a problem and hence the demand for further work? Truth is determined by agreement with a gold standard. The raw results are plotted as a collection of 262 tables, one for each category or group of categories of interest and for each radiologist. As will be discussed, the differences among radiologists prove to be so large an effect that extreme care
835
50 Quality Evaluation: Diagnostic Accuracy
Assessment: 0!*! !*%)!"-!').+, "**!"+ ")! !** :"; . )" 7%" #1 -