The Rational Clinical Examination: Evidence-Based Clinical Dthe Rational Clinical Examination: Evidence-Based Clinical Diagnosis eBook Iagnosis eBook 0071590315, 9780071590310

The ultimate guide to the evidence-based clinical encounter In the tradition of the famous "Users' Guides to

317 108 9MB

English Pages 500 [940] Year 2008

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
Contributors
Foreword
Preface
Acknowledgments
1. A Primer on the Precision and Accuracy of the Clinical Examination
Update: Primer on Precision and Accuracy
2. Does This Patient Have Abdominal Aortic Aneurysm?
Update: Abdominal Aortic Aneurysm
3. Is Listening for Abdominal Bruits Useful in the Evaluation of Hypertension?
Update: Abdominal Bruits
4. Does This Patient Have an Alcohol Problem?
Update: Problem Alcohol Drinking
5. Does This Adult Patient Have Appendicitis?
Update: Appendicitis, Adult
6. Does This Patient Have Ascites?
Update: Ascites
7. What Can the Medical History and Physical Examination Tell Us About Low Back Pain?
Update: Low Back Pain
8. Does This Patient Have Breast Cancer?
Update: Breast Cancer
9. Does This Patient Have a Clinically Important Carotid Bruit?
Update: Carotid Bruit
10. Does This Patient Have Carpal Tunnel Syndrome?
Update: Carpal Tunnel
11. Does This Patient Have Abnormal Central Venous Pressure?
Update: Central Venous Pressure
12. Does This Patient Have Acute Cholecystitis?
Update: Cholecystitis
13. Does the Clinical Examination Predict Airflow Limitation?
Update: Chronic Obstructive Airways Disease
14. Does This Patient Have Clubbing?
Update: Clubbing
15. Is This Patient Taking the Treatment as Prescribed?
Update: Compliance and Medication Adherence
16(A). Can the Clinical Examination Diagnose Left-Sided Heart Failure in Adults?
16(B). Does This Dyspneic Patient in the Emergency Department Have Congestive Heart Failure?
Update: Congestive Heart Failure
17. Is This Patient Dead, Vegetative, or Severely Neurologically Impaired? Assessing Outcome for Comatose Survivors of Cardiac Arrest
Update: Death
18(A). Does This Patient Have Deep Vein Thrombosis? (1998)
18(B). Does This Patient Have Deep Vein Thrombosis? (2006)
Update: Deep Vein Thrombosis
19. Is This Patient Clinically Depressed?
Update: Depression
20. Does This Patient Have a Family History of Cancer?
Update: Family History of Cancer
21. Does This Patient Have a Goiter?
Update: Goiter
22. Does This Patient Have Hepatomegaly?
Update: Hepatomegaly
23. Does This Patient Have Hypertension?
Update: Hypertension
24. Is This Adult Patient Hypovolemic?
Update: Hypovolemia, Adult
25. Is This Child Dehydrated?
Update: Hypovolemia, Child
26. Does This Patient Have Influenza?
Update: Influenza
27. Does This Patient Have a Torn Meniscus or Ligament of the Knee?
Update: Knee Ligaments and Menisci
28. Is This Adult Patient Malnourished?
Update: Malnourishment, Adult
29. Does This Patient Have a Mole or a Melanoma?
Update: Melanoma
30. Does This Adult Patient Have Acute Meningitis?
Update: Meningitis, Adult
31. Is This Woman Perimenopausal?
Update: Menopause
32. Does This Patient Have Aortic Regurgitation?
Update: Murmur, Diastolic
33. Does This Patient Have an Abnormal Systolic Murmur?
Update: Murmur, Systolic
34. Does This Patient Have Myasthenia Gravis?
Update: Myasthenia Gravis
35. Is This Patient Having a Myocardial Infarction?
Update: Myocardial Infarction
36. Does This Woman Have Osteoporosis?
Update: Osteoporosis
37. Does This Child Have Acute Otitis Media?
Update: Otitis Media, Child
38. Does This Patient Have Parkinson Disease?
Update: Parkinsonism
39. Is This Patient Allergic to Penicillin?
Update: Penicillin Allergy
40. Does This Adult Patient Have Community-Acquired Pneumonia?
Update: Community-Acquired Pneumonia, Adult
41. Does This Infant Have Pneumonia?
Update: Pneumonia, Infant and Child
42. Is This Patient Pregnant?
Update: Early Pregnancy
43. Does This Patient Have Pulmonary Embolism?
Update: Pulmonary Embolus
44. Does This Patient Have an Instability of the Shoulder or a Labrum Lesion?
Update: Shoulder Instability
45. Does This Patient Have Sinusitis?
Update: Sinusitis
46. Does This Patient Have Splenomegaly?
Update: Splenomegaly
47. Does This Patient Have Strep Throat?
Update: Streptococcal Pharyngitis
48. Is This Patient Having a Stroke?
Update: Stroke
49. Does This Patient Have Temporal Arteritis?
Update: Temporal Arteritis
50. Does This Patient Have an Acute Thoracic Aortic Dissection?
Update: Thoracic Aortic Dissection
51. Does This Woman Have an Acute Uncomplicated Urinary Tract Infection?
Update: Urinary Tract Infection, Women
52. What Is Causing This Patient's Vaginal Symptoms?
Update: Vaginitis
53. Does This Dizzy Patient Have a Serious Form of Vertigo?
Update: Vertigo
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Recommend Papers

The Rational Clinical Examination: Evidence-Based Clinical Dthe Rational Clinical Examination: Evidence-Based Clinical Diagnosis eBook Iagnosis eBook
 0071590315, 9780071590310

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

The Rational Clinical Examination

NOTICE Medicine is an ever-changing science. As new research and clinical experience broaden our knowledge, changes in treatment and drug therapy are required. The authors and the publisher of this work have checked with sources believed to be reliable in their efforts to provide information that is complete and generally in accord with the standards accepted at the time of publication. However, in view of the possibility of human error or changes in medical sciences, neither the authors nor the publisher nor any other party who has been involved in the preparation or publication of this work warrants that the information contained herein is in every respect accurate or complete, and they disclaim all responsibility for any errors or omissions or for the results obtained from use of the information contained in this work. Readers are encouraged to confirm the information contained herein with other sources. For example and in particular, readers are advised to check the product information sheet included in the package of each drug they plan to administer to be certain that the information contained in this work is accurate and that changes have not been made in the recommended dose or in the contraindications for administration. This recommendation is of particular importance in connection with new or infrequently used drugs.

The Rational Clinical Examination Evidence-Based Clinical Diagnosis

Editors David L. Simel, MD, MHS Chief, Medicine Service Durham Veterans Affairs Medical Center Professor of Medicine Duke University School of Medicine Durham, North Carolina

Drummond Rennie, MD JAMA Chicago, Illinois Philip R. Lee Institute for Health Policy Studies University of California, San Francisco San Francisco, California

Editor, Education Guides Sheri A. Keitz, MD, PhD Chief, Medicine Service Miami Veterans Affairs Medical Center Associate Dean for Faculty Diversity and Development University of Miami Miller School of Medicine Miami, Florida

New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto

Copyright © 2009 by the American Medical Association. All rights reserved. Manufactured in the United States of America. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher. 0-07-159031-5 The material in this eBook also appears in the print version of this title: 0-07-159030-7. All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps. McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. For more information, please contact George Hoare, Special Sales, at [email protected] or (212) 904-4069. TERMS OF USE This is a copyrighted work and The McGraw-Hill Companies, Inc. (“McGraw-Hill”) and its licensors reserve all rights in and to the work. Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGrawHill’s prior consent. You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohibited. Your right to use the work may be terminated if you fail to comply with these terms. THE WORK IS PROVIDED “AS IS.” McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting therefrom. McGraw-Hill has no responsibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise. DOI: 10.1036/0071590307

Professional

Want to learn more? We hope you enjoy this McGraw-Hill eBook! If you’d like more information about this book, its author, or related books and websites, please click here.

For more information about this title, click here

CONTENTS Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xvii Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 1. A Primer on the Precision and Accuracy of the Clinical Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 UPDATE: Primer on Precision and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2. Does This Patient Have Abdominal Aortic Aneurysm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 UPDATE: Abdominal Aortic Aneurysm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3. Is Listening for Abdominal Bruits Useful in the Evaluation of Hypertension? . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 UPDATE: Abdominal Bruits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4. Does This Patient Have an Alcohol Problem? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 UPDATE: Problem Alcohol Drinking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5. Does This Adult Patient Have Appendicitis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 UPDATE: Appendicitis, Adult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6. Does This Patient Have Ascites? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 UPDATE: Ascites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7. What Can the Medical History and Physical Examination Tell Us About Low Back Pain? . . . . . . . . . . . . . . . . . . 75 UPDATE: Low Back Pain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 8. Does This Patient Have Breast Cancer? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 UPDATE: Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 9. Does This Patient Have a Clinically Important Carotid Bruit? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 UPDATE: Carotid Bruit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10. Does This Patient Have Carpal Tunnel Syndrome? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 UPDATE: Carpal Tunnel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 11. Does This Patient Have Abnormal Central Venous Pressure?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 UPDATE: Central Venous Pressure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 12. Does This Patient Have Acute Cholecystitis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 UPDATE: Cholecystitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 13. Does the Clinical Examination Predict Airflow Limitation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 UPDATE: Chronic Obstructive Airways Disease. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 14. Does This Patient Have Clubbing?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 UPDATE: Clubbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 15. Is This Patient Taking the Treatment as Prescribed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 UPDATE: Compliance and Medication Adherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 16(A). Can the Clinical Examination Diagnose Left-Sided Heart Failure in Adults?. . . . . . . . . . . . . . . . . . . . . . . . . . . 183 v

Contents

16(B). Does This Dyspneic Patient in the Emergency Department Have Congestive Heart Failure? . . . . . . . . . . . . . . .195 UPDATE: Congestive Heart Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .209 17. Is This Patient Dead, Vegetative, or Severely Neurologically Impaired? Assessing Outcome for Comatose Survivors of Cardiac Arrest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215 UPDATE: Death. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .225 18(A). Does This Patient Have Deep Vein Thrombosis? (1998) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .227 18(B). Does This Patient Have Deep Vein Thrombosis? (2006) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .235 UPDATE: Deep Vein Thrombosis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245 19. Is This Patient Clinically Depressed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .247 UPDATE: Depression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .259 20. Does This Patient Have a Family History of Cancer? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265 UPDATE: Family History of Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .275 21. Does This Patient Have a Goiter? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .277 UPDATE: Goiter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .285 22. Does This Patient Have Hepatomegaly? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .289 UPDATE: Hepatomegaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .299 23. Does This Patient Have Hypertension? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .301 UPDATE: Hypertension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .311 24. Is This Adult Patient Hypovolemic? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .315 UPDATE: Hypovolemia, Adult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .325 25. Is This Child Dehydrated? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .329 UPDATE: Hypovolemia, Child . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .339 26. Does This Patient Have Influenza? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .343 UPDATE: Influenza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .355 27. Does This Patient Have a Torn Meniscus or Ligament of the Knee? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .357 UPDATE: Knee Ligaments and Menisci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .369 28. Is This Adult Patient Malnourished? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .371 UPDATE: Malnourishment, Adult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .379 29. Does This Patient Have a Mole or a Melanoma? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .383 UPDATE: Melanoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .391 30. Does This Adult Patient Have Acute Meningitis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395 UPDATE: Meningitis, Adult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .403 31. Is This Woman Perimenopausal? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .407 UPDATE: Menopause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .415 32. Does This Patient Have Aortic Regurgitation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .419 UPDATE: Murmur, Diastolic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .429 33. Does This Patient Have an Abnormal Systolic Murmur? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .433 UPDATE: Murmur, Systolic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .443 34. Does This Patient Have Myasthenia Gravis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .449 UPDATE: Myasthenia Gravis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .459 vi

Contents

35. Is This Patient Having a Myocardial Infarction?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 UPDATE: Myocardial Infarction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 36. Does This Woman Have Osteoporosis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 UPDATE: Osteoporosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489 37. Does This Child Have Acute Otitis Media? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 UPDATE: Otitis Media, Child . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 38. Does This Patient Have Parkinson Disease?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 UPDATE: Parkinsonism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 39. Is This Patient Allergic to Penicillin? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 UPDATE: Penicillin Allergy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 40. Does This Adult Patient Have Community-Acquired Pneumonia? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 UPDATE: Community-Acquired Pneumonia, Adult. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 41. Does This Infant Have Pneumonia? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 UPDATE: Pneumonia, Infant and Child . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 42. Is This Patient Pregnant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 UPDATE: Early Pregnancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 43. Does This Patient Have Pulmonary Embolism?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 UPDATE: Pulmonary Embolus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 44. Does This Patient Have an Instability of the Shoulder or a Labrum Lesion? . . . . . . . . . . . . . . . . . . . . . . . . . . 577 UPDATE: Shoulder Instability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 45. Does This Patient Have Sinusitis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 UPDATE: Sinusitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 46. Does This Patient Have Splenomegaly? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 UPDATE: Splenomegaly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611 47. Does This Patient Have Strep Throat? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 UPDATE: Streptococcal Pharyngitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 48. Is This Patient Having a Stroke? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 UPDATE: Stroke . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 49. Does This Patient Have Temporal Arteritis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 UPDATE: Temporal Arteritis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653 50. Does This Patient Have an Acute Thoracic Aortic Dissection? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 UPDATE: Thoracic Aortic Dissection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671 51. Does This Woman Have an Acute Uncomplicated Urinary Tract Infection? . . . . . . . . . . . . . . . . . . . . . . . . . . . 675 UPDATE: Urinary Tract Infection, Women . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687 52. What Is Causing This Patient’s Vaginal Symptoms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691 UPDATE: Vaginitis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 53. Does This Dizzy Patient Have a Serious Form of Vertigo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 UPDATE: Vertigo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 vii

JAMAevidence: Using Evidence to Improve Care Founded around the Users’ Guides to the Medical Literature and The Rational Clinical Examination: Evidence-Based Clinical Diagnosis, JAMAevidence offers an invaluable online resource for learning, teaching, and practicing evidence-based medicine (EBM). Updated regularly, the site includes fully searchable content of the Users’ Guides to the Medical Literature and The Rational Clinical Examination and features podcasts from the leading minds in EBM, interactive worksheets, question wizards, functional calculators, and a comprehensive collection of PowerPoint slides for educators and students. www.JAMAevidence.com Please visit www.JAMAevidence.com for subscription information.

viii

CONTRIBUTORS John Attia, MD, MSc, PhD

Hayden Bosworth, PhD

Academic Director of General Medicine University of Newcastle John Hunter Hospital New Lambton, New South Wales, Australia Meningitis, adult

Research Professor Duke University Durham Veterans Affairs Medical Center Durham, North Carolina, USA Compliance and medication adherence

Robert G. Badgett, MD

David Cescon, MD

Clinical Professor of Medicine University of Texas, San Antonio San Antonio, Texas, USA Congestive heart failure

Resident in Medicine University of Toronto Toronto, Ontario, Canada Murmur, diastolic; Murmur, systolic

Allan N. Barkun, MD, MSc

Sanjeev D. Chunilal, MB, ChB

Professor of Medicine and Epidemiology, Biostatistics and Occupational Health McGill University Montreal General Hospital Montreal, Quebec, Canada Splenomegaly

North Shore Hospital Takapuna, Auckland, New Zealand Pulmonary embolus

Mary B. Barton, MD Scientific Director, US Preventive Services Task Force Agency for Healthcare Research and Quality Washington, DC, USA Breast cancer

Lori A. Bastian, MD, MPH Associate Professor of Medicine Duke University Durham Veterans Affairs Medical Center Durham, North Carolina, USA Early pregnancy

David Bates, MD Professor of Medicine Harvard University Brigham and Women’s Hospital Boston, Massachusetts, USA Knee ligaments and menisci

Stephen Bent, MD Assistant Adjunct Professor University of California, San Francisco San Francisco Veterans Affairs Medical Center San Francisco, California, USA Urinary tract infection, women

Christopher M. Booth, MD Medical Oncologist and Research Fellow Queen’s University Kingston, Ontario, Canada Death

Cathleen Cólon-Emeric, MD Assistant Professor of Medicine Duke University Durham, North Carolina, USA Osteoporosis

Richard A. Deyo, MD Professor of Medicine University of Washington, Seattle Harborview Medical Center Seattle, Washington, USA Low back pain

Edward Etchells, MD Associate Professor University of Toronto Sunnybrook and Women’s College Health Sciences Centre Toronto, Ontario, Canada Murmur, diastolic; Murmur, systolic

Jeff Ginsberg, MD Professor of Medicine Faculty of Health Sciences McMaster University Hamilton, Ontario, Canada Pulmonary embolus

James Grichnik, MD, PhD Associate Professor of Medicine Duke University Durham, North Carolina, USA Melanoma

Copyright © 2009 by the American Medical Association. Click here for terms of use.

ix

Contributors

Steven A. Grover, MD, MPA

Kathryn Myers, MD

Professor of Medicine and Epidemiology and Biostatistics McGill University Montreal General Hospital Montreal, Quebec, Canada Splenomegaly

University of Western Ontario St. Joseph’s Health Care London, Ontario, Canada Clubbing

Daniel A. Ostrovsky, MD Rose Hatala, MD Clinical Associate Professor University of British Columbia Vancouver, British Columbia, Canada Meningitis, adult

Catherine P. Kaminetzky, MD, MPH Associate in Medicine Duke University Durham Veterans Affairs Medical Center Durham, North Carolina, USA Shoulder instability

Jeffrey N. Katz, MD Associate Professor of Medicine Harvard University Brigham and Women’s Hospital Boston, Massachusetts, USA Knee ligaments and menisci

Sheri A. Keitz, MD, PhD Chief, Medicine Service Miami Veterans Affairs Medical Center Associate Dean for Faculty Diversity and Development University of Miami Miller School of Medicine Miami, Florida, USA The Rational Clinical Examination Education Guides

Michael Klompas, MD Research Fellow in Medicine Harvard University Massachusetts General Hospital and Brigham and Women’s Hospital Boston, Massachusetts, USA Thoracic aortic dissection

Frank A. Lederle, MD Professor of Medicine Minneapolis Veterans Affairs Medical Center Minneapolis, Minnesota, USA Abdominal aortic aneurysm

Catherine R. Lucey, MD Clinical Professor of Medicine Ohio State University Columbus, Ohio, USA Congestive heart failure

x

Clinical Associate in Pediatrics Duke University Durham, North Carolina, USA Pneumonia, infant and child

Joanne T. Piscitelli, MD Associate Clinical Professor of Obstetrics and Gynecology Duke University Durham, North Carolina, USA Vaginitis

James Rainville, MD Assistant Clinical Professor of Physical Medicine and Rehabilitation Harvard Medical School New England Baptist Hospital Boston, Massachusetts, USA Low back pain

Goutham Rao, MD Associate Professor, Pediatrics University of Pittsburgh School of Medicine Pittsburgh, Pennsylvania, USA Parkinsonism

Drummond Rennie, MD JAMA Chicago, Illinois Philip R. Lee Institute for Health Policy Studies University of California, San Francisco San Francisco, California, USA Editor, The Rational Clinical Examination

Russell Rothman, MD Assistant Professor, Internal Medicine and Pediatrics Vanderbilt University Nashville, Tennessee, USA Otitis media, child

Jonathan L. Schaffer, MD, MBA Advanced Operative Technology Group Orthopaedic Research Center Cleveland Clinic Cleveland, Ohio, USA Knee ligaments and menisci

Katalin Scherer, MD Assistant Professor of Neurology University of Arizona Tucson, Arizona, USA Myasthenia gravis

Contributors

Robert H. Shmerling, MD

Michael J. Steiner, MD

Associate Professor of Medicine Harvard Medical School Beth Israel Deaconess Medical Center Boston, Massachusetts, USA Temporal arteritis

Assistant Professor of Pediatrics and Medicine University of North Carolina, Chapel Hill Chapel Hill, North Carolina, USA Hypovolemia, child

Ben Stern, MS, DPT Kaveh G. Shojania, MD Assistant Professor of Medicine University of Ottawa Ottawa, Ontario, Canada Cholecystitis

CEO and Director of Development Integrated Golf Schools Scottsdale, Arizona, USA Low back pain

Robert Trowbridge, MD David L. Simel, MD, MHS Chief, Medicine Service Durham Veterans Affairs Medical Center Professor of Medicine Duke University School of Medicine Durham, North Carolina, USA Editor, The Rational Clinical Examination Abdominal bruits; Ascites; Carotid bruit; Carpal tunnel; Central venous pressure; Chronic obstructive airways disease; Community-acquired pneumonia, adult; Deep vein thrombosis; Family history of cancer; Goiter; Hepatomegaly; Hypertension; Hypovolemia, adult; Influenza; Malnourishment, adult; Melanoma; Myocardial infarction; Otitis media, child; Penicillin allergy; Primer on precision and accuracy; Problem alcohol drinking; Sinusitis; Streptococcal pharyngitis; Stroke; Vaginitis; Vertigo

Gerald W. Smetana, MD Associate Professor of Medicine Harvard Medical School Beth Israel Deaconess Medical Center Boston, Massachusetts, USA Temporal arteritis

Daniel H. Solomon, MD, MPH Associate Professor of Medicine Harvard Medical School Brigham and Women’s Hospital Boston, Massachusetts, USA Knee ligaments and menisci

Assistant Professor of Medicine University of Vermont College of Medicine Maine Medical Center Portland, Maine, USA Cholecystitis

James Wagner, MD Associate Professor University of Texas, Southwestern Medical Center Dallas, Texas, USA Appendicitis, adult

John W. Williams, Jr, MD, MHS Professor of Medicine Duke University Durham Veterans Affairs Medical Center Durham, North Carolina, USA Depression; Sinusitis

Jeffrey G. Wong, MD Professor of Medicine Associate Dean for Medical Education Medical University of South Carolina Charleston, South Carolina, USA Meningitis, adult

xi

This page intentionally left blank

FOREWORD I remember my introduction to the medical history and clinical examination as the most exciting moments of my early career. As each item in the history and physical examination was explained and given meaning and significance, I believed that after the long preclinical years I had at last reached the threshold of becoming a physician. I could begin to hold more than a comforting conversation with a patient. I could use my ears, eyes, and hands to disclose the patient’s problem and so begin to be of actual use to a real patient. As I polished my skills, it did not occur to me that the divination of all those signs and symptoms was anything but an art: the epitome of the art of medicine. But, with time, I realized that many of the so-called pathognomonic symptoms and signs were so merely because someone, often the person whose name was attached to them, had declared that they were. Doubt started to overtake accepted wisdom as it became clear to me that little worthwhile evidence supported the artist’s tools I thought I had mastered. Towards the end of the 1980s, my friend David Sackett, then chief of medicine and clinical epidemiology and biostatistics at McMaster University, showed me a new way of thinking about all this. He equated items in the history and the physical examination with traditional diagnostic laboratory tests, each susceptible to evidentiary testing. So he and I began planning 2 series of articles on evidence-based medicine to appear in JAMA. One of these, the Users’ Guides to the Medical Literature, was soon placed into the capable hands of Gordon Guyatt, also of McMaster University, and articles began to appear in JAMA in 1993. By 2002, they were printed in updated form in 2 books, an Essentials and a fuller Manual,1,2 both of which have been so successful that second editions3,4 have just been published. The other series consisted of The Rational Clinical Examination articles and started appearing in 1992. With the first article, Sackett and I published an editorial.5 We reminded our readers of studies that showed that primary care providers usually establish the correct diagnosis at the end of a brief history and some subroutine of the physical examination. So on practical grounds alone, it made sense to improve our understanding of the parts of the history and examination that were useful, or useless, in pinning down, usually at an early stage of the disease, one diagnosis and ruling out others. We contrasted symptoms and signs with laboratory tests, which were subjected to rigorous testing before adoption, but which might have far less ability to narrow the diagnostic possibilities. As an example, we observed the overwhelming probability of coronary stenosis in a 65-year-old man who has smoked all his life when he tells you that he gets central chest tightness regularly on exertion, which forces him to stop and which disappears when he rests.6,7 Perhaps most important, by encouraging research into the history and physical examination, we wanted to restore

respectability to a part of medicine that seemed to have been eroding as academic and financial rewards went to those who most resembled scientists relying on expensive diagnostic tests and least behaved as physicians relating to patients. It is no coincidence that both Sackett and I, authors of the editorial launching the series, have served roles in the Cochrane Collaboration, an initiative that has had a massive effect on the way we see evidence and a profound influence on the methods and popularity of systematic review and metaanalysis. These sciences, as well as that of decision making, had grown up and spread to medicine during the 1970s and 1980s. Without them, both the Cochrane Collaboration and The Rational Clinical Examination series would have been impossible undertakings; indeed, the entire evidence-based movement would have grown far more slowly. At the same time, because of the unfamiliarity of these techniques and the revolutionary approach we were taking, namely, a scientific examination of what most clinicians considered to be an ineffable art not susceptible to dissection, we published a primer on the precision and accuracy of the clinical examination. This laid out the approach to be taken and took the reader through the terms, methods, and calculations underpinning clinical diagnosis.8 Although each article’s purpose could be worked out from its title, the full meaning of the concepts took time to sink in, as I discovered from comments sent in by many of the expert specialty peer reviewers to whom I sent the manuscripts as they came in to JAMA. Indeed, it was unfamiliar even to some prospective authors. David Sackett had a firm belief that the reviews would be done best by generalist physicians who had learned basic critical appraisal skills. As the editor, I learned that these generalist physicians were often speaking a different language from our specialist reviewers. Sackett was clearly correct, and it remains commonplace for specialty reviewers to ask that specialists be added to the writing team because, well, they are specialists. What has happened in our process is that both authors and reviewers learn from the editorial review process, with specialty reviewers ensuring that authors interpret the data in the proper context. In return, the specialists often learn that much of what they took for granted has no basis in evidence. The Rational Clinical Examination book should not replace books on clinical diagnosis. But, somewhat as the Cochrane Database of Systematic Reviews provides a systematic evaluation of all studies on a particular intervention without becoming prescriptive, so articles in The Rational Clinical Examination series are careful systematic efforts to assess the accuracy of items from the patient’s medical history and the clinical examination. In this sense, they are a revolutionary departure from what we have regarded as books on physical diagnosis, which, until the first articles in The Rational Clinical Examination series appeared,

Copyright © 2009 by the American Medical Association. Click here for terms of use.

xiii

Foreword had never taken that approach. Since then, however, such books have already started using the evidence as summarized in articles in the series. In his preface to the eighth edition of DeGowin’s Diagnostic Examination, Richard LeBlond writes: References to articles from the medical literature are included in the body of the text. We have chosen articles which provide useful clinical information including excellent descriptions of disease and syndromes and, in some cases, photographs illustrating key findings. Evidencebased articles on the utility of the physical exam are included, mostly from The Rational Clinical Examination series published over the last decade in the Journal of the American Medical Association. They are included with the caveat that they evaluate the physical exam as a hypothesis-testing tool, not as a hypothesis generating task. …9 Our series is indeed about testing tests (symptoms, signs) to separate the useful from the useless and so is about testing hypotheses. Books on physical diagnosis are hypothesis generating in that they are a compendium of instructions on how to elicit all symptoms and signs, typically presented in the absence of any certain disease consideration or context, typically organized by organ system (eg, “the cardiovascular examination”). In contrast, our articles are usually organized by a certain condition (eg, “Does this patient have systolic dysfunction?”). And, although there are a few articles in which the authors take a more hypothesis-generating tack (eg, those on splenomegaly and hepatomegaly), we always frame them in a clinical context. An issue all along has been whether, and how much, to integrate the evidence on symptoms and signs with that provided by diagnostic tests. In general, we have had so much material to deal with, and there are so many good texts on diagnostic tests, that we have limited our approach as much as common sense would allow. Some articles do include assessments of a few basic laboratory and radiologic studies that are commonly available to the clinician and that can be interpreted only by the physician in the clinical context (eg, the sedimentation rate for temporal arthritis or vascular congestion on a chest radiograph for systolic dysfunction). Recently, we expanded the series to include “rational clinical procedures,” because many procedures are actually part of the clinical examination and tightly linked to the presence of the history and physical examination findings.10 David Simel of Duke University had been immediately excited by the concept and was a coauthor of the first article in the series, “Does This Patient Have Ascites? How to Divine Fluid in the Abdomen.”11 At that time, 1992, Simel made it clear that he intended to devote his research career to investigating this crucial area of medicine, and soon after he took over as primary editor of the series. Since then, he has stimulated large numbers of authors to complete these systematic reviews. His personal involvement with authors has brought us many more articles than we could otherwise have expected and ensured a uniform presentation. He also made certain that every manuscript had been through review before submission xiv

to JAMA, where I put each manuscript through rigorous external peer review, just as with all original submissions to JAMA. Each review is a considerable undertaking, often requiring more than a year of unpaid and often unappreciated work, which explains why it has taken 15 years to produce what is now more than 70 articles in JAMA. As news of the series spread, volunteer authors suggested their own topics of interest. The appearance of fully fledged review articles depended on the skills and persistence of the authors and on the persuasive powers and analytic assistance of David Simel. Even then, more than a fifth of the proposed topics failed to result in publishable manuscripts, usually because the authors found insufficient evidence. It is for that reason that Simel and I published in 1995 a plea for support for a wide research agenda and the formation of collaborations to ensure that the wide gaps in our knowledge were filled.12 With the publication of this book, Simel has updated the first 51 published articles either alone or with the original authors. In addition, he has updated the primer8—essential for all readers of this book. David Simel’s contributions to this series, and the transformation he has wrought in how we think about the clinical examination, have been immense, and working with him has been a privilege and a delight. This is the first book in The Rational Clinical Examination series. Our plan is to keep soliciting and publishing in JAMA articles on fresh Rational Clinical Examination topics. We welcome volunteers with good ideas who are prepared to undertake the work. We will accumulate these articles, keeping them current with updates, and publish them as new chapters online and in succeeding editions of The Rational Clinical Examination book. The Rational Clinical Examination will be published online with a set of teaching/learning slides for each chapter and will be integrated with the Users’ Guides to the Medical Literature and other online-only content and features in an extensive evidence-based medicine Web site called, JAMAevidence (http://www.JAMAevidence.com). David Simel and I welcome Sheri Keitz (recently of the Durham Veterans Affairs Medical Center and Duke University, who has now moved to the University of Miami) as editor of The Rational Clinical Examination Education Guides. Sheri has many talents, including a fine critical eye. She has prepared or supervised development of all the teaching slides, and she has reviewed most of the Updates to the original manuscripts. The series started with the encouragement of George Lundberg, then editor-in-chief of JAMA and the Archives journals. His successor, Cathy DeAngelis, has consistently and very strongly supported us, helping negotiate the complex path to publication. Annette Flanagin has been a tireless worker in this, as in so many other JAMA causes. This book would not have been possible without her. We are grateful to Barry Bowlus for directing the publishing of this book and to Richard Newman for his advice and support. We are also grateful for the expertise of Jim Shanahan, Robert Pancotti, Helen Parr, and others at McGraw-Hill, as well as Peter Compitello at NewGen, and Holly Auten and her colleagues at Silverchair.

Foreword Publishing, like medicine, moves forward. During the last few years, the illustrations in JAMA have come under the care of Ronna Siegel and 2 medical illustrators, Cassio Lynm and Alison Burke. The series articles have benefited from their extraordinary skills, and improvements continue with the introduction of video images, as well as teaching clips. We also thank Cara Wallace and Angela Grayson for their expert editing and support. The response to the articles published in JAMA tells us that this book will be useful. We also hope that readers will be stimulated to conduct research on aspects of the clinical examination. Perhaps readers will contact us if they believe they can undertake the sort of review that could constitute future articles in JAMA and chapters in the next book. —–Drummond Rennie, MD, FRCP, MACP

REFERENCES 1. Guyatt G, Rennie D, eds. Users’ Guides to the Medical Literature: A Manual for Evidence-Based Practice. Chicago, IL: AMA Press; 2001.

2. Guyatt G, Rennie D, eds. Users’ Guides to the Medical Literature: Essentials of Evidence-Based Practice. Chicago, IL: AMA Press; 2001. 3. Guyatt G, Rennie D, Meade MO, Cook DJ, eds. Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. 2nd ed. New York, NY: McGraw-Hill; 2008. 4. Guyatt G, Rennie D, Meade MO, Cook DJ, eds. Users’ Guides to the Medical Literature: Essentials of Evidence-Based Clinical Practice. 2nd ed. New York, NY: McGraw-Hill; 2008. 5. Sackett DL, Rennie D. The science of the art of the clinical examination. JAMA. 1992;267(19):2650-2652. 6. Diamond GA, Forrester JS. Analysis of probability as an aid in the clinical diagnosis of coronary artery disease. N Engl J Med. 1979;300(24):1350-1358. 7. Sox HC, Hickam DH, Marton KI, et al. Using the patient’s history to estimate the probability of coronary artery disease: a comparison of primary and referral practices. Am J Med. 1990;89(1):7-14. 8. Sackett DL. The rational clinical examination: a primer on the precision and accuracy of the clinical examination. JAMA. 1992;267(19):2638-2644. 9. LeBlond RF. Preface. In: LeBlond RF, DeGowin RL, Brown DD, eds. DeGowin’s Diagnostic Examination. 8th ed. New York, NY: McGraw-Hill; 2004. 10. Straus SE, Thorpe KE, Holroyd-Leduc J. How do I perform a lumbar puncture and analyze the results to diagnose bacterial meningitis? JAMA. 2006;296(16):2012-2022. 11. Williams JW Jr, Simel DL. The rational clinical examination: does this patient have ascites? how to divine fluid in the abdomen. JAMA. 1992;267(19):2645-2648. 12. Simel DL, Rennie D. The clinical examination: an agenda to make it more rational. JAMA. 1997;277(7):572-574.

xv

This page intentionally left blank

PREFACE I’ve never met a medical student who lacked passion for making a diagnosis. And, among all the diagnoses a student might make, clinching the case right at the bedside is the most treasured. The same holds true not only for physicians in practice but also for all those involved in caring for patients—physician assistants, nurses, and physical therapists must each constantly assess their patient and consider what’s wrong. The Rational Clinical Examination series, published in JAMA since 1992 and collected in this book, should appeal to anyone who wonders about the meaning of a patient’s symptoms and signs. Many indispensable textbooks instruct learners on “how” to elicit the medical history and perform the physical examination, but we suspect that, once the “how” is learned, clinicians only infrequently return to what was one of their favored textbooks during their training years. When I ask clinicians to recall the book they used for physical diagnosis class in medical school, there is no pause before they state DeGowin and DeGowin, Bates, Mosby, Schwartz, or another of a select few. We see The Rational Clinical Examination as an essential companion to, and not a replacement for, these time-honored texts of the “complete” medical history and physical examination. Although standard textbooks might clearly describe several maneuvers for detecting ascites, for example, we identify those findings that work best. Although textbooks typically march from “head to toe” without regard to diagnoses when describing the complete physical examination, we start with clinical diagnostic questions and provide data that identify the most relevant symptoms and signs. Unlike physical examination textbooks, we also provide data on what does not work, derived from a thorough review of the literature that backs up our recommendations. Please recognize that we can never replace a great textbook on the complete medical history and physical examination because we will never be complete in describing the rational clinical examination. There are many diagnoses we have not yet reviewed and many more to come. After more than 15 years of producing systematic reviews in JAMA, which included the article that launched the evidence-based medicine movement,1 it was time for us to update and combine our work in one resource for learners and clinicians to enjoy. Accordingly, this book is evidence based. We present the original Rational Clinical Examination article, followed by an Update. For each topic, we recreated the original literature search and evaluated the new literature dating from 1 year before the publication of the original article to the time we prepared the Update. If anything, we tried to be even more restrictive in applying our quality measures for including new research in the Updates. The Updates follow a format similar to that of the original articles: they open with a clinical scenario, present the results of the literature search, and summarize new information. Sometimes we discovered that we had

not reviewed the topic as thoroughly as we thought, so we also recount any improvements we made when we reanalyzed data. Simple tables display the new findings that we incorporate with the previously published data. Because evidence-based guidelines for most diseases did not exist when we launched The Rational Clinical Examination series, we review the recommendations of the major federal agencies for each of the topics and highlight how our information supports or differs from those recommendations. Finally, we include a Make the Diagnosis section that gives a summary of the prior probability of the target disorder, the population for whom the target disorder should be considered, a table of likelihood ratio data for the best clinical findings, and a list of the accepted reference standards. Some readers will want more data, so we provide a structured review of every article identified in our Update that met our inclusion criteria. These reviews are available online in an Evidence to Support the Update section, available at http://www. JAMAevidence.com. JAMAevidence is a Web site resource for learning, teaching, and practicing evidence-based medicine that includes the complete online content of The Rational Clinical Examination and the Users’ Guides to the Medical Literature, along with other features, such as downloadable projection slides to enhance classroom or conference teaching and learning experience, an extensive evidence-based medicine glossary, functional calculators, question wizards, customizable worksheets, podcasts, and regular updates. We hope that long-time readers of The Rational Clinical Examination series will recognize the painstaking care and preparation taken during the review of each topic. Every Update was reviewed by an author of the original article or a clinician who had no involvement with the original publication. Although this alone might seem reassuring and unlike typical medical textbooks, we went a step further. For each topic, a slide presentation, called an Education Guide, has been prepared, primarily by Duke University Department of Medicine residents, or in a few cases by young clinical Duke University faculty members, all supervised by Sheri A. Keitz, MD, PhD. The Education Guides follow a similar format and have been “field-tested” among learners. The goal in preparing the Education Guides was to have the learners create a set of materials for their instructors that match how they, the learners, hope the topic would be taught. Just like the Updates themselves, the slides have also been reviewed. From this, we learned that trainees are among our most critical readers—they expect careful, accurate, and thoughtful presentation and exposition. The Education Guides slides are available online at http://www.JAMAevidence.com. For current students, The Rational Clinical Examination demonstrates the correct way to learn the medical history and physical examination, giving direction in interpreting

Copyright © 2009 by the American Medical Association. Click here for terms of use.

xvii

Preface the results and answering questions that typical physical examination textbooks do not systematically address. For teachers, the Education Guides, amply supplemented with teacher’s notes, allow you to teach physical diagnosis with an evidence-based approach. For established practitioners, perhaps far removed from their introductory physical examination course, we hope to challenge any cynicism that clinical examination is all “art.” There is a science behind the art of clinical examination. We hope you discover that learning this

xviii

science not only validates your role as a clinician and improves your skills but also is fun. —–David L. Simel, MD, MHS

REFERENCE 1. Evidence-Based Medicine Working Group. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA. 1992;268(17): 2420-2425.

ACKNOWLEDGMENTS To all those who supported me at the Durham Veterans Affairs Medical Center and Duke University, and to the many learners, trainees, and colleagues around the world who have encouraged this work and contributed to it, I give my thanks. Jack Feussner and Drummond Rennie have been my most important career mentors by allowing me to dabble in this seldom funded, but purely pleasurable, line of investigative work on the clinical examination. Sheri Keitz and her mentorship of the Duke University Medical house staff in preparing the Education Guides that accompany the book have kept the writing very honest, elevating everything to a higher standard than I could have accomplished alone. Barry Bowlus, Annette Flanagin, and Cathy DeAngelis took charge of all the arrangements with our publisher, freeing me to concentrate on content because there are no shortcuts when the imprimatur of JAMA is attached to The Rational Clinical Examination. Cara Wallace and Angela Grayson helped with the editing, and Cassio Lynm, Alison Burke, and Ronna Siegel have provided us with illustrations that convey more information than words sometimes allow. Pete Compitello from NewGen, Helen Parr, Jim Shanahan, Robert Pancotti, and others from McGraw-Hill, and Holly Auten and her colleagues at Silverchair have taken what we put on paper and transformed the information to print and online formats in a way that met and then surpassed our vision. Kenneth Goldberg and Rob Minton covered all my clinical and hospital tasks when I took time off, and David Matchar and Eugene Oddone made possible a 6-month escape from work. I especially thank Joanne, Lauren, Michael, and Brian for their love and for their continued forgiveness for the many times I am physically present while working on The Rational Clinical Examination though my mind is elsewhere. I owe Joanne a lot of Sunday morning bike rides.

Having never worked on a book before, I had no way of knowing how it would affect my life or the lives of those around me. I was deeply embedded in editing for The Rational Clinical Examination when my 11-year-old son entered our study at home and said, “Mom, I think you need to take a break. Look,” he declared, pointing to my hands; I was shaking out the cramps after hours at the keyboard. “Isn’t that flick sign?” My poor children had spent so much time during the past year watching me display signs as I worked on each chapter that one of them could correctly identify flick sign. I responded, “See! Everyone flicks their wrists, whether or not they have carpal tunnel syndrome!” I am particularly indebted to my family, who supported my work on many nights, weekends, and holidays in order to complete my portions of the book. I am also thankful for the great number of people who have been critical to the success of the project at the Durham Veterans Affairs Medical Center and Duke University. Beth Weast, Sarah Williams, and Phillis Scott were a tremendous help in holding things together at work and helping me find time to move the book forward. My personal and professional mentors, Gene Oddone and Dave Simel, helped me develop my passion for including best evidence in my clinical practice and teaching. I want to give special thanks to Dave. As he has done for so many, he created this opportunity for me and provided steady support, guidance, and tolerance as I made my way through each chapter. Finally, this work was only possible through collaboration with many Duke house staff and faculty members whose constant drive for understanding improved the quality of our work and elevated our standards to the highest level. –—Sheri A. Keitz, MD, PhD

—–David L. Simel, MD, MHS

Copyright © 2009 by the American Medical Association. Click here for terms of use.

xix

This page intentionally left blank

C H A P T E R

1

A Primer on the Precision and Accuracy of the Clinical Examination David L. Sackett, MD, MSc Epid, FRCPC

This background article will introduce and explain the terms and concepts that are being used in the series of overviews on the rational clinical examination that begins in this issue of THE JOURNAL. It includes definitions and explanations of certain key concepts, clinical examples, guides for reading clinical journals about a diagnostic test, and a blank “working table” that you can use to apply the concepts on your own. Background articles in this series will discuss selected issues in the precision and accuracy of the clinical examination in greater detail or extend them to more complex diagnostic situations. Some of these issues are also discussed in clinical epidemiology textbooks.1 Of course, the precision and accuracy of the clinical examination are not the only concerns in the clinical encounter, and their proper application provides only the starting point for decisions about how certain we need to be about a diagnosis before we act on it (the decision threshold) and how we ought to incorporate the concerns of both patients and society in deciding whether and how to act. Later background articles will discuss these additional considerations; this one will be confined to precision and accuracy. Like others in the series, this background article will be introduced with a patient.

THE PATIENT One of your patients, whom you have not seen for several years, is admitted to the orthopedic service after a packing crate has tipped over onto his leg, producing an unstable fracture of his distal tibia and fibula. You stop by to see him as he is being prepared for surgery. He is alert and hemodynamically stable but smells of alcohol (at 10 AM) and has 3 spider nevi on his upper chest (but no gynecomastia or asterixis). He is obese, and his belly is prominent. Among the questions that are raised in your mind, the following are of special significance: 1. Is this man an alcoholic? You would place the odds for this disorder at 50-50 (and the science of the art of how clinicians generate these odds will be the subject of a later background article). The answer to this diagnostic question is important in the long run and in protecting him from the complications of acute withdrawal during and after his operation. 2. Does he have ascites? You are much less sure here, but if he is alcohol dependent you would place the odds that the prominence of his belly represents ascites also at 50-50. Again, it would be important to know whether he has this manifestation of advanced alcoholic liver damage. Your options for answering these questions are several. To explore his possible alcohol abuse or dependency, (1) you could take the time required for a thorough confrontation and Copyright © 2009 by the American Medical Association. Click here for terms of use.

1

CHAPTER 1

The Rational Clinical Examination

Alcohol Abuse or Dependency

No. of Positive Answers to the 4 CAGE Questions

3 Or 4

2, 1, Or none

Yes

No

60 (True +)

1 (False +)

57 (False –)

a

b

c

d

a+b c+d 400 (True –)

457

a+b+c+d

a+c b +d 117

61

401

518

Figure 1-1 The CAGE Questions for Alcohol Abuse or Dependency Characteristics: sensitivity, a/(a + c) = 60/117 = 0.51, or 51%; specificity, d/(b + d ) = 400/401 = 0.998, or 99.8%. Predictions: positive predictive value or posttest probability of having the target disorder (alcohol abuse or dependency) for patients with 3 or 4 positive responses, a/(a + b) = 60/61 = 0.98, or 98%; negative predictive value or posttest probability of not having the target disorder for patients with 2 or fewer positive responses, d/(c + d ) = 400/457 = 0.88 or 88%; posttest probability of having the target disorder for patients with 2 or fewer positive responses, c/(c + d ) = 57/457 = 0.12, or 12%. Prevalence or pretest probability of having the target disorder (adapted from Bush et al5), (a + c )/(a + b + c + d ) = 117/518 = 23%. Abbreviation: CAGE, c ut down, annoyed, guilty, eye opener.

interrogation about the amount of alcohol he consumes (and, in the process, risk alienating him, estranging the nursing staff, and exasperating yourself); (2) you could order 1 or more liver function tests; (3) you could even request one of the new, “hot” tests for platelet enzyme activity, reported to be elevated in persons with alcoholism2; or (4) you could ask him the 4 quick “CAGE” questions: Have you ever felt you should cut down on your drinking? Have people annoyed you by criticizing your drinking? Have you ever felt bad or guilty about your drinking? Have you ever had a drink first thing in the morning to steady your nerves or to get rid of a hangover (eye-opener)? This opening example in the series is all the more appropriate when we observe that the first report on the CAGE questionnaire in a general medical journal was by John Ewing3 and that it was accompanied by an editorial from a major supporter of this series, George Lundberg.4 To explore his possible ascites, (1) you could check him for shifting dullness, fluid wave, or even the puddle sign; (2) you could order an abdominal ultrasonographic examination; or (3) you could simply ask him whether he has ever had swollen ankles. Stop for a moment and consider the implications, in terms of your time and somebody’s money, of the alternative ways of answering these 2 questions. Would it not be better if you could answer them both with just 5 quick questions (4 for CAGE and 1 about ankle swelling)? As it happens, you might be able to do just that. If he answers yes to 3 or 4 of the CAGE questions, he is an alcoholabusing or alcohol-dependent man (and this medical history is far more powerful than any laboratory tests you can 2

order). If he answers no to ankle swelling, you have pretty well ruled out clinically important ascites (you could double check the latter by testing for shifting dullness; like most such patients, he did not have a fluid wave, and as you will learn in a forthcoming overview on ascites, the puddle sign is not useful in him or anybody else). Thus, for both questions, a quick bedside examination has provided definitive diagnostic information, without the need for laboratory testing or diagnostic imaging. How can we make such a bold statement about the power of these simple elements of the clinical history and physical examination? The answer lies in the science of the art of clinical diagnosis that underpins this series of overviews on the rational clinical examination. This first background article will introduce and illustrate the key elements of this science (and readers who want a more detailed discussion of what follows can consult a step-by-step discussion published elsewhere1). The background articles also are intended to convey the fun and gratification physicians derive from making correct diagnoses with crispness and dispatch.

TAKING AN ALTERNATIVE HISTORY FOR ALCOHOLISM Examine Figure 1-1. In it are shown the number of positive answers to the CAGE questions from 2 groups of patients admitted to the orthopedic or medical services of a community-based teaching hospital in Boston, Massachusetts.5 In the left-hand column are the responses from patients whose extensive evaluations (including, where indicated, detailed social histories, follow-ups, and liver biopsies) provided acceptable “proof ” that they were alcohol abusers or alcohol dependent. In the right-hand column are patients whose evaluations showed that they were not alcohol abusers or dependent. These extensive confirmatory investigations often are referred to as criterion standards of diagnosis and typically consist of definitive findings at angiography, operation, autopsy, and the like. This study is useful to clinicians because the CAGE history and the extensive (reference or criterion standard) investigations were carried out independently among a wide spectrum of well-described patients in whom it was clinically reasonable to inquire about alcohol abuse. It thus satisfies the first criterion of a valid, clinically useful article on diagnostic strategies that appears in Table 1-1 (has there been an independent, “blind” comparison with a criterion standard of diagnosis?). The readers’ guides in Table 1-1 have been used by the authors of this series on the rational clinical examination to “screen” articles for inclusion in their overviews of diagnostic approaches to specific clinical problems. Table 1-1 can be clipped and carried for easy reference when reading clinical articles that make claims about the usefulness of (especially new) diagnostic tests, and the reasoning behind its elements are described in detail elsewhere.1 The study that generated Figure 1-1 also satisfied the second, commonsense guide, for it was carried out in a patient sample that included an appropriate spectrum of mild and severe, treated and untreated alcoholism, plus individuals

CHAPTER 1 with different but commonly confused disorders. The setting for the study (a large, urban, general hospital) was described, satisfying the third readers’ guide and permitting us to determine the applicability of the results to our own setting, and the term normal (the fifth guide) was clearly and sensibly defined as the absence of alcohol abuse or dependency (we shall return to the fourth guide of reproducibility later). The authors of the CAGE study were not proposing that their questions be used as part of an extensive series (“cluster”) of diagnostic tests (so the sixth guide does not apply), and the questions were presented with their exact wording in the article, satisfying the seventh guide and permitting their exact application in the reader’s own practice. The final readers’ guide (has the utility of the test been determined?) is satisfied to the extent that the CAGE questions recognized far more persons with alcoholism, especially alcohol abusers, than routine clinical diagnosis and made them candidates for treatment and counseling. In summary, the CAGE study observed the methodologic standards required for a valid and clinically useful description of the clinical applicability of any diagnostic information, whether it comes from the clinical history, the physical examination, or the diagnostic laboratory.

Table 1-1 Readers’ Guides for an Article About a Diagnostic Test 1. Has there been an independent, “blind” comparison with a criterion standard of diagnosis? 2. Has the diagnostic test been evaluated in a patient sample that included an appropriate spectrum of mild and severe, treated and untreated disease, plus individuals with different but commonly confused disorders? 3. Was the setting for this evaluation, as well as the filter through which study patients passed, adequately described? 4. Have the reproducibility of the test result (precision) and its interpretation (observer variation) been determined? 5. Has the term normal been defined sensibly as it applies to this test? 6. If the test is advocated as part of a cluster or sequence of tests, had its individual contribution to the overall validity of the cluster and sequence been determined? 7. Have the tactics for carrying out the test been described in sufficient detail to permit their exact replication? 8. Has the utility of the test been determined?

First Clinician’s Examination for Spider Nevi

THE PRECISION OF THE CLINICAL EXAMINATION For an item of the clinical history or physical examination to be accurate, it first must be precise. That is, we need to have some confidence that 2 clinicians examining the same, unchanged patient would agree with each other on the presence or absence of the symptom (such as our patient’s answer to one of the CAGE questions) or sign (such as the presence of spider nevi on our patient’s chest). The precision (often appearing under the name of “observer variation” in the clinical literature) of such clinical findings can be quantitated.6 Suppose 2 clinicians recorded whether they found spider nevi when they independently examined the same 100 patients suspected of having liver disease and generated the data shown in Figure 1-2. The 2 clinicians agreed that 23 of the patients (cell a) had spider nevi and that 66 patients (cell d) did not; thus, they agreed on (23 + 66)/100 = 89% of the patients they examined. However, 6 patients (cell c) judged to have spider nevi by the first clinician were judged not to have nevi by the second, and 5 patients (cell b) judged to have spider nevi by the second clinician were judged not to have nevi by the first. How should we interpret this precision? Is this degree of clinical agreement good, or should we expect better? We might begin by recognizing that some clinical agreement would occur by chance alone. For example, if the second clinician merely tossed a coin for each patient instead of carrying out an examination, reporting nevi if the coin came up “heads” and no nevi if it came up “tails,” agreement would be 50%. We should begin, then, by determining how much of the observed agreement of 89% was because of chance, so that we can find out how much real clinical skill (agreement beyond chance) was being displayed by these clinicians.

Primer on Precision and Accuracy

Second Clinician’s Examination for Spider Nevi

Positive

Negative

Positive

Negative

23 (Expect 8)

5

6

a

b

a+b

c

d

c+d

66 (Expect 51)

a+c b +d 29

28

72

a+b +c +d 71

100

Figure 1-2 The Precision of the Clinical Examination for Spider Nevi Observed agreement: (a + d )/(a + b + c + d ) = (23 + 66)/100 = 89% Expected agreement: For cell a, ([a + b] × [a + c])/(a + b + c + d ) = (28 × 29)/100 = 8 For cell d, ([c + d ] × [b + d ])/(a + b + c + d ) = (72 × 71)/100 = 51 Calculate expected agreement as (expected a + expected d )/(a + b + c + d ) = (8 + 51)/100 = 59%. Agreement beyond chance = κ = (observed agreement – expected agreement)/ (100% – expected agreement) = (89% – 59%)/(100% – 59%) = 0.73. Conventional levels of κ: slight, 0.0-0.2; fair, 0.2-0.4; moderate, 0.4-0.6; substantial, 0.6-0.8; almost perfect, 0.8-1.0. Adapted from Lundberg.4

Chance agreement can be calculated by the formal process of “marginal cross-products” shown in Figure 1-2, but it also can be thought of as a coin toss in which, for example, the first clinician’s coin came up heads 29% of the time (based on [a + c]/[a + b + c + d]). Thus, 29% of the 28 patients judged to have spider nevi by the second clinician (a + b) would also be judged to have them by the first clinician, and 29% of 28 is 8 (the number of patients we would expect to 3

CHAPTER 1

The Rational Clinical Examination

find in cell a by chance alone). Similarly, the first clinician’s coin came up tails 71% of the time ([b + d]/[a + b + c + d]), and 71% of the 72 patients judged to be free of spider nevi by the second clinician (c + d) is 51 (the expected value for cell d). As a result, we would expect the 2 clinicians to agree (8 + 51)/100, or 59% of the time, on the basis of chance alone, and the remaining potential agreement beyond chance is therefore 100% – 59%, or 41%. How much of this 41% potential agreement beyond chance was achieved? This is determined by comparing it with the actual agreement beyond chance of 89% – 59%, or 30%, and 30%/41% comes to 0.73, which means that about three-fourths of the potential agreement beyond chance was achieved by our 2 clinicians. This measure of agreement goes by the name κ and is rather like a correlation coefficient.1 It ranges from –1.0 (where 2 clinicians would be in perfect disagreement), through 0.0 (where only chance agreement was accomplished), to +1.0 (where 2 clinicians would be in perfect agreement). As you can see in the listing of “conventional levels of κ” that appears in the legend for Figure 1-2, the agreement between our 2 clinicians is considered “substantial,” and this is the case for many “present/ absent” aspects of the physical examination. As you might imagine, agreement is greater still when the 2 examinations are carried out by the same clinician. Other items on the clinical examination do not fare as well. For example, in one study of the chest examination, the κ for cyanosis, tachypnea, and whispered pectoriloquy was 0.36, 0.25, and 0.11, respectively.7 No measure of clinical agreement is ideal, and κ is no exception. Its size is slightly affected by the frequency of the abnormal finding in the group of patients being examined (it is highest when half of the patients have the finding and tails off a bit when the finding is extremely common or uncommon). If your and our interests warrant, we shall come back to this in a subsequent background article. But, of course, high precision is not enough, for examiners may be consistent but wrong in their assessments. All 5 members of my clinical team occasionally fail to detect a big liver or hear an important diastolic murmur. In other cases, clinicians may be neither precise nor accurate. For example, a group of iridologists was asked to examine the irises of a series of patients and distinguish those with gallstones from those who had sonographically empty gallbladders.8 Their clinical agreement was only “slight,” with an average κ of 0.18 (about like whispered pectoriloquy). More important, however, their diagnostic accuracy was no better than chance: they missed about half the patients with gallstones (sensitivity, 54%) and diagnosed gallstones in about half the patients with negative sonogram results (specificity, 52%). To understand sensitivity and specificity, we must now shift from determining the precision of the clinical examination to defining the characteristics of its accuracy.

THE CHARACTERISTICS OF THE ACCURACY OF DIAGNOSTIC TESTS Returning our attention to Figure 1-1, we can examine the accuracy characteristics of the CAGE questions. The 60 4

patients in cell a of Figure 1-1 answered yes to 3 or 4 of the CAGE questions and constitute 51%, or 0.51, of all the 117 patients (a + c) with a positive diagnosis of alcohol dependency or abuse. The shorthand term for this proportion of 0.51, or a/(a + c), is sensitivity, and it is a useful measure of how well a diagnostic test (whether a symptom, sign, or laboratory test) detects a target disorder when it is present. The closer the sensitivity to 100%, the more “sensitive” the clinical or laboratory finding. In the right-hand column are the responses from patients for whom the criterion standard ruled out the diagnosis of problem drinking. The 400 patients in cell d answered yes to 2, only 1, or none of the CAGE questions and constitute 99.8%, or 0.998, of all the 401 patients (b + d) who did not have alcohol dependency or abuse. The shorthand term for this proportion of 0.998, or d/(b + d), is specificity, and it is a useful measure of how often a symptom, sign, or other diagnostic test is absent when the target disorder is not present. The closer the specificity to 100%, the more “specific” the clinical or laboratory finding. (Of course, clinicians are not interested in sensitivity and specificity as such but in their effect on the interpretation of positive and negative findings, and we shall get to that shortly. Sensitivity and specificity are properties that must be established beforehand, and that is why they are presented here.) You will observe that the sensitivity of the CAGE questions is not impressive. The number of “true positives” in cell a is almost equaled by the number of “false negatives” in cell c, and the sensitivity of only 51% confirms that it “misses” about half the problem drinkers. On the other hand, the specificity of the CAGE questions is outstanding. The number of “true negatives” in cell d vastly outnumbers the number of “false positives” in cell b, and the specificity of 99.8% confirms that it almost never labels a patient as a problem drinker when this disorder is absent. Now we can consider the “predictions” we make about our patient according to the foregoing characteristics. Because of the high specificity, virtually every patient in cell a who answered yes to 3 or 4 of the CAGE questions (a + b) has the target disorder, alcohol abuse or dependency, and the shorthand term for this proportion a/(a + b), which is 60/61, or 98%, is the positive predictive value or posttest probability of having the target disorder (among patients with 3 or more positive answers). Moreover, despite the rather unimpressive sensitivity, most of the patients in cells c and d who answered yes to none, just 1, or 2 of the CAGE questions were in cell d and did not have the target disorder. The shorthand term for this proportion d/(c + d), which is 400/457, or 88%, is the negative predictive value or posttest probability of not having the target disorder among those patients with 2 or fewer positive answers. The complement of this negative predictive value, or c/(c + d), describes the posttest probability of having the disorder among those patients with 2 or fewer positive answers, and this other way of saying the same thing is found useful by some clinicians. The reason that the negative predictive value looks relatively high, despite the low sensitivity, lies in the fact that the proportion of all patients in this study who had alcohol

CHAPTER 1 dependency or abuse, (a + c)/(a + b + c + d), or 117/518, was only 23% to begin with. That is, 100% – 23%, or 77%, of the patients were not alcohol dependent before they were asked any questions. The shorthand term for the previous knowledge contained in this (a + c)/(a + b + c + d) is prevalence or, more usefully, the pretest probability of the target disorder (because this pretest probability is the starting point for making clinical use of the test characteristics, we will place it above the “predictions” entries in subsequent figures). In contrast to this pretest probability of 23% in the clinical article describing the CAGE questions, in our patient, we judged that the pretest probability of alcohol abuse or dependency was 50%. How would the CAGE questions perform in patients like ours? If the patients in the study summarized in Figure 1-1 were like our own patient, we would expect the result shown in Figure 1-3. As long as the patient “mix” and severity of disease in the CAGE study summarized in Figure 1-1 are similar to the patient mix and severity of disease in our practice, we would expect sensitivity and specificity to remain constant, despite changes from the study’s to our patient’s pretest probability of the target disorder. Thus, the sensitivity (51%) and specificity (99.8%) in Figure 1-3 are the same as those in Figure 1-1. Notice, however, that the negative predictive value has decreased from 88% to 67% because predictive values must change with changes in the prevalence of the target disorder. One useful way to think about this is to carry through this concept of prevalence. After all, the predictive value of a positive test result is simply the prevalence of the target disorder among those patients with positive test results. Similarly, the negative predictive value is the prevalence of not having the target disorder among patients with a negative test result. No wonder, then, that predictive values must change with a change in the overall prevalence of the target disorder.

BACK TO THE PATIENT Your patient readily admitted that he had cut down on his drinking, that his spouse and workmates had annoyed him by complaining about his drinking, and that he often needed an “eye opener” to get going in the morning. According to this quick medical history, and given your previous judgment (before you had any knowledge of his responses to any of these questions) that his chances of being alcohol dependent were 50-50 (ie, a pretest probability of 50%), you can follow his response through Figure 1-3 and conclude that his posttest probability of alcohol dependency is 99.6%, or about as certain as you ever can be about any diagnosis. Your patient helps us make another general point: because he gave a positive response to a diagnostic history whose specificity was extremely high (99.8%), you “ruled in” the target disorder. A simple way of remembering this property of a powerful diagnostic test is the acronym SpPin: when specificity is extremely high, a positive test result rules in the target disorder. Would the laboratory tests you were considering ordering have saved you some time and done a better job of determining this diagnosis? In fact, and in addition to delaying the diagnosis,

Primer on Precision and Accuracy

Alcohol Abuse or Dependency

No. of Positive Answers to the 4 CAGE Questions

3 Or 4

2, 1, Or none

Yes

No

510 (True +)

2 (False +)

490 (False –)

a

b

a+b

c

d

c+d 998 (True –)

a+c b +d 1000

512

1488

a+b +c +d 1000

2000

Figure 1-3 The CAGE Questions for Alcohol Abuse or Dependency When the Pretest Probability Is 50% Characteristics: sensitivity, a/(a + c) = 510/1000 = 0.51, or 51%; specificity, d/(b + d ) = 998/1000 = 0.998, or 99.8%. Prevalence or pretest probability of having the target disorder, (a + c)/(a + b + c + d ) = 1000/2000 = 50%. Predictions: positive predictive value or posttest probability of having the target disorder for patients with 3 or 4 positive responses, a/(a + b) = 510/512 = 0.996, or 99.6%; negative predictive value or posttest probability of not having the target disorder for patients with 2 or fewer positive responses, d/(c + d ) = 998/1488 = 0.67, or 67%; posttest probability of having the target disorder for patients with 2 or fewer positive responses, c/(c + d ) = 490/1488 = 0.33, or 33%. Abbreviation: CAGE, cut down, annoyed, guilty, eye opener.

their accuracy is much worse. In the same investigation that studied the CAGE questions, the specificities for γ-glutamyl transpeptidase, mean corpuscular volume, and an entire liver function battery were only 76%, 64%, and 81%, respectively.3 Moreover, the hot new test of platelet enzyme activity has a specificity of only 73%.2 Thus, in your patient, a simple medical history was not only quicker and easier but also far more specific. What about his possible ascites? Given that you have established the diagnosis of alcohol dependency, you already can plan his perioperative and postoperative management to prevent, detect, and treat alcohol withdrawal syndromes. Nonetheless, you would like to know whether he has sufficient liver damage to affect his handling of the sorts of drugs he is likely to receive. Given his fractured ankle, the kneeling position required for eliciting the puddle sign is out of the question, and even a test for shifting dullness will cause him considerable pain. He has already been to radiology, and you do not want him to make the trip again for an abdominal ultrasonographic examination if you can avoid it. His uninvolved ankle is not swollen now, and he tells you he has never had ankle swelling in the past. Would this simple medical history for previous ankle swelling be of any use? Figure 1-4 summarizes a study of 63 patients admitted to a general medical service in Durham, North Carolina.9 Of 15 patients with ascites on abdominal ultrasonographic examination (the criterion standard), 14 had a history of ankle swelling, for an impressive sensitivity of 93%. If we applied this sensitivity (93%) and specificity (66%) to our pretest probability for ascites of 50%, the result (shown in Figure 1-5) suggests 5

CHAPTER 1

The Rational Clinical Examination

Presence of Ascites on Abdominal Ultrasonography

Yes

Present

Absent

14

16

History of Ankle Swelling No

30

a

b

a+b

c

d

c +d

1

33

32

a+c b +d 15

a+b +c +d 48

63

Figure 1-4 Relationship Between a History of Ankle Swelling and Ascites Characteristics: sensitivity, a/(a + c) = 14/15 = 0.93, or 93%; specificity, d/(b + d ) = 32/48 = 0.67, or 67%. Prevalence or pretest probability of having the target disorder, (a + c)/(a + b + c + d ) = 15/63 = 24%. Predictions: positive predictive value or posttest probability of having the target disorder for patients with a history of ankle swelling, a/(a + b) = 14/30 = 0.47, or 47%; negative predictive value or posttest probability of not having the target disorder for patients with a negative history for ankle swelling, d/(c + d ) = 32/33 = 0.97, or 97%; posttest probability of having the target disorder for patients with a negative history for ankle swelling (adapted from Simel et al9 ), c/(c + d ) = 1/33 = 0.03, or 3%.

Presence of Ascites on Abdominal Ultrasonography

Yes

Present

Absent

93

34

History of Ankle Swelling No

a

b

a+b

c

d

c +d

7

66

a+c b +d 100

127

73

a+b +c +d 100

200

Figure 1-5 Relationship Between a History of Ankle Swelling and Ascites When the Pretest Probability Is 50% Characteristics: sensitivity, a/(a + c) = 93/100, or 93%; specificity, d/(b + d ) = 66/100 = 0.66, or 66%. Prevalence or pretest probability of having the target disorder, (a + c)/(a + b + c + d ) = 100/200 = 0.5, or 50%. Predictions: positive predictive value or posttest probability of having the target disorder for patients with a history of ankle swelling, a/(a + b) = 93/127 = 0.73, or 73%; negative predictive value or posttest probability of not having the target disorder for patients with a negative history for ankle swelling, d/(c + d ) = 66/73 = 0.90, or 90%; posttest probability of having the target disorder for patients with a negative history for ankle swelling, c/(c + d ) = 7/73 = 0.10, or 10%.

6

that the posttest probability of not having ascites is 90% when the patient denies ankle swelling. Again, this simple element of the clinical history provides powerful diagnostic information: when the sensitivity of a symptom or sign is high, a negative response rules out the target disorder, and the acronym for this property is SnNout. However, you may have observed that this study included only 15 patients with ascites, and you may well inquire how confident we should feel about this sensitivity of 0.93. As it happens, the degree of confidence we ought to place in this (or any other) estimate of sensitivity (or specificity) can be calculated and expressed as a confidence interval, within which you can be confident that the true sensitivity resides, say, 95% of the time.1 In this case, the 95% confidence interval on this sensitivity of 0.93 based on 15 patients runs all the way from 0.81 (not terribly sensitive) to 1.00 (or perfect sensitivity). If, on the other hand, this sensitivity of 0.93 were based on 100 patients with ascites, the 95% confidence interval would run from 0.88 to 0.98, and you would be justified in being more confident that a negative medical history rules out ascites. Thus, you should look for information on the 95% confidence interval for measures of accuracy such as sensitivity and specificity when you read about them.

A FASTER AND MORE POWERFUL APPROACH: THE LIKELIHOOD RATIO Many of the overviews in this series will describe not only the sensitivity and specificity of specific symptoms and signs but also their likelihood ratios (LRs). This method of describing the accuracy of diagnostic information, once mastered, is much faster and more powerful than the sensitivity and specificity approach.1 It is shown in Figure 1-6 for ankle swelling and ascites. In brief, an LR expresses the odds that a given finding on the medical history or physical examination would occur in a patient with, as opposed to a patient without, the target disorder. When a finding’s LR is above 1.0, the probability of disease increases (because the finding is more likely among patients with than without the disorder); when the LR is below 1.0, the probability of disease decreases (because the finding is less likely among patients with than without the disorder); finally, when the LR is close to 1.0, the probability of disease is unchanged (because the finding is equally likely in patients with and without the disorder). LRs are related to sensitivity and specificity but possess some advantages for clinicians. In a 2 × 2 table such as Figure 1-6, the LR for a positive history of ankle swelling is equal to sensitivity/ (1 – specificity) or 0.93/0.33, or 2.8, indicating that a positive history is almost 3 times as likely to be obtained from a patient with, as opposed to a patient without, ascites. The LR for a negative history of ankle swelling is equal to (1 – sensitivity)/specificity or 0.07/0.67, or 0.10, indicating that a negative history is only as likely to be obtained from a patient with, as opposed to a patient without, ascites (and confirming our earlier conclusion that this negative history permitted us to SnNout this diagnosis). The first advantage of LRs is that the LR for a given finding, when applied to the pretest odds of the target disorder, generates the posttest odds for that disorder. Because the LR is expressed

CHAPTER 1 as an odds, this may at first appear cumbersome, for it means that the pretest probability must also be expressed as an odds (although this is tedious to do by hand, later, we will show you how to avoid the calculations by using the nomogram shown in Figure 1-7). When done by hand, the pretest probability of the target disorder is converted into pretest odds by the formula: Pretest odds = Probability of having the target disorder/ Probability of not having the target disorder In Figure 1-6, the pretest probability of ascites is 0.24, and the pretest probability of not having ascites is 1.00 – 0.24, or 0.76. Therefore, the pretest odds of ascites are 0.24/0.76, or 0.32, and this can be multiplied by 2.8 (generating a posttest odds of ascites of 0.90) when the history is positive for ankle swelling and by 0.10 (generating a posttest odds of 0.03) when this history is negative. These posttest odds can then be converted back to probabilities by the formula: Posttest probability of the target disorder = Posttest odds/(Posttest odds + 1) Thus, the posttest odds of 0.90 following from a positive history of ankle swelling converts (by 0.90/1.90) to 47%, and the posttest odds of ascites of 0.03 following from a negative history converts (by 0.03/1.03) to 3%, and you will observe that these are the same values for the posttest probability of having ascites that we generated in Figure 1-4. The necessity for converting probability to odds and back again can be obviated by using the nomogram shown in Figure 1-7, which has already carried out the conversions for us.1 You can prove this to yourself as follows: anchor a straightedge at the left margin of the nomogram, at the pretest probability of 24%, and rotate the straightedge until it intersects the middle line of the nomogram at an LR of 2.8, corresponding to a positive history of ankle swelling. It will intersect the right margin of the nomogram at just below 50%. Similarly, rotate the straightedge until it intersects an LR of 0.10 for the negative history and observe that the posttest probabilityof ascites decreases to 3%. The second advantage of LRs becomes apparent when we see that the nomogram permits us to determine the probability of ascites when the pretest probability changes from 24% in Figure 1-4 to 50% in Figure 1-5 without having to construct the latter. We can simply reanchor the straightedge at 50% and run it across the LRs of 2.8 and 0.10 as before, intersecting the posttest probability line at about 73% and 10%. The third advantage of LRs is that, unlike sensitivity and specificity (which limit the number of test results to just 2 levels, “positive” and “negative”), they can be generated for multiple levels of the diagnostic test result. At each level, the proportion of patients with the target disorder at this level is divided by the proportion of patients who do not have the target disorder at this same level; the result is the LR for this level. This is shown in Table 1-2, in which LRs for 4, 3, 2, and 1 and no positive responses to the CAGE questionnaire are shown (the awkward, infinitely high LR for 4 positive answers can be avoided if 3 and 4 positive answers are combined, generating an LR of 206 for the combination).

Primer on Precision and Accuracy

Presence of Ascites on Abdominal Ultrasonography

Yes

Present

Absent

14 (0.93)

16 (0.33)

History of Ankle Swelling No

1 (0.07)

a

b

a+b

c

d

c +d 32 (0.67)

a+c b +d 15 (1.00)

30

48 (1.00)

33

a+b +c +d 63

Figure 1-6 Likelihood Ratios for a History of Ankle Swelling in Diagnosing Ascites Characteristics: sensitivity/(1 – specificity) = likelihood ratio (LR) (of having the target disorder) for a positive test result = (a/[a + c])/(b/[b + d ]) = 0.93/0.33 = 2.8; (1 – sensitivity)/specificity) = LR (of having the target disorder) for a negative test result = (c/[a + c])/(d/[b + d ]) = 0.07/0.67 = 0.10. Pretest probability: prevalence or pretest probability of having the target disorder, (a + c)/(a + b + c + d ) = 15/63 = 24%. Predictions: posttest probability of the target disorder (expressed as odds) = pretest probability of the target disorder (expressed as odds) × LR for the test result. Positive history, 0.24/0.76 = 0.32 × 2.8 = 0.90/ 1.90 = 47%. Negative history, 0.24/0.76 = 0.32 × 0.10 = 0.03/1.03 = 3%. Adapted from Simel et al.9

Figure 1-7 A Nomogram for Applying Likelihood Ratios Adapted from Sackett et al.1

7

CHAPTER 1

The Rational Clinical Examination

Table 1-2 Multiple Levels of Responses to the CAGE Questions for Alcohol Abuse or Dependency a,b No. of Positive Answers to the 4 CAGE Questions

Alcohol Abuse or Dependency

4 3 2 1 0 Total

Yes

No

23 (0.20) 37 (0.32) 28 (0.24) 11 (0.09) 18 (0.15) 117

0 1 (0.002) 14 (0.03) 28 (0.07) 358 (0.89) 401

Likelihood Ratios ∞ 127 6.8 1.3 0.17

Abbreviation: CAGE, c ut down, a nnoyed, g uilty, e ye opener. aAdapted from Bush et al.5 b Numbers in parentheses are proportions of the respective columns.

CONCLUSION

Target Disorder

Absent

Present

Positive Diagnostic Test Result

a

b

a+b

c

d

c +d

Negative

a+c b +d

The fourth advantage of the LR strategy is that the posttest probability of the target disorder obtained from the first item of diagnostic information (say, a history of ankle swelling) is the pretest probability of that diagnosis for the next item of diagnostic information (say, the physical examination for ankle edema). This example also identifies the problem we always face when we combine diagnostic information from the medical history and physical examination (and chemistry laboratory, and radiology suite!): the results of the medical history and physical examination are not independent from each other. Thus, a patient with a positive history of swollen ankles is far more likely to have pedal edema than a patient with a negative history, and we must either use an LR that considers both of the 2 items as a pair or modify the LR for the second, according to the results of the first. This issue of independence, along with the consideration of the site (primary care or a tertiary hospital) where the examination is carried out, will be taken up in a subsequent background article in this series.

This first background article has described readers’ guides for articles about diagnostic information and has shown how diagnostic data derived from the medical history and physical examination can be assessed for their precision and accuracy. It concludes with a working table (Figure 1-8) and glossary that can be photocopied or clipped. Kept handy, they can help readers study and understand the overviews published in this and subsequent issues of the series on the rational clinical examination.

a+b +c +d

Author Affiliations at the Time of the Original Publication

Figure 1-8 Working Table for the Reader’s Use For accuracy Sensitivity = a/(a + c); SnNout: when sensitivity is high, a negative test result rules out the target disorder Specificity = d/(b + d ); SpPin: when specificity is high, a positive test result rules in the target disorder. Positive predictive value or posttest probability of having the target disorder among patients with positive test results, a/(a + b). Negative predictive value or posttest probability of not having the target disorder among patients with negative test results, d/(c + d ). Posttest probability of having the target disorder for patients with negative test results, c/(c + d ). Prevalence or pretest probability of having the target disorder, (a + c)/(a + b + c + d ). Sensitivity/(1 – specificity) = likelihood ratio (LR) (of having the target disorder) for a positive test result = (a/[a + c])/(b/[b + d ]). (1 – sensitivity)/specificity = LR (of having the target disorder) for a negative test result = (c/[a + c])/(d/[b + d ]). Posttest probability of the target disorder (expressed as odds) = pretest probability of the target disorder (expressed as odds) × LR for the test result. For precision (and κ) Observed agreement: (a + d )/(a + b + c + d ) Expected agreement: Expected cell a, ([a + b] × [a + c])/(a + b + c + d ) Expected cell d, ([c + d ] × [b + d ])/(a + b + c + d ) Calculate expected agreement as (expected a + expected d )/(a + b + c + d ); Agreement beyond chance = κ = (observed agreement – expected agreement)/(100% – expected agreement) Conventional levels of κ : slight, 0.0-0.2; fair, 0.2-0.4; moderate, 0.4-0.6; substantial, 0.6-0.8, almost perfect, 0.8-1.0.

8

Departments of Medicine and Clinical Epidemiology and Biostatistics, Faculty of Medicine, McMaster University, and the Hamilton Civic Hospitals, Hamilton, Ontario, Canada. Acknowledgments

I wish to thank Andreas Laupacis, MD, David Simel, MD, Richard Deyo, MD, Drummond Rennie, MD, James Nishikawa, MD, Charles Goldsmith, PhD, and the JAMA reviewers for their kind and helpful responses to earlier versions of this background article.

REFERENCES 1. Haynes RB, Sackett DL, Guyatt GH, Tugwell P. Clinical Epidemiology: How to Do Clinical Practice Research. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2005. 2. Tabakoff B, Hoffman PL, Lee JM, Saito T, Willard B, de Leon-Jones F. Differences in platelet enzyme activity between alcoholics and nonalcoholics. N Engl J Med. 1988;318(3):134-139. 3. Ewing JA. Detecting alcoholism: the CAGE questionnaire. JAMA. 1984; 252(14):1905-1907. 4. Lundberg GD. Ethyl alcohol: ancient plague and modern poison. JAMA. 1984;252(14):1911-1912. 5. Bush B, Shaw S, Cleary P, Delbanco TL, Aronson MD. Screening for alcohol abuse using the CAGE questionnaire. Am J Med. 1987;82(2):231-235. 6. Theodossi A, Knill-Jones RP, Skene A, et al. Inter-observer variation of symptoms and signs in jaundice. Liver. 1981;1(1):21-32. 7. Spiter MA, Cook DG, Clarke SW. Reliability of eliciting physical signs in examination of the chest. Lancet. 1988;331(8590):873-875. 8. Knipschild P. Looking for gall bladder disease in the patient’s iris. BMJ. 1988;297(6663):1578-1581. 9. Simel DL, Halvorsen RA Jr, Feussner JR. Quantitating bedside diagnosis: clinical evaluation of ascites. J Gen Intern Med. 1988;3(5):423-428.

U P D A T E : Primer on Precision and Accuracy

1

Prepared by David L. Simel, MD, MHS Reviewed by Sheri Keitz, MD, PhD

UPDATED SUMMARY ON PRECISION AND ACCURACY OF THE CLINICAL EXAMINATION Original Review Sackett DL. A primer on the precision and accuracy of the clinical examination. JAMA. 1992;267(19):2638-2644.

WHAT IS THERE TO UPDATE? Each of the updates in The Rational Clinical Examination systematically evaluates the newly published literature on the topic, except this one. Updating the Primer requires a different approach to fulfill the original promise that the series would address methodologic concerns beyond precision and accuracy. What we will do is take a very utilitarian approach, driven by the topic updates themselves. The updates and our own lectures on the rational clinical examination unearthed topics that we need to address. Rather than conducting a systematic review of quality measures, sensitivity, specificity, likelihood ratios (LRs), and a plethora of related topics, we instead provide background information and answers to questions that our own authors required when preparing their reviews and updates. Of course, the basic premise for diagnosis has not changed since the Primer (or since Thomas Bayes figured it out more than 3 centuries ago): Prior odds × LR = Posterior odds For the clinical examination, this means we (1) use information about the probability of a target disorder (frequently taken as the prevalence, which is then converted to the prior odds) and then (2) apply the results of symptoms or signs (in the form of an LR). After applying the LR associated with various symptoms and signs, we get the posterior odds of disease. The probability of disease increases when a clinical finding is more likely in a patient with the target disorder (reflected by an LR > 1). The probability of disease decreases when a clinical finding is more likely to occur in a patient without the target disorder (reflected by an LR < 1). The resultant probability becomes the “posterior” probability because the prior probability is established first and then modified with information from the medical history and physical examination quantitatively expressed in the form of the LR.* Keeping the simple equation

in mind focuses the goal of The Rational Clinical Examination series articles on providing all the data needed to solve the posterior odds equation.

Why LRs? In the Primer, we emphasized the role of the univariate LR for clinicians. The term univariate means the results for 1 finding, without regard to the findings of other historical or clinical features. We chose this route for a variety of reasons, most important being its fundamental property that allows clinicians to apply the values to individual patients in a consistent pattern. LRs always convey the same information— they quantify the change in odds of disease for a particular test result. By tradition for dichotomous test results, we call the LR associated with a positive test the LR+ (positive LR), whereas the LR associated with a negative test is the LR– (negative LR). In either case, the actual LR value is related to the change in likelihood that the patient has the disease of interest. Thus, there can be no confusion, as is sometimes the case when physicians become overwhelmed with how to translate positive predictive value, true-positive rate, falsepositive rate, negative predictive value, true-negative rate, or false-negative rate into a change in the likelihood of disease for an individual patient. Many clinicians feel more comfortable with the terms sensitivity and specificity. However, these values in and of themselves have little application to the clinical setting. Sensitivity and specificity are values that apply to a screening test result before we know whether the patient has the target disorder. So which result do we use at the bedside? Sensitivity applies only to patients with disease, whereas specificity applies only to patients without disease. Because we use screening tests precisely because we do not know about the presence or absence of disease, how do we decide whether the value of *Do not be confused by the transition between odds in the equation and our discussion of probability. The equation requires that we use the odds ratio, but clinicians find it easier to think in terms of probability. We can covert any probability of disease to the odds ratio by the equation odds = probability of disease/probability of no disease. After we covert the prior probability to odds and multiply it by the LR to get the posterior odds, we convert the result back to the probability of disease by the equation probability = odds/(1 + odds).

9

CHAPTER 1

Update

Table 1-3 Examples of Symptoms or Signs That Have Results Other Than Just “Present” or “Absent” Example

Screening Test

A symptom reported by the “Do you have trouble patient initiating your urine stream?” A sign on the physical examination

Is a third heart sound present?

Ordinala valued findings

Deep tendon reflexes

Multilevel Outcome “Always” “Frequently” “Sometimes” “Never” Abnormal Uncertain Normal 4+ 3+ 2+ 1+ 0

a

Ordinal means “ordered.” The results can be ranked, although the incremental value has no quantitative meaning. For example, deep tendon reflexes of 2+ are more pronounced but not twice as prominent as 1+ reflexes.

Table 1-4 Hypothetical Data to Demonstrate How to Describe the Results for a Finding With 3 Possible Outcomes

S3 definitely present Uncertain S3 definitely absent

LV Systolic Dysfunction Present

Normal LV Function

30 5 10

5 10 50

Abbreviation: LV, left ventricular.

sensitivity or the value of specificity applies to our patient? The simple answer is that we do not know. If we do know which result applies to our patient, then, by definition, we know the disease status, and the results of screening tests lose relevance. The true value of an LR comes from its mathematical definition that combines the values of sensitivity and specificity, making it applicable to each patient before we know whether disease is present or absent. When evaluated in combination, the sensitivity and specificity are the building blocks of the LR for tests that are dichotomous (eg, “positive” or “negative,” “present” or “absent”). The LR for a positive result is sensitivity/(1 – specificity), whereas the LR for a negative result is (1 – sensitivity)/specificity. But what happens when a screening test has more than 2 outcomes (Table 1-3)? Traditional laboratory tests are measured on continuous scales, where the result intervals have a mathematical meaning, but the clinician could not possibly know the LR for every outcome. A clinical laboratory reports the raw result, along with a designator for whether the result is “high,” “normal,” or “low.” The report takes the raw value and transforms it to an ordinal scale, making it easier for clinicians to review a large amount of data. When there are more than 2 outcomes of a screening test, sensitivity and specificity cannot be 10

directly calculated, so the clinician must rely on LRs that are usually given for ordinal results. A simple quantitative explanation helps explain why the sensitivity and specificity lose meaning when there are more than 2 screening test results. The presence of a third heart sound (S3) suggests left ventricular (LV) systolic dysfunction. Sometimes, the clinician is uncertain whether the sound is present. To illustrate this point, we can make up some data that might apply to the clinician’s interpretation of the S3 compared with a reference standard echocardiogram that quantified the LV function (Table 1-4). We can describe the sensitivity of the S3 as 30/(30 + 5 + 10) = 0.68 and the specificity as 50/(5 + 10 + 50) = 0.77. Although this may seem straightforward, closer inspection reveals some problems with that interpretation. First, the treatment of the “uncertain” results lacks consistency. For calculating the sensitivity, we “count” an uncertain S3 as if it were actually absent. But the clinical reality was that the physician could not state with certainty whether it was present or absent. When we calculate the specificity, we do the exact opposite and count the “uncertain” outcomes as if they were “positive.” How can one “uncertain” finding be considered “positive” for sensitivity but “negative” as specificity? This dual treatment creates problems that become even more pronounced as the number of results increases beyond 3 outcomes. Second, even if we believed that the sensitivity and specificity captured the meaning of an S3 that is either present or absent, how do we describe the results for “uncertain?” Sensitivity provides an inadequate definition because sensitivity is the value that describes the percentage of patients with an abnormal result among all those with disease and “uncertain” is neither abnormal nor normal. A similar argument applies to the specificity, so that neither sensitivity nor specificity offers a reasonable description of the value of an uncertain result. The constructs just do not apply to a test result that is neither completely normal nor completely abnormal. The LR provides a way to describe not only the positive and negative results but also those that are uncertain. At a fundamental level, the LR takes a given screening test result and for that outcome tells us the ratio of those with disease to those without disease. So once we know which row of the table a patient belongs in according to their test result (S3 present, S3 uncertain, or S3 absent), the LR tells us the likelihood that the patient will come from the first column vs the second column. We can calculate an LR for every row of an r × 2 table (where r represents the number of rows) (Table 1-5). Thus, when we hear an S3 in the patient, we apply the value 8.7, which makes LV systolic dysfunction much more likely. When we feel confident that an S3 is absent, the likelihood of LV systolic dysfunction decreases. However, when we are “uncertain,” the LR we apply is 0.72, a value that approaches 1 and suggests that the “uncertain” result should not have a large effect on our estimate of the likelihood of disease. Oftentimes, it is useful to know that “uncertain” really means “not much information” with an LR approaching 1.

CHAPTER 1

Isn’t All the Information in the Patient’s Medical History? We now need to address a common belief that the physical examination is not particularly helpful and, at best, only confirms the historical findings and symptoms. Oftentimes, a clinician takes a patient’s medical history and makes a diagnosis before performing a physical examination. This process, although sometimes successful, leads to the inference that the physical examination was unnecessary. For a simple reason, the inference is not true: the physical examination begins from the moment the clinician meets a patient and before the patient utters a word! We observe body language, the patient’s gait, vital signs (eg, tachypnea), and physical deformities, and we judge the acuity of illness. These findings derived from visual observations may be hard to quantify (eg, a sense that the quiet, sullen patient might be depressed), although most clinicians recognize the huge amount of information they collect in the first few moments of a patient interaction. Because describing and measuring the influence of our overall observations is difficult, researchers often overlook the clinical gestalt. One way of isolating the clinical gestalt is to evaluate whether we can make a diagnosis in the absence of directly observing a patient. A symptom checklist (but not the patient’s medical history) can be obtained through a completed patient self-administered questionnaire. Sometimes, we can infer a diagnosis from such questionnaires with our impression uncontaminated by physical findings, but the diagnosis typically requires confirmation obtained through a patient interview or physical examination. The ability to disentangle the history from the physical examination findings is often an illusion, leading to the inference that the patient’s medical history (symptoms) dominates the clinical diagnostic process over the physical examination (signs).

The Pretest Probability The most important part of the clinical examination and the resulting diagnosis is typically not the symptoms or signs—it is the pretest probability, transformed to the prior odds, that dominates the equation. Simply put, if a condition is highly unlikely (or vice versa), then the presence or absence of any addition findings will typically not change things. As a corollary, when the probability of a target condition is not so certain, the effect of the signs and symptoms on the prior probability creates a potentially bigger effect. So, where does the pretest probability come from? We establish the pretest probability in the course of our clinical examination, and that creates a bit of a problem (for both researchers and clinicians). In other words, as we learn more about the patient’s medical history, symptoms, and signs, we orient our approach to a narrower spectrum of disease possibilities. This approach requires that we “waste” a few findings to establish the pretest probability. For example, most patients we examine do not have sinusitis, and we do not ask questions about symptoms related to sinusitis, nor do we transilluminate the sinuses during the course of a clinical examination unless we have a suspicion of the disease. We might constrain our evaluation for sinusitis to patients who claim nasal stuffiness, nasal discharge, or maxillary facial discomfort or who come right out and state, “I think I have a sinus infection.” Each of these findings would prompt an

Primer on Precision and Accuracy

Table 1-5 A Likelihood Ratio Can Be Calculated for Each Row of an r × 2 Table as Shown With These Hypothetical Data

S3 present S3 uncertain S3 absent Total

LV Systolic Dysfunction Present

Normal LV Function

30 5 10 45

5 10 50 65

LRa (30/45)/(5/65) = 8.7 (5/45)/(10/65) = 0.72 (10/45)/(50/65) = 0.29

Abbreviations: LR, likelihood ratio; LV, left ventricular. a By convention, for LR values 0-1, we round off to the 100ths; for LR values 1-10, we round off to the tenths; and for LR > 10, we round off to the nearest integer.

appropriate evaluation for sinusitis and in a research study create the “entrance criteria.” Thus, when we refer to the pretest probability of sinusitis, we most likely are referring to the prevalence of sinusitis among patients with any of those findings rather than to the prevalence of sinusitis among all patients in general. This pretest probability becomes the value we use in the equation and the anchor for applying other symptoms and signs we uncover during our clinical examination. The establishment of the pretest probability is the problem most learners fear, representing their main “excuse” for not using the concepts in The Rational Clinical Examination. Frequently, learners claim “lack of experience.” When existing studies adequately describe their study population, the pretest probability is not difficult to understand. Experience becomes more valuable when the literature is less clear, and perhaps this is part of the “art” of the clinical examination. Trainees may be quite good at estimating the pretest probability of common conditions. However, both trainees and experienced clinicians tend to overestimate the prior probabilities of less common diseases. Trainees express discomfort when estimating the prior probability because (1) they do not practice quantifying and then validating their clinical impression and (2) they may recall their own cases in which they pursued an unlikely diagnosis for a seemingly “classic” presentation, only to find that the disease was not present. Although the second reason emanates from overlooking the importance of prior probability, it requires a reassessment of the role of symptoms and signs.

What Is a “Good” Symptom or Sign? The presence of a “good” symptom or sign creates a large effect on the probability, convincing the clinician that the target condition is much more likely to be present than the prior probability suggests. The suggestion that some prespecified LR threshold defines a good clinical finding for all disease is a myth so persistent that it represents a medical urban legend. Some researchers and clinicians define a “good” test result as that associated with an LR greater than 10 or an LR less than 0.1, but these results do not have intrinsic properties that are the sine qua non of high value. For example, a pretest probability of 10% and positive test with an LR = 10 generates a posttest probability of 53%; this is a big increase in the probability of disease but hardly an increase that 11

CHAPTER 1

Update

clinches the diagnosis. Furthermore, this is a similar posttest probability that follows from a disease with a pretest probability of 20% and a positive test with an LR = 5. Thus, although positive test results are increasingly powerful as the LR increases and negative results are increasingly valuable as the LR decreases, the efficiency of the finding in making a diagnosis depends on the pretest probability. When considering that multiple symptoms and signs are interpreted together, individual findings with much less impressive LRs alone (eg, LR+, 2-5; or LR–, 0.25-0.50) could prove useful when used in combination. If no LR threshold automatically qualifies a result as good, is there a way to compare the efficiency of different clinical findings? A positive clinical finding with the highest LR+ or a negative finding with the lowest LR– will always have the greatest effect on posttest probability. Unfortunately, clinicians discover that a list of symptoms and signs for an individual patient sometimes simultaneously yields outcomes both suggesting (positive results) and pointing away from (negative findings) a target disorder. There is a way, though, to make sense of this. Rank ordering the LR+ associated with each result, along with the reciprocal of the LR– (1/LR–), reveals the single “best” clinical finding for a target condition. The value with the highest LR+ or 1/LR– is the single best symptom or sign result. A single symptom or sign may be useful when present (high LR+) or absent (small LR–). Unfortunately, most symptoms and signs will not produce both the best findings when positive and also the best when it is negative. For example, a clinical sign may have a low LR– when negative, whereas a positive result may have an LR+ that approaches 1. Creating a mental list of LR and 1/LR– for a variety of symptoms and signs is not easy. Some clinicians want to identify the single finding that overall is the most likely to give them the right answer (ie, positive when the patient has disease and negative when the patient is not affected). The diagnostic odds ratio (DOR) creates a single measure of accuracy that tells us which symptom or sign is most likely to correctly classify a patient as having the target disorder or not.1 The DOR is not difficult to calculate, as the DOR = LR+/LR–. The more accurate the symptom or sign, the higher the DOR. So when faced with a table of data on many clinical findings in which none distinguishes itself as the overwhelming favorite, the clinician should choose the finding with the highest DOR. Unfortunately, the DOR cannot be used like the LR for estimating the probability of a diagnosis, but it can help us choose the symptoms and signs of higher utility so that we can ignore those of lesser value. At this point, the skeptical reader might accept that there is a method for identifying better symptoms and signs in terms of their overall measurement properties (through the DOR) and better results applicable to individual patients (through the LR). However, a remaining question might be, How confident can I be that the symptoms and signs I think are the best really are the best?

The Confidence Interval When The Rational Clinical Examination series began, we presented likelihood results as single point values as if they 12

completely described a clinical finding—they do not. Like all statistical parameters, an LR has an associated confidence interval (CI) that helps us decide whether the data are sufficient for us to infer usefulness. These CIs are important because they provide transparency. An optimistic LR suggests a promising clinical finding, but a broad CI dampens the enthusiasm by implying that a small sample size accounts for some certainty. We are particularly cautious when the 95% CI includes 1 because LR values of 1 add no information to the pretest probability. Broad CIs around LR–, even when they do not include 1, are a particular problem. Because the LR– values are constrained between 0 and 1, a broad CI seems less of a problem than the broad CI around a high LR+. To compare the relative findings, the clinical reader can use the technique we described above (ie, taking the value 1/LR–) for comparing the breadth of the CIs of negative to positive LRs. Some readers will be surprised that there are different methods that yield slight (but clinically unimportant) differences in CIs. We prefer the easiest computational method that also works well in spreadsheets.2 One situation presents problems for researchers and clinical readers alike: what do we do when one cell of the 2 × 2 table is 0? When any single cell has a 0 value (typically, the cells for false positive or false negatives), adding 0.5 to each cell of the 2 × 2 table allows calculation of useful CIs.3 A sensitivity of 100% yields an LR– of 0, with the LR upper 95% CI obtained after adding 0.5 to each cell. A specificity of 100% yields an LR+ that is not calculable (∞), so we report both the LR+ and CI obtained after adding 0.5 to each cell. Although high-quality studies report both the sensitivity and specificity of clinical findings, not all of them calculate the LRs for us. When researchers provide the actual numbers of affected and unaffected patients, together with the sensitivity and specificity, we can generate the LRs and 95% CIs. Although it is sometimes easy to calculate CIs from individual research reports, meta-analysis offers us an even better way of describing the LRs of findings evaluated across several studies.

Meta-analysis Meta-analysis of symptoms and signs combines the results described across several studies and summarizes them to get a single estimate and CI. Although some statisticians have a high degree of skepticism about the appropriateness of combining LRs, we take the position that summarizing results provides clarity for clinicians that at the very least allows them to assimilate data and decide whether a symptom or sign is useful, useless, or uncertain. An important part of meta-analysis requires the investigator to make decisions about the appropriateness of combining data. Although statisticians often suggest a purely statistical approach (ie, studies that have statistically heterogeneous results should not be combined), we take a more pragmatic approach similar to that espoused by other clinical diagnosticians.4 First, we evaluate whether the universe of published studies represents the universe of patients for whom the target condition might be considered. When the

CHAPTER 1 studies reflect the population of patients for whom the symptoms and signs apply, we prefer to try combining the LRs. On the other hand, when studies use various definitions of disease or different thresholds for the symptoms and signs, we cannot combine the results in a meaningful way. When we cannot combine the results, we present ranges for the LRs. Second, we consider our target audience to be clinical readers. For a condition that might have a very different LR among different populations of patients (eg, findings for appendicitis among children vs geriatrics patients), we avoid combining results or we at least show how they vary. Part of this approach requires common sense, and part of this is statistical, in which we examine the outlier results to deduce whether there is anything recognizable that accounts for the variant LR findings. Third, we examine the actual results with their CIs after we combine the data. We always use random-effects measures for generating the LR and CIs, rather than the fixed-effects approach. Random-effects measures generate broader CIs than the fixed effects, providing at least some assurance that we are not overstating the importance and confidence in our findings. If a study is a statistical LR outlier, we still include it in the combined data if it does not make a large clinical difference in the LRs. We suggest that the clinician use clinical judgment when deciding whether 2 LRs yield clinically important differences in the posttest probability. For example, for a pretest probability of 30%, an LR of 5.4 produces a posttest probability of 70%, whereas an LR of 3.5 produces a posttest probability of 60%. These LRs “look” different, but a clinician might take a similar action for a posttest probability of 70% vs 60%. Thus, the 2 LRs could be statistically different but provide clinically similar results. We always provide the results from each study, and astute readers can decide from the point estimates and CIs whether they believe a finding is useful or useless. More statistically experienced readers may recognize that meta-analysis of LRs differs from what they expect. Statisticians, when they accept meta-analysis of diagnostic tests at all, prefer summarizing the DOR as a global measure of test performance. We take a different approach because summarizing the DOR gives clinicians a value that they cannot use for individual patients. Although we do sometimes provide summary measures of the DOR, the summary measures of the prevalence of disease (pretest probability) and the LR are the values needed for solving the equation for posttest probability. Sometimes, we encounter studies that only provide sensitivity data. What do we do with studies that are case series of patients with disease and that do not have specificity values?

“Sensitivity-Only” Studies When conditions are less common, investigators recognize that enrolling consecutive patients at risk for the target disorder creates a study population overwhelmed by those without disease. This approach is costly and takes time, and the small number of patients with disease leads to broad CIs around the sensitivity and LR–. The alternate approach of studying only patients with disease so that sensitivity can be

Primer on Precision and Accuracy

defined is pragmatic, and it may be the best the investigator can do. These studies typically come from a narrow spectrum of diseased patients, and often, the clinical finding is recorded among patients when the clinician knows that disease is present. In addition to understanding the potential biases in the data, we must understand the inferences made from the sensitivity of symptoms and signs without specificity values. The goal of sensitivity studies is to identify a group of symptoms and signs that would unlikely all be negative in a patient with the target condition. Symptoms and signs with high sensitivity are less likely to be negative in patients with disease. When presented with sensitivity data by itself, clinicians will count the number of absent findings in their patients and deduce that those with normal findings on multiple high-sensitivity symptoms and signs will be unlikely to have disease. For example, suppose we identify 2 symptoms and 1 sign, each of which has a sensitivity of 85% for the target condition. That means that each finding would be absent in 15% of patients with disease; all 3 would be absent in fewer than 1% of patients (0.15 × 0.15 × 0.15).

How Do We Use All the Symptoms and Signs? Among several reasons for preferring LRs as our common statistical parameter, rather than the individual sensitivity and specificity values, the ability to multiply likelihood results from several findings is the most alluring. Unfortunately, a crucial assumption is not often fully addressed— sequentially multiplying LRs requires that the symptoms and signs be independent of one another. Let us explain the independence concept with a simple example. Suppose you conduct a study of chest pain symptoms as a predictor of acute ischemia and you categorize words as having “physical” or “emotional” connotations. Words that describe location and radiation would be physical (eg, “center of the chest,” “in the neck”), whereas words that describe the interpretation of pain would be emotional (eg, “suffocating,” “crushing”). You decide to record whenever a patient refers to an “elephant” in describing their discomfort as emotional as in, “It felt like an elephant stepped on my chest.” We suspect it is obvious that a patient who is “elephant-positive” is experiencing crushing pain, but if they report they are having “crushing pain that feels like an elephant on my chest,” should we report the findings separately for “crushing positive” and “elephant positive?” Multiplying the LRs together for “crushing,” “elephant-like” discomfort probably overstates the importance, producing a posttest odds ratio that is too high because elephant-like pain is not independent of crushing pain. Although common sense might work as an initial judge of independence, common sense should not be the only arbiter of independence. What should you do when presented with an array of findings for many symptoms and sign without any assessment of independence? To make teaching and performing the medical history and physical examination more efficient and accurate, we want parsimony. By “parsimony,” we mean the fewest number of symptoms and signs that yield the most accurate information. Parsimonious examinations force teachers to teach only 13

CHAPTER 1

Update

the most relevant parts of the examination, allowing students to spend more time learning what is important while eliminating wasteful maneuvers. Of course, some of this waste is in eliminating maneuvers that do not work well. For example, a Rinne test is interesting to teach, but it does not add useful diagnostic information to the symptom of “decreased hearing” reported by the patient.5 We eliminate additional wasted effort when we discard nonindependent findings. A parsimonious examination should mathematically make us more accurate because a “complete” medical history and physical examination almost certainly produces nonindependent findings. “Positive” nonindependent findings confuse us and distort our probability estimates, typically making us infer a higher probability of disease than is justified. Most authors of The Rational Clinical Examination articles emphasize no more than 3 to 4 findings, even when additional symptoms and signs have useful LRs. Narrowing down the number of recommended findings requires “face validity,” by which we mean using common sense to recommend the items with the best, seemingly independent LRs. When we take this approach, experienced clinicians then use semiquantitative reasoning and deduce that the more findings present, the more likely the patient has disease (or vice versa). When clinicians want to incorporate the results of diagnostic studies into their decision making, they can take 3 approaches to prevent errors created by lack of independence.6 Performing the clinical examination and then using only one single history or physical examination finding to adjust the prior odds will guarantee there is no problem with independence. (Of course, it also guarantees that the clinician might be ignoring a lot of useful clinical information!) Typically, the clinician will want to use the single finding that has the greatest effect on the prior odds, or the “best” finding that we described earlier. The approach is not difficult since simple math allows you to rank the findings in order from most useful to least useful. Suppose you have 3 findings (A, B, and C) that can each be positive or negative, with the LRs associated with each result shown in Table 1-6. Is the finding

Table 1-6 The Findings With the Biggest Influence Can Be Found by Rank Ordering the LR+ and LR–a Finding

LR

LR for Values > 1 and 1/LR for Values < 1b

A present C absent B present C present B absent A absent

15 0.1 5.0 2.0 0.6 0.9

15 10 5.0 2.0 1.7 1.1

Abbreviations: LR, likelihood ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Adapted from Holleman and Simel.6 b For LRs < 1.0 (usually the LR–), the reciprocal (1/LR) is used.

14

that “A” is present more diagnostically useful than “C’s” absence? To determine this, you can rank order these by comparing the LR for the positive results to 1/LR for the negative results. Table 1-6 shows the relative value each of the findings. If your patient had “A” absent, “C” present, and “B” present, then you would multiply the prior odds by the LR associated with the outcome for test “B” (LR = 5.0) because it had had the most useful outcome for that individual. Although the above result removes any concerns with independence, the clinician must collect many data that ultimately are discarded. At the very least, it is not efficient, and at the worst, important information could be ignored. Not surprisingly, this approach lacks appeal because it ignores the way most clinicians incorporate many bits of information into their decision making. Clinical researchers must analyze their data in a multivariate way to help clinicians. By “multivariate,” we mean that they must analyze combinations of findings so that there is less concern about independence. This can involve one of 2 general approaches. The easiest approach is to take the medical history and physical examination findings and perform logistic regression. Logistic regression takes a number of individual variables and determines their importance in predicting whether disease is present or absent. In the first strategy for assessing independence, logistic regression identifies variables that lack independence and that can be eliminated as redundant. In our example above, if all patients with wheezing were also dyspneic, then the finding on the “variable” dyspnea might be unimportant once we know the wheezing status. The logistic regression approach would identify this as being nonsignificant, and the investigator would suggest we concentrate our efforts at assessing for wheezing. Used as a “data-reduction” step to achieve parsimony, the clinician would use the simple, univariate LRs for any finding identified as being independently useful in the logistic model. This approach has a lot of appeal because it identifies the important and useful variables for the clinician, and it does not require that they understand the logistic model itself, because the univariate LRs are used. However, in using the simple, unadjusted LRs, we ignore the relationship between the various clinical findings in favor of simplicity. The β parameters of a multivariate logistic analysis describe the relative importance of symptoms and signs. From algebra, you might remember the equation for a straight line is y = mx + b. The m in the equation is the slope, and it quantifies how a change in x affects y.* A logistic model works similarly, except that now, rather than having 1 x, we have several symptoms and signs that we evaluate all at once. The equivalent of m in the logistic model now represents the β parameter, which is the odds ratio associated with each symptom or sign; the higher the β parameter, the more important the finding. When investigators provide us the actual multivariate models, we can put the results of our own patient’s clinical examination into the model, and the outcome is the individual patient’s actual probability of disease. *For those who just cannot remember b, it is the intercept where the line crosses the y-axis.

CHAPTER 1

Primer on Precision and Accuracy

The Fuss About Precision

A Brief Word About Quality

The Primer states, “for an item of the clinical history or physical examination to be accurate, it first must be precise.” By precision, we imply that 2 or more observers agree on the presence or absence of a finding in a patient who experienced no clinical changes.* When we measure precision, describing the percentage of time that 2 observers agree on a symptom or sign fails to consider simple luck. Instead of reporting simple agreement, investigators report precision as the agreement beyond that attributable to chance. For dichotomous findings (“yes” vs “no” or “present” vs “absent”) compared between 2 observers, we quantify this agreement beyond chance with the κ statistic.† The κ statistic varies from –1 (perfect disagreement) to 0 (chance agreement) to +1 (perfect agreement). Suppose we are interested in whether a third heart sound identifies patients with LV systolic dysfunction. It is easy to imagine that a cardiologist might be better at identifying this correctly than a generalist internist, suggesting that a κ statistic might show lower agreement beyond chance than if we were comparing 2 general physicians. Should we conclude that a third heart sound is not a good test from the precision between a cardiologist and a general internist? The answer, of course, is no because test accuracy depends on the quality of the observation—the cardiologist might be a better observer than a less experienced clinician. These seemingly imprecise symptoms and signs are potentially useful when certain providers get consistently good results because they represent opportunities for improved performance and accuracy. A second type of precision is more important for identifying inaccurate findings. Although a low κ between observers points to opportunities for improving, poor intraobserver agreement precludes high accuracy unless the problem can be eliminated. Intraobserver agreement describes whether a clinician gets the same result when assessing a symptom or sign on a patient who is clinically unchanged. For example, when a clinician inquires about unilateral headaches as a symptom for migraines but the patient changes his or her answer, the finding can never be accurate or precise. Although the natural assumption might be to blame the patient for inconsistency, part of poor intraobserver agreement may be attributable to poor technique that can be improved. This is true even when applied to symptoms as reported by the patient because different answers follow when the information is solicited differently (eg, asking the patient a leading question about unilateral headaches vs an open-ended question). But if clinicians cannot assure reliability on their own findings, they will never use the symptoms and signs accurately. If you cannot agree with yourself, the LR results will be random.

Every article in The Rational Clinical Examination series and the updates in this book use a standard process for assessing the quality of data. Although the Primer focuses mostly on the sensitivity, specificity, and LR results, it should be clear that narrow CIs around the results do not assure methodologic rigor of the studies that generated the results. At the inception of The Rational Clinical Examination series, the evidencebased medicine movement was in its infancy. An early article in the series heralded its entry into the mainstream thoughts of clinical educators and investigators.7 Because standardized approaches had not been developed for assessing the quality of the medical history and physical examination, David L. Sackett, MD, and Charles H. Goldsmith, PhD, agreed on certain characteristics that they asked their reviewers to use when judging quality. The criteria were simplified and summarized in an early article of the series.8 Subsequently, several groups have published their criteria for the review of diagnostic accuracy studies, although none address the particular nuances of symptoms and signs.9-11 Perhaps it is not surprising that many clinical investigators and epidemiologists have reported on a large number of quality measures that describe what seem like innumerable potential biases in diagnostic test studies. Despite the increasing complexity of rating systems and quality measures, the original criteria for reviewing articles have stood the test of time and pragmatism. If anything, we made the process easier and reduced the number of quality levels a reviewer might assign an article. We reviewed the recommendations for diagnostic test studies9,10 and adapted them specifically for studies of the clinical examination.12 In the early articles appearing in The Rational Clinical Examination series, we assigned Grades for levels of evidence. However, this blurred the distinction between Levels 3, 4, and 5. Because no study accepts Level 5 evidence in making recommendations, we dropped the Grade designation and now report only the Levels as shown in Table 1-7.13

*To clarify further, some researchers use the word reliability or the term observer variability instead of precision. These are all terms that imply the same concept of similar results on repeated examinations, so we use them interchangeably. †We use the weighted κ when we have findings that are not dichotomous. For example, a sign graded as 0, 1, or 2+ would have a disagreement between observers of “grade 1 and 2” weighted as less than a discrepancy between “grade 0 and 2.” When we have multiple observers, we use regression techniques to generate the intraclass correlation coefficient for describing the interobserver variability.

Table 1-7 Levels of Evidencea Level of Evidence Grade 1

A

2

B

3

C

4

C

5

C

a

Definition Independent blinded comparison of sign or symptom results with a criterion standard of diagnosis among a large number of consecutive patients suspected of having the target condition Independent blinded comparison of sign or symptom with a criterion standard of diagnosis among a small number of consecutive patients suspected of having the target condition Independent blinded comparison of sign or symptom with a criterion standard of diagnosis among nonconsecutive patients suspected of having the target condition Nonindependent comparison of sign or symptom with a criterion standard of diagnosis among samples of patients who obviously have the target condition plus, perhaps, normal individuals Nonindependent comparison of sign or symptom with a standard of uncertain validity

Modified from Holleman and Simel.13

15

CHAPTER 1

Update

Table 1-8 Hypothetical Data in Which Only the Patients Who Received Neuroimaging Appear in the Published Report

Table 1-9 Hypothetical Data, Adjusted for the Patients Who Did Not Receive Neuroimaging

Target Condition Finding

Present

Absent

Present Absent

90 10 Sensitivity = 0.90 Specificity = 0.90

10 90

Target Condition Finding LR+ = 9.0 LR– = 0.11

Abbreviations: LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Most of the important biases that compromise a study’s results follow from the study population not being consecutive, prospective, or independently assessed with an appropriate blindly applied reference standard. By consecutive, we mean that the authors enrolled all patients for whom the target disorder was a reasonable consideration. Independent means that the symptom or sign under study was not used to select patients for the study. Blind means that the symptoms and signs were applied without knowledge of the presence of disease determined by the reference standard, but also that the reference standard was interpreted without knowledge of the study questions. The size of a study (level 1 vs level 2) for quality assessment depends on the disease under consideration. The authors of The Rational Clinical Examination evaluate sample sizes according to their review of the literature because there is no uniform number that determines quality; for example, a large study of thoracic aortic aneurysms might likely not have as many patients as a large study of urinary tract infection in women. One particular bias, verification bias, deserves special consideration because it can be insidious and have a big effect on the LR. Verification bias occurs when all the potentially eligible patients fail to undergo confirmation of their disease status. Often, this is done for pragmatic reasons. An example might be a study of headache patients that seeks to describe whether asymmetric neurologic findings (eg, weakness) indicating serious intracranial abnormalities were discovered through neuroimaging. Because it would be expensive and impractical to have every patient with headaches undergo imaging, an investigator typically chooses to maximize the chance of finding something by including all patients with asymmetric muscle strength but only a sample of those who are normal. We can highlight the effect of verification bias on the sensitivity, specificity, and LRs, through examining tables of example data. Suppose an investigator reports the findings displayed in Table 1-8. In the example, the finding looks excellent, with a sensitivity and specificity of 90%. However, because the investigator could not justify the reference standard (eg, neuroimaging on every patient with a headache), the investigative team referred only a sample of those with positive clinical findings (for illustrative purposes, 10%). Had the investigator been evaluating every patient, the findings might have been as shown in Table 1-9. The data demonstrate that verification bias tends to overestimate sensitivity while underestimating specificity.* When 16

Present Absent

Present

Absent

90 10 10/0.10 = 100 90/0.10 = 900 Sensitivity = 0.47 Specificity = 0.99

LR+ = 43 LR– = 0.53

Abbreviations: LR+, positive likelihood ratio; LR–, negative likelihood ratio.

the bias is left unadjusted, the investigator will not recognize that the presence of the finding is actually better than suggested (the adjusted LR+ should be higher), whereas the absence of the finding is not as good as suggested (the adjusted LR– should be closer to 1). Astute investigators will recognize that if they collect complete data on all the potentially eligible patients, the bias is one of the few in diagnostic test research that can be mathematically corrected.

REFERENCES FOR THE UPDATE 1. Glas AF, Ligmer JG, Prins MH, Bonsel GJ, Bossuyt PMM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56(11):1129-1135. 2. Simel DL, Samsa GP, Matchar DB. Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991;44(8):763-770. 3. Hasselblad V, Hedges LV. Meta-analysis of screening and diagnostic tests. Psychol Bull. 1995;117(1):167-178. 4. Devillé WL, Buntix F, de Vet R, Ligmer J, Montori V. Guidelines for conducting systematic reviews of studies evaluating the accuracy of diagnostic tests. In: Knotterus JA, ed. The Evidence Base of Clinical Diagnosis. London, England: BMJ Books; 2002. 5. Bagai A, Thavendiranathan P, Detsky AS, Simel DL, Rennie D, eds. Does this patient have hearing impairment? JAMA. 2006;295(4):416-428. 6. Holleman DR, Simel DL. Quantitative assessments from the clinical examination: how should clinicians integrate the numerous results? J Gen Intern Med. 1997;12(3):165-171. 7. Guyatt G, Cairns J, Churchill D, et al; Evidence-Based Medicine Working Group. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA. 1992;268(17):2420-2425. 8. Holleman DR, Simel DL. Does the clinical examination predict airflow limitation? JAMA. 1995;273(4):313-319. 9. Bossuyt PMM, Reitsma JB, Bruns DE, et al; for the STARD Group. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Chem. 2003;49:1-6. 10. Bossuyt PMM, Reitsma JB, Bruns DE, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem. 2003;49(1):7-18. 11. Whiting P, Rutjes AWS, Reitsma JB, et al. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Tech. 2003;3:25. 12. Simel DL, Rennie D, Bossuyt PM. The STARD statement for reporting diagnostic accuracy studies: application to the history and physical examination. J Gen Intern Med. 2008;23(6):768-774. 13. Holleman DR Jr, Simel DL. Does the clinical examination predict airflow limitation? JAMA. 1995;273(4):313-319.

*Verification bias can work in the opposite direction, although that is not usually the case.

C H A P T E R

2

Does This Patient Have Abdominal Aortic Aneurysm? Frank A. Lederle, MD David L. Simel, MD, MHS

CLINICAL SCENARIOS CASE 1 A 60-year-old man requests a physical exami-

nation because a friend recently died suddenly from a ruptured abdominal aortic aneurysm (AAA). Your examination reveals nothing abnormal. After reassuring the patient, you are left wondering whether you might have missed an AAA large enough to warrant surgical repair. CASE 2 A thin 80-year-old woman observes that she can feel her abdomen pulsating against her belt. While examining her abdomen, you find an easily palpable, strongly pulsating aorta that you measure to be about 2 cm wide. You wonder whether you should order an ultrasonographic examination. CASE 3 You are asked to see a 75-year-old man with 12

hours of right flank and abdominal pain, constipation, urinary frequency, urgency, dysuria, and leukocytosis and who is about to be sent home on treatment for pyelonephritis. Deep palpation of the abdomen is difficult, but you faintly discern a large pulsatile mass. You order computed tomography, which confirms an AAA with bleeding into the retroperitoneum, and the patient is taken to the operating room.

WHY IS PHYSICAL DIAGNOSIS OF AAA IMPORTANT? Abdominal aortic aneurysms cause more than 10 000 deaths each year in the United States,1 and many of these deaths should be preventable through timely diagnosis and treatment. AAAs usually remain asymptomatic while slowly enlarging during a period of years or even decades. About a third will eventually rupture, an event associated with a mortality rate of 80%.2 Important risk factors for AAA include age, male sex, and smoking.3 Abdominal palpation was the original method of AAA detection. When ultrasonography and computed tomography became available, it was clear that they were more accurate than palpation, and these became the procedures of choice for confirming the diagnosis of AAA and for measurement of AAA diameter. A variety of studies have shown the sensitivity and specificity of ultrasonography and computed tomography to be close to 100%.4-8 Since then, the importance of abdominal palpation has been limited to identifying patients who should have confirmatory imaging studies. In one recent report, 31% of all AAAs diagnosed at a university hospital were originally detected by routine physical examination.9 The first scenario addresses the issues of screening (or case finding) to detect AAA and the subsequent management of asymptomatic AAA, 2 subjects of considerable debate in recent literature. Although most of the discussion of screenCopyright © 2009 by the American Medical Association. Click here for terms of use.

17

CHAPTER 2

The Rational Clinical Examination

ing has focused on the use of ultrasonography, the only study to consider both methods found screening with abdominal palpation to be more cost-effective.10 In a review of the periodic physical examination, abdominal palpation for AAA was one of the few maneuvers recommended for older men.11 The Canadian Task Force on the Periodic Health Examination observed that abdominal palpation of men older than 60 years was prudent,12 but both the Canadian and the US Preventive Services Task Forces gave each AAA screening method a C rating (poor evidence to include or exclude from the periodic health examination), and some authors have judged the accuracy of abdominal palpation for AAA to be insufficient for screening.13 Management is based on observations that the risk of AAA rupture (and hence the need for elective repair) increases with the diameter of the aneurysm. The diameter of asymptomatic AAA above which repair should be offered to good surgical candidates is the topic of ongoing clinical trials,14 and current recommendations range from 4.0 to 6.0 cm, with 5.0 cm as the cutoff point most commonly used.15 Patients with AAAs that do not yet warrant repair are followed up with ultrasonography once or twice a year to detect enlargement that might warrant repair. The second scenario represents what has been termed the students’ aneurysm.16 Realization that these symptoms and physical findings are normal allows the physician to provide immediate reassurance to the patient and makes further testing unnecessary. In the third scenario, abdominal palpation may have been lifesaving. Physical examination should not be relied on to rule out the diagnosis of ruptured AAA, and any patient in whom the diagnosis is considered should undergo ultrasonography or computed tomography. However, there are patients whose clinical likelihood of having a ruptured AAA lies below the physician’s threshold for obtaining an imaging study and for whom physical examination may therefore be decisive. Many physicians are unfamiliar with the varied presentations of ruptured AAAs, so palpation of a widened aorta may be the first suggestion of the diagnosis.17 The importance of the physical examination in these settings depends largely on its accuracy. In this article, the accuracy of physical diagnosis of an AAA is assessed by review and analysis of the available literature. In 1905, Osler18 observed that “no pulsation, however forcible, no thrill, however intense, no bruit, however loud—singly or together—justify [sic] the diagnosis of an aneurysm of the abdominal aorta, only the presence of a palpable expansile tumour.” Accordingly, most of the literature on physical examination to detect AAA has dealt with abdominal palpation to measure the width of the pulsatile mass representing the aneurysmal aorta, but several other physical signs have been considered. In one study, abdominal and femoral bruits and absent femoral pulses had no predictive value.8 Another study found that location of the pulsation more than 3.0 cm caudad of the umbilicus was not predictive of AAA.19 In 1975, Guarino20 stated that the pulsatile mass of AAA could be distinguished by its being moveable laterally but not cephalad or caudad. This observation was not studied, however, and in the current era of readily avail18

able ultrasonography, there may be little value in further increasing the specificity of physical examination once a widened aorta is felt. We are aware of no other putative signs of AAA for which published information is available, so the remainder of this article will be limited to the consideration of abdominal palpation in detecting a widened aorta. Attempts to measure precisely the AAA diameter by abdominal palpation (as opposed to simply differentiating abnormal from normal) have also been studied4,5,21-23 but are of limited importance now that AAA measurements are routinely obtained more accurately from follow-up imaging studies and so will not be considered further.

METHODS We searched MEDLINE for articles from 1966 to August 1998, using a search strategy previously developed for The Rational Clinical Examination series that combined 10 exploded MeSH headings (“physical examination,” “medical history taking,” “professional competence,” “sensitivity and specificity,” “reproducibility of results,” “observer variation,” “diagnostic tests, routine,” “decision support techniques,” “Bayes theorem,” “mass screening”) and 2 text word categories (“physical exam$” and “sensitivity and specificity”), and then we took the intersection of this set with aortic aneurysm (exploded). The resulting set, plus articles in our files, references cited by these articles, and references in textbooks, was reviewed for information pertinent to the clinical examination of AAA. Unpublished information was obtained from the authors of some studies. Series with fewer than 10 patients and those published before 1966 were not considered. No other exclusions (eg, language, publication type) were applied. We assigned each study to a level of evidence according to a system previously developed for this series.24 Level 1 studies are independent, blind comparisons of sign or symptom results with a criterion standard among a large number (sufficient to have narrow confidence limits on the resulting sensitivity, specificity, or likelihood ratio) of consecutive patients suspected of having the target condition. Level 2 studies are independent, blind comparisons of sign or symptom results with a criterion standard among a small number of consecutive patients suspected of having the target condition. Level 3 studies are independent, blind comparisons of signs and symptoms with a criterion standard among nonconsecutive patients suspected of having the target condition. Level 4 studies are nonindependent comparisons of signs and symptoms with a criterion standard among convenience samples of patients who obviously have the target condition plus, perhaps, healthy individuals. Level 5 studies are nonindependent comparisons of signs and symptoms with a standard of uncertain validity (which may even incorporate the sign or symptom result in its definition) among convenience samples of patients and, perhaps, healthy patients. Abdominal aortic aneurysm, to provide consistency in data extraction, was defined as an abdominal aortic diameter of 3.0 cm or greater. There is no widely accepted method of defining

CHAPTER 2 the cutoff point between a normal aorta and an AAA. Imaging studies done in clinical practice are often interpreted according to arterial shape (eg, distal widening), but epidemiologic studies have generally used the simpler measure of unadjusted infrarenal aortic diameter, which has been shown to be associated with rupture risk.25 An infrarenal aortic diameter of 3.0 cm is a commonly used but somewhat controversial cutoff point in published articles, whereas a diameter of 4.0 cm or larger is clearly diagnostic of an AAA. Adjustment of the cutoff point for such factors as age, sex, and body size has been suggested but appears to have little practical value.26 An a priori decision was made to consider intermediate findings on palpation as negative when the uncertainty was due to the aorta’s being impalpable27-30 and positive when the findings were considered suggestive of an AAA (as opposed to definite).8,31 Sensitivity was calculated as the proportion of affected patients with positive findings, specificity as the proportion of unaffected patients with negative findings, and a positive predictive value as the proportion of patients with positive findings who were affected. Likelihood ratios were also calculated; the positive likelihood ratio (LR+) is defined as sensitivity/(1 – specificity) and expresses the increase in the odds of having the disease when the finding is positive (LR+ values are ≥ 1), and the LR– is defined as (1 – sensitivity)/specificity and expresses the decrease in the odds of having the disease when the finding is negative (LR– values are 0-1). Values for true positives, false positives, true negatives, and false negatives were increased by 0.5 when likelihood ratios were computed to avoid division by 0.32 CIs for likelihood ratios from individual studies were computed using the method of Simel et al.33 The studies of AAA screening were judged to be of sufficient quality and similarity of design to assess for statistical similarity. The χ2 tests for heterogeneity of the sensitivity data were not significant (all P > .10), supporting the decision to pool these data.34 However, assessments of heterogeneity of the effectiveness scores (a measure of the effect size of a diagnostic test result) were of borderline significance

Abdominal Aortic Aneurysm

(pooled effectiveness, 1.7; P = .04 using a cutoff of 3.0 cm; pooled effectiveness, 2.1; P = .06 using a cutoff of 4.0 cm).32 Therefore, a random-effects measure was used as a conservative method for pooling the results of these studies, and CIs for the pooled likelihood ratios were calculated by using the method of Eddy and Hasselblad.34

RESULTS Abdominal Palpation for Ruptured AAA Several studies have reported the sensitivity of abdominal palpation in patients with ruptured AAA (Table 2-1).17,35-42 In these studies, it is not clear how often the physical findings suggested the diagnosis of AAA as opposed to being elicited after the diagnosis was made by other methods. The sensitivities tended to be higher when patient selection was limited to those diagnosed antemortem (including operative series). Three series included masses that were described as not pulsatile, and sensitivities with these masses included are reported separately in Table 2-1. Compared with asymptomatic AAAs, ruptured AAAs tend to be larger, which would be expected to increase sensitivity,43 but rupture may also be associated with guarding, intestinal distention caused by compromised circulation, and loss of integrity of the AAA, which could have the opposite effect.

Abdominal Palpation for Asymptomatic AAA Some studies have reported the sensitivity of abdominal palpation in patients with known asymptomatic AAAs (range of sensitivities, 65%-100%).4-7,22,23,36,39,44-49 Most of these studies involved patients undergoing preoperative evaluation for elective repair of large AAAs, and many patients were originally identified by physical examination before referral to the study group. The lack of blinding and the preponderance of large AAAs likely resulted in higher sensitivities than would be achieved in most clinical settings.

Table 2-1 Sensitivity of Abdominal Palpation in Series of Patients With Ruptured Abdominal Aortic Aneurysma Source, y Pryor,35 1972 Williams et al,36 1972 Ottinger,37 1975 McGregor,38 1976 Gordon-Smith et al,39 1978 Gaylis and Kessler,40 1980 Donaldson et al,41 1985 Walsh et al,42 1992 Lederle et al,17 1994

No. of AAAs

Sensitivity of Palpation (%) b

Patient Selection

44 79 40 41 83 105 81 55 23

45 (82) 97 75 (100) 44 (51) 90 87 91 64 52

All Operated on Diagnosed antemortem Unoperated on at autopsy Operated on Diagnosed antemortem Not stated All Presented to internist

Abbreviation: AAA, abdominal aortic aneurysm. a All studies provide level 4 evidence (see “Methods” section). b Numbers in parentheses represent the sensitivity if nonpulsatile masses are included.

19

CHAPTER 2

The Rational Clinical Examination

Other studies have reported the positive predictive value of clinical suspicion for AAAs in a series of patients referred for imaging studies (range of positive predictive values, 15%91%).6,13,21,31,48-53 The wide range of values may reflect possible inclusion in some studies of patients with previous diagnostic imaging studies before their referral to the study group (falsely increasing positive predictive value) and of patients referred for ruling out AAA according to indications other than palpation of a widened aorta (potentially falsely increasing or decreasing positive predictive value). Two studies provide results by age and sex, indicating that the highest positive predictive values are obtained in men older than 60 years, with low values (50 >65 39-90 60-75 65-74 31-83 38-86

0 0 43 25 0 0 36 42

50 200 168 100 201 426 101 288

3 14 3 15 20 23d 4 14

0 64 0 33 45 35 0 29

2 7 2 10 10 NA 2 NA

0 43 0 0 40 NA 0 NA

1 3 0 3 5 NA 0 NA

0 100 … 100 20 … … …

0 4 1 2 5 NA 2 NA

… 75 0 100 80 … 0 …

… 64 0 100 35 36 … 31

12 (0.3-528) 21 (8.7-53) 1.6 (0.1-23) 59 (3.4-1018) 4.7 (2.5-9.0) 9.9 (4.7-21) 20 (0.4-890) 8.7 (3.2-23)

17-67

13

163

10

70

3

0

4

100

3

100

26

5.1 ( 2.9-9.1) 0.37 (0.15-0.87)

7.2 (4.6-11)

0.07 (0-1.0)

NA

36

200

55

24

33

0

16

44

6

100

72

6.4 (2.5-16)

0.79 (0.68-0.92)

19 (7.8-47)

0.43 (0.26-0.69)

55-82 65-83 60-80

41 53 29

89 411 392

9 7 7

100 43 57

2 2 1

100 50 0

5 3 4

100 33 50

2 2 2

100 50 100

82 33 57

31 (9.0-105) 27 (9.1-81) 62 (18-208)

0.05 (0-0.77) 0.57 (0.31-1.0) 0.44 (0.20-3.0)

17 (6.9-43) 23 (6.9-74) 71 (22-231)

0.07 (0-0.97) 0.59 (0.30-1.2) 0.36 (0.13-0.97)

55-81 …

0 26

96 2955

1 194

100 39

1 75

100 29

0 44

… 50

0 29

… 76

14 43

11 (3.7-33) 12 (7.4-19)

0.27 (0.02-3.0) 0.72 (0.65-0.81)

6.5 (0.8-52) 16 (8.6-28)

0.54 (0.08-3.8) 0.51 (0.38-0.67)

0.88 (0.61-1.3) 25 (0.6-968) 0.76 (0.34-1.7) 0.38 (0.19-0.74) 18 (8.9-39) 0.20 (0.05-0.83) 0.95 (0.65-1.4) 3.3 (0.3-39) 0.81 (0.36-1.8) 0.66 (0.46-0.94) 176 (11-2823) 0.08 (0.01-1.2) 0.61 (0.41-0.90) 4.5 (2.2-9.1) 0.56 (0.31-1.0) 0.67 (0.50-0.90) … … 0.90 (0.68-1.2) 33 (0.8-1415) 0.84 (0.50-1.4) 0.73 (0.52-1.0) … …

CHAPTER 2 Abdominal Aortic Aneurysm

Abbreviations: AAA, abdominal aortic aneurysm; CI, confidence interval; LR+, positive likelihood ratio; LR–, negative, likelihood ratio; NA, data not available. a Includes unpublished information received from authors. All studies used ultrasonography and provide level 2 evidence. The pooled results for numbers are sums and for functions are from a random-effects measure and provide level 1 evidence (see “Methods” section). Abdominal aortic aneurysm is defined as at least 3.0 cm by ultrasonography. b No information was given on AAA diameter. c Ellipses indicate values cannot be calculated. d Abdominal aneurysms less than 3 cm are included.

21

CHAPTER 2

The Rational Clinical Examination

tion of any of several thousand AAAs seen over four decades.”65 We are aware of no educational studies examining methods of learning AAA palpation. In our experience, however, accurate palpation is readily learned through practice and feedback. We have found that physicians can become proficient after comparing their findings with ultrasonographic measurements in a few patients with AAAs and a few controls.

Bottom Line The only physical examination maneuver of demonstrated value for the diagnosis of an AAA is abdominal palpation to detect a widened aorta. Palpation of AAA appears to be safe and has not been reported to precipitate rupture. Positive findings on abdominal palpation greatly increase the likelihood that an AAA, particularly a large AAA, is present. Even so, the positive predictive value of 43% (Table 2-2) indicates that less than half of all high-risk patients (and fewer low-risk patients, such as most women and young men) suspected of having an enlarged aorta on abdominal palpation will be found to have an AAA. However, this may not be of great concern because ultrasonography provides a safe and relatively inexpensive confirmatory test. Abdominal palpation will detect most AAAs large enough to warrant surgery, but it cannot be relied on to rule out the diagnosis. The sensitivity of palpation appears to be reduced by abdominal obesity and by routine abdominal examination not specifically directed at measuring aortic width. When a ruptured AAA is suspected, imaging studies such as ultrasonography or computed tomography should be performed regardless of physical findings. Author Affiliations at the Time of the Original Publication

Departments of Medicine, Minneapolis Veterans Affairs Medical Center, University of Minnesota, Minneapolis (Dr Lederle), and Durham Veterans Affairs Medical Center, Duke University, Durham, North Carolina (Dr Simel) Acknowledgments

The authors thank Andreas Laupacis, MD, and Kavita Nanda, MD, for their helpful reviews of the article.

REFERENCES 1. Gillum RF. Epidemiology of aortic aneurysm in the United States. J Clin Epidemiol. 1995;48(11):1289-1298. 2. Ingoldby CJH, Wujanto R, Mitchell JE. Impact of vascular surgery on community mortality from ruptured aortic aneurysms. Br J Surg. 1986; 73(7):551-553. 3. Lederle FA, Johnson GR, Wilson SE, et al; Aneurysm Detection and Management (ADAM) Veterans Affairs Cooperative Study Group. Prevalence and associations of abdominal aortic aneurysm detected through screening. Ann Intern Med. 1997;126(6):441-449. 4. Hertzer NR, Beven EG. Ultrasound aortic measurement and elective aneurysmectomy. JAMA. 1978;240(18):1966-1968. 5. Graeve AH, Carpenter CM, Wicks JD, Edwards WS. Discordance in the sizing of abdominal aortic aneurysm and its significance. Am J Surg. 1982;144(6):627-634. 6. Nusbaum JW, Freimanis AK, Thomford NR. Echography in the diagnosis of abdominal aortic aneurysm. Arch Surg. 1971;102(4): 385-388.

22

7. Lee KR, Walls WJ, Martin NL, Templeton AW. A practical approach to the diagnosis of abdominal aortic aneurysms. Surgery. 1975;78:195-201. 8. Lederle FA, Walker JM, Reinke DB. Selective screening for abdominal aortic aneurysms with physical examination and ultrasound. Arch Intern Med. 1988;148(8):1753-1756. 9. Kiev J, Eckhardt A, Kerstein MD. Reliability and accuracy of physical examination in detection of abdominal aortic aneurysms. Vasc Surg. 1997;31(2):143-146. 10. Frame PS, Fryback DG, Patterson C. Screening for abdominal aortic aneurysm in men ages 60 to 80 years. Ann Intern Med. 1993;119(5):411-416. 11. Oboler SK, LaForce FM. The periodic physical examination in asymptomatic adults. Ann Intern Med. 1989;110(3):214-226. 12 Canadian Task Force on the Periodic Health Examination. Periodic health examination, 1991 update, 5. CMAJ. 1991;145(7):783-789. 13. Beede SD, Ballard DJ, James EM, Ilstrup DM, Hallet JW. Positive predictive value of clinical suspicion of abdominal aortic aneurysm. Arch Intern Med. 1990;150(3):549-551. 14. Lederle FA, Wilson SE, Johnson GR, et al; ADAM VA Cooperative Study Group. Design of the Abdominal Aortic Aneurysm Detection and Management (ADAM) Study. J Vasc Surg. 1994;20(2):296-303. 15. Ballard DJ, Etchason JA, Hilborne LH, et al. Abdominal Aortic Aneurysm Surgery: A Literature Review and Ratings of Appropriateness and Necessity. Santa Monica, CA: RAND; 1992. 16. Fowler NO. Diseases of the aorta. In: Wyngaarden JB, Smith LH, eds. Cecil Textbook of Medicine. 17th ed. Philadelphia, PA: WB Saunders Co; 1985:345-353. 17. Lederle FA, Parenti CM, Chute EP. Ruptured abdominal aortic aneurysm: the internist as diagnostician. Am J Med. 1994;96(2):163-167. 18. Osler W. Aneurysm of the abdominal aorta. Lancet. 1905;2:1089-1096. 19. Collin J, Araujo L, Walton J, Lindsell D. Oxford screening programme for abdominal aortic aneurysm in men aged 65 to 74 years. Lancet. 1988;332(8611):613-615. 20. Guarino JR. Abdominal aortic aneurysm. J Kans Med Soc. 1975;76(5): 108, 15A. 21. McGregor JC, Pollock JG, Anton HC. The value of ultrasonography in the diagnosis of abdominal aortic aneurysm. Scott Med J. 1975;20 (3):133-137. 22. Brewster DC, Darling RC, Raines JK, et al. Assessment of abdominal aortic aneurysm size. Circulation. 1977;56(suppl 2):164-169. 23. Buxton B, Buttery B, Buckley J. The measurement of abdominal aortic aneurysms. Aust N Z J Surg. 1978;48(4):387-389. 24. Holleman DR, Simel DL. Does the clinical examination predict airflow limitation? JAMA. 1995;273(4):313-319. 25. Nevitt MP, Ballard DJ, Hallett JW. Prognosis of abdominal aortic aneurysms: a population-based study. N Engl J Med. 1989;321(15):1009-1014. 26. Lederle FA, Johnson GR, Wilson SE, et al; ADAM VA Cooperative Study Investigators. Relationship of age, gender, race, and body size to infrarenal aortic diameter. J Vasc Surg. 1997;26(4):595-601. 27. Cabellon S, Moncrief CL, Pierre DR, Cavanaugh DG. Incidence of abdominal aortic aneurysms in patients with atheromatous arterial disease. Am J Surg. 1983;146(5):575-576. 28. MacSweeney ST, O’Meara M, Alexander C, O’Malley MK, Powell JT, Greenhalgh RM. High prevalence of unsuspected abdominal aortic aneurysm in patients with confirmed symptomatic peripheral or cerebral arterial disease. Br J Surg. 1993;80(5):582-584. 29. al Zahrani HA, Rawas M, Maimani A, Gasab M, Aba al Khail BA. Screening for abdominal aortic aneurysm in the Jeddah area, western Saudi Arabia. Cardiovasc Surg. 1996;4(1):87-92. 30. Arnell TD, de Virgilio C, Donayre C, Grant E, Baker JD, White R. Abdominal aortic aneurysm screening in elderly males with atherosclerosis: the value of physical exam. Am Surg. 1996;62(10):861-864. 31. Robicsek F, Daugherty HK, Mullen DC, Tam W, Scott WP. The value of angiography in the diagnosis of unruptured aneurysms of the abdominal aorta. Ann Thorac Surg. 1971;11(6):538-550. 32. Hasselblad V, Hedges LV. Meta-analysis of screening and diagnostic tests. Psychol Bull. 1995;117(1):167-178. 33. Simel DL, Samsa GP, Matchar DB. Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991;44(8):763-770. 34. Eddy DM, Hasselblad V. Fast*Pro v1.8: Software for Meta-analysis by the Confidence Profile Method. San Diego, CA: Academic Press; 1992:91-92. 35. Pryor JP. Diagnosis of ruptured aneurysm of abdominal aorta. BMJ. 1972;3(5829):735-736.

CHAPTER 2 36. Williams RD, Fisher FW, Dickey JW Jr. Problems in the diagnosis and treatment of abdominal aortic aneurysms. Am J Surg. 1972;123(6):698701. 37. Ottinger LW. Ruptured arteriosclerotic aneurysms of the abdominal aorta: reducing mortality. JAMA. 1975;233(2):147-150. 38. McGregor JC. Unoperated ruptured abdominal aortic aneurysms. Br J Surg. 1976;63(2):113-116. 39. Gordon-Smith IC, Taylor EW, Nicolaides AN, et al. Management of abdominal aortic aneurysm. Br J Surg. 1978;65(12):834-838. 40. Gaylis H, Kessler E. Ruptured aortic aneurysms. Surgery. 1980;87(3): 300-304. 41. Donaldson MC, Rosenberg JM, Bucknam CA. Diagnosis of ruptured abdominal aortic aneurysm. Conn Med. 1985;49(1):3-6. 42. Walsh JA, Dohnalek JA, Doley AJ, Wiadrowski TP. Ruptured abdominal aortic aneurysms. Med J Aust. 1992;156(2):138. 43. Lofgren RP. The dynamic nature of sensitivity and specificity. J Gen Intern Med. 1987;2(6):452-453. 44. Friedman SA, Hufnagel CA, Conrad PW, Simmons EM, Weintraub A. Abdominal aortic aneurysms: clinical status and results of surgery in 100 consecutive cases. JAMA. 1967;200(13):1147-1151. 45. Bergan JJ, Yao JST, Henkin RE, Quinn JL. Radionuclide aortography in detection of arterial aneurysms. Arch Surg. 1974;109(1):80-83. 46. Volpetti G, Barker CF, Berkowitz H, Roberts B. A twenty-two year review of elective resection of abdominal aortic aneurysms. Surg Gynecol Obstet. 1976;142(3):321-324. 47. Chervu A, Clagett GP, Valentine RJ, Myers SI, Rossi PJ. Role of physical examination in detection of abdominal aortic aneurysms. Surgery. 1995;117(4):454-457. 48. Robicsek F. The diagnosis of abdominal aneurysms. Surgery. 1981;89 (2):275-276. 49. Roberts A, Johnson N, Royle J, Buttery B, Buxton B. The diagnosis of abdominal aortic aneurysms. Aust N Z J Surg. 1974;44(4):360-362. 50. Brewster DC, Retana A, Waltman AC, Darling RC. Angiography in the management of aneurysms of the abdominal aorta: its value and safety. N Engl J Med. 1975;292(16):822-825. 51. Lee TG, Henderson SC. Ultrasonic aortography: unexpected findings. Am J Roentgenol. 1977;128(2):273-276.

Abdominal Aortic Aneurysm

52. Karp W, Eklof B. Ultrasonography and angiography in the diagnosis of abdominal aortic aneurysm. Acta Radiol. 1978;19(6):955-960. 53. Kahn CE Jr, Quiroz FA. Positive predictive value of clinical suspicion for abdominal aortic aneurysm. J Gen Intern Med. 1996;11(12):756-758. 54. Ohman EM, Fitzsimons P, Butler F, Bouchier-Hayes D. The value of ultrasonography in the screening for asymptomatic abdominal aortic aneurysm. Ir Med J. 1985;78(5):127-129. 55. Twomey A, Twomey E, Wilkins RA, Lewis JD. Unrecognised aneurysmal disease in male hypertensive patients. Int Angiol. 1986;5(4):269-273. 56. Allen PI, Gourevitch D, McKinley J, Tudway D, Goldman M. Population screening for aortic aneurysms [letter]. Lancet. 1987;2(8561):736. 57. Allardice JT, Allwright GJ, Wafula JM, Wyatt AP. High prevalence of abdominal aortic aneurysm in men with peripheral vascular disease: screening by ultrasonography. Br J Surg. 1988;75(3):240-242. 58. Shapira OM, Pasik S, Wassermann JP, Barzilai N, Mashiah A. Ultrasound screening for abdominal aortic aneurysms in patients with atherosclerotic peripheral vascular disease. J Cardiovasc Surg (Torino). 1990;31(2): 170-172. 59. Andersson AP, Ellitsgaard N, Jorgensen B, et al. Screening for abdominal aortic anurysm in 295 outpatients with intermittent claudication. Vasc Surg. 1991;25(7):516-520. 60. Spiridonov AA, Omirov ShR. Selective screening for abdominal aortic aneurysms through clinical examination and ultrasonic scanning [in Russian]. Grud Serdechnososudistaia Khir. 1992;9-10:33-36. 61. Karanjia PN, Madden KP, Lobner S. Coexistence of abdominal aortic aneurysm in patients with carotid stenosis. Stroke. 1994;25(3):627-630. 62. Molnar LJ, Langer B, Serro-Azul J, Wanjgarten M, Cerri GG, Lucarelli CL. Prevalence of abdominal aneurysm in the elderly [in Portuguese]. Rev Assoc Med Bras. 1995;41(1):43-46. 63. Craig SR, Wilson RG, Walker AJ, Murie JA. Abdominal aortic aneurysm: still missing the message. Br J Surg. 1993;80(4):450-452. 64. Cronenwett JL, Sargent SK, Wall WH, et al. Variables that affect the expansion rate and outcome of small abdominal aortic aneurysms. J Vasc Surg. 1990;11(2):260-269. 65. Joyce JW. Examination of the patient with vascular disease. In: Loscalzo J, Creager MA, Dzau VJ, eds. Vascular Medicine. Boston, MA: Little Brown & Co; 1992:401-418.

23

This page intentionally left blank

UPDATE:

Abdominal Aortic Aneurysm

2

Prepared by Frank Lederle, MD Reviewed by Ed Etchells, MD

CLINICAL SCENARIO You are performing a physical examination on an obese 65-year-old man. You have been thorough with abdominal palpation and allowed the abdominal muscles to relax enough so that you to feel the aortic pulsation. You estimate it to be 2 cm wide, which is normal. Because you have heard that abdominal palpation is less accurate in obese patients, you wonder whether the examination findings exclude abdominal aortic aneurysm (AAA).

UPDATED SUMMARY ON ABDOMINAL AORTIC ANEURYSM Original Review Lederle FA, Simel DL. Does this patient have abdominal aortic aneurysm? JAMA. 1999;281(1):77-82.

UPDATED LITERATURE SEARCH We reviewed all citations listed under “exp aortic aneurysm” in MEDLINE, from 1998 to July 2004. The search yielded 7590 titles. We also searched personal files maintained on the topic since the original publication. We reviewed titles and abstracts to identify new studies that met the original inclusion and exclusion criteria, focusing on large studies that included information on the sensitivity or specificity of the physical examination for abdominal aneurysms in the general population. The review identified only 1 article that met our inclusion criteria.

NEW FINDINGS • The interobserver variability for detecting aneurysms is good. • The sensitivity of the examination is better for smaller patients than for larger patients. However, the sensitivity in larger patients is still good when the aorta can be palpated. • When the patient cannot “relax” the abdomen, clinicians should be aware that they are more likely to “miss” an aneurysm.

Details of the Update Abdominal palpation continues to be an important method for diagnosing AAA. In a recent study from a UK district general hospital, 48% of all AAAs were diagnosed by physical examination1 compared with 31% in reference 9 of the original Rational Clinical Examination article. A study published after the original review evaluated patient factors such as abdominal obesity, girth, and tightness and the effect of a palpable aorta, which might have an effect on the accuracy of the clinical evaluation. In addition, the investigators provided information on interobserver variability in abdominal palpation for AAA.2 The only pragmatic way to conduct such an evaluation is through the evaluation of patients with and without an aneurysm. In this study of 200 subjects, 99 with and 101 without AAA, the interobserver pair agreement for AAA vs no AAA between the first and second examination was 77% (κ = 0.53). The sensitivity of the examination improves with increasing size of the aneurysm. For aneurysms 5 cm or larger, the sensitivity was 82%. Not surprisingly, the examiners also had better sensitivity in thinner subjects (abdominal girth less than 100 cm [40-in waistline]) than in more obese subjects (sensitivity, 91% vs 53% for girth of 100 cm or more). Even when girth was 100 cm or more, if the aorta was palpable, sensitivity was 82%. Physicians sometimes have trouble palpating the abdominal aorta when patients cannot “relax” their abdomen. This study confirmed that the examiners’ assessment that the abdomen was not tight improved their accuracy in detecting aneurysms (odds ratio, 2.7; 95% confidence interval, 1.2-6.1). In another study, 125 subjects with AAA and 39 without AAA underwent abdominal palpation with a vascular surgeon, a nurse, and the patient.3 The vascular surgeon and nurse knew of the high prevalence of AAA in the sample, but they did not know an individual patient’s diagnosis. For vascular surgeons, sensitivity was 57% for AAAs less than 4.0 cm but more than 97% for AAAs larger than 4.0 cm. The accuracy of nurses and patients was similar to that of the surgeons, which is surprising because the patients used palpable pulsation as the only criterion for diagnosing AAA. The κ value for agreement between surgeons and nurses was high, at 0.92, and agreement of either with the patient was nearly as high. Factors independently associated with false negatives were smaller AAA diameter and higher body mass index. The extremely high sensitivities, presumably related to the exam25

CHAPTER 2

Update

Table 2-3 The More Certain the Examiner Feels About the Findings, the More Likely They Are Correct Clinical Impression Examination “definite” for aneurysm Examination “suggestive” Examination “normal”

LR+ (95% CI) 4.8 (2.7-8.8) 1.4 (0.92-2.1) 0.43 (0.35-0.54)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio.

iners’ knowledge of the high prevalence of AAA, raise questions about the study’s generalizability. The largest sensitivity study to date was recently reported from Brazil.4 The first 3000 subjects to call in response to an advertising campaign were scheduled for screening. The study group consisted of 2756 subjects who responded to an advertising campaign, were older than 50 years, had no previous diagnosis of AAA, and had an adequate ultrasonographic examination. Each subject underwent abdominal palpation with a vascular surgeon and ultrasonography. It is unclear whether palpation was blinded to ultrasonographic findings. There were 64 AAAs 3.0 cm or larger identified by ultrasonography. Sensitivity and positive predictive value of a positive abdominal palpation result were 31% and 33%, respectively. This sensitivity was somewhat lower than in previous studies, possibly reflecting reduced examiner vigilance resulting from the size of the study. Several other studies since the original review added useful information but did not meet our inclusion criteria. A pulsatile mass may be present after endovascular repair of AAA, potentially leading to diagnostic confusion.5 A cohort study from the Medical Research Council Thrombosis Prevention Trial examined the result of abdominal palpation of the aorta by general practitioners in 4171 men from 1992 to 1994.6 Abdominal aortic aneurysm was suspected in 60 men and confirmed in 25 (positive predictive value, 42%). By mid1996, 6 men died of ruptured AAA who had not been suspected of having AAA on palpation, suggesting that sensitivity of palpation to detect clinically important AAA was less than 81%. In an older study addressing predictive value, only 1 of 29 consecutive patients presenting to the Massachusetts General Hospital emergency department in the 1970s with tender pulsatile mass without hypovolemia actually had AAA.7

26

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION A new study allows us to assess the likelihood of an aneurysm according to clinicians’ confidence in their examination findings and the accuracy of the examination related to various patient factors such as obesity (see Table 2-3). Whereas the original publication observed that 5 cm was the threshold most commonly used for considering surgery, 2 large randomized trials show no benefit of repair for aneurysms with a diameter of less than 5.5 cm.8,9

CHANGES IN THE REFERENCE STANDARD There are no changes in the reference standard.

RESULTS OF LITERATURE REVIEW Univariate Findings for AAA The efficiency of the examination depends on the confidence in your findings.

EVIDENCE FROM GUIDELINES Four trials of screening for abdominal aneurysms with ultrasonography have been conducted since the original US Preventive Services Task Force and Canadian Task Force recommendations. The US Preventive Services Task Force now recommends one-time screening for AAA by ultrasonography in men aged 65 to 75 years who have ever smoked.10

CLINICAL SCENARIO—RESOLUTION Although it is true that abdominal palpation is less accurate in obese patients (roughly those with a waist circumference of more than 40 in), the fact that you could palpate the aorta improves the accuracy. The sensitivity for detecting an AAA 3.0 cm or larger is 82%, and your finding that the aorta was normal confers a negative likelihood ratio of 0.30. You are able to reassure the patient that, given your examination findings, the likelihood that he has an AAA is low.

CHAPTER 2

Abdominal Aortic Aneurysm

ABDOMINAL AORTIC ANEURYSM—MAKE THE DIAGNOSIS

PRIOR PROBABILITY Abdominal aortic aneurysms occur in 4% to 8% of older men. The prevalence in older women is less than 2%.

POPULATION FOR WHOM AAA SHOULD BE CONSIDERED • Age older than 50 years • • • •

History of ever smoking Male sex White race Family history of AAA

DETECTING AN ABDOMINAL AORTIC ANEURYSM The size of an aneurysm affects the clinician’s ability to detect it (Table 2-4).

Table 2-4 Likelihood Ratios Vary With the Size of the Aneurysm Ability to Detect an Asymptomatic Aneurysm According to Size

LR+ (95% CI)

Aneurysm > 4.0 cm (n = 12 studies) Aneurysm > 3.0 cm (n = 15 studies)

16 (8.6-29) 12 (7.4-20)

LR– (95% CI) 0.51 (0.38-0.67) 0.72 (0.65-0.81)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Clinicians can detect asymptomatic AAAs. The ability to detect the aneurysm relates, in part, to patient characteristics. The examination should focus on the width of the palpated abdominal aorta. Fortunately, the examination results are just as good for the obese as for the nonobese patient when the clinician detects an aneurysm. However, the examination is not as efficient at ruling out an aneurysm in obese patients or in those who cannot relax their abdomen to facilitate the examination.

REFERENCE STANDARD TESTS Imaging studies (ultrasonography or computed tomography).

REFERENCES FOR THE UPDATE 1. Karkos CD, Mukhopadhyay U, Papakostas I, Ghosh J, Thomson GJ, Hughes R. Abdominal aortic aneurysm: the role of clinical examination and opportunistic detection. Eur J Vasc Endovasc Surg. 2000;19(3):299-303. 2. Fink HA, Lederle FA, Roth CS, Bowles CA, Nelson DB, Haas MA. The accuracy of physical examination to detect abdominal aortic aneurysm. Arch Intern Med. 2000;160(6):833-836.a 3. Venkatasubramaniam AK, Mehta T, Chetter IC, et al. The value of abdominal examination in the diagnosis of abdominal aortic aneurysm. Eur J Vasc Endovasc Surg. 2004;27(1):56-60. 4. Puech-Leao P, Molnar LJ, Oliveira IR, Cerri GG. Prevalence of abdominal aortic aneurysms: a screening program in Sao Paulo, Brazil. Sao Paulo Med J. 2004;122(4):158-160.a 5. Lachat M, Pfammatter T, Moehrlen U, Hilfiker P, Hoerstrup SP, Turina MI. Abdominal pulsatile tumor after endovascular abdominal aortic aneurysm repair. Vasa. 1999;28(1):55-57. 6. Zuhrie SR, Brennan PJ, Meade TW, Vickers M. Clinical examination for abdominal aortic aneurysm in general practice: report from the Medical

Research Council’s General Practice Research Framework. Br J Gen Pract. 1999;49(446):731-732. 7. Kadir S, Athanasoulis CA, Brewster DC, Moncure AC. Tender pulsatile abdominal mass: abdominal aortic aneurysm or not? Arch Surg. 1980; 115(5):631-633. 8. Lederle FA, Wilson SE, Johnson GR, et al. Immediate repair compared with surveillance of small abdominal aortic aneurysms. N Engl J Med. 2002;346(19):1437-1444. 9. The United Kingdom Small Aneurysm Trial Participants. Long-term outcomes of immediate repair compared with surveillance of small abdominal aortic aneurysms. N Engl J Med. 2002;346(19):1445-1452. 10. US Preventive Services Task Force. Screening for abdominal aortic aneurysm: recommendation statement. Ann Intern Med. 2005;142(3):198202. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

27

This page intentionally left blank

2

EVIDENCE TO SUPPORT THE UPDATE: Abdominal Aortic Aneurysm

TITLE The Accuracy of Physical Examination to Detect Abdominal Aortic Aneurysm. AUTHORS Fink HA, Lederle FA, Roth CS, Bowles CA, Nelson DB, Haas MA. CITATION Arch Intern Med. 2000;160(6):833-836. QUESTION How well do commonly used maneuvers work for detecting abdominal aortic aneurysm (AAA)? DESIGN Each participant underwent physical examination of the abdomen by 2 internists.

larger, sensitivity was 100% (12 examinations). Factors independently associated with correct examination findings included AAA diameter (odds ratio [OR], 1.95 per centimeter increase; 95% confidence interval [CI], 1.1-3.6), abdominal girth (OR, 0.90 per centimeter increase; 95% CI, 0.870.94), and the examiner’s assessment that the abdomen was not tight (OR, 2.7; 95% CI, 1.2-6.7). The authors provided us data for each examiner according to their degree of confidence in their examination. As expected, these data indicate that an examination “suggestive” of aneurysm conveys considerably less certainty than an examination “definite” for aneurysm (see Table 2-5).

SETTING Minneapolis Veterans Affairs Medical Center. PATIENTS Two hundred participants (aged 51-88 years), 99 with and 101 without AAA as determined by previous ultrasonography.

Table 2-5 The Efficiency of the Examination Depends on the Confidence in Your Findings (n = 3 Examiners) Level of Certainty in Findings

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The internists were blinded to one another’s findings and to the ultrasonographic diagnosis.

MAIN OUTCOME MEASURES κ, Mean pair agreement, sensitivity, specificity, likelihood ratios, independent predictors of correct diagnosis. The unit of analysis was the examination.

MAIN RESULTS Interobserver pair agreement for AAA vs no AAA between the first and second examinations was 77% (κ = 0.53). Sensitivity increased with AAA diameter, from 61% for AAAs of 3.0 to 3.9 cm, to 69% for AAAs of 4.0 to 4.9 cm, 72% for AAAs of 4.0 cm or larger, and 82% for AAAs of 5.0 cm or larger. Sensitivity in subjects with an abdominal girth less than 100 cm (40-in waistline) was 91% vs 53% for girth of 100 cm or greater (P < .001). When girth was 100 cm or greater and the aorta was palpable, sensitivity was 82%. When girth was less than 100 cm and the AAA was 5.0 cm or

Examination “definite” for aneurysm Examination “suggestive” Examination “normal”

LR+ (95% CI) 4.8 (2.7-8.8) 1.4 (0.92-2.1) 0.43 (0.35-0.54)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio.

CONCLUSION LEVEL OF EVIDENCE Level 3. STRENGTHS This study was the first to involve sufficient

numbers of AAA to examine the effect of patient factors such as obesity, girth, and abdominal tightness and the effect of a palpable aorta. Because previous work indicated that abdominal palpation was insensitive when girth was 100 cm or greater, the authors sought to determine whether subgroups of patients with large girth could be identified in whom abdominal palpation might be reliable. Those with a palpable aorta and large girth had sensitivity of 82%. LIMITATIONS One likely reason for the increased sensitivities was increased diagnostic vigilance owing to the high prevalence of AAA. Unlike previous studies that used consecutive patients with relatively low prevalence of AAA, this study included a large

E2-1

CHAPTER 2

Evidence to Support the Update

number of patients with AAA to provide power to look at the value of various patient and examination factors. It was also the first study to look at interobserver variability in abdominal palpation for AAA. The mean pair agreement (77%) and κ (0.53) for AAA vs no AAA are considered moderate. Abdominal palpation has only moderate overall sensitivity for detecting AAA but appears to be sensitive for diagnosis of AAAs large enough to warrant elective intervention in patients who do not have a large girth. Abdominal palpation has good sensitivity, even in patients with a large girth, when the aorta is palpable.

MAIN RESULTS Table 2-6 Results of Palpation in a Large Screening Setting Palpation

N

No. of AAAs by Ultrasonography

Positive Negative Impossible

60 2398 298

20 41 3

Abbreviation: AAA, abdominal aortic aneurysm. Sensitivity: 20/64 = 31%. Specificity: 2652/2692 = 98%. Positive predictive value of positive examination result: 20/60 = 33%.

Reviewed by Frank A. Lederle, MD

CONCLUSION LEVEL OF EVIDENCE Level 3.

TITLE Prevalence of Abdominal Aortic Aneurysms: A Screening Program in São Paulo, Brazil. AUTHORS Puech-Leao P, Molnar LJ, Oliveira IR, Cerri GG. CITATION Sao Paulo Med J. 2004;122(4):158-160. QUESTION How accurate is abdominal palpation for detecting abdominal aortic aneurysm (AAA)? DESIGN Each subject underwent abdominal palpation with a vascular surgeon and ultrasonography. SETTING University Hospital, São Paulo, Brazil. PATIENTS The first 3000 subjects to call in response to an advertising campaign were scheduled for the screening clinic. The study group consisted of 2756 subjects who were older than 50 years, without previous diagnosis of AAA, and for whom ultrasonography was adequate.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The description of palpation precedes that of ultrasonography in the “Methods,” but we are not told explicitly that palpation was performed before, or blinded to, ultrasonography.

MAIN OUTCOME MEASURES Palpation result was recorded as positive, negative, or impossible. AAA was defined as aortic diameter of 3.0 cm or more by ultrasonography. See Table 2-6 for the results of palpation for this study.

E2-2

STRENGTHS This is by far the largest study of the sensitiv-

ity of palpation to date, comprising nearly as many patients as all previous studies combined. The sensitivity of 31% is somewhat lower than the pooled sensitivity of 39% reported in our original Rational Clinical Examination article, which could result from a greater attenuation of any increased examiner vigilance resulting from study participation. LIMITATIONS It is not clear from the article that examiners were blinded to the ultrasonographic results, though the low sensitivity would suggest that they were. Although the authors have information on age, sex, and AAA diameter, the effect of these factors on palpation is not described.

Reviewed by Frank A. Lederle, MD

C H A P T E R

3

Is Listening for Abdominal Bruits Useful in the Evaluation of Hypertension? Jeffrey M. Turnbull, MD, FRCP

Toward the end of an unusually busy clinic, a clinical clerk greets the final patient of the day, a man with a recently documented increase of blood pressure. With all the enthusiasm that remains after 4 years of medical training, she compulsively listens for abdominal bruits. Almost surprised, she hears a soft systolic-diastolic epigastric bruit and is faced with the inevitable question: so what?

WHY IS THIS AN IMPORTANT QUESTION TO ANSWER WITH A CLINICAL EXAMINATION? As we have gained insight into the origin and meaning of vascular bruits, detailed auscultation of the abdomen has become more common. Once detected, an abdominal bruit often is characterized according to pitch, timing, amplitude, and location in an effort to detect and document pathologic states, such as renovascular disease, splenic enlargement, hepatic cirrhosis, carcinoma of the pancreas and liver, splenic and hepatic vascular abnormalities, intestinal vascular insufficiency, and aortic disease. More recently, abdominal bruits have been documented in a substantial percentage of healthy individuals. Although the search for an abdominal bruit has become part of the general physical examination, it also has been recommended as a key element of the examination of the hypertensive patient, in whom the presence of an abdominal bruit is considered to be an important feature of renovascular hypertension.1-3 To be of value, a diagnostic investigation (such as eliciting an abdominal bruit in the setting of hypertension) must reliably predict the presence or absence of disease (in this case, renovascular hypertension). This process should influence the course of management or prognosis. With this in mind, the reliability and accuracy of auscultating for an abdominal bruit in a patient with hypertension will be examined.

THE ANATOMIC AND PHYSIOLOGIC ORIGIN OF THE ABDOMINAL BRUIT Whereas turbulent flow within a vessel is the physiologic basis for a bruit, the pitch and radiation are a function of the flow and direction of the turbulent stream. Intrinsic or extrinsic abnormalities can produce turbulence, and although these abnormalities usually arise from within the abdomen, they can also arise from the inguinal area, retroperitoneum, or thorax.

PREVALENCE OF ABDOMINAL BRUITS The prevalence of bruits in different groups is summarized in Table 3-1. In “normal” populations (individuals without hypertension), the presence of any abdominal bruit has been detected in 6.5% to 31% of patients, with a predilection for the younger age groups (Figure 3-1). Among normal individCopyright © 2009 by the American Medical Association. Click here for terms of use.

29

CHAPTER 3

The Rational Clinical Examination

uals older than 55 years, the prevalence was 4.9%. It is generally believed that the short, faint, and midsystolic bruit heard in these asymptomatic patients is “innocent.”7 In patients with angiographically proven renal artery stenosis, bruits have been documented in 77.7% to 86.9% of cases, with higher prevalence than the 28% observed among unselected patients referred for hypertension.5,8,9 In a study by Grim et al,10 the systolic-diastolic bruit was never detected in 379 normal subjects and was found in 1 of 199 patients with essential hypertension. Eppier et al11 distinguished the presence of abdominal bruits in fibromuscular hyperplasia of the renal artery from that in atherosclerotic lesions. Their retrospective medical record review of 87 patients with surgically treated renal artery stenosis revealed a bruit in 77% of patients with fibromuscular disease and in 35% of patients with atherosclerotic disease.

Table 3-1 The Prevalence of Abdominal Bruits Reference, y

Age, y

Edwards et al,4 1970 Julius and Stewart,5 1967 Rivin,6 1972

17-30

No. and Study Group

Prevalence, %

General Population 200 healthy volunteers

6.5

Unknown 170 volunteers

16

16-85

18

426 patients without cardiovascular or intraabdominal disease Watson and Will- 13-71 161 psychiatric patients iams,7 1973 13-78 200 patients referred with gastrointestinal complaints Patients With Hypertension Julius and Stew155 patients referred with art,5 1967 hypertension Patients With Angiographically Proven Renal Stenosis Hunt at al,8 1974 6-63 100 patients referred for investigation of hypertension 54 patients referred with Perloff et al,9 1961 17-72 sustained hypertension

HOW TO EXAMINE FOR ABDOMINAL BRUITS The patient should be relaxed in a supine position, with the room quiet and with the examiner initially auscultating in the epigastrium, with moderate pressure applied to the diaphragm of the stethoscope. All 4 quadrants should be auscultated anteriorly. The auscultation should continue over the spine and flanks in the areas between T12 and L2 to rule out bruits that may be heard best posteriorly. However, no data exist that would support the routine auscultation of the back for abdominal or retroperitoneal bruits. Once detected, bruits can be correlated to the cardiac cycle by palpation of the carotid upstroke, with the systolic-diastolic bruit being more prolonged and extending into diastole. Because the kidneys lie retroperitoneally and the renal arteries leave the aorta in the area cephalad to the umbilicus, attention should be given to auscultation in the epigastric area for the bruit of renovascular disease, a pancreatic neoplasm, or an innocent bruit (Figure 3-2). The bruit of a hepatic carcinoma has been heard in the right upper quadrant, whereas that of a splenic arteriovenous fistula has been described in the left upper quadrant. Periumbilical bruits are at times heard in the setting of mesenteric ischemia, and venous hums are from portosystemic hypertension. Finally, in the older population, an abdominal bruit may be associated with an abdominal aortic aneurysm. Estes,12 in a study of 102 patients with abdominal aortic aneurysms, demonstrated the presence of an associated bruit in 28% of cases.

31 27

28 R

L

87 78 Areas of auscultation Epigastric area

Julius and Stewart,5 1967 Rivin,6 1972

Upper right quadrant

Upper left quadrant

Percentage With Abdominal Murmurs

50 40

Lower right quadrant

30 20 10

0

10

20

30

40

50

60

70

Age, y

Figure 3-1 The Prevalence of Bruits Varies With Age in Normal Populations

30

Figure 3-2 Appropriate Areas of Auscultation

Lower left quadrant

CHAPTER 3

THE PRECISION OF ABDOMINAL AUSCULTATION FOR BRUITS Neither intraobserver nor interobserver variations in the way we elicit this sign have been evaluated in detail. However, Watson and Williams7 reported 92% (149/161) agreement when patients with celiac artery compression were prospectively examined by 2 examiners for the presence of an abdominal bruit. With standardization, auscultation of the abdomen can be performed with the appropriate degree of precision.

THE ACCURACY OF ABDOMINAL AUSCULTATION IN RENOVASCULAR HYPERTENSION This discussion will concentrate on abdominal bruits in fibromuscular and atherosclerotic renovascular disease. Because abdominal bruits occur in healthy individuals and in those with the nonrenovascular conditions listed in Table 3-2, they may occasionally yield false-positive findings in hypertensive patients. Many studies describe the accuracy of the abdominal bruit in detecting renovascular disease in patients referred for hypertension, but only 3 demonstrate sufficient methodologic rigor (Table 3-3). These reports were of sufficient size and uniform clinical assessment, and the angiogram was the criterion standard. A further study by Julius and Stewart5 reported a sensitivity of 20%; however, specificity could not be estimated.

Abdominal Bruits

diastolic bruit helped to rule it in, with a likelihood ratio (LR) of 39 (95% CI, 9.4-160). A second study recorded any epigastric or flank bruits in a series of hypertensive patients undergoing arteriography.24 Not surprising, the sensitivity of 63% (95% CI, 45%-81%) for any bruit was higher than in the previous study, whereas the specificity for any bruit was somewhat lower, at 90% (95% CI, 84%-96%). Consequently, the presence of any systolic bruit confers a lower LR for renovascular hypertension (LR = 6.4; 95% CI, 3.2-13). Thus, the systolic-diastolic abdominal bruit is less sensitive (P = .04; χ21 = 4.36) and more specific (P < .01; χ21 = 13.5) than the combination of both isolated systolic and combined systolic-diastolic bruits. Other than these studies and that by Perloff et al,9 additional studies of the accuracy of abdominal bruits in patients with hypertension are less rigorous and are not reported. In summary, there is a substantial prevalence of systolic bruits in young, healthy patients, which increases in hypertensive patients, especially those with documented renovascular disease. In instances when the accuracy of the abdominal bruit has been rigorously assessed in evaluating patients with renovascular disease, the sensitivity has been reported to be between 20% and 78%, whereas the specificity has been between 64% and 90%. Systolic-diastolic bruits are seldom

Table 3-2 Reported Nonrenovascular Causes of an Abdominal Bruit a

PRESENCE OF ABDOMINAL BRUITS

Reference, y

Condition

The most useful study10 of the accuracy of abdominal auscultation assembled a consecutive series of patients referred to a university medical center for hypertension. All patients healthy enough for surgery underwent careful abdominal auscultation, with positive findings confirmed by a second examiner, plus other tests for renovascular hypertension, including arteriography. Of 64 patients with renovascular hypertension (an abnormal angiogram result and a renal vein renin ratio >1.5), 25 had combined systolic-diastolic abdominal bruits, for a sensitivity of 39% (95% confidence interval [CI], 27%-51%). Of 199 hypertensive patients with normal arteriogram results, 2 had systolicdiastolic bruits, for a specificity of 99% (95% CI, 98%-100%). Thus, although the absence of a systolic-diastolic bruit did not rule out renovascular hypertension, the presence of a systolic-

Arida,13 1977 Bloom,14 1950 Clain et al,15 1966 Estes,12 1950 Goldstein,16 1968 Lee,17 1967 Matz and Spear,18 1969 McLoughlin et al,19 1975 Sarr et al,20 1980 Serebro and W’srand,21 1965 Shumaker and Waldhausen,22 1961 Smythe and Gibson,23 1963

Splenic arteriovenous fistula Hepatic cirrhosis Alcoholic hepatitis, hepatoma Abdominal aortic aneurysm Celiac artery compression syndrome Bacterial gastroenteritis Unilateral renal hypertrophy Celiac artery stenosis Chronic intestinal ischemic Pancreatic neoplasia Hepatic arteriovenous fistula Tortuous splenic arteries

a

No data exist that would permit the listing of these disorders by prevalence.

Table 3-3 Accuracy of the Abdominal Bruit in Renovascular Hypertension LR Reference, y Grim et al,10 1979 Fenton et al,24 1966 Perloff et al,9 1961

Type of Bruit

Sensitivity, % (95% CIa)

Specificity, %

If Present

If Absent

Systolic and diastolic abdominal bruit Any epigastric or flank bruit, including isolated systolic bruit Systolic bruit

25/64 = 39 (27-51)

197/199 = 99 (98-100)

39

0.6

17/27 = 63 (45-81)

82/91 = 90 (84-96)

6.4

0.4

78

64

2.1

0.35

Abbreviations: CI, confidence interval; LR, likelihood ratio. a CI obtained with the use of normal approximation method.

31

CHAPTER 3

The Rational Clinical Examination

heard in healthy people or in patients with essential hypertension, but they are more common in individuals with renovascular disease. In patients with fibromuscular disease, there is an increased prevalence for all types of bruits.

AUSCULTATORY CHARACTERISTICS OF BRUITS Although many bruits have been characteristically described as having a certain pitch, intensity, and location, the data to support this have been questioned.11,19 Moser and Caldwell25 demonstrated a slightly increased prevalence of high-pitched bruits in association with renal artery disease (87%) when compared with the prevalence of medium-pitched or lowpitched bruits (57%). This finding supports the results of Julius and Stewart,5 who reported an increased prevalence (64%) of high-pitched bruits in these patients. In the study by Moser and Caldwell,25 the intensity of the bruit described in patients with renovascular disease was less discriminatory, with 80% (17/21) of cases having loud bruits and 55% (16/29) having quiet bruits. These same authors described their results in predicting the localization of the stenosis. In their study, of the 13 patients in whom renovascular disease was isolated to 1 vessel, stenosis was correctly localized beforehand in 6 (46%). Eppier et al11 reported slightly better results because the site of the renovascular lesion was correctly localized in 70% of patients with fibromuscular disease and 43% of patients with atherosclerotic renovascular disease. Julius and Stewart5 directly auscultated the renal artery by using a sterile stethoscope at the time of renovascular surgery, demonstrating that, of 18 patients with bruits, in 9 the bruits were confined to the correct renal artery and in 7 the renal artery bruits were combined with additional vascular bruits. In 2 patients (11%), the bruits heard before surgery were secondary to other vascular abnormalities, and there were no bruits associated with the renal artery.

PROGNOSIS OF PATIENTS WITH HYPERTENSION AND BRUITS Finally, the importance of identifying the location, pitch, and intensity of a bruit is questionable, and this issue awaits further clarification with larger prospective studies. Two reports have linked the presence of bruits to the outcome of renovascular surgery but with conflicting results. Eppier et al11 found that 84% of patients with systolic-diastolic bruits had favorable surgical results, compared with 55% of patients with only systolic bruits or no bruits. This result was replicated in patients whose renal artery stenoses were due to atherosclerosis, but the presence of diastolic bruits and the recent onset of hypertension correlated with favorable surgical outcomes in patients with both fibromuscular and atherosclerotic vascular disease. In contrast, Simon et al26 were unable to attach prognostic importance to abdominal bruits in patients with fibromuscular or atherosclerotic renovascular disease. 32

THE BOTTOM LINE In view of the high prevalence (7%-31%) of innocent abdominal bruits in the younger age groups, if a systolic abdominal bruit is detected in a young, normotensive, asymptomatic individual, no further investigations are warranted. In view of the low sensitivity, the absence of a systolic bruit is not sufficient to rule out the diagnosis of renovascular hypertension. In view of the high specificity, the presence of a systolic bruit (in particular a systolic-diastolic bruit) in a hypertensive patient is suggestive of renovascular hypertension. Subsequent investigation should take into consideration the pretest likelihood of renovascular disease and full cost and potential benefits of any management decision. In view of the lack of evidence to support characterizing bruits as to pitch, intensity, and location, bruits should be reported only as systolic or systolic/diastolic. Existing information does not permit a definitive statement pertaining to the prognostic implication of a renal bruit. In summary, the critical review of the literature pertaining to the abdominal bruit would suggest that the routine auscultation of the abdomen for the presence or absence of an abdominal bruit in the healthy asymptomatic population is of little value in view of the high prevalence of benign bruits. However, for our troubled clinical clerk, the presence of a systolic-diastolic bruit would provide supportive evidence of an underlying diagnosis of renovascular disease and should lead her to more aggressive investigation for this disorder. Author Affiliation at the Time of the Original Publication

From the Department of Medicine, Ottawa (Ontario) Civic Hospital, Ottawa, Ontario, Canada. Acknowledgments

The author and editors thank E. K. M. Smith, MD, for his helpful review of the manuscript.

REFERENCES 1. Gifford RW, Poutasse EF. Renal artery disease: diagnosis and therapy. Geriatrics. 1963;8:761. 2. Maronde RF. The hypertensive patient. JAMA. 1975;233(9):997-1000. 3. Havey RJ, Krumlovsky F, delGreco F, Martin HG. Screening for renovascular hypertension. JAMA. 1985;254(3):388-393. 4. Edwards AJ, Hamilton JD, Nichol WD, et al. Experience coeliac axis compression syndrome. BMJ. 1970;1(5692):342-345. 5. Julius S, Stewart BH. Diagnostic significance of and murmurs. N Engl J Med. 1967;276(21):1175-1178. 6. Rivin AU. Abdominal vascular sounds. JAMA. 1972;221(7):688-690. 7. Watson WC, Williams PB. Epigastric bruits in patients without celiac axis compression. Ann Intern Med. 1973;79(2):211-215. 8. Hunt JC, Sheps SG, Harrison EG Jr, Strong CG, Bernatz PE. Renal and renovascular hypertension: a reasoned approach to diagnosis and management. Arch Intern Med. 1974;133(6):988-999. 9. Perloff D, Sokolow M, Wylie EJ, et al. Hypertension secondary to renal artery occlusive disease. Circulation. 1961;24:1286-1304. 10. Grim CE, Luft FC, Myron H, et al. Sensitivity and specificity of screening tests for renal vascular hypertension. Ann Intern Med. 1979;91(4):617-622. 11. Eppier DF, Gifford RW, Stewart BH, et al. Abdominal bruits in renovascular hypertension. Am J Cardiol. 1976;37(1):48-52. 12. Estes E. Abdominal aortic aneurysm. Circulation. 1950;2(2):258-264.

CHAPTER 3 13. Arida EJ. Splenic arteriovenous fistula with hypertension, varices, and ascites. N Y State J Med. 1977;77(6):987-990. 14. Bloom HJG. Venous hums in hepatic cirrhosis. Br Heart J. 1950;12(4): 343-350. 15. Clain D, Wartnaby K, Sherlock S. Abdominal arterial murmurs in liver disease. Lancet. 1966;2(7462):516-519. 16. Goldstein LT. Enlarged, tortuous arteries and hepatic bruit. JAMA. 1968;206(11):2518-2520. 17. Lee R. Provision of health services: past, present, future. N Engl J Med. 1967;277(13):682-686. 18. Matz R, Spear P. Abdominal bruit due to unilateral renal hypertrophy. Lancet. 1969;1(7589):310-311. 19. McLoughlin MJ, Colapinto RJ, Hobbs BB. Abdominal bruits. JAMA. 1975;232(12):1238-1242.

Abdominal Bruits

20. Sarr MG, Dickson ER, Newcomer AD. Diastolic bruit in chronic intestinal ischemia. Dig Dis Sci. 1980;25(10):761-762. 21. Serebro H, W’srand MB. Diagnostic sign of carcinoma of the body of the pancreas. Lancet. 1965;1:85-86. 22. Shumaker HB, Waldhausen JA. Intrahepatic arteriovenous fistula of hepatic artery and portal vein. Surg Gynecol Obstet. 1961;112:497-501. 23. Smythe C, Gibson DB. Upper-quadrant bruit due to tortuous artery. N Engl J Med. 1963;2:1308-1309. 24. Fenton S, Lyttle JA, Pantridge JF. Diagnosis and results of surgery in renovascular hypertension. Lancet. 1966;2(7455):117-121. 25. Moser RJ, Caldwell JR. Abdominal murmurs. Ann Intern Med. 1962;56: 471-483. 26. Simon N, Franklin SS, Bleifer KH, Maxwell MH. Clinical characteristics of renovascular hypertension. JAMA. 1972;220(9):1209-1218.

33

This page intentionally left blank

U P D A T E : Abdominal Bruits

3

Prepared by David L. Simel, MD, MHS Reviewed by Lori Orlando, MD

CLINICAL SCENARIO A 55-year-old, white, male smoker has had hypertension for 10 years. It has always been well controlled, with systolic measures of lower than 35 mm Hg. He is receiving a diuretic and a β-blocker. Recently, the systolic pressure has typically been 140 to 150 mm Hg. He is a bit overweight (body mass index, 26.5). There has been no evidence for atherosclerotic disease. His serum creatinine level is unchanged, at 0.11 μmol/L. The serum cholesterol level is 5.95 mmol/L. Your suspicion is that the increased blood pressure is a manifestation of essential hypertension, but you decide to auscultate for an abdominal bruit. You hear none. You would like to add an angiotensin-converting enzyme inhibitor, but you wonder whether you have ruled out renal artery stenosis as a cause of the recent upward trend in his pressure.

UPDATED SUMMARY ON ABDOMINAL BRUITS Original Review Turnbull JM. Is listening for abdominal bruits useful in the evaluation of hypertension? JAMA. 1995;274(16):1299-1301.

UPDATED LITERATURE SEARCH Our literature search crossed the text words “renal artery,” “auscultation,” “bruit,” and “hypertension,” published in English from 1994 to 2004. We also searched on the subject heading “renal artery obstruction/di.” The search yielded 86 articles for which the titles and abstracts were reviewed. One article that included sensitivity and specificity data on the abdominal bruit as a sign for renal artery stenosis was retrieved.

NEW FINDINGS • A large study of patients with hypertension that is difficult to control confirmed the usefulness of finding an abdominal bruit, even those heard only during systole. • Available data do not allow us to make conclusions about the prevalence or importance of finding an abdominal bruit in black patients.

Details of the Update Many normal individuals have abdominal bruits. The presence of an abdominal bruit becomes potentially important in hypertensive patients, especially those with certain characteristics. Abdominal bruits may be the harbinger of renal artery stenosis, and the diagnosis should be suspected in hypertensive patients who had their disease onset at a young age or who have blood pressures that are seemingly resistant to medical treatment. It may be therapeutically useful to identify patients with renal artery stenosis because balloon angioplasty may be a useful treatment intervention for controlling blood pressure, especially when medications fail.1 One study, identified in the original Rational Clinical Examination article, found the highest diagnostic utility for an abdominal bruit that had both a systolic and diastolic component. The effect of an abdominal bruit with both components compared with an abdominal bruit with only a single systolic component has not been evaluated. In our updated literature review, we found 1 large, prospective cohort study of patients with hypertension that is difficult to control who were systematically evaluated for renal artery stenosis. The importance of a systolic bruit in this population of patients (predominantly white) was similar to that found in previous work that we reviewed in the original publication.2 A study of 85 consecutive patients with hypertension, diabetes, and normal renal function provides useful information about ethnicity and renal artery stenosis as it includes a higher proportion of black patients than previous studies.3 The odds ratio for Afro-Caribbean patients vs other patients (white or Asian) was 0.70 (95% confidence interval [CI], 0.19-2.5). We can combine the data with those from Krijnen et al2 to find a summary odds ratio of 0.37 (95% CI, 0.12-1.1) for black ethnicity, suggesting that perhaps black patients are less likely than other patients to get renal artery stenosis. However, the broad CIs suggest that the currently available data do not allow us to conclude this with certainty. Unfortunately, data were not provided on the frequency of abdominal bruits, so we do not know whether the finding of an abdominal bruit in black patients has the same significance as in other patients.

35

CHAPTER 3

Update

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION CIs were not provided in the original publication. A typographic error in the negative likelihood ratio (LR–) for a bruit was found for Table 3-3. The LR– for the study by Perloff et al4 should have been 0.35, as is now shown. We reconfigured Table 3-3 from the original publication, providing the CIs and summary estimates for the presence of a bruit (Table 3-4).

CHANGES IN THE REFERENCE STANDARD

RESULTS OF LITERATURE REVIEW Multivariate Findings for Renal Artery Stenosis A clinical prediction model can be used in white patients with hypertension that is difficult to control.2 The model can be downloaded to a computer (the DRASTIC [Dutch Renal Artery Stenosis Intervention Cooperative] spreadsheet; http:// www2.eur.nl/fgg/mgz/software.html, accessed May 16, 2008). The model has not been validated prospectively or in a population of blacks.

Table 3-4 Univariate Findings for Renal Artery Stenosis

Bruit Systolic and diastolic Grim et al6 Systolic with/without diastolic componenta Krijnen et al2 Fenton et al7 Perloff et al4 Summary systolic bruit History of atherosclerotic disease2

LR+ (95% CI)

LR– (95% CI)

39 (10-145)

0.62 (0.49-0.73)

6.7 (3.7-12) 6.4 (3.2-12) 2.2 (1.5-3.2) 4.3 (2.3-8.0) 2.2 (1.8-2.8)

0.76 (0.66-0.84) 0.41 (0.24-0.62) 0.35 (0.20-0.57) 0.52 (0.34-0.78) 0.52 (0.40-0.66)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Did not distinguish between individuals with systolic-only bruits vs systolic and diastolic.

36

The Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure (JNC 7) suggests that physicians auscultate for abdominal bruits in patients with hypertension.8 The suggestion is not accompanied with data but is an expert’s opinion. The report specifically recommends considering renal artery stenosis for certain hypertensive patients.

CLINICAL SCENARIO—RESOLUTION

The reference standard remains arteriography. However, noninvasive tests have replaced arteriography in offering a less risky screening approach for appropriate patients.5 At possible treatment (ie, balloon angioplasty), all patients undergo arteriography to ensure proper technique.

Finding

EVIDENCE FROM GUIDELINES

Patients with hypertension frequently need treatment with additional medications as they get older. The patient has none of the more obvious findings to suggest renovascular hypertension from renal artery stenosis. According to expert recommendations, you listened for abdominal bruits and heard none. The proper technique must be used, and you must be listening in a quiet room. Often, physicians do not apply enough pressure with the diaphragm of the stethoscope. Had you heard a bruit, you would have attempted to see whether the bruit extends into diastole. This can be done by palpating the carotid while listening to see whether the bruit prolongs beyond the carotid upstroke. The LR data for the presence or absence of systolic bruits apply only to patients with resistant hypertension. With just 2 medications, you should not assume that he has resistant hypertension. Thus, the LR for the absence of bruit cannot be applied to this patient. You might resort to a clinical decision model (referenced above). Given his age, smoking status, sex, body weight, absence of a bruit, long history of hypertension, and normal creatinine and cholesterol levels, you would find that his predicted probability of renovascular stenosis is 10%. Two caveats apply to this model—it was also developed with data from patients with resistant hypertension, so his probability of renal artery stenosis is probably even lower. Second, had your patient been black, you would have needed to recognize that the accuracy of the model would be unknown.

CHAPTER 3

Abdominal Bruits

RENAL ARTERY STENOSIS—MAKE THE DIAGNOSIS Patients without hypertension should not have auscultation for asymptomatic renal artery bruits because bruits frequently are a normal finding. The search for renal artery stenosis should be confined to certain patient populations (see below). When present in these populations, an abdominal bruit is the most useful physical examination finding for assessment of renal artery stenosis.

PRIOR PROBABILITY OF RENOVASCULAR DISEASE Approximately 1% to 5% of the general population has renovascular disease. Approximately 20% of white patients with medically refractory hypertension have renal artery stenosis.

POPULATION FOR WHOM RENAL ARTERY STENOSIS SHOULD BE CONSIDERED • Onset of hypertension before 30 years of age • Patients with an arterial bruit and hypertension, especially if there is a diastolic component • Accelerated hypertension • Hypertension that becomes resistant to medication • Flash pulmonary edema • Renal failure, especially in the absence of proteinuria or an abnormal urine sediment result

DETECTING THE LIKELIHOOD OF RENAL ARTERY STENOSIS IN PATIENTS WITH REFRACTORY HYPERTENSION See Table 3-5. Table 3-5 Clinical Examination Findings for Renal Artery Stenosis Finding (No. of Studies)

LR+ (95% CI)

Systolic-diastolic bruit (n = 1) 39 (10-145) Systolic bruit (n = 3) 4.3 (2.3-8.0) History of atherosclerotic disease (n = 1) 2.2 (1.8-2.8)

LR– (95% CI) 0.62 (0.49-0.73) 0.52 (0.34-0.78) 0.52 (0.40-0.66)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

REFERENCE STANDARD TESTS Moderate-risk and high-risk patients are subjected to a noninvasive screening test (ultrasonography, computed tomography, magnetic resonance imaging). The type of imaging modality for screening (eg, contrast-enhanced ultrasonography vs gadolinium-enhanced computed tomography or magnetic resonance angiography) may be operator dependent, and physicians will need to rely on their local radiologists’ expertise. All patients have their disease status confirmed with arteriography as part of a therapeutic procedure.

• Acute renal failure precipitated by angiotensin-converting enzyme inhibitors or angiotensin-receptor blockers

REFERENCES FOR THE UPDATE 1. Nordmann AJ, Woo K, Parkes R, Logan AG. Balloon angioplasty or medical therapy for hypertensive patients with atherosclerotic renal artery stenosis? a meta-analysis of randomized controlled trials. Am J Med. 2003;114(1):44-50. 2. Krijnen P, van Jaarsveld BC, Steyerberg EW, Man in ‘t Veld AJ, Schalekamp MA, Habbema JD. A clinical prediction rule for renal artery stenosis. Ann Intern Med. 1998;129(9):705-711.a 3. Valabhji J, Robinson S, Poulter C, et al. Prevalence of renal artery stenosis in subjects with type 2 diabetes and coexistent hypertension. Diabetes Care. 2000;23(4):539-543. 4. Perloff D, Sokolow M, Wylie EJ, et al. Hypertension secondary to renal artery occlusive disease. Circulation. 1961;24:1286-1304.

5. Vasbinder GB, Nelemans PJ, Kessels AG, Kroon AA, de Leeuw PW, van Engelshoven JM. Diagnostic tests for renal artery stenosis in patients suspected of having renovascular hypertension: a meta-analysis. Ann Intern Med. 2001;135(6):401-411. 6. Grim CE, Luft FC, Myron H, et al. Sensitivity and specificity of screening tests for renal vascular hypertension. Ann Intern Med. 1979;91(4):617-622. 7. Fenton S, Lyttle JA, Pantridge JF. Diagnosis and results of surgery in renovascular hypertension. Lancet. 1966;2(7455):117-121. 8. Chobanian AV, Bakris GL, Black HR, et al. The seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. JAMA. 2003;289(19):2560-2572. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

37

This page intentionally left blank

EVIDENCE TO SUPPORT THE UPDATE: Abdominal Bruits

3

TITLE A Clinical Prediction Rule for Renal Artery Stenosis.

Table 3-6 Likelihood Ratio of Findings for Renal Artery Stenosis

AUTHORS Krijnen P, van Jaarsveld BC, Steyerberg EW, Man in ’t Veld AJ, Schalekamp MA, Habbema JD.

Test

CITATION Ann Intern Med. 1998;129(9):705-711.

Sensitivity Specificity LR+ (95% CI) LR– (95% CI)

Abdominal bruit Atherosclerotic disease

0.27 0.63

0.96 0.72

6.7 (3.7-12) 0.76 (0.66-0.84) 2.2 (1.8-2.8) 0.52 (0.40-0.66)

QUESTION Do clinical data identify patients likely to have renal artery stenosis?

Abbreviations: CI, confidence interval; LR, likelihood ratio.

DESIGN Prospective data collected as part of a cohort study.

Abdominal bruit or atherosclerotic disease (femoral or carotid bruit, angina, claudication, myocardial infarction, cerebrovascular accident, or vascular surgery) were the variables with the best accuracy (Table 3-6). A clinical prediction model included the additional terms of age, smoking history, recent onset of hypertension, obesity, hypercholesterolemia, and the serum creatinine level. The model can be downloaded via the Internet (the DRASTIC [Dutch Renal Artery Stenosis Intervention Cooperative] spreadsheet; http://www2. eur.nl/fgg/mgz/software.html, accessed May 16, 2008). The model had an area under the receiver operating characteristic curve (a measure of accuracy) of 0.84 (95% confidence interval, 0.79-0.89).

SETTING Multiple internal medicine departments in the Netherlands. PATIENTS One thousand one hundred thirty-three patients, aged 18 to 75 years, with normal serum creatinine levels and referred for hypertension evaluations. Most patients had hypertension that was difficult to control.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Patients were assigned to 1 of 2 treatment protocols. Those who had a mean diastolic blood pressure of 95 mm Hg or higher at follow-up, or those who experienced an increase in serum creatinine level when treated with angiotensin-converting enzyme inhibitor, underwent digital subtraction angiography, and underwent other noninvasive tests of the renal arteries. The clinical data were collected prospectively. The presence of “abdominal bruit” was recorded before the reference standard tests.

MAIN OUTCOME MEASURES Renal artery stenosis (≥50%) identified by arteriography.

MAIN RESULTS From a population of 1133 patients, 477 required renal artery stenosis evaluation for either blood pressure that is difficult to control or an increase in serum creatinine level when treated with an angiotensin-converting enzyme inhibitor. One hundred seven patients had renal artery stenosis (22%).

CONCLUSIONS LEVEL OF EVIDENCE Level 1. STRENGTHS Prospective data collection in the relevant

population of patients with hypertension that is difficult to control. The prediction model was subjected to internal validation. LIMITATIONS “Abdominal bruit” is not defined. The study

population had almost no patients of black ethnicity. The prediction rule was not externally validated in a separate population of patients. This is a large study in the population of patients for whom renovascular hypertension and renal artery stenosis might be considered. The presence of any abdominal bruit was recorded by examiners and showed excellent specificity with a sufficiently high positive likelihood ratio. A patient’s history that indicates previous atherosclerotic vascular disease also has diagnostic utility. E3-1

CHAPTER 3

Evidence to Support the Update

A problem for some clinicians is that the patients were almost all whites.1 Given the low prevalence of renovascular hypertension in blacks, US physicians cannot be certain that the results will generalize well. Reviewed by David L. Simel, MD, MHS

E3-2

REFERENCE FOR THE EVIDENCE 1. Wilcox C. Screening for renal artery stenosis: are scans more accurate than clinical criteria? Ann Intern Med. 1998;129(9):738-740.

C H A P T E R

4

Does This Patient Have an Alcohol Problem? James M. Kitchens, MD, FRCPC

CLINICAL SCENARIO A 58-year-old man was admitted to the hospital for an elective cholecystectomy. At the time of admission, he smelled of alcohol, although he was not obviously intoxicated. On questioning, he said that he had come from a business lunch where he had “a drink.” When questioned about his alcohol history, he became angry and defensive. He said that he was “offended by the implications of these questions.” On the day after the surgery, he was found to be diaphoretic, tremulous, and hallucinating and was judged to be in alcohol withdrawal. Could other interviewing techniques have identified this man as one who was alcohol dependent and at risk of withdrawal?

WHY IS THIS AN IMPORTANT QUESTION TO ANSWER WITH A CLINICAL EXAMINATION? It is estimated that more than 100 million Americans drink alcohol and that about 10% of those who drink have alcohol problems that adversely affect their lives and the lives of their families.1 Alcohol is involved in 10% of all deaths in the United States. The mortality rate in those who drink 6 or more drinks per day is 50% higher than the rate in matched controls.2 Alcohol is a major factor in suicides, homicides, violent crimes, and fatal motor vehicle crashes. Alcohol abuse and dependence are common in both partners where spouse and child abuse occur.3,4 There is a 4-fold increased risk of alcohol dependence in the children of alcohol-dependent parents.5 Alcohol is primarily or secondarily implicated in a large number of medical problems such as cirrhosis, alcoholic hepatitis, portal hypertension, gastritis, nutritional deficiencies, cardiomyopathy, dysrhythmias, cognitive dysfunction, seizures, neuropathies, myopathies, low birth weight, fetal alcohol syndrome, and a variety of head and neck cancers.1 Alcohol abuse and alcohol dependence are common problems. A history of alcohol abuse has been found in one-fifth to one-third of patients attending inner-city ambulatory medical clinics, and one-third of these patients report an active drinking problem. In some of these settings, the prevalence of abuse has been as high as two-thirds in men.6-8 Unfortunately, physicians recognize only about half of the problem drinkers that they encounter, and they are even less likely to identify problems in women and elderly people.8-13

DIAGNOSTIC STANDARDS FOR ALCOHOL ABUSE AND DEPENDENCY Alcohol-related problems provide many diagnostic problems for clinicians. In our society, drinking is a common and socially complex behavior. At one end of the drinking specCopyright © 2009 by the American Medical Association. Click here for terms of use.

39

CHAPTER 4

The Rational Clinical Examination

Table 4-1 Diagnostic and Statistical Manual of Mental Disorders, Revised Third Edition (DSM-III-R) and International Statistical Classification of Diseases, 10th Revision (ICD-10) Diagnostic Criteria for Substance Abuse, Harmful Use, and Substance Dependence DSM-III-R Dependence (3 Items Required) 1. Substance often taken in larger amounts or during a longer period than the person intended 2. Persistent desire or 1 or more unsuccessful efforts to cut down or control substance use 3. A great deal of time spent in activities necessary to get substance, taking substance, or recovering from its effects 4. (a) Recurrent use when substance use is physically hazardous (eg, drives while intoxicated) or (b) frequent intoxication or withdrawal symptoms when expected to perform major role obligations at work, school, or home 5. Important social, occupational, or recreational activities given up or reduced because of substance use 6. Continual substance use despite knowledge of having persistent or recurrent social, psychological, or physical problem that is caused or exacerbated by the use of substance 7. Marked tolerance: need for markedly increased amounts of substance (at least a 50% increase) to achieve intoxication or desired effect or markedly diminished effect with continued use of the same amount 8. Characteristic withdrawal symptoms 9. Substance often taken to relieve or avoid withdrawal symptoms. DSM-III-R Abuse 1. Continued use despite knowledge of having persistent or recurrent social, occupational, psychological, or physical problem that is caused or exacerbated by the use of substance 2. Recurrent use in situations in which use is physically hazardous ICD-10 Dependence (3 Items Required) 1. A strong desire or sense of compulsion to use a substance 2. Evidence of impaired capacity to control the use of a substance. This may relate to difficulties in avoiding initial use, difficulties in terminating use, or problems controlling levels of use. 3. A withdrawal state or use of the substance to relieve or avoid withdrawal symptoms and subjective awareness of the effectiveness of such behavior 4. Evidence of tolerance of the effects of the substance 5. Progressive neglect of alternative pleasures, behaviors, or interests in favor of substance use 6. Persisting with substance use despite clear evidence of harmful consequences ICD-10 Harmful Use 1. Clear evidence that the use of a substance was responsible for causing actual psychological or physical harm to the user

trum, alcohol is used in moderation without adverse consequences to the drinkers or those around them. At the other end of the spectrum are those drinkers who have adverse effects medically, economically, and psychosocially from repeated abuse of alcohol. Between those who occasionally use alcohol in moderation and those who are frankly alcohol dependent lies a continuum of drinkers with varying consumption patterns and risks of alcohol-related problems. The rational use of diagnostic tests to identify problem drinking or alcohol dependence demands a clear understanding of the definitions of the disorder being diagnosed. It will also become clear that diagnostic test characteristics, 40

such as sensitivity, specificity, and likelihood ratios (LRs), vary considerably, depending on the definition of problem drinking or alcohol dependence. The International Statistical Classification of Diseases, 10th Revision (ICD-10) and the Diagnostic and Statistical Manual of Mental Disorders, Revised Third Edition (DSM-III-R) of the American Psychiatric Association present guidelines for the diagnosis of substance abuse disorders.14,15 The ICD-10 recognizes 2 categories: harmful use and alcohol dependence. The DSM-III-R recognizes 2 categories: alcohol abuse and alcohol dependence. The diagnostic criteria for DSM-III-R and ICD10 are found in Table 4-1. There is another edition of the DSM, the DSM-IV. It is not significantly different from DSMIII-R with regard to the diagnosis of alcohol abuse and alcohol dependence. The following discussion refers to DSM-III-R because it has been used as a diagnostic standard for comparison with other diagnostic questionnaires. Alcohol dependence represents a syndrome as diagnosed by DSM-III-R and ICD-10. The syndrome criteria of the 2 systems overlap considerably, but there are differences between DSM-III-R and ICD-10. The ICD-10 does not include items that address the social or legal consequences of dependence, nor does it have criteria that assess dangerous use (eg, driving or working while intoxicated). The ICD-10 criteria are restricted to the medical and psychological consequences of abuse and dependence. Despite these differences, there is excellent concordance between DSM-III-R and ICD-10 in the diagnosis of alcohol dependence.16 This high degree of concordance illustrates the fact that dependence most commonly affects medical, psychological, and social aspects of life. Rarely are the consequences restricted to one sphere of life. The ICD-10 and DSM-III-R have separate categories of harmful or abusive drinking that do not meet the criteria for dependence. However, there is poor concordance between the 2 systems for these diagnostic categories.16 Because it does not include criteria for social/legal consequences of drinking, ICD-10 makes fewer diagnoses than DSM-III-R does. For example, an individual who repeatedly drives while intoxicated would not be assigned a diagnosis under ICD-10 but would be assigned a diagnosis as an alcohol abuser under DSM-III-R. The DSM-III-R is the most widely used diagnostic framework for alcohol-related disorders, and it has been used as the diagnostic standard for comparison of other diagnostic questionnaires.6,7,15 The DSM-III-R criteria for alcohol abuse or dependence are structured to detect alcohol problems at any time in the life of the patient. This lifetime prevalence of alcohol problems may not represent an individual’s current drinking status.8 Most studies that use the DSM-III-R criteria as the diagnostic standard for the identification of alcohol abuse or dependence also use a published structured interview, such as the Structured Clinical Interview for DSM-III-R (SCID), that asks specific interview questions that relate to the DSM-III-R diagnostic criteria.17 Other studies have used alcohol consumption questionnaires and interviews to define a level of “problem drinking” and then examined the diagnostic accuracy of screening

CHAPTER 4 questionnaires to separate problem drinkers from nonproblem drinkers.7,18-21 However, in the following section, it will be seen that the sensitivities of screening questionnaires decrease as the definition of problem drinking is changed to include a greater proportion of at-risk drinkers. It is clear that excessive alcohol consumption may be detrimental to medical and social health. The dangers associated with alcohol consumption represent a continuum of risk that makes it difficult to define “safe levels” of alcohol consumption. Some authors contend that ingestion of 4 or more drinks per day in men and 2 or more drinks per day in women constitute a “hazardous” consumption level that increases the risk of alcohol dependence and medical problems.4,22,23 A “drink” is defined as equivalent volume amounts that have an ethanol content of 0.6 oz. Twelve ounces of beer, 5 oz of wine, and 1.5 oz of liquor all contain 0.6 oz of ethanol. However, safe levels of consumption vary considerably, depending on the clinical or social context of drinking. One and one-half drinks per day may constitute at-risk drinking for pregnant women and represent a health threat to the developing child.18,24 The World Health Organization (WHO) has developed a questionnaire, the Alcohol Use Disorders Identification Test (AUDIT), to identify persons with “hazardous” and “harmful” alcohol consumption who may not be captured by DSMIII-R or ICD-10 diagnostic criteria.25 WHO recognizes the following disorders of alcohol use: “Hazardous drinking” is use that increases the risk of subsequent psychological or medical harm and is judged to be 4 or more drinks per day in men and 2 or more drinks per day in women. “Harmful drinking” occurs in the person who has psychological or medical complications as defined in ICD-10. The WHO classification system attempts to identify persons who drink quantities that will increase their risk of subsequent problems. This modification is driven by concerns about the cost and effectiveness of treating alcohol dependence.25 A review of alcohol treatment programs and their effectiveness is beyond the scope of this article, but there is a substantial body of evidence that brief, ambulatory interventions targeted to persons with hazardous drinking can decrease levels of consumption and, it is hoped, decrease the likelihood of subsequent harm and dependence.26 However, diagnosis must precede treatment. It is the diagnosis of alcohol disorders in the context of the medical history that is the subject of the remainder of this article.

DIAGNOSTIC TESTS OF ALCOHOL ABUSE AND DEPENDENCY Several questionnaires have been developed for the detection of alcohol disorders, including the cut down, annoyed by criticism, guilty about drinking, eye-opener drinks (CAGE) questionnaire, the Michigan Alcoholism Screening Test (MAST), and the AUDIT. The most widely used are the CAGE questionnaire and the MAST. Of these, the MAST has been more thoroughly studied in terms of reliability and accuracy. However, the MAST and its shortened versions are

Problem Alcohol Drinking

more complicated than the CAGE questionnaire. The CAGE questionnaire is short, easily memorized, and reasonably accurate, making it the screening test of choice for busy house officers and practitioners.

CAGE Questionnaire In 1968, Ewing27 developed the CAGE questionnaire for the detection of alcoholism. CAGE is mnemonic for these 4 questions: (1) Have you ever felt you ought to cut down on your drinking? (2) Have people annoyed you by criticizing your drinking? (3) Have you ever felt bad or guilty about your drinking? (4) Have you ever had a drink first thing in the morning to steady your nerves or get rid of a hangover (eye opener)? Some investigators have reasoned that alcohol abusers are more likely to give accurate responses to the CAGE questions if they are part of a series of questions on lifestyle that include drinking, smoking, diet, and exercise habits.7,28 The rationale behind this approach is that it may be less likely to trigger defensiveness and denial in people who are alcohol dependent. Other studies do not attempt to disguise the CAGE questionnaire. No studies that examine differences between CAGE interviews and written CAGE questionnaires were identified. There are no comparative studies of reliability or accuracy for the different modes of administering the CAGE questions. It seems reasonable to ask these questions in a frank, nonjudgmental manner as part of the medical history or review of symptoms.

MAST The MAST was originally reported on by Selzer29 in 1971. The MAST consists of 24 yes/no questions, with the “alcohol dependent” responses being scored as 1, 2, or 5 points. The MAST questions are listed in Table 4-2. The most common scoring for the MAST has 0 to 3 points as “non–alcohol dependent,” 4 or 5 as “probably alcohol dependent,” and greater than 5 as “definitely alcohol dependent.” Two modified, shortened versions of the MAST have been developed to make it a less time-consuming screening instrument for alcohol dependence. A 10-question version, the Brief MAST (BMAST), and a 13-question version, the Short MAST (SMAST), are available.30,31

AUDIT WHO sponsored a collaborative project to develop a screening test that would be able to detect persons with hazardous levels of consumption and those with harmful use and dependence. The AUDIT questions are listed in Table 4-3. Answers are scored from 0 to 4, for a maximum score of 40 points, with scores of 8 or more considered diagnostic of an alcohol use disorder.25,32

Biochemical and Hematologic Tests Increases in liver enzyme concentrations (aspartate aminotransferase, alanine aminotransferase, and γ-glutamyltrans41

CHAPTER 4

The Rational Clinical Examination

Table 4-2 Michigan Alcoholism Screening Test (MAST) 29 Points 2 2

1 2 1 2 2 5 1 2 2 2 2 2 2 1 2 2 5 5 2

2

2 2

Question 1. Do you feel you are a normal drinker?a 2. Have you ever awakened the morning after some drinking the night before and found that you could not remember a part of the evening before? 3. Does your spouse or parents ever worry or complain about your drinking? a 4. Can you stop drinking without a struggle after 1 or 2 drinks? 5. Do you ever feel bad about your drinking?a 6. Do friends or relatives think you are a normal drinker?a 7. Are you always able to stop drinking when you want to?a 8. Have you ever attended a meeting of Alcoholics Anonymous?a 9. Have you gotten into fights when drinking? 10. Has drinking ever created problems with you and your spouse?a 11. Has your spouse or other family member ever gone to anyone for help about your drinking? 12. Have you ever lost friends or girlfriends/boyfriends because of your drinking? 13. Have you ever gotten into trouble at work because of drinking? a 14. Have you ever lost a job because of drinking? 15. Have you ever neglected your obligations, your family, or your work for 2 or more days in a row because you were drinking? a 16. Do you ever drink before noon? 17. Have you ever been told you have liver trouble? Cirrhosis? 18. Have you ever had delirium tremens (DTs), severe shaking, heard voices, or seen things that weren’t there after heavy drinking? 19. Have you ever gone to anyone for help about your drinking?a 20. Have you ever been in a hospital because of your drinking?a 21. Have you ever been a patient in a psychiatric hospital or on a psychiatric ward of a general hospital when drinking was part of the problem? 22. Have you ever been treated at a psychiatric or mental health clinic or gone to a doctor, social worker, or clergyman for help with an emotional problem in which drinking had played a part? 23. Have you ever been arrested, even for a few hours, because of drunk behavior? a 24. Have you ever been arrested for drunk driving or driving after drinking? a

a

Included in the short version of the MAST.

ferase) and mean corpuscular volume have been investigated as biological markers of alcohol abuse. All of these tests are insensitive in detecting alcohol abusers. None of these tests, alone or in combination, perform as well as the MAST or the CAGE questionnaire in detecting alcohol abuse.19,22,33-35

RELIABILITY OF THE MAST, CAGE, AND AUDIT QUESTIONNAIRES Gibbs36 reviewed the internal consistency (α) reliability coefficient of the MAST reported in 6 studies and found it to vary from .83 to .93. The α values in 6 studies of the SMAST or BMAST ranged from .75 to .81. Skinner and Sheu37 reported the test-retest reliability of the MAST at .84. Reliability coeffi42

cients of 1.0 represent perfect test precision (perfect interobserver or intraobserver precision), and values close to 1.0 are highly precise. No reports measuring the reliability of the CAGE and AUDIT questionnaires were identified.

ACCURACY OF THE MAST, CAGE, AND AUDIT QUESTIONNAIRES Determining the test accuracy of all questionnaires for alcohol use disorders presents some methodologic problems. The questions in the CAGE, MAST, and AUDIT questionnaires are embodied within the commonly used reference standards, DSM-III-R and ICD-10, which may result in inflated estimates of test accuracy. The advantage of the CAGE and AUDIT questionnaires over the much longer questionnaires is their brevity, which would allow them to be used as a screening or case-finding tool by busy clinicians. The diagnostic accuracy of the MAST and its shorter versions has been reported, with sensitivities of 71% to 100% and specificities of 81% to 96%.8,19,38 The MAST can be criticized as a screening tool because of its length; it requires about 20 minutes to administer, making it less likely to be used by a busy clinician. In most studies of the diagnostic accuracy of the CAGE questionnaire, a positive test result has been defined as 2 or more affirmative answers to the questions. The CAGE questionnaire has been validated in several environments, including psychiatric inpatients, medical and orthopedic inpatients, and ambulatory medical patients in the United States and Great Britain.6,7,18-21,28 Table 4-4 lists studies in which the diagnostic accuracy of the CAGE questionnaire has been reported and in which the authors specify the “diagnostic standard” used to define the patient’s alcohol status. In all these studies, changing the criterion of a positive CAGE test result from a score of 2 to 1 results in greater test sensitivity but lower specificity. In other words, the test will identify more problem drinkers, but it will also misclassify more nonproblem patients as problem drinkers. Note that as the definition of problem drinking is lowered, for example, from 16 to 8 drinks per day or from 2 drinks to 1 drink per day in pregnant women, the sensitivity of the test decreases and the specificity increases for the same CAGE threshold. The CAGE questionnaire is reasonably accurate at identifying those individuals who are alcohol dependent or heavy drinkers (>8 drinks/d). However, it is not at all sensitive at detecting the lower levels of consumption that may be dangerous, especially in pregnant women. It has not been tested as a tool to detect hazardous or at-risk drinking on the order of 4 drinks per day. It will be less sensitive in that situation. There is no difference in the diagnostic accuracy of the CAGE questionnaire when used in men or women, and it is equally effective in elderly people.6,39 However, there is a marked difference in the prevalence of alcohol disease in men and women. The prevalence of alcohol dependence in women is about onethird that in men. The predictive values for CAGE responses reflect the lower prevalence figures for women, with lower positive predictive values and higher negative predictive values.6,39

CHAPTER 4 The AUDIT is a newly developed tool, and only 1 validation study was identified. When a positive test result is considered to be a score of 8 or more points, the sensitivity of the AUDIT in detecting hazardous or harmful use is 92% and the specific-

Problem Alcohol Drinking

ity is 94%.32 However, as noted above, there are methodologic reasons to believe that these estimates are inflated and may not be reliably testable. The 10 AUDIT questions were culled from a 150-item assessment of alcohol use. The AUDIT has not been

Table 4-3 Alcohol Use Disorders Identification Test (AUDIT) Questions 25,32 1. How Often Do You Have a Drink Containing Alcohol? Never Monthly or less 2 to 4 times a month 2 or 3 times a week 4 or more times a week 2. How Many Drinks Containing Alcohol Do You Have on a Typical Day When You Are Drinking? 1 or 2 3 or 4 5 or 6 7 to 9 10 or more 3. How Often Do You Have 6 or More Drinks on 1 Occasion? Never Less than monthly Monthly Weekly Daily or almost daily 4. How Often During the Last Year Have You Found That You Were Not Able to Stop Drinking Once You Had Started? Never Less than monthly Monthly Weekly Daily or almost daily 5. How Often During the Last Year Have You Failed to Do What Was Expected From You Because of Drinking? Never Less than monthly Monthly Weekly Daily or almost daily 6. How Often During the Last Year Have You Needed a First Drink in the Morning to Get Yourself Going After a Heavy Drinking Session? Never Less than monthly Monthly Weekly Daily or almost daily 7. How Often in the Last Year Have You Had a Feeling of Guilt or Remorse After Drinking? Never Less than monthly Monthly Weekly Daily or almost daily 8. How Often During the Last Year Have Been Unable to Remember What Happened the Night Before Because You Had Been Drinking? Never Less than monthly Monthly Weekly Daily or almost daily 9. Have You or Someone Else Been Injured as a Result of Your Drinking? No Yes, but not in the last year Yes, during the last year 10. Has a Relative or Friend or a Doctor or Other Health Worker Been Concerned About Your Drinking or Suggested You Cut Down? No Yes, but not in the last year Yes, during the last year

Table 4-4 Diagnostic Standards and Diagnostic Accuracy for the CAGE Questionnaire Patients Source, y

Type

No.

Diagnostic Standard

Bernadt et al,19 1982 Psychiatric inpatients

385

Buchsbaum et al,6 1991

Ambulatory medical patients

821

Ethyl alcohol intake interview plus ≥ 16 drinks/d or medical record review diagnosis of alcoholism DSM-III-R with SCID

Bush et al,7 1987

Medical and orthopedic inpatients

521

DSM-III with MAST, NIAAA intake questionnaire

King,20 1986

Ambulatory general patients

407

Ethyl alcohol intake interview plus ≥ 8 drinks/d

Mayfield et al,28 1974 Veterans Affairs hospital inpatients

366

Multidisciplinary team diagnosis

Sokol et al,18 1989

Prenatal clinic

971

Periconceptual ethyl alcohol intake interview plus ≥ 2 drinks/d

Waterson and Murray-Lyon,21 1989

Prenatal clinic

893

Periconceptual ethyl alcohol intake interview plus ≥ 2 (top row) vs ≥ 1 (bottom row) drink/d

Positive Prevalence of CAGE Result Sensitivity, % Specificity, % Alcohol Disease, % ≥2

97

76

17

≥2 ≥1 ≥2 ≥1 ≥2 ≥1 ≥2 ≥1 ≥2 ≥1 ≥2 ≥2

73 89 75 85 82 0 81 90 38 59 33 20

91 81 96 89 95 84 89 72 92 82 95 96

36 2 4 39 4 2 20

Abbreviations: CAGE, cut down, a nnoyed, g uilty, e ye opener; DSM-III, Diagnostic and Statistical Manual of Mental Disorders, Third Edition; DSM-III-R, Diagnostic and Statistical Manual of Mental Disorders, Revised Third Edition; MAST, Michigan Alcoholism Screening Test; NIAAA, National Institute on Alcohol Abuse and Alcoholism; SCID, Structured Clinical Interview for DSM-III-R.

43

CHAPTER 4

The Rational Clinical Examination

tested as a discrete group of questions against an accepted diagnostic standard. As with the other questionnaires for alcohol disorders, items in the AUDIT are represented in the commonly used reference standards, DSM-III-R and ICD-10. This likely inflates the estimates of reliability coefficients and test accuracy. The AUDIT attempts to identify drinkers whose consumption places them at risk of harmful or dependent alcohol use before dependence has occurred. Three AUDIT questions relate to amounts and frequency of consumption. There is no reliable way to test the accuracy of patient responses concerning consumption. If heavy drinkers are defensive about their drinking and tend to underreport consumption, the AUDIT estimate of hazardous drinking may be conservative.

PREDICTIVE ACCURACY OF THE CAGE QUESTIONNAIRE There are 2 ways for clinicians to calculate predictive value or posterior probability of disease.40,41 The first approach uses test sensitivity, specificity, and estimates of disease prevalence in Bayes theorem. The second approach multiplies the LR by the pretest odds of disease to obtain the posttest odds of disease. The 2 methods are equivalent when the diagnostic test

Table 4-5 Likelihood Ratios of CAGE Questions for the Diagnosis of Alcohol Abuse and Alcohol Dependence

Prevalence of alcohol disease LR by CAGE score 0 1 2 3 4

Buchsbaum et al,6 1991

Bush et al,7 1987

Mayfield et al,28 1974

0.36

0.20

0.39

0.14 1.5 4.5 13 101

0.18 1.4 6.8 158 ∞

0.13 0.90 1.6 15 ∞

Abbreviations: CAGE, cut down, a nnoyed, g uilty, e ye opener; LR, likelihood ratio.

Table 4-6 Posterior Probability of Alcohol Abuse or Alcohol Dependence Calculated With Likelihood Ratios a Posterior Probability

CAGE Score 0 1 2 3 4

LR 0.14 1.5 4.5 13 101

Prevalence of Prevalence of Alcohol Disease Alcohol Disease of 10% of 36% .02 .14 .33 .59 .92

.07 .46 .72 .88 .98

Abbreviations: CAGE, cut down, a nnoyed, g uilty, e ye opener; LR, likelihood ratio. aLRs are based on data from Buchsbaum et al.6

44

used gives dichotomous results. However, if the test results are not dichotomous, and most are not, these 2 methods may give surprisingly different results. The insistence that a given cut point be assigned to continuous data or multiple categorical levels can result in a loss of diagnostic power and even erroneous diagnostic conclusions. In the introductory article to this series, Sackett42 introduced the concept of LRs for diagnostic tests with multiple levels of response. If you are not familiar with LRs, I encourage you to review that article. If one wishes to avoid some of the pitfalls that may occur when interpreting the results of questionnaires, it is important to be able to interpret the results with LRs. Table 4-5 lists 3 studies of the CAGE questionnaire in which LRs can be calculated.6,7,28 These studies have low LRs for CAGE scores of 0 (0.13-0.18), high LRs for CAGE scores of 3 (13-158), and very high LRs for CAGE scores of 4 (101 to infinity). Table 4-6 shows the posterior probability of alcohol abuse or dependence for each CAGE score according to the Buchsbaum et al6 data and prevalences of 10% and 36%. Alcohol abuse or dependence is unlikely in persons with a score of 0. With a score of 3, the diagnosis is likely, and a score of 4 is virtually diagnostic of alcohol abuse or dependence in the higher-prevalence group. However, more caution needs to be exercised when interpreting CAGE scores of 1 or 2. The likelihood of alcohol abuse or dependence is increased in persons with scores of 2, but one might want to administer other confirmatory tests before the patient is given a diagnosis. A score of 1 has an LR of 1.5, and the posttest probability of disease is only marginally higher than the pretest probability of disease.

PROBLEMS IN THE IDENTIFICATION OF AT-RISK DRINKING IN PREGNANT WOMEN Pregnant women who drink 2 or more drinks per day may expose the fetus to an increased risk of developmental delay, growth retardation, cardiac defects, and craniofacial abnormalities.18,24 Women drinking enough to expose the fetus to a teratogenic risk may underreport their consumption. This is most pronounced among those women with high MAST scores who are drinking heavily.43,44 It has also been shown that the BMAST and CAGE questionnaires are insensitive instruments for identifying pregnant drinkers who consume 2 or more drinks per day.18,21 Sokol et al18 modified the CAGE questionnaire by substituting for the question on “guilt” to one on alcohol tolerance: “How many drinks does it take to make you high?” The patient was considered tolerant if it took more than 2 drinks to make her feel high. The authors claim that this question is not likely to generate defensiveness and denial. This modified questionnaire, T-ACE (tolerance, annoyed, cut down, eye opener), was administered to 1065 women attending an inner-city obstetric clinic. The prevalence of at-risk drinking in this study was judged to be 4.3%. The T-ACE questionnaire was found to be more sensitive than the CAGE questionnaire (76% vs 59%) and equivalent to the MAST in identifying pregnant women drinking more

CHAPTER 4 than 2 drinks per day when the cut point for a positive test result was a score of 1 or higher. Unfortunately, 40% of the women judged to be at-risk drinkers scored 0 on the CAGE questionnaire. Although the T-ACE questionnaire was more sensitive, 25% of at-risk drinkers had a score of 0. In this setting, the specificities of the T-ACE, CAGE, and MAST questionnaires were similar (76%-82%) and the positive predictive values were 13% to 14%. Given the low sensitivity of these tests, a significant portion of pregnant drinkers will go undetected. The low prevalence of at-risk drinking in this population and the moderate specificity of these tests result in low positive predictive values. Consequently, these questionnaires cannot be expected to reliably identify problem pregnant drinkers.

THE BOTTOM LINE In summary, the CAGE questionnaire can be a useful tool in the diagnosis of DSM-III-R–defined alcohol abuse and dependence and very heavy drinking (>8 drinks/d). A CAGE score of 0 has a good negative predictive value at a lower prevalence of disease. Scores of 3 or 4 strongly support the diagnosis of alcohol abuse. However, scores of 1 or 2 must be interpreted with caution, and one should use the LR approach to accurately interpret these intermediate scores. The CAGE questionnaire has not been tested as a tool for identifying persons who may be engaged in hazardous drinking of lesser amounts of alcohol; for example, 4 drinks per day. It is likely that the test will be insensitive in detecting these individuals. The AUDIT was recently developed to identify these hazardous drinkers. It has not been thoroughly tested, but the initial report suggests that it is reasonably accurate. Because 7 of the AUDIT questions are almost identical to questions in the MAST or CAGE, it should be good at identifying alcohol abuse and alcohol dependence. The other 3 AUDIT questions relate to consumption and constitute an attempt to identify hazardous drinkers. It may not be possible to determine the accuracy of these questions in the absence of a reliable, socially acceptable diagnostic standard for consumption. However, if heavy drinkers are defensive about their levels of consumption, the AUDIT may underestimate levels of consumption. The CAGE questionnaire is short and can be easily memorized. It has been field tested and shown to be a useful tool. The busy clinician could use the CAGE questionnaire to find unrecognized patients who are abusing or dependent on alcohol. The first 3 questions of the AUDIT are also easily memorized and can provide an estimate of the patient’s typical alcohol consumption. The busy clinician could use these questions as a form of targeted preventive medicine. Men drinking more than 4 drinks per day and women drinking more than 2 drinks per day should be counseled about the risks of drinking. Identifying pregnant women engaged in at-risk drinking is problematic. The prevalence of at-risk drinking among pregnant women is low, and the screening questionnaires to identify problem drinkers have relatively low sensitivities. Because none of these instruments is sufficiently reliable to use for case finding

Problem Alcohol Drinking

in pregnant women, all pregnant women should be counseled about the risks of drinking while pregnant. Abstinence from alcohol would be the safest option, but women who choose to drink while pregnant should be strongly advised to avoid binge drinking and to drink fewer than 2 drinks per day. Author Affiliation at the Time of the Original Publication

From the Department of Internal Medicine, University of Toronto, and the Division of General Internal Medicine, St Michael’s Hospital, Toronto, Ontario, Canada. Acknowledgments

The author thanks James Rankin, MD, for his critical review of the manuscript.

REFERENCES 1. Eckardt MJ, Harford TC, Kaelber CT, et al. Health hazards associated with alcohol consumption. JAMA. 1981;246(6):648-666. 2. Klatsky AL, Armstrong MA, Friedman GD. Alcohol and mortality. Ann Intern Med. 1992;117(8):646-654. 3. West LJ, Cohen S. Provisions for dependency disorders. In: Holland WW, Detels R, Knox G, eds. Oxford Textbook of Public Health. New York, NY: Oxford University Press; 1985:vol 2; chap 9:106-114. 4. Rankin JG, Ashley MJ. Alcohol-related health problems. In: Last JM, Wallace RB, eds. Maxcy-Rosenau-Last Public Health and Preventive Medicine. 13th ed. East Norwalk, CT: Appleton & Lange; 1992:chap 43:741-767. 5. Schuckit MA. Genetics and the risk for alcoholism. JAMA. 1985;254(18): 2614-2617. 6. Buchsbaum DG, Buchanan RG, Centor RM, Schnoll SH, Lawton MJ. Screening for alcohol abuse using CAGE scores and likelihood ratios. Ann Intern Med. 1991;115(10):774-777. 7. Bush B, Shaw S, O’Leary P, Delbanco T, Aronson MD. Screening for alcohol abuse using the CAGE questionnaire. Am J Med. 1987;82(2):231-235. 8. Cleary PD, Miller M, Bush BT, Warburg MM, Delbanco TL, Aronson MD. Prevalence and recognition of alcohol abuse in a primary care population. Am J Med. 1988;85(4):4664-4671. 9. Buchsbaum DG, Buchanan RG, Poses RM, Schnoll SH, Lawton MJ. Physician detection of drinking problems in patients attending a general medicine practice. J Gen Intern Med. 1992;7(5):517-521. 10. Curtis JR, Geller G, Stokes EJ, Levine DM, Moore RD. Characteristics, diagnosis, and treatment of alcoholism in elderly patients. J Am Geriatr Soc. 1989;37(4):310-316. 11. Marsh CM, Vaughn DA. Do MAST and CAGE scores help detect alcoholism? Presented at: 142nd Annual Meeting of the American Psychiatric Association; May 8, 1989; San Francisco, CA. 12. Moore RD, Bone LR, Geller G, Mamon JA, Stokes EJ, Levine DM. Prevalence, detection, and treatment of alcoholism in hospitalized patients. JAMA. 1989;261(3):403-407. 13. Dawson NV, Dadheech G, Speroff T, Smith RL, Schubert DSP. The effect of patient gender on the prevalence and recognition of alcoholism on a general medical service. J Gen Intern Med. 1992;7(1):38-45. 14. World Health Organization. Mental and behavioral disorders: diagnostic criteria for research. In: International Statistical Classification of Diseases, 10th Revision (ICD-10). Geneva, Switzerland: World Health Organization; 1992. 15. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Revised Third Edition. Washington, DC: American Psychiatric Association; 1987. 16. Rounsaville BJ, Bryant K, Babor T, Kranzler H, Kadden R. Cross system agreement for substance use disorders: DSM-III-R, DSM-IV and ICD10. Addiction. 1993;88(3):337-348. 17. Spitzer RL, Williams JBW, Gibbon M, First MB. Structured Clinical Interview for DSM-III-R: Non-patient Edition (SCID-NP, Version 1.0). Washington, DC: American Psychiatric Press; 1990. 18. Sokol RJ, Martier SS, Ager JW. The T-ACE questions: practical prenatal detection of risk drinking. Am J Obstet Gynecol. 1989;160(4):863868.

45

CHAPTER 4

The Rational Clinical Examination

19. Bernadt MW, Mumford J, Taylor C, Smith B, Murray RM. Comparison of questionnaire and laboratory tests in the detection of excessive drinking and alcoholism. Lancet. 1982;1(8267):325-328. 20. King M. At risk drinking among general practice attenders: validation of the CAGE questionnaire. Psychol Med. 1986;16(1):213-217. 21. Waterson EJ, Murray-Lyon IM. Screening for alcohol related problems in the antenatal clinic: an assessment of different methods. Alcohol Alcohol. 1989;24(1):21-30. 22. Skinner HA, Schuller R, Roy J, Israel Y. Identification of alcohol abuse using laboratory tests and a history of trauma. Ann Intern Med. 1984;101(6):847-851. 23. Pequingnot G, Tuyns AJ, Berta JL. Ascitic cirrhosis in relation to alcohol consumption. Int J Epidemiol. 1978;7(2):113-120. 24. Ernhart CB, Sokol RJ, Martier S, et al. Alcohol teratogenicity in the human: a detailed assessment of specificity, critical period, and threshold. Am J Obstet Gynecol. 1987;156(1):33-39. 25. Saunders JB, Aasland OG, Amundsen A, Grant M. Alcohol consumption and related problems among primary health care patients: WHO collaborative project on early detection of persons with harmful alcohol consumption, I. Addiction. 1993;88(3):349-362. 26. Bien TH, Miller WR, Tonigan JS. Brief interventions for alcohol problems: a review. Addiction. 1993;88(3):315-335. 27. Ewing JA. Detecting alcoholism: the CAGE questionnaire. JAMA. 1984; 252(14):1905-1907. 28. Mayfield D, McLeod G, Hall P. The CAGE questionnaire: validation of a new alcohol screening instrument. Am J Psychiatry. 1974;131:1121-1123. 29. Selzer ML. The Michigan Alcoholism Screening Test: the quest for a new diagnostic instrument. Am J Psychiatry. 1971;127(12):1653-1658. 30. Pokorny AD, Miller BA, Kaplan HB. The brief MAST: a shortened version of the Michigan Alcoholism Screening Test. Am J Psychiatry. 1972;129(3): 342-345. 31. Selzer ML, Vinokur A, van Rooijen MA. A self-administered Short Michigan Alcoholism Screening Test (SMAST). J Stud Alcohol. 1975;36(1):117-126.

46

32. Saunders JB, Aasland OG, Babor TF, De La Fuente JR, Grant M. Development of the Alcohol Use Disorders Identification Test (AUDIT): WHO collaborative project on early detection of persons with harmful alcohol consumption, II. Addiction. 1993;88(6):791-804. 33. Beresford T, Adduci R, Low D, Goggans F, Hall RCW. A computerized biochemical profile for detection of alcoholism. Psychosomatics. 1982;23:713-720. 34. Clark PMS, Kricka LJ. Biochemical tests for alcohol abuse. Br J Alcohol Alcohol. 1986;16:11-26. 35. Chick J, Kreitman N, Plant M. Mean cell volume and gamma-glutamyltranspeptidase as markers of drinking in working men. Lancet. 1981;1 (8232):1249-1251. 36. Gibbs E. Validity and reliability of the Michigan Alcoholism Screening Test. Drug Alcohol Depend. 1983;12:279-285. 37. Skinner HA, Sheu WJ. Reliability of alcohol use indices. J Stud Alcohol. 1982;43(11):1157-1170. 38. Ross HE, Gavin DR, Skinner HA. Diagnostic validity of the MAST and the Alcohol Dependence Scale in the assessment of DSM-III alcohol disorders. J Stud Alcohol. 1990;51(6):506-513. 39. Buchsbaum DG, Buchanan RG, Welsh J, Centor RM, Schnoll SH. Screening for drinking disorders in the elderly using the CAGE questionnaire. J Am Geriatr Soc. 1992;40(7):662-665. 40. Sackett DL, Haynes RB, Guyatt G, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. 2nd ed. Boston, MA: Little Brown & Co Inc; 1991. 41. Sox HC, Blatt MA, Higgins MC, Marton KI. Medical Decision Making. Stoneham, MA: Butterworths-Heinemann; 1988. 42. Sackett DL. A primer on the precision and accuracy of the clinical examination. JAMA. 1992;267(19):2638-2644. 43. Ernhart CB, Morrow-Tlucak M, Sokol RJ, Martier S. Underreporting of alcohol use in pregnancy. Alcoholism. 1988;12(4):506-511. 44. Morrow-Tlucak M, Ernhart CB, Sokol RJ, Martier S, Ager J. Underreporting of alcohol use in pregnancy: relationship to alcohol problem history. Alcoholism. 1989;13(3):399-401.

U P D A T E : Problem Alcohol Drinking

4

Prepared by David L. Simel, MD, MHS Reviewed by Will Yancy, MD

CLINICAL SCENARIO A 35-year-old woman requests an appointment for a gynecologic examination. Your nursing staff gives her the usual paperwork and self-administered questionnaires while she waits in the examination room. She fills them all out and gives a response of no to each question. What questions did your patient answer? Do you know how to evaluate her questionnaire? Could she be a problem drinker?

UPDATED SUMMARY ON SCREENING FOR ALCOHOL PROBLEMS Original Review Kitchens JM. Does this patient have an alcohol problem? JAMA. 1994;272(22):1782-1787.

the shorter-form questionnaires that would be more applicable for primary care (see Appendix Tables 4-12, 4-14, 4-15, and 4-16 for the forms AUDIT, CAGE, T-ACE, and TWEAK, respectively). We also retrieved a recent systematic review that was published by the Agency for Health Care Policy and Research as part of an update to the Guide to Clinical Preventive Services, Third Edition, Periodic Updates (see http:// www.ahrq.gov/clinic/uspstf/uspsdrin.htm [accessed May 17, 2008] for the article that first appeared in Whitlock EP, Polen MR, Green CA, Orleans CT, Klein J. Behavioral counseling interventions in primary care to reduce risky/harmful alcohol use by adults: a summary of the evidence for the US Preventive Services Task Force. Ann Intern Med. 2004;140(7):558569). When necessary, we retrieved references from the systematic reviews to verify likelihood ratios (LRs) for reported instruments. After reviewing the retrieved studies and their reference lists, we repeated a literature search using the textwords CAGE, AUDIT, TWEAK, and T-ACE to make sure that we missed no original primary care studies that would have met inclusion criteria.

UPDATED LITERATURE SEARCH The perceived shortcomings of questionnaires for alcohol use disorders, coupled with the high prevalence of problems, prompted a worldwide effort to improve detection of alcohol use disorders. The US Preventive Services Task Force updated their recommendations (2004) according to new evidence concerning the effectiveness of screening and brief treatment interventions. Our literature search, conducted between 1993 and July 25, 2004, combined the search terms “alcoholism/di” and “alcohol drinking/cl, pc, ep” and the textwords “problem drinking” with “screening.” The search was limited to “systematic reviews,” and we used the Ovid MEDLINE database, along with the evidence-based medicine databases, to yield 19 English-language articles. We retained articles that were systematic (as opposed to nonsystematic reviews) and that focused on primary care (eg, rather than population-based samples, emergency or psychiatric care). This resulted in 4 articles that we obtained for review. We kept 1 article that had emergency department data to better assess the issues of screening women as opposed to men. We concentrated on

NEW FINDINGS It is now abundantly clear that choosing to screen for problem drinking by using any standard approach is overwhelmingly more important than deciding on the screening form! However, once clinicians commit to screening for alcohol problems, there are advantages and disadvantages to the current questionnaires that require understanding (1) what disorder you are screening for and (2) your patient population. Problem drinking is drinking behavior that has not reached the level of abuse or dependence. Studies use various descriptors for problem drinking, including the terms hazardous, at risk, or harmful drinking. The past decade has seen the continued validation of the AUDIT questionnaire, the recognition that screening for alcohol abuse differs from screening for hazardous or problem drinking, and the need for different approaches to screening according to the patient population. Screening women and, possibly, older patients requires different approaches than screening adult men.

47

CHAPTER 4

Update

DETAILS OF THE UPDATE IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION The results have not changed, but newer information allows revised estimates of the sensitivity, specificity, and LRs of screening tests for alcohol problems (Table 4-7).

CHANGES IN THE REFERENCE STANDARD The reference standard for alcohol abuse and dependence remains the guidelines in the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition.1 It is now important for clinicians to understand what constitutes a “drink” and the newer categories of patients’ drinking problems that have not reached the level of abuse or dependence. The definition of a drink changes across cultures, restaurants, and homes. A stan-

dard drink in Great Britain contains about 8 g of alcohol, as opposed to the standard of 19.75 g in Japan.2 The US Department of Health and Human Services and the US Department of Agriculture define a standard drink in alcohol and volume content that approximates 12 fl oz of regular beer, 5 fl oz of wine, or 1.5 fl oz of 80-proof distilled spirits.2(p7) The National Institute on Alcohol Abuse and Alcoholism defines moderate drinking according to the frequency of drinking. Moderate male drinkers ingest 14 or fewer drinks/ wk; moderate women drinkers, 7 or fewer drinks/wk; and adults older than 65 years, 7 or fewer drinks/wk.3 Men younger than 65 years would be considered “at risk” drinkers when they drink more than 14 drinks/wk or more than 4 drinks per occasion. Women have drinking problems at lower thresholds: more than 7 drinks/wk or more than 3 drinks per occasion defines “at risk” drinking among women. The World Health Organization uses slightly different descriptors that rely on the consequences of drinking rather than the amount and frequency: “hazardous” drinkers are those who are at risk

Table 4-7 Alcohol Problem Screening Results by Test and Population Profile Screening a Test (n = Number of Studies)

Sensitivity (95% CI)

Specificity (95% CI)

LR+ (95% CI)

LR– (95% CI)

At Risk, Harmful, or Hazardous Drinking Adults AUDIT-C ≥ 8 (n = 1) AUDIT ≥ 8 (n = 2) CAGE ≥ 2 (n = 1; all patients > 60 y) CAGE ≥ 2 (n = 2) Pregnant Womenb TWEAK ≥ 3 (n = 2) TWEAK ≥ 2 (n = 2) T-ACE ≥ 1 (n = 3) CAGE ≥ 2 (n = 3) CAGE ≥ 1 (n = 3) Adults CAGE ≥ 2 (n = 10) CAGE ≥ 1 (n = 10) AUDIT ≥ 8 (n = 2) Women CAGE ≥ 2 (n = 2) CAGE ≥ 1 (n = 1) ≥ 60 y CAGE ≥ 2 (n = 3) CAGE ≥ 1 (n = 2) AUDIT ≥ 8 (n = 1)

0.40 0.57-0.59 0.14

0.97 0.91-0.96 0.97

12 (5.0-30) 6.8 (4.7-10) 4.7 (3.7-6.0)

0.62 (0.52-0.74) 0.46 (0.38-0.55) 0.89 (0.86-0.91)

0.49-0.69

0.75-0.95

3.4 (1.2-10)

0.66 (0.54-0.81)

8.4 4.0 3.6 6.9 3.5

0.36 0.12 0.15 0.56 0.42

0.33 (0.25-0.43) 0.33 (0.25-0.43) 0.37 (0.28-0.49)

0.67 (0.61-0.73) 0.91 (0.87-0.94) 0.89 (0.81-0.94) 0.48 (0.44-0.53) 0.66 (0.62-0.70)

0.92 (0.91-0.93) 0.77 (0.76-0.78) 0.75 (0.70-0.79) 0.93 (0.92-0.93) 0.81 (0.81-0.82) Alcohol Abuse or Dependence

0.66-0.71

0.85-0.86

6.9 (4.2-11) 3.4 (2.3 to 5.1) 4.6 (3.5-6.1)

0.58 (0.32-0.80) 0.89 (0.82-0.93)

0.93 (0.90-0.95) 0.83 (0.79-0.86)

8.3 5.2

0.45 0.13

0.13-0.82 0.79-0.98 0.33

0.82-0.99 0.56-0.88 0.91

5.2 (3.0-9.0) 2.6 (1.5-4.5) 3.6 (1.6-8.0)

0.37 (0.29-0.47) 0.24 (0.15-0.40) 0.75 (0.58-0.90)

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; AUDIT-C, AUDIT Consumption Questions; CAGE, c ut down, a nnoyed, g uilty, e ye opener; CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; T-ACE, t olerance, a nnoyed, c ut down, e ye opener; TWEAK, t olerance, w orry, e ye opener, a mnesia, cut (k ut) down. aThe screening questionnaire should be assessed based on the patient population, the threshold that describes positivity, and whether you are screening for “at risk” drinking or dependence. bLikelihood ratio estimated from summary sensitivity and specificity measures.

48

CHAPTER 4 of the adverse consequences of alcohol, whereas “harmful drinking” causes physical or psychological harm that does not yet meet the criteria for abuse.3(p1979) About 4.6% of US adults abuse alcohol, with men (6.9%) having about 3 times the rate compared to women (2.6%).4 An additional 3.8% display alcohol dependence (5.4% of men vs 2.3% of women).

RESULTS OF LITERATURE REVIEW EVIDENCE FROM GUIDELINES Canadian Task Force on Preventive Health Care The Canadian Task Force has not updated their recommendations since 1994,5 at a time when the CAGE and the MAST had the best available data. Screening was recommended, although the limitations of these instruments in detecting hazardous drinking were recognized.

Web Resources for Alcohol Screening A patient-administered screen: http://www.alcoholscreening. org/ (accessed May 17, 2008). For clinicians: http://pubs.niaaa. nih.gov/publications/Practitioner/pocketguide/pocket_guide. htm (accessed May 17, 2008).

CLINICAL SCENARIO—RESOLUTION Fortunately, your clinical practice is routinely screening for alcohol problems. However, it is important to know exactly how your patients are being screened. If your clinic is using the CAGE questionnaire, you may detect most patients with alcohol dependence, but you will likely fail to recognize patients who are problem drinkers. This is especially true for women because the sensitivity for all questionnaires is less compared with that for men. In addition to knowing which questionnaire your clinic nurses are using, you need to know how to score the results. Accepting a lower score as “positive” will improve the sensitivity so that you will not miss as many patients with alcohol problems. Because the prevalence of alcohol problems is so high, it is important not to miss these patients. Assuming your patient drinks some alcohol, the negative LR for alcohol abuse or dependence is 0.18 for adults with at least 1 question positive in the CAGE. The sensitivity is better for the AUDIT, but primary care clinics

Problem Alcohol Drinking

might not use the AUDIT because it contains more questions. If you want to detect potentially harmful or hazardous drinking, it would be good to ask the “Tolerance” question from the TWEAK (eg, “How many drinks does it take before you begin to feel the first effects of the alcohol?”). If the patient answers “at least 3,” then you need to assess more fully for problem drinking. From a practice management standpoint, you and your clinic nurses should review your patient population (Table 4-8). If your clinic patients are mostly women, the best current screening forms are the TWEAK or the TACE. No data support the existence of 1 ideal questionnaire applicable to all patients, although making no choice of a screening instrument guarantees missed opportunities for intervention. If you are using the CAGE questions, you may choose to switch to the AUDIT (which will detect problem drinking, abuse, and dependence). If the AUDIT is too long for your patients, then you could select the CAGE, TWEAK, or T-ACE and use a low threshold for pursuing follow-up questions. Two alternate approaches combine the best features of the AUDIT (which detects hazardous drinking but is long) with the CAGE (which detects abuse and dependence and is short but does not detect problem drinking). The resulting AUDIT-C is a shorter questionnaire than the AUDIT (see Appendix Table 4-13) and, in one study, appears to have the same measurement characteristics as the full AUDIT.

Table 4-8 US Preventive Health Services Task Force Recommendations for Tests in Different Populations Population

Adults ≥ 65 y Pregnant women Adults ≥ 65 y Pregnant women

AUDIT

CAGE

TWEAK or T-ACE

Risky or Harmful Drinking Yes No Uncertain No No No

No No Yes

Alcohol Abuse or Dependence Yes Yes Uncertain Uncertain No No

Yes Uncertain Yes

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, a nnoyed, g uilty, e ye opener; T-ACE, t olerance, annoyed, c ut down, eye opener; TWEAK, t olerance, w orry, e ye opener, amnesia, cut (k ut) down.

49

CHAPTER 4

Update

SCREENING FOR ALCOHOL PROBLEMS—MAKE THE DIAGNOSIS

PRIOR PROBABILITY Data from the National Institute on Alcohol Abuse and Alcoholism suggest that 3 of 10 adults engage in risky drinking behaviors. In primary care clinics, the prevalence will be around 11% to 18%.

POPULATIONS FOR WHOM PROBLEM DRINKING SHOULD BE ASSESSED • All adults (see Tables 4-9 and 4-10) • Targeted populations/conditions requiring assessment include pregnant women (see Table 4-11), adolescents, and emergency patients Table 4-9 Detecting the Likelihood of At-risk, Harmful, or Hazardous Drinking in Adults LR Range AUDIT or AUDIT-C ≥ 8 AUDIT or AUDIT-C ≤ 8

6.8-12 0.46-0.62

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; AUDIT-C, AUDIT Consumption Questions; LR, likelihood ratio.

Table 4-10 Detecting the Likelihood of Alcohol Abuse or Dependence in Adultsa LR (95% CI) CAGE ≥ 1 CAGE = 0

3.4 (2.3-5.1) 0.18 (0.11-0.29)

Abbreviations: CAGE, c ut down, annoyed, g uilty, eye opener; CI, confidence interval; LR, likelihood ratio. aWomen have a lower sensitivity than men do but have a higher specificity. A cut point of ≥ 1 optimizes the sensitivity and, therefore, the negative LR.

Table 4-11 Detecting the Likelihood of 2 or More Drinks/Day During Pregnancya LR Range TWEAK ≥ 2 or T-ACE ≥ 1 TWEAK ≤ 1 or T-ACE = 0

3.6-4.0 0.12-0.15

Abbreviations: LR, likelihood ratio; T-ACE, t olerance, annoyed, cut down, eye opener; TWEAK, tolerance, w orry, e ye opener, amnesia, cut (kut) down. aLRs are estimated from studies that have incorporation bias where the interviewer knew the results of the screening questionnaires.

REFERENCE STANDARD TESTS Diagnostic interview schedule for Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition,2 interview performed by an experienced provider in an alcohol-related interview.

REFERENCES FOR THE UPDATE 1. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-IV). 4th ed. Washington, DC: American Psychiatric Association; 2000. 2. Dufour MC. What is moderate drinking? Alcohol Res Health. 1999;23 (1):5-14. 3. Fiellen DA, Reid MC, O'Connor PG. Screening for alcohol problems in primary care. Arch Intern Med. 2000;160(13):1977-1989.a 4. Grant BF, Dawson DA, Stinson FS, Chou SP, Dufour MC, Pickering RP. The 12-month prevalence and trends in DSM-IV alcohol abuse and dependence: United States, 1991-1992 and 2001-2002. Drug Alcohol Depend. 2004;74(3):223-234.

5. Haggerty JL. Early detection and counseling of problem drinking. In: Canadian Task Force on the Periodic Health Examination. Canadian Guide to Clinical Preventive Health Care. Ottawa, Ontario, Canada: Health Canada; 1994:488-498. http://www.ctfphc.org/sections/section06ch042.htm. Accessed May 17, 2008. 6. Bradley KA, Boyd-Wickizer J, Powell SH, Burman ML. Alcohol screening questionnaires in women. JAMA. 1998;280(2):166-171.a 7. Whitlock EP, Polen MR, Green CA, Orleans CT, Lein JT. Behavioral Counseling Interventions in Primary Care to Reduce Risky/Harmful Alcohol Use. Rockville, MD: Agency for Healthcare Research and Quality; 2004. Systematic Evidence Review No. 30. Electronic copies available at http://www.ahrq.gov/clinic/3rduspstf/alcohol/alcomissum.pdf (accessed May 17, 2008). a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

50

CHAPTER 4

APPENDIX—ALCOHOL SCREENING INSTRUMENTS6,7 Adapted from Whitlock EP, Polen MR, Green CA, Orleans CT, Lein JT. Behavioral Counseling Interventions in Primary Care to Reduce Risky/Harmful Alcohol Use. Rockville, MD:

Problem Alcohol Drinking

Agency for Healthcare Research and Quality; 2004. Systematic Evidence Review No. 30. Electronic copies available at http://www.ahrq.gov/clinic/3rduspstf/alcohol/alcomissum.pdf (accessed, May 17, 2008).

Table 4-12 AUDIT Circle the number that comes closest to your alcohol use in the PAST YEAR. 1. How often do you have a drink containing alcohol? (0) Never (1) Monthly or less (2) 2 to 4 times a month (3) 2 or 3 times a week (4) 4 or more times a week 2. How many drinks containing alcohol do you have on a typical day when you are drinking? (0) 1 or 2 (1) 3 or 4 (2) 5 or 6 (3) 7 to 9 (4) 10 or more 3. How often do you have 6 or more drinks on 1 occasion? (0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 4. How often during the last year have you found that you were not able to stop drinking once you had started? (0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 5. How often during the last year have you failed to do what was expected from you because of drinking? (0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 6. How often during the last year have you needed a first drink in the morning to get yourself going after a heavy drinking session? (0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 7. How often in the last year have you had a feeling of guilt or remorse after drinking? (0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 8. How often during the last year have been unable to remember what happened the night before because you had been drinking? (0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily 9. Have you or someone else been injured as a result of your drinking? (0) No (1) Yes, but not in the last year (2) Yes, during the last year 10. Has a relative or friend or a doctor or other health worker been concerned about your drinking or suggested you cut down? (0) No (1) Yes, but not in the last year (2) Yes, during the last year Abbreviation: AUDIT, Alcohol Use Disorders Identification Test. Scoring: A score of 8 or more is considered a positive screen for hazardous or harmful drinking.

Table 4-13 AUDIT-C Circle the number that comes closest to your alcohol use in the PAST YEAR. 1. How often do you have a drink containing alcohol? Consider a “drink” to be 1 can or bottle of beer, 1 glass of wine, 1 wine cooler, 1 cocktail, or 1 shot of hard liquor (like scotch, gin, or vodka). (0) Never (1) Monthly or less (2) 2 to 4 times a month (3) 2 to 3 times a week (4) 4 or more times a week 2. How many drinks containing alcohol do you have on a typical day when you are drinking? (0) 1 or 2 (1) 3 or 4 (2) 5 or 6 (3) 7 to 9 (4) 10 or more 3. How often do you have 6 or more drinks on 1 occasion? (0) Never (1) Less than monthly (2) Monthly (3) Weekly (4) Daily or almost daily Abbreviation: AUDIT-C, Alcohol Use Disorders Identification Test Consumption Questions. Scoring: A score of 8 or more is considered a positive screen for hazardous or harmful drinking.

51

CHAPTER 4

Update

Table 4-14 CAGE

Table 4-16 TWEAK

1. Have you ever felt you should cut down on your drinking? 2. Have people annoyed you by criticizing your drinking? 3. Have you ever felt bad or guilty about your drinking? 4. Have you ever had a drink first think in the morning to steady your nerves or to get rid of a hangover (eye opener)?

1. How many drinks can you hold? (“Hold” version; ≥ 6 drinks indicates tolerance) or how many drinks does it take before you begin to feel the first effects of the alcohol? (“High” version; ≥ 3 indicates t olerance)? 2. Does your spouse (or do your parents) ever worry or complain about your drinking? 3. Have you ever had a drink first think in the morning to steady your nerves or to get rid of a hangover (eye opener)? 4. Have you ever awakened the morning after some drinking the night before and found that you could not remember a part of the evening before? (amnesia) 5. Have you ever felt you ought to cut (kut) down on your drinking?

Abbreviation: CAGE, cut down, a nnoyed, g uilty, e ye opener. Scoring: Two or more positive responses are considered a positive screen for problem drinking in most studies. Alternatively, you may select a cut point of just 1 positive response to improve the sensitivity.

Table 4-15 T-ACE 1. How many drinks does it take to make you feel high (t olerance)? 2. Have people annoyed you by criticizing your drinking? 3. Have you ever felt you should cut down on your drinking? 4. Have you ever had a drink first think in the morning to steady your nerves or to get rid of a hangover (eye opener)? Abbreviation: T-ACE, tolerance, a nnoyed, cut down, e ye opener. Scoring: Positive response to the tolerance item (positive is considered more than 2 drinks) is scored 2 points; to other items, 1 point each. Total of 2 or more indicates risky drinking.

52

Abbreviation: TWEAK, t olerance, w orry, e ye opener, a mnesia, cut (k ut) down. Scoring: Positive responses to the tolerance or worry items score 2 points each; to other items, score 1 point each. A total score of 3 or more is considered positive for heavy/problem drinking. During pregnancy, it may be more appropriate to consider a score of 2 or more as positive.

E V I D E N C E TO S U P P O R T T H E U P D A T E : Problem Alcohol Drinking

4

TITLE The Value of the CAGE in Screening for Alcohol Abuse and Alcohol Dependence in General Clinical Populations: A Diagnostic Meta-analysis.

Table 4-17 Serial LR for the CAGE Questionnaire at Each Cut Point for Patients From Either Outpatient or Inpatient Settings

AUTHORS Aertgeerts B, Buntinx F, Kester A.

CAGE = 4 CAGE ≥ 3 CAGE ≥ 2 CAGE ≥ 1 CAGE = 0

CITATION J Clin Epidemiol. 2004;57(1):30-39. QUESTION How well does the CAGE questionnaire (cut down, annoyed, guilty, eye opener) perform?

CAGE threshold

LR+ (95% CI) 25 (15-43) 15 (8.2-29) 6.9 (4.2-11) 3.4 (2.3-5.1) 0.18 (0.11-0.29)

DESIGN A formal systematic review with meta-analytic techniques.

Abbreviations: CAGE, c ut down, annoyed, g uilty, eye opener; CI, confidence interval; LR, likelihood ratio.

DATA SOURCES MEDLINE database and MEDION database for diagnostic reviews.

When comparing primary care patients to ambulatory medical patients (excluding inpatients), the results for the LRs among these groups are clinically similar. While inpatients have positive LRs (confidence intervals [CIs]) that overlap at each threshold, the results for the negative LRs differ. The CAGE has much better sensitivity for inpatients, especially at lower thresholds: When patients have no more than 1 positive response on the CAGE, the LR is 0.17 (CI, 0.11-0.28), and when they answer all the questions negatively, the LR is 0.02 (CI, 0-0.11). The authors conclude that the CAGE at a cut point of 2 or greater is of limited value.

STUDY SELECTION AND ASSESSMENT A search for articles published from January 1974 to December 2001 was conducted, along with a manual search of Dutch-language articles. All languages (except Japanese) were included in the search. Studies had to be in a general clinical population and to report the data required for sensitivity and specificity. Studies with verification bias were eliminated, although studies that adjusted for verification bias were retained. Studies outside of general medical practices (eg, psychiatric settings or the emergency department) were excluded.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD CAGE questionnaire as compared with the diagnosis established by the Diagnostic and Statistical Manual of Mental Disorders criteria.

OUTCOME MEASURES Sensitivity, specificity, and likelihood ratios (LRs) of the CAGE for diagnosis of alcohol abuse or dependence.

MAIN RESULTS Thirty-five articles were identified, but only 10 were in compliance with all the inclusion and exclusion criteria (Table 4-17).

CONCLUSION LEVEL OF EVIDENCE Systematic review. STRENGTHS High-quality systematic review with appropri-

ate meta-analytic techniques. The study formulates the research question, includes a comprehensive search and selection of studies, critically appraises the studies and provides the results, and incorporates the results into their interpretation. LIMITATIONS Users of the CAGE should be careful not to extrapolate these data to the diagnosis of hazardous or problem drinking because the studies evaluated alcohol abuse or dependence. We see these data as suggesting that the CAGE is more useful than do the authors. However, it is very important to recognize that the CAGE, with its recommended cut point of CAGE of 2 or greater, is intended to diagnose alcohol abuse or dependence

E4-1

CHAPTER 4

Evidence to Support the Update

and not lower levels of problem drinking. The CAGE is useful for this because getting an affirmative answer greatly increases the probability that the person has a problem. On the other hand, we agree with the authors that questionnaires with 0 to 1 positive responses do not sufficiently rule out abuse or dependence, especially in populations with higher prevalence of abuse or dependence. What about accepting a threshold of only 1 positive response? Further studies are needed, but this would be a reasonable approach for screening. It should be noted that many patients who answer with only 1 positive question will not have an abuse or dependence problem, but it is likely that the sensitivity for such a question would be much higher for problem drinking and you would “miss” fewer patients. For many clinic populations, the LR of 0.18 when the patient answers in the negative for all CAGE questions may not be adequate. This has led many clinics to use a combination of the Alcohol Use Disorders Identification Test (AUDIT; for diagnosing problem drinking) and CAGE (for diagnosing abuse or dependence).

Table 4-18 Performance of the CAGE Questionnaire Among Older Patients Test (No. of Studies) CAGE ≥ 2 (n = 2) CAGE ≥ 1 (n = 2) AUDIT ≥ 8 (n = 1)

LR+ (95% CI)

Sensitivity

Specificity

LR– (95% CI)

0.63-0.70

0.82-0.91

5.3 (3.0-9.0) 0.37 (0.29-0.47)

0.79-0.86

0.56-0.78

2.6 (1.5-4.5) 0.24 (0.15-0.40)

0.33

0.91

3.6 (1.6-8.0) 0.75 (0.58-0.90)

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, a nnoyed, g uilty, eye opener; CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

OUTCOME MEASURES Sensitivity and specificity. The criterion standard assessed for alcohol abuse or dependence. We retrieved articles to calculate the LRs from the original data.

Reviewed by David L. Simel, MD, MHS

MAIN RESULTS

TITLE Screening for Alcohol Abuse and Dependence in Older People by Using DSM Criteria: A Review. AUTHORS Beullens J, Aertgeerts B. CITATION Aging Ment Health. 2004;8(1):76-82. QUESTION Which alcohol screening questionnaires perform best in older patients? DESIGN Formal systematic review without meta-analytic techniques. DATA SOURCES MEDLINE and PsycINFO databases. STUDY SELECTION AND ASSESSMENT Studies published from 1996 to 2002. Studies could be inpatient, outpatient, or nursing home settings for patients 60 years or older. One study of nursing home patients that included those as young as 50 years was included.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD CAGE (cut down, annoyed, guilty, eye opener), AUDIT (Alcohol Use Disorders Identification Test), MAST (Michigan Alcoholism Screening Test), and variations compared with the diagnosis established by the Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria. We assessed the data for the CAGE and the AUDIT because these are shorter questionnaires than the longer MAST (see Appendix in the Update for the actual questionnaires).

Seven articles were identified for inclusion; only 2 were done in the outpatient setting, and the results are displayed in Table 4-18.

CONCLUSION LEVEL OF EVIDENCE Systematic review. STRENGTHS The study formulates the research question,

includes a comprehensive search and selection of studies, and provides the results. LIMITATIONS There is no meta-analytic assessment. A formal quality assessment is not presented. Confidence intervals and sample sizes for the number of patients with alcohol abuse or dependence are not given. The number of studies on drinking problems in older patients is disappointingly low. The authors provide a good rationale for why the existing questionnaires might not work as well in older patients. The authors’ impression is that the CAGE may be better for detecting alcohol abuse or dependence in older patients, which would be consistent with other studies about the use of the CAGE, but it is hard to be conclusive given the paucity of studies in ambulatory older patients. As in other studies, picking a threshold of just 1 or more positive answer to CAGE questions improves the sensitivity. The authors hypothesize that the T-ACE (tolerance, annoyed, cut down, eye opener) might be even more efficient than the CAGE because the “feeling guilty” question is replaced by a “tolerance” question that may be more appropriate for older patients. That hypothesis, along with assessing the proper threshold, needs assessment. The authors do not address the detection of harmful or hazardous drinking in older patients.

Reviewed by David L. Simel, MD, MHS

E4-2

CHAPTER 4

Problem Alcohol Drinking

TITLE Alcohol Screening Questionnaires in Women.

Table 4-19 Performance Characteristics of Screening Questionnaires in Women

AUTHORS Bradley KA, Boyd-Wickizer J, Powell SH, Burman ML.

Setting (No. of Studies, No. of Patients)

CITATION JAMA. 1998;280(2):166-171. QUESTION Which alcohol screening questionnaires perform best in women? DESIGN Formal systematic review without meta-analytic techniques. DATA SOURCES MEDLINE database and Social Science and Science Citations Index. STUDY SELECTION AND ASSESSMENT Studies published from 1996 to July 1997 in English. Studies did not have to be performed in a general clinical population but did need to include a clinic population of women with the data reported separately for women. United States studies were the only studies included. All studies had to compare a brief screening questionnaire to a criterion standard.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD CAGE (cut down, annoyed, guilty, eye opener), AUDIT (Alcohol Use Disorders Identification Test), TWEAK (tolerance, worry, eye opener, amnesia, cut [kut] down), Brief Michigan Alcohol Screening Test (BMAST), T-ACE (tolerance, annoyed, cut down, eye opener), Trauma score, and NET* questionnaires1 as compared with the diagnosis established by the Diagnostic and Statistical Manual of Mental Disorders criteria. In the obstetrics clinic studies, the criterion standard was the number of drinks per day, which is appropriate, given that any drinking may be harmful. In the primary care clinics studies, the criterion was alcohol abuse or dependence. We assessed the data only for the CAGE, AUDIT, TWEAK, and T-ACE for this review as these were the surveys studied in more than 1 location. In all studies, a person who was aware of the questionnaire results applied the criterion standard.

OUTCOME MEASURES Sensitivity, specificity, and area under the receiver operating characteristic curve. Data were presented for women compared with men when the results were available.

MAIN RESULTS Thirty-six articles were identified, but only 13 met all the inclusion criteria.

*NET stands for: N, Normal drinker: Do you feel you are a normal drinker?; E, “Eye opener” question from CAGE.; T, Tolerance: How many drinks does it take to make you feel high. These questions are found in the other questionnaires.

Test

Sensitivity (95% CI)

Specificity (95% CI)

Emergency Care (3 studies, 892 patients) TWEAK ≥ 2 0.87 (0.74-0.93) 0.87 (0.83-0.90) (Hold version; only in 1 study) Higher cut point CAGE ≥ 2, AUDIT 0.72 (0.66-0.77) 0.94 (0.92-0.96)a ≥ 8, TWEAK ≥ 3 (Hold version) Obstetrics Clinic (3 studies, 8431 patients) Low cut point

T-ACE ≥1 0.89 (0.81-0.94)a 0.75 (0.70-0.79)a CAGE ≥1 0.66 (0.62-0.70) 0.81 (0.81-0.82) TWEAK ≥ 2 0.91 (0.87-0.94) 0.77 (0.76-0.78) (Hold version; only 1 study) Higher cut point T-ACE ≥ 2 0.79 (0.64-0.90)a 0.82 (0.71-0.90)a CAGE ≥ 2 0.48 (0.44-0.53) 0.93 (0.92-0.93) TWEAK ≥ 3 0.67 (0.61-0.73) 0.92 (0.91-0.93) (Hold version; only 1 study) Primary Care (2 studies, 758 patients) Low cut point

Low cut point Higher cut point

CAGE ≥1 (only 1 study) CAGE ≥ 2

0.89 (0.82-0.93)

0.83 (0.79-0.86)

0.58 (0.32-0.80)a 0.93 (0.90-0.95)

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, a nnoyed, g uilty, eye opener; CI, confidence interval; T-ACE, t olerance, a nnoyed, c ut down, eye opener; TWEAK, tolerance, worry, eye opener, amnesia, cut (kut) down. aHeterogeneous, P < .05.

We extracted the data for sensitivity and specificity to assess for summary values (Table 4-19). The results are the random effects summary measures when there is more than 1 study. We combined data for the sensitivity and specificity estimates by extracting the raw results. Because of concerns about incorporation bias, we assessed for heterogeneity. We chose not to report summary likelihood measures for women because of our uncertainty about the effect of incorporation bias. The summary specificity for the CAGE of 2 or greater, AUDIT, TWEAK of 3 or greater, and T-ACE of 2 or greater is 0.92 (95% CI, 0.90-0.94), has narrow CIs, and suggests that a positive questionnaire at these thresholds is clinically similar no matter what population of women is included. There is greater variability for the sensitivity. The CAGE questionnaire performs poorly in an obstetrics clinic. The AUDIT and the TWEAK of 3 or greater (hold version) have similar sensitivities across all settings (0.69 [95% CI, 0.640.74]). For every questionnaire studied (CAGE, AUDIT, and TWEAK), the sensitivity is always worse in women compared with men, whereas the specificity is always higher for women.

CONCLUSION LEVEL OF EVIDENCE Systematic review.

E4-3

CHAPTER 4

Evidence to Support the Update

STRENGTHS High-quality systematic review. The study

formulates the research question, includes a comprehensive search and selection of studies, critically appraises the studies and provides the results, and incorporates the results into their interpretation. LIMITATIONS There is no meta-analytic assessment. The studies in the emergency department and in primary care assessed only for abuse or dependence rather than for harmful or hazardous drinking. Each study was potentially affected by incorporation bias in which the interviewer knew the results of the screening questionnaires. We are uncertain how this affected the interpretation of the criterion standard. However, because all studies were affected by this bias, we still may make inferences on the relative value of the sensitivity and specificity. No matter what the setting, the specificity of these tests is similarly high for women. Although it is possible that this uniformly good measurement property is a function of incorporation bias, it is also plausible that women with any positive screen result for alcohol are highly likely to be problem drinkers. The results from the individual studies cited by these authors suggest poorer overall performance for the CAGE among women. Compared with the overall data in the metaanalysis by Aertgeerts et al,2 the estimated positive likelihood ratio (LR) for women with a CAGE of 2 or greater appears to be the same (an estimated positive LR of 8.2 in women vs the meta-analytic summary estimate of 6.9 by Aertgeerts et al2), but the estimated LR of 0.45 does appear worse (summary positive LR 0.33 [95% CI, 0.25-0.43]). A study published just after this systematic review also suggested CAGE differences between men and women, along with differences based on race or country of origin.3 In that study, the sensitivity of the CAGE for white women and black women fell within the CI of that in the systematic review by Aertgeerts et al2 but was less for Hispanic women. The AUDIT had a better sensitivity among all 3 groups of women studied. The TWEAK and T-ACE were developed to detect alcohol problems during pregnancy, so they ought to work better than the CAGE for pregnant women. However, the TWEAK and TACE have not been as widely studied in primary care clinics. The authors conclude that the TWEAK and AUDIT may be the best screening tests for women in any setting. They recommend a cut point of 2 or greater for the TWEAK, which does improve the sensitivity but was reported in only 1 study. Although the specificity is worse for the TWEAK of 2 or greater, this is not as an important an issue as failing to diagnose alcohol misuse during pregnancy. Dropping the cut point for the CAGE to 1 or greater improves the sensitivity, but it still does not perform as well as the TWEAK. Our assessment is that the TWEAK does have statistically similar sensitivity to the AUDIT, with a narrow CI, and these appear to perform better than the CAGE. The TWEAK has the obvious advantage over the AUDIT in that it requires fewer questions. The “hold” version of the TWEAK has been studied more extensively than the “high” version (see Appendix in the Update for the actual questionnaires), but in the single study that compared them, the results were similar. Because many

E4-4

women may never have passed out from alcohol, the authors recommend using the high version of the TWEAK with the question, “How many drinks does it take before you begin to feel the first effects of the alcohol?” (≥ 3 drinks indicates tolerance). They also recommend a cut point of 2 or greater as indicating positivity. They suggest this lower threshold because the improved sensitivity, especially for pregnant women, would be more important than a higher specificity. The T-ACE should be studied further because it has fewer questions. It may be easier for primary care clinics to implement it because it is similar to the CAGE except that the “Feeling guilty” question is replaced by the “Tolerance” question. Reviewed by David L. Simel, MD, MHS

REFERENCES FOR THE EVIDENCE 1. Russell M, Martier SS, Sokol RJ, et al. Screening for pregnancy riskdrinking. Alcohol Clin Exp Res. 1994;18(5):1156-1161. 2. Aertgeerts B, Buntinx F, Kester A. The value of the CAGE in screening for alcohol abuse and alcohol dependence in general clinical populations: a diagnostic meta-analysis. J Clin Epidemiol. 2004;57(1):30-39. 3. Steinbauer JR, Cantor SB, Holzer CE, Volk RJ. Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med. 1998;129(5):353-362.

TITLE Screening for Alcohol Problems in Primary Care. AUTHORS Fiellin DA, Reid MC, O’Connor PG. CITATION Arch Intern Med. 2000;160(13):1977-1989. QUESTION Which alcohol screening questionnaires perform best in primary care patients? DESIGN Formal systematic review. DATA SOURCES MEDLINE database. STUDY SELECTION AND ASSESSMENT Studies published in 1996-1998, English language, primary care setting, comparing a screening questionnaire to a criterion standard and including the sensitivity, specificity, or likelihood ratios (LRs). An assessment for evaluation bias or incorporation bias whereby the results of the screening test were used in the criterion standard and an analysis of clinical subgroups was done for each article.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD AUDIT (Alcohol Use Disorders Identification Test), CAGE (cut down, annoyed, guilty, eye opener), and SMAST (Short Michigan Alcoholism Screening Test) instruments for screening for alcohol problems compared with the Diagnostic and Statistical Manual of Mental Disorders as the criterion standard.

CHAPTER 4

OUTCOME MEASURES Adherence to quality standards of reporting the demographics, comorbidities, eligibility criteria and participation rate, criterion standard, blinding, and analysis of subgroups was presented for 38 studies. Sensitivity and specificity were presented without their confidence intervals (CIs). Meta-analytic techniques were not used.

MAIN RESULTS Eleven articles assessed at-risk, hazardous, or harmful drinking, whereas 27 articles studied alcohol dependence or abuse. The result for the SMAST was found in only 1 retrieved study. Table 4-20 includes the data only from studies that met standards for avoiding evaluation and incorporation bias. The sensitivity and specificity are the point estimates (single study) or ranges reported in the review. We retrieved the original articles to obtain the data for combining the results to get a summary LR for the AUDIT. We calculated the summary LR CIs for the AUDIT and AUDIT-C (AUDIT Consumption Questions) from the original data. (For alcohol abuse or dependence, a separate systematic review with a meta-analysis was used to combine the results.1 The sensitivity and specificity values of the studies without verification bias cited in the publication are shown for comparison purposes to the AUDIT.)

CONCLUSION LEVEL OF EVIDENCE Systematic review. STRENGTHS This is an excellent systematic review that for-

mulates the research question, includes a comprehensive

Table 4-20 Performance Characteristics of Screening Questionnaires in Primary Care Sensitivity Specificity

LR+ (95% CI)

LR– (95% CI)

Problem Alcohol Drinking

search and selection of studies, critically appraises the studies and provides the results, and incorporates the results into their interpretation. LIMITATIONS There is no meta-analytic assessment of the AUDIT and CAGE. This makes the results a bit harder for the clinician to detect differences in the performance characteristics of these questionnaires. The authors evaluated the sensitivity and specificity ranges to conclude that the AUDIT is best at identifying at-risk, hazardous, or harmful drinking. We retrieved the original reports to calculate the LRs. The CAGE appears inferior to the AUDIT for detecting at-risk, harmful, or hazardous drinking. However, a pragmatic problem occurs with the AUDIT in that it is much longer than the CAGE (10 questions vs 4). We retrieved the data from the AUDIT-C, which is a shorter version of the AUDIT, and it compares favorably to the AUDIT for diagnosing hazardous drinking, although it may not be as good for ruling out the problem. Because a subsequent systematic review performed a meta-analysis of the CAGE, we did not use this study to combine those data.

Reviewed by David L. Simel, MD, MHS

REFERENCES FOR THE EVIDENCE 1. Aertgeerts B, Buntinx F, Kester A. The value of the CAGE in screening for alcohol abuse and alcohol dependence in general clinical populations: a diagnostic meta-analysis. J Clin Epidemiol. 2004;57(1):30-39. 2. Bush K, Kivlahan DR, McDonell MB, Fish SD, Bradley KA. The AUDIT alcohol consumption questionnaires (AUDIT-C): an effective brief screening test for problem drinking. Arch Intern Med. 1998;158(16):1789-1795. 3. Bradley KA, Bush KR, McDonnell MB, Malone T, Fihn SD. Screening for problem drinking, comparison of CAGE and AUDIT. J Gen Intern Med. 1998;13(6):379-388. 4. Adams WL, Barry KL, Fleming MF. Screening for problem drinking in older primary care patients. JAMA. 1996;276(24):1964-1967. 5. Aithal GP, Thornes H, Dwarakanath AD, Tanner AR. Measurement of carbohydrate deficient trasferrin (CDT) in a general medical clinic: is this test useful in assessing alcohol consumption? Alcohol Alcohol. 1998;33(3):304-309. 6. Fleming MF, Barry KL. A three-sample test of a masked alcohol screening questionnaire. Alcohol Alcohol. 1991;26(1):81-91.

At-Risk, Harmful, or Hazardous Drinking 0.57-0.59 0.91-0.96 6.8 (4.7-10) AUDIT ≥8 AUDIT-C ≥82(p1974) 0.40 0.97 12 (5.0-30) a,3(p385) CAGE ≥2 0.49-0.69 0.75-0.95 3.4 (1.2-10) CAGE ≥24 0.14 0.97 4.7 (3.7-6.0) (patients all >60 y) Current Abuse/Dependence 2,3

0.46 (0.38-0.55) 0.62 (0.52-0.74) 0.66 (0.54-0.81) 0.89 (0.86-0.91)

AUDIT ≥82(p1974),3(p385) 0.66-0.71 0.85-0.86 4.6 (3.5-6.1) 0.37 (0.28-0.49) AUDIT-C ≥82(p1974) 0.46 0.92 5.9 (3.3-10) 0.58 (0.44-0.73) CAGE ≥23(p385) 0.77 0.79 Lifetime Abuse Dependence AUDIT ≥83(p385) CAGE ≥23(p385),6

0.39 0.43-0.53

0.89 0.86

7.0

0.46 (0.36-0.58)

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; AUDIT-C, AUDIT Consumption Questions; CAGE, cut down, a nnoyed, g uilty, e ye opener; CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aSource for sensitivity: Aithal et al.5

E4-5

CHAPTER 4

Evidence to Support the Update

TITLE Behavioral Counseling Interventions in Primary Care to Reduce Risky/Harmful Alcohol Use.

Table 4-21 Screening Questionnaires for Risky Alcohol Use Should Be Selected According to the Patient Population Population

AUTHORS Whitlock EP, Green CA, Polen MR.

AUDIT

CAGE

TWEAK or T-ACE

Risky or Harmful Drinking

CITATION Contract No. 290-92-0018, Task No. 2, Technical support of the US Preventive Services Task Force, March 2004. http://www.ncbi.nlm.nih.gov/books/bv.fcgi? rid=hstat3.chapter.45217. Accessed May 17, 2008.

Adults ≥65 y Pregnant women

QUESTION Which screening questionnaires for risky alcohol use among primary care patients identify those who might benefit from brief interventions?

Adults ≥65 y Pregnant women

DESIGN Formal systematic review without meta-analytic techniques.

Abbreviations: AUDIT, Alcohol Use Disorders Identification Test; CAGE, cut down, a nnoyed, g uilty, eye opener; T-ACE, t olerance, a nnoyed, c ut down, eye opener; TWEAK, t olerance, w orry, eye opener, amnesia, cut (k ut) down.

DATA SOURCES MEDLINE, Cochrane, PsychInfo, HealthSTAR, and CINAHL databases.

Yes No Uncertain No No No Alcohol Abuse or Dependence Yes Uncertain No

Yes Uncertain No

No No Yes Yes Uncertain Yes

CONCLUSION LEVEL OF EVIDENCE Systematic review.

STUDY SELECTION AND ASSESSMENT The goal was to identify new literature since the last US Preventive Services Task Force recommendations1; thus, articles were sought from 1994 through April 2002. An extensive search was conducted to identify all relevant articles. Studies had to have been conducted in primary care settings (emergency care and inpatient studies were excluded). The study quality for all included and excluded articles is included. In addition to reviewing primary data, the authors reviewed other systematic reviews.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The focus of this review was on brief treatment interventions for problem drinkers. The shorter questionnaires were used in the studies that were included: CAGE (cut down, annoyed, guilty, eye opener), AUDIT (Alcohol Use Disorders Identification Test), TWEAK (tolerance, worry, eye opener, amnesia, cut [kut] down), and T-ACE (tolerance, annoyed, cut down, eye opener).

OUTCOME MEASURES Screening yield, sensitivity, and specificity.

MAIN RESULTS Twelve studies were included in the review for assessing screening of primary care patients who might be enrolled in brief treatment intervention (Table 4-21). The initial yield of screening primary care patients for all levels of drinking who are waiting for appointments is 11% to 18%. After further questioning, about 7% of primary care patients are candidates for brief treatment interventions. In trying to identify all patients with drinking disorders, the higher value of 11% to 18% would be the appropriate prevalence for adult US patients. E4-6

STRENGTHS This is an excellent systematic review that formu-

lates the research question, includes a comprehensive search and selection of studies, critically appraises the studies and provides the results, and incorporates the results into their interpretation. LIMITATIONS There is no meta-analytic assessment. Confidence intervals and LRs are not presented. The studies included in this review were selected because they included randomized trials of patients suitable for brief interventions for problem drinking. Thus, these were not specifically studies of the diagnostic tests themselves. To determine the performance characteristics of screening tests, the authors also used published systematic reviews of the questionnaires. According to data from systematic reviews of diagnostic tests, these authors conclude that the AUDIT is the best test for detecting risky harmful drinking in adults, although the TWEAK or T-ACE ought to be used for pregnant patients. For detecting alcohol abuse or dependence, they conclude that any of the 4 questionnaires is suitable other than during pregnancy. The CAGE questionnaire is in widespread use, so the authors suggest that it might be improved by adding quantity/frequency questions. This has shown greater sensitivity and specificity in the emergency department but has not been studied in primary care.2 It is available online as part of the National Institute on Alcoholism and Alcohol Abuse guide to physicians (http:// pubs.niaaa.nih.gov/publications/Practitioner/pocketguide/ pocket_guide.htm, accessed May 17, 2008) and also as a selfgraded patient form (http://www.alcoholscreening.org/, accessed May 17, 2008).

Reviewed by David L. Simel, MD, MHS

REFERENCES FOR THE EVIDENCE 1. US Preventive Services Task Force. Guide to Clinical Preventive Services. 2nd ed. Baltimore, MD: Lippincott Williams & Wilkins; 1996. 2. Friedmann PD, Saiitz R, Gogineni A, Zhang JX, Stein MD. Validation of the screening strategy in the NIAAA “Physicians guide to helping patients with alcohol problems.” J Studies Alcohol. 2001;62(2):234-238.

C H A P T E R

5

Does This Adult Patient Have Appendicitis? James M. Wagner, MD

CLINICAL SCENARIO A 29-year-old patient presents to your office with abdominal pain and a fever. The patient was well until 1 day ago and had never experienced abdominal pain. A vague periumbilical pain awoke him from sleep 12 hours previously, and he soon developed anorexia, nausea, and vomiting. His wife consulted their family medical reference guide and then brought him to the office, concerned that his symptoms matched a description of appendicitis. The pain then migrated to the right lower quadrant (RLQ) and was much worse while he was riding in the car to the physician’s office. The patient’s oral temperature is 37.8°C; the pulse rate and blood pressure are normal. He has RLQ tenderness, guarding but not rigidity, and rebound tenderness in the RLQ. A rectal examination reveals no tenderness, and he does not exhibit the psoas or obturator signs. Rovsing sign is positive.

W. Paul McKinney, MD John L. Carpenter, MD

WHY IS THIS AN IMPORTANT QUESTION TO ANSWER WITH A CLINICAL EXAMINATION? In western countries, appendicitis represents a common cause of acute abdominal pain. According to National Center for Health Statistics data, approximately 500000 patients underwent appendectomies from 1979 to 1984. Individuals carry a 7% lifetime risk of developing appendicitis.1 The incidence of appendicitis causing abdominal pain depends on the clinical setting. In series from emergency departments or surgical services, 25% of patients younger than 60 years and evaluated for acute abdominal pain have acute appendicitis, whereas the incidence in those older than 60 years is approximately 4%.1-5 Only 0.7% to 1.6% of all ambulatory patients with abdominal pain have appendicitis.6,7 Among children treated in the ambulatory care setting, appendicitis causes 2.3% of all abdominal pain episodes.8 In children admitted for acute abdominal pain, appendicitis is the etiology for approximately 32%.9-11 The morbidity and mortality of appendicitis remain significant, even with the advent of antibiotics and effective surgical management. Although the overall mortality rate with appropriate treatment is less than 1%, in the elderly it remains approximately 5% to 15%.2,4 There is a significant amount of morbidity caused by appendiceal rupture.12-15 The incidence of perforation in patients with appendicitis ranges from 17% to 40%, with a median of 20%.16,17 The perforation rate is significantly higher in the elderly, with rates as high as 60% to 70%. Several factors contribute to the increased incidence of perforation in the elderly, including significant delay in seeking care, nonspecificity of the presenting symptoms and signs, diminished febrile response, and fewer abnormalities in important laboratory characteristics such as the white blood cell count (WBC).2,3,5,14,18,19 Children also have

Copyright © 2009 by the American Medical Association. Click here for terms of use.

53

CHAPTER 5

The Rational Clinical Examination

an increased incidence of perforation because of delays in consulting a physician for abdominal pain.8 The negative laparotomy result rate in most series ranges from 15% to 35% and creates morbidity.16,17,20-22 In younger women, the negative laparotomy result rate is significantly higher (up to 45%) because of the prevalence of pelvic inflammatory disease and other common obstetric and gynecologic disorders.16,17,23,24

THE ACCURACY OF OTHER DIAGNOSTIC MODALITIES Routine medical history and physical examination remain the most effective and practical diagnostic modalities.25,26 Several other clinical methods for diagnosing appendicitis have been studied. Computer or algorithm-driven analyses of patients with abdominal pain have been evaluated,27-35 although most studies have incomplete controls and yield inconsistent results. Thus, the utility of computer-guided diagnosis compared with unassisted clinical diagnosis needs further evaluation. The authors of most of these studies believe that the improved utility they demonstrated was primarily because clinicians were forced to focus on specific clinical data that were readily available to be entered into the analysis tree. Finally, these authors observed that all of these modalities completely depend on the accuracy of the data gathered and interpreted by clinicians before the data are entered into the computer or algorithm analysis. The concept of an extended period of observation of patients with questionable appendicitis has been shown by some authors to be helpful.8,27,28 Its utility, like that of computer and algorithm analyses, depends on routine medical history and physical examination skills of clinicians. The utility of radiographic techniques has also been evaluated. Plain abdominal radiographs and barium enemas are neither specific nor sensitive for appendicitis.36 Ultrasonography is more effective in detecting a distended appendix than appendiceal perforation.10,15,36-44 No study has demonstrated ultrasonography to be clearly superior to the clinical examination, and many authors believe that its primary utility is to supplement the medical history and physical examination in patients with equivocal findings. The accuracy of computed tomography in diagnosing appendicitis has also been inconsistent.36,42,43 Laparoscopy has been shown by some authors to be useful, particularly in young women in whom it can be difficult to differentiate between pelvic inflammatory disease, ectopic pregnancy, and appendicitis.27 However, other series have not been as supportive, with negative appendectomy result rates from 20% to 30%.44,45 Studies of outcomes comparing laparoscopy with laparotomy have yielded conflicting results.46,47 Even though ultrasonography, computed tomography, and laparoscopy can be helpful, none are ideal techniques, and the clinician must depend on patient medical history and physical examination results.

APPENDICEAL ANATOMY AND PATHOPHYSIOLOGY OF APPENDICITIS The adult’s appendix averages 10 cm in length, arising from the posteromedial wall of the cecum, about 3 cm below the 54

ileocecal valve.48 Its position in the abdominal cavity is variable, being described as retrocecal, retroileal, preileal, subcecal, or pelvic, and this variability in location may influence the clinical signs and symptoms associated with appendicitis. Although the physiologic role of the appendix is unproved, an immunologic function is suggested by its content of lymphoid tissue.49 Appendiceal obstruction, followed by secondary bacterial invasion, causes the majority of appendicitis. Continued fluid secretion by the mucosa of the obstructed appendix distends the lumen, eventually exceeding venous pressure and leading to tissue ischemia and, ultimately, necrosis. Causes of obstruction include fecaliths, calculi, tumors, parasites, foreign bodies, or, rarely, barium. In the one-third of patients without apparent obstruction, infection by viruses, parasites, or bacteria, or either trauma or postoperative fecal stasis may be involved.50-55 Normally, appendicitis presents with a highly characteristic sequence of symptoms and signs.56 Initially, appendicitis causes visceral pain poorly localized to the epigastrium or periumbilical region, presumably because of distention of the appendix. Anorexia, nausea, and vomiting soon follow as this pathophysiology worsens. More advanced inflammation causes irritation of adjacent structures or the peritoneum, low-grade fever, and peritoneal pain localized to the RLQ. The pathophysiology explains the classic migration of pain caused by appendicitis. The point of maximal tenderness may be distinct from McBurney point, 5 cm from the anterior superior iliac spine on a line running from the umbilicus. Atypical locations of the appendix may lead to unusual clinical findings. In the case of retrocecal or retroiliac appendices,57,58 the pain may be poorly localized and may not undergo the transition from epigastric to RLQ locations. Pelvic appendicitis frequently causes pain in the left lower quadrant, with an absence of tenderness, and is reflected by increased pain during a rectal examination. Unusual symptoms of urinary and defecation urgency, caused by irritation of the ureter and rectum, respectively, plus dysuria and diarrhea may also occur. Although often a diagnostic dilemma in the first trimester of pregnancy because of confusion with other diagnoses, appendicitis in later stages of gestation may present a challenge for the clinician because of displacement of the appendix by the enlarging uterus. In such cases, periumbilical or right subcostal tenderness may be found.

HOW TO ELICIT THE RELEVANT SYMPTOMS AND SIGNS Pain is commonly the first symptom of appendicitis.9,59 Classically, the vague, midepigastric or periumbilical pain awakens the patient from sleep but is not initially severe. After reaching its peak in around 4 hours, it diminishes and then migrates to the RLQ. Most patients will seek medical attention within 12 to 48 hours. Pain usually occurs before vomiting, and the patient has usually not experienced similar symptoms before the present episode.

CHAPTER 5 According to Cope’s Early Diagnosis of the Acute Abdomen,60 many patients feel constipated and anticipate that defecation will relieve discomfort, leading them to use cathartic agents. However, pain persists after a bowel movement. Many signs have been associated with appendicitis or peritonitis. Some of obvious value, such as the pelvic examination, have not been adequately evaluated to merit mention in this systematic review or they lack an adequate description or standardization of the elicitation of the sign to ensure accurate reproduction. A common reference for definitions in the best studies is a text by De Dombal.61 What follows is the most consistent and useful description of the signs: • Guarding: Guarding is a state of voluntary contraction of the abdominal muscles. The muscles are held tense by the patient because he or she knows (or fears) that further examination is likely to be painful. Fear can be partially, or fully, overcome by tact and persuasion.61 • Rigidity: Rigidity is also known as involuntary guarding. The best studies of abdominal pain have described rigidity as an involuntary reflex spasm of the muscles of the abdominal wall. It can never be overcome by tact and reassurance.61 • Rebound tenderness: (1) Press on the area of question with the flat of your hand, sufficient to depress the peritoneum. The patient should be experiencing pain. (2) Keep pressing with a constant intensity. As the patient adjusts to this pressure during 30 to 60 seconds, the pain diminishes. It may go away completely, although usually it does not. (3) Without warning, and preferably while the patient’s attention is distracted, remove the hand suddenly to just above skin level. Watching the patient grimace is more indicative than a complaint of pain.61 • Rovsing sign: A sign related to the rebound tenderness test. Press deeply and evenly in the left lower quadrant and then release pressure suddenly. The presence of tenderness in the RLQ during palpation or referred rebound tenderness in the RLQ during release is considered a positive Rovsing sign. • Psoas sign: With the patient in the supine position, ask the patient to lift the thigh against your hand, placed just above the knee. Alternatively, with the patient in the left lateral decubitus position (Figure 5-1), extend the patient’s right leg at the hip. Increased pain with either maneuver is a positive sign and indicates irritation of the psoas muscle by an inflamed appendix. • Obturator sign: This sign is similar mechanically to the psoas sign. It is elicited by passively flexing the right hip and knee and internally rotating the leg at the hip, stretching the obturator muscle (Figure 5-2). Resultant rightsided abdominal pain is a positive sign, indicating irritation of the obturator muscle. The obturator sign has not been studied independent of the psoas sign, but most clinicians would attribute the same significance. • Rectal examination: Classically, tenderness and fullness perceived on the right but not the left side on rectal examination are indicative of a pelvic appendicitis.60 This sign is subjective and poorly described in most major physical examination texts. No studies that assess rectal tenderness describe the examination technique.

Appendicitis, Adult

PRECISION OF THESE SYMPTOMS AND SIGNS There have been no studies published evaluating the precision of the clinical examination for appendicitis. A standardized clinical examination might produce strong interrater reliability.

Left lateral decubitus position

Psoas muscle

Examiner extends the patient’s right leg at the hip.

Appendix

Figure 5-1 The Psoas Sign in Examination for Appendicitis The sign can be elicited with 2 different patient positions. First, with the patient in the supine position, ask the patient to lift the right thigh against your hand placed just above the knee. With the patient in the left lateral decubitus position (as shown), extend the right leg at the hip. Increased pain with either maneuver is a positive sign and indicates irritation of the psoas muscle by an inflamed appendix.

With the patient in the supine position, the examiner passively flexes the right hip and knee. The leg is gently pulled laterally while maintaining position of the knee, causing internal rotation at the hip.

Obturator internus muscle

Appendix

Figure 5-2 The Obturator Sign in Examination for Appendicitis Elicit this sign by passively flexing the patient’s right hip and knee and internally rotating the leg at the hip, stretching the obturator muscle. Resultant right-sided abdominal pain is a positive sign, indicating irritation of the obturator muscle.

55

CHAPTER 5

The Rational Clinical Examination

ACCURACY OF THESE SYMPTOMS AND SIGNS A handful of studies published during the past few decades have evaluated the accuracy of the clinical presentation of appendicitis. The studies are of various quality and design. Most are best described as cross-sectional in design because a clinical judgment is made, with outcomes measured in terms of pathologic confirmation of appendicitis vs a negative laparotomy result or no requirement for surgery. Eleven of the highest-quality studies, based on number of patients studied, the study design, and completeness of reported data, are summarized in Table 5-1.9,24,33,35,62-67 The search strategy for identifying these articles is available from the authors on request. This strategy yielded about 300 articles since 1966. Further limiting sets to adult age groups yielded 200 studies. The titles and abstracts were reviewed and chosen if adequate detail of the outcomes and aspects of the clinical examination allowed construction of 2 × 2 tables and subsequent calculation of likelihood ratios [LRs]. The 11 studies were divided into 2 groups by the patients on whom they focused. Approximately half of the studies focused on patients in whom appendicitis was suspected, and half, on those who were examined for acute abdomen. In the studies of suspected appendicitis, the inclusion criteria were not further defined. In the studies of acute abdomen, inclusion criteria usually involved pain for less than 1 week. Taken together, the studies report on the findings of more than 4000 patients and provide the best available evidence supporting the most valuable aspects of the clinical examination for appendicitis (Table 5-2). Each study reports on a varying constellation of clinical findings. Many aspects of the clinical examination are not evaluated in all of these studies. Unfortunately, some of the aspects evaluated are poorly defined in the text of the studies,

so specific recommendations for these aspects are difficult to derive for medical education or the everyday practice of medicine. Nonetheless, several points can be drawn from a systematic literature review. In evaluation of patients presenting with emergency and acute abdominal pain, usually defined as less than 1 week in duration before presenting to an emergency department or surgical ward, the prevalence (pretest probability) of acute appendicitis ranges from 12% to 26%.12,30,32,69 The clinical examination will influence this probability further. If various aspects of the clinical examination are viewed as diagnostic tests, LRs70,71 and posttest probability can be calculated. From the medical history, 6 aspects have been evaluated. Seven physical examination items have also been studied well. These aspects are examined further in Table 5-3.72 The large number of patients studied and the similarities across studies make the data suitable for being combined into summary measures. Three findings show a high positive LR (LR+) across all studies and, when present, are most useful for identifying patients at increased likelihood for appendicitis: RLQ pain (LR+, 8.0), rigidity (LR+, 4.0), and migration of initial periumbilical pain to the RLQ (LR+, 3.2). Rebound tenderness was studied in most patients, but its positive likelihood varied too much to allow a statistical point estimate of its effect (LR+, 1.1-6.3). Although the obturator sign has not been studied independently, the authors suspect that this sign has operating characteristics similar to those of the psoas sign. Clinicians also collect evidence to help prove normality. Unfortunately, no single component consistently provided a low negative LR (LR–) that would rule out appendicitis. There were, however, many signs that proved to be helpful in ruling out appendicitis. The absence of RLQ pain and

Table 5-1 Studies of the Operating Characteristic of the Clinical Examination for Appendicitis Authors

Year

Inclusion Criteria

Design

Staniland et al62 Brewer et al63 Berry and Malt24 Nauta and Magnant64 Alvarado33 Fenyo35 Liddington and Thomson65 Dixon et al9 Izbicki et al66 Eskelinen et al67 Eskelinen et al68 Total

1972 1976 1984 1986 1986 1987 1991 1991 1992 1994 1995

Admitted for acute abdomen ED evaluation for acute abdomen Operation for suspected appendicitis Operation for suspected appendicitis Admitted for suspected appendicitis Admitted for suspected appendicitis Admitted for abdominal pain Admitted for suspected appendicitis ED evaluation for suspected appendicitis Admitted for abdominal pain Admitted for abdominal pain

Retrospective Retrospective Retrospective Prospective Retrospective Prospective Prospective Prospective Prospective Prospective Prospective

Abbreviation: ED, emergency department.

56

No. of Patients Studied (% Women) 600 (49) 1000 (0) 300 (40) 97 (40) 305 (42) 830 (57) 150 (58) 1204 (39) 150 (56) 222 (58) 417 (54) 5275 (41)

Country

Age Range, y

United Kingdom United States United States United States United States Sweden United Kingdom Scotland Germany Finland Finland

70 15 to >65 10 to >50 2 to 91 4 to 80 15 to 86 7 to 84 7 to 87 11 to 88 65 to 90 >50

CHAPTER 5 the presence of similar previous pain demonstrated powerful LR– (0.28 and 0.25, respectively). The absence of the classic migration of pain also diminished the likelihood of appendicitis significantly (LR–, 0.5). The absence of RLQ guarding or rebound pain has excellent properties for ruling out appendicitis in some studies, but not others. The

Appendicitis, Adult

presence of pain before vomiting needs further study to identify its diagnostic efficiency because, in its only evaluation, it was highly efficient in ruling out appendicitis. Astute clinicians will recognize that the absence of anorexia, nausea, or vomiting has little effect on the likelihood of appendicitis.

Table 5-2 Aspects of the Clinical Examination Studieda Author Staniland et al62 Brewer et al63 Berry and Malt24 Nauta and Magnant64 Alvarado33 Fenyo35 Liddington and Thomson65 Dixon et al9 Izbicki et al66 Eskelinen et al67 Eskelinen et al68 No. of cases studied

Pain Migr × × × ×

Anorexia

Nausea

Vomiting

Pain

Similar

× × × ×

×

× ×

×

× ×

×

×

× ×

Rectal

× ×

×

Psoas

× ×

×

1354

× × × 2161

× 1691

× 1684

× 651

1542

× × × 2349

Rebound

Rigid

Guard

×

× × × ×

×

×

× × ×

× ×

RLQ Pain

450

× × × × 3979

Fever ×

×

× × ×

× ×

× × × × 4688

× × × × 3555

× × 2267

× 1264

Abbreviations: Migr, migration of the initial periumbilical pain to the right lower quadrant; pain, pain before vomiting; psoas, positive psoas sign; rectal, pain on rectal examination; RLQ, right lower quadrant; similar, symptoms similar to those the patient previously experienced. a For an explanation of rebound, rigid, and guard, see the “How to Elicit the Relevant Symptoms and Signs” section of the text.

Table 5-3 Summary of Clinical Examination Operating Characteristics for Appendicitisa Procedure Right lower quadrant pain Rigidity Migration of pain Pain before vomitingc Psoas sign Fever Rebound tenderness test Guarding No similar pain previously Rectal tenderness Anorexia Nausea Vomiting

Sensitivity

Specificity

LR+ (95% CI)

LR– (95% CI)

0.84 0.20 0.64 1.0 0.16 0.67 0.63 0.73 0.86 0.41 0.68 0.58 0.51

0.90 0.89 0.82 0.64 0.95 0.79 0.69 0.52 0.40 0.77 0.36 0.37 0.45

7.3-8.5b 3.8 (3.0-4.8) 3.2 (2.4-4.2) 2.8 (1.9-3.9) 2.4 (1.2-4.7) 1.9 (1.6-2.3) 1.1-6.3b 1.7-1.8b 1.50 (1.46-1.7) 0.83-5.3b 1.3 (1.2-1.4) 0.69-1.2b 0.92 (0.82-1.0)

0-0.28b 0.82 (0.79-0.85) 0.50 (0.42-0.59) NA 0.90 (0.83-0.98) 0.58 (0.51-0.67) 0-0.86b 0-0.54b 0.32 (0.25-0.42) 0.36-1.1b 0.64 (0.54-0.75) 0.70-0.84b 1.1 (0.95-1.3)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; NA, not available. a All studies were used to create 2 × 2 tables and then tested for homogeneity of the odds ratio with the Breslow-Day statistic. If studies were not rejected as heterogeneous by this statistic, P =.05, CIs were manually reviewed to exclude type II errors. Studies satisfying both criteria were combined, and LRs were calculated with the Mantel-Haenszel method. The 95% CIs were calculated according to the method of Simel et al.72 Only 1 study evaluated pain before vomiting. For an explanation of procedure terms, see Table 5-2 or the “How to Elicit the Relevant Symptoms and Signs” section of the text. b In heterogeneous studies, the LRs are reported as ranges. c Only 1 study on this in the meta-analysis.

57

CHAPTER 5

The Rational Clinical Examination

THE ROLE OF COMBINED FINDINGS Clinicians rarely rely on a single sign or symptom for diagnosis but instead rely on a combination of findings. Unfortunately, the precision and accuracy of combinations of findings have not been reported in these studies. Several studies do assess, however, various decision rules that do combine these findings.6,33-35,66,73-77 Four of the most powerful rules were validated on an independent set of 1254 patients older than 50 years and presenting with abdominal pain. No single score was found to be superior; however, it was observed that the decision rules reported in the original work to be most powerful incorporated at least 2 of 5 common variables: site and duration of pain, site of tenderness, rebound tenderness, and leukocytosis.78

THE BOTTOM LINE Returning to the beginning clinical scenario, the historical components of the presentation are highly suggestive of appendicitis. Our patient demonstrates the classic sequence of abdominal pain before vomiting, culminating with the migration of the initial midepigastric pain to the RLQ. The combination of these LR+s alone makes appendicitis more likely. The findings of guarding but not rigidity tend to neutralize each other’s effect. The rectal examination results and the psoas and related signs are helpful if present but are not helpful when absent, as in this case. In sum, we suspect appendicitis in this man, so further evaluation is warranted. A surgical doctrine suggests that a decrease in the perforation rate will be achieved only by an increase in the negative laparotomy result rate in suspected acute appendicitis. The truth of this doctrine has been called into question, given the results of large- and small-area variation studies.29 Improved clinical evaluation is suggested as a remedy for a high rate of negative laparotomy results without increasing the perforation rate. Evidence suggests the essential nature of clinical details.79,80 Clinicians often do not collect enough clinical details for accurate and precise diagnosis.81-83 Correction of this deficit, therefore, may well increase diagnostic accuracy without increasing the perforation rate. In summary, there are several conclusions that can be made concerning the clinical presentation, pathophysiology, and diagnosis of appendicitis: 1. Appendicitis is a common clinical entity, with significant morbidity and mortality, particularly at the extremes of age. 2. The pathophysiology of appendicitis consists of initial dilatation of the appendix, followed by appendiceal ischemia, necrosis, and parietal peritoneal irritation. Clinical findings are predictable, predicated on knowledge of this pathophysiology. 3. The characteristic sequence of symptoms and signs includes the following: (1) vague pain initially located in the epigastric or periumbilical region; (2) anorexia, nausea, or unsustained vomiting; (3) migration of the initial pain to the RLQ; and (4) low-grade fever. 58

4. Migration of pain in the characteristic manner, RLQ pain, and the presence of pain before vomiting are historical findings that suggest appendicitis. The presence of rigidity, a positive psoas sign, fever, or rebound tenderness is a sign on physical examination indicating an increased likelihood of appendicitis. 5. Conversely, the absence of RLQ pain, the absence of the classic migration of pain, and the presence of similar pain previously are powerful symptoms in the medical history that make appendicitis less likely. In the physical examination, the lack of RLQ pain, rigidity, or guarding makes appendicitis less likely. 6. Because no finding on the clinical examination can effectively rule out appendicitis, prudence dictates close follow-up of patients with abdominal pain who do not receive further diagnostic testing. Author Affiliations at the Time of the Original Publication

Department of Internal Medicine, University of Texas, Southwestern Medical Center, Dallas, Texas (Drs Wagner, McKinney, and Carpenter). Acknowledgments

We appreciate the expert advice offered by gynecologist Joanne Piscitelli, MD, and general surgeon Ted Pappas, MD, both of Duke University, Durham, North Carolina, during the preparation of the manuscript.

REFERENCES 1. Irvin TT. Abdominal pain: a surgical audit of 1190 emergency admissions. Br J Surg. 1989;76(11):1121-1125. 2. Balsano N, Cayten CG. Surgical emergencies of the abdomen. Emerg Med Clin North Am. 1990;8(2):399-410. 3. Bugliosi TF, Meloy TD, Vukov LF. Acute abdominal pain in the elderly. Ann Emerg Med. 1990;19(12):1383-1386. 4. Fenyo G. Acute abdominal disease in the elderly. Am J Surg. 1982; 143(6):751-754. 5. Fenyo G. Diagnostic problems of acute abdominal diseases in the aged. Acta Chir Scand. 1974;140(5):396-405. 6. Wasson JH, Sox HC Jr, Sox CH. The diagnosis of abdominal pain in ambulatory male patients. Med Decis Making. 1981(3);1:215-224. 7. Britt H, Bridges-Webb C, Sayer GP, Neary S, Traynor V, Charles J. The diagnostic difficulties of abdominal pain. Aust Fam Physician. 1994;23 (3):375-381. 8. White JJ, Santillana M, Haller JA. Intensive in-hospital observation: a safe way to decrease unnecessary appendectomy. Am Surg. 1975;41(12):793798. 9. Dixon JM, Elton RA, Rainey JB, Macleod DA. Rectal examination in patients with pain in the right lower quadrant of the abdomen. BMJ. 1991;302(6781):386-388. 10. Gamal R, Moore TC. Appendicitis in children aged 13 years and younger. Am J Surg. 1990;159(6):589-592. 11. Putnam TC, Gagliano N, Emmens RW. Appendicitis in children. Surg Gynecol Obstet. 1990;170(6):527-532. 12. Jess P, Bjerregaard B, Brynitz S, Holst-Christensen J, Kalaja E, LundKristensen J. Acute appendicitis: prospective trial concerning diagnostic accuracy and complications. Am J Surg. 1981;141(2):232-234. 13. Mittelpunkt A, Nora PF. Current features in the treatment of acute appendicitis: an analysis of 100 consecutive cases. Surgery. 1966;60 (5):971-975. 14. Franz MG, Norman J, Fabri PJ. Increased morbidity of appendicitis with advancing age. Am Surg. 1995;61(1):40-44 15. Wade DS, Morrow SE, Balsara ZN, Burkhard TK, Goff WB. Accuracy of ultrasound in the diagnosis of acute appendicitis compared with the surgeon’s clinical impression. Arch Surg. 1993;128(9):1039-1046.

CHAPTER 5 16. Lewis FR, Holcroft JW, Boey J, Dunphy E. Appendicitis: a critical review of diagnosis and treatment in 1000 cases. Arch Surg. 1975;110(5):677684. 17. Addiss DG, Shaffer N, Fowler BS, Tauxe RV. The epidemiology of appendicitis and appendectomy in the United States. Am J Epidemiol. 1990;132 (5):910-925. 18. Owens BJ, Hamit HF. Appendicitis in the elderly. Ann Surg. 1978; 187(4):392-396. 19. Peltokallio P, Jauhiainen K. Acute appendicitis in the aged patient. Arch Surg. 1970;100(2):140-143. 20. Jerman RP. Removal of the normal appendix: the cause of serious complications. Br J Clin Pract. 1969;23(11):466-467. 21. Chang FC, Hogle HH, Welling DR. The fate of the negative appendix. Am J Surg. 1973;126(6):752-754. 22. Howie JGR. Death from appendicitis and appendicectomy. Lancet. 1966;2(7477):1334-1335. 23. Bongard F, Landers DV, Lewis F. Differential diagnosis of appendicitis and pelvic inflammatory disease. Am J Surg. 1985;150(1):90-96. 24. Berry J, Malt RA. Appendicitis near its centenary. Ann Surg. 1984;200 (5):567-575. 25. Peterson MC, Holbrook JH, Hales DV, Smith NL, Staker LV. Contributions of the history, physical examination, and laboratory investigation in making medical diagnoses. West J Med. 1992;156(2):163-165. 26. Hampton JR, Harrison MJG, Mitchell JRA, Prichard JS, Seymour C. Relative contributions of history-taking, physical examination, and laboratory investigation to diagnosis and management of medical outpatients. BMJ. 1975;2(5969):486-489. 27. Jones PF. Practicalities in the management of the acute abdomen. Br J Surg. 1990;77(4):365-367. 28. Schwartz SI. Tempering the technological diagnosis of appendicitis. N Engl J Med. 1987;317(11):703-704. 29. Neutra RR. Appendicitis: decreasing normal removals without increasing perforations. Med Care. 1978;16(11):956-961. 30. Adams ID, Chan M, Clifford PC, et al. Computer aided diagnosis of acute abdominal pain: a multicentre study. BMJ. 1986;293:800-804. 31. De Dombal ET. Educational assessment of clinical diagnostic skills: studies across Europe on acute abdominal pain. Postgrad Med J. 1993;69 (suppl 2):S94-S96. 32. De Dombal ET, Leaper DJ, Horrocks JC, Staniland JR, McCann AP. Human and computer-aided diagnosis of abdominal pain: further report with emphasis on performance of clinicians. BMJ. 1974;1(5904):376-380. 33. Alvarado A. A practical score for the early diagnosis of acute appendicitis. Ann Emerg Med. 1986;15(5):557-564. 34. Bond GR, Tully SB, Chan LS, Bradley RL. Use of the MANTRELS score in childhood appendicitis: a prospective study of 187 children with abdominal pain. Ann Emerg Med. 1990;19(9):1014-1018. 35. Fenyo G. Routine use of a scoring system for decision-making in suspected acute appendicitis in adults. Acta Chir Scand. 1987;153(9):545-551. 36. Brazaitis MP, Dachrnan AH. The radiologic evaluation of acute abdominal pain of intestinal origin. Med Clin North Am. 1993;77(5):939-961. 37. Puylaert JBCM, Rutgers PH, Lalisang RI, et al. A prospective study of ultrasonography in the diagnosis of appendicitis. N Engl J Med. 1987;317 (11):666-669. 38. John H, Neff U, Kelemen M. Appendicitis diagnosis today: clinical and ultrasonic deductions. World J Surg. 1993;17(2):243-249. 39. Davies AH, Mastorakou I, Cobb R, Rogers C, Lindell D, Mortenson NJM. Ultrasonography in the acute abdomen. Br J Surg. 1991;78(10): 1178-1180. 40. Anteby SO, Schenker JG, Polishuk WZ. The value of laparoscopy in acute pelvic pain. Ann Surg. 1974;118(4):484-486. 41. Kang WM, Lee CH, Chou YH, et al. A clinical evaluation of ultrasonography in the diagnosis of acute appendicitis. Surgery. 1989;105(2 pt 1):154-159. 42. Taourel P, Baron MP, Pradel J, Fabre JM, Seneterre E, Bruel JM. Acute abdomen of unknown origin: impact of CT on diagnosis and management. Gastrointest Radiol. 1992;17(4):287-291. 43. Lim HK, Bae SH, Seo GS. Diagnosis of acute appendicitis in pregnant women: value of sonography. AJR Am J Roentgenol. 1992;159(3):539-542. 44. Sarafati MR, Hunter GC, Witzke DB, et al. Impact of adjunctive testing on the diagnosis and clinical course of patients with acute appendicitis. Am J Surg. 1993;166(6):664-665. 45. Whitworth CM, Whitworth PW, Sanfillipo J, Polk HCJ. Value of diagnostic laparoscopy in young women with possible appendicitis. Surg Gynecol Obstet. 1988;167(3):187-190.

Appendicitis, Adult

46. Schroder DM, Lathrop JC, Lloyd LR, Coccaccio JE, Hawasli A. Laparoscopic appendectomy for acute appendicitis: is there really any benefit? Am Surg. 1993;59(8):541-547; discussion 547-548. 47. Kollias J, Harries RH, Otto G, Hamilton DW, Cox JS, Gallery RM. Laparoscopic versus open appendectomy for suspected appendicitis: a prospective study. Aust N Z J Surg. 1994;64(12):830-835. 48. Buschard K, Kjaeldfaard A. Investigation and analysis of the position, fixation, length, and embryology of the vermiform appendix. Acta Chir Scand. 1973;139(3):293-298. 49. Bjerke K, Brantzaeg P, Rognum TO. Distribution of immunoglobulin producing cells is different in normal human appendix and colon mucosa [abstract]. Gut. 1986;27(6):667. 50. Lin J, Bleiweiss U, Mendelson MH. Cytomegalovirus-associated appendicitis in a patient with the acquired immunodeficiency syndrome. Am J Med. 1990;89(3):377-379. 51. Lopez-Navidad A, Domingo P, Cadafalch J. Acute appendicitis complicating infectious mononucleosis: case report and review. Rev Infect Dis. 1990;12(2):297-302. 52. Nadler S, Cappell MS, Bhatt B. Appendiceal infection by Entamoeba histolytica and Strongyloides stercoralis presenting like acute appendicitis. Dig Dis Sci. 1990;35(5):603-608. 53. Hennington MH, Tinsley EJ, Proctor HJ. Acute appendicitis following blunt abdominal trauma: incidence or coincidence? Am Surg. 1991; 214(1):61-63. 54. Houghton A, Aston N. Appendicitis complicating colonoscopy. Gastrointest Endosc. 1988;34(6):489. 55. Barr D, van Heerden JA, Mucha PJ. The diagnostic challenge of postoperative acute appendicitis. World J Surg. 1991;15(4):526-528. 56. Schrock TR. Acute appendicitis. In: Sleisinger MH, Fortran JS, eds. Gastrointestinal Disease: Pathophysiology, Diagnosis, Management. Philadelphia, PA: WB Saunders Co; 1996:1339-1347. 57. Poole GV. Anatomic basis for delayed diagnosis of appendicitis [abstract]. South Med J. 1990;83(7):771-773. 58. Shen GK, Wong R, Daller J, et al. Does the retrocecal position of the vermiform appendix alter the clinical course of acute appendicitis? a prospective analysis [abstract]. Arch Surg. 1991;126(5):569-570. 59. Maxwell JM, Ragland JJ. Appendicitis: improvements in diagnosis and treatment. Am Surg. 1991;57(5):282-285. 60. Silen W. Cope’s Early Diagnosis of the Acute Abdomen. New York, NY: Oxford University Press Inc; 1987:1-290. 61. De Dombal FT. Diagnosis of Acute Abdominal Pain. New York, NY: Churchill Livingstone Inc; 1991:1-259. 62. Staniland JR, Ditchburn J, De Dombal ET. Clinical presentation of acute abdomen: study of 600 patients. BMJ. 1972;3(5823):393-398. 63. Brewer RJ, Golden OT, Hitch DC, Rudolf LE, Wangensteen SL. Abdominal pain: an analysis of 1000 consecutive cases in a university hospital emergency room. Am J Surg. 1976;131:219-223. 64. Nauta RI, Magnant C. Observation versus operation for abdominal pain in the right lower quadrant: roles of the clinical examination and the leukocyte count. Am J Surg. 1986;151(6):746-748. 65. Liddington MI, Thomson WHF. Rebound tenderness test. Br J Surg. 1991;78(7):795-796. 66. Izbicki JR, Wolfram TK, Dietmar KW, et al. Accurate diagnosis of acute appendicitis: a retrospective and prospective analysis of 686 patients. Eur J Surg. 1992;158(4):227-231. 67. Eskelinen M, Ikonen J, Lipponen P. Acute appendicitis in patients over the age of 65 years: comparison of clinical and computer based decision making. Int J Biomed Comput. 1994;36:239-249. 68. Eskelinen M, Ikonen J, Lipponen P. The value of history-taking, physical examination, and computer assistance in the diagnosis of acute appendicitis in patients more than 50 years old. Scand J Gastroenterol. 1995;30(4):349355. 69. Wilson DH, Wilson PD, Walmsley RG, Horrocks JC, De Dombal FT. Diagnosis of acute abdominal pain in the accident and emergency department. Br J Surg. 1977;64(4):250-254. 70. Nardone DA, Lucas LM, Palac DM. Physical examination: a revered skill under scrutiny. South Med J. 1988;81(6):770-773. 71. Sackett KL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. Boston, MA: Little Brown & Co; 1991:173-186. 72. Simel DL, Samsa GP, Matchar DB. Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991;44(8):763-770.

59

CHAPTER 5

The Rational Clinical Examination

73. Christian F, Christian GP. A simple scoring system to reduce the negative appendicectomy rate. Ann R Coll Surg Engl. 1992;74(4): 281-285. 74. Orient JM, Kettel LJ, Lim J. A test of a linear discriminant for identifying low-risk abdominal pain. Med Decis Making. 1985;5(1):77-87. 75. Ramirez JM, Deus J. Practical score to aid decision making in doubtful cases of appendicitis. Br J Surg. 1994;81(5):680-683. 76. Teicher I, Landa B, Cohen M, Kabnick LS, Wise L. Scoring system to aid in the diagnosis of appendicitis. Ann Surg. 1983;198(6):753759. 77. Morgan DL, Hayes JE, Zenarosan N. Predicting which abdominal pain patients can be appropriately discharged home [abstract]. Ann Emerg Med. 1995;25:140.

60

78. Ohmann C, Yang Q, Franke C. Diagnostic scores for acute appendicitis. Eur J Surg. 1995;161(4):273-281. 79. Lavelle SM, Kanagaratnam B. The information value of clinical data. Int J Biomed Comput. 1990;26(3):203-209. 80. Todd BS, Stamper R. Limits to diagnostic accuracy. Med Inform. 1993;18 (3):255-270. 81. Avorn J, Everitt DE, Baker MW. The neglected medical history and therapeutic choices for abdominal pain: a nationwide study of 799 physicians and nurses. Arch Intern Med. 1991;151(4):694-698. 82. Johnson JE, Carpenter JL. Medical house staff performance in physical examination. Arch Intern Med. 1986;146(5):937-941. 83. Wiener S, Nathanson M. Physical examination: frequently observed errors. JAMA. 1976;236(7):852-855.

U P D A T E : Appendicitis, Adult

5

Prepared by Jim Wagner, MD Reviewed by Kaveh Shojania, MD

CLINICAL SCENARIO

NEW FINDINGS

A 24-year-old woman presents with abdominal pain, nausea, and vomiting. She describes the pain as beginning in her midabdomen 3 days ago, and it has gotten progressively worse. Her last menstrual period was 3 weeks ago and was normal; she is not sexually active. The pain has stayed in the midabdomen and not moved to other locations. On examination, she has a fever and right lower quadrant (RLQ) and rebound tenderness; her pelvic and rectal examination results are unremarkable. Laboratory evaluation reveals a left shift without leukocytosis and ketonuria.

• Combinations of findings from the clinical examination are more powerful than any single finding. • Most of the decision rules formed by these combinations of findings include migration of pain from periumbilical to RLQ, rebound tenderness, RLQ tenderness, nausea-vomiting, male sex, fever, rigidity, and white blood cell (WBC) count.

UPDATED SUMMARY ON ADULT APPENDICITIS Original Review Wagner JM, McKinney WP, Carpenter JL. Does this patient have appendicitis? JAMA. 1996;276(19):1589-1594.

UPDATED LITERATURE SEARCH Our literature search used the parent search for the Rational Clinical Examination series, combined with the subject headings “exp appendicitis” published between 1994 and September 2004. This search yielded more than 400 titles, which were narrowed down to approximately 50 by excluding studies of laboratory and radiologic tests and case studies. There have been few new studies that focused on the operating characteristics of individual components of the clinical examination for appendicitis. However, there have been several studies that have looked at combinations of findings. That is, instead of examining the likelihood ratio (LR) of rebound tenderness alone, studies have explored the combination of fever, migration of pain, and rebound tenderness. The studies of clinical decision rules were selected if the components, derivation, and validation of the prediction rule were clearly defined in the article and the patients included were those from a general population with abdominal pain or were suspected of having appendicitis. Our previous literature search was reviewed, and studies conducted before 1994 were included if they fit these criteria.

Details of the Update Eighteen studies that derived or validated clinical decision rules for appendicitis were identified. The most important studies were those by Alvarado,1 Eskelinen et al,2 and Fenyo et al.3 These studies were chosen because of their methodology, large sample sizes, simplicity of the decision rule, or familiarity with physicians. In addition, a study that compared several clinical decision rules on the same population provided a good perspective of the relative value of these rules.4 The Alvarado1 study was one of the first of the clinical decision rules published, demonstrating the power of the rule beyond individual findings. Although the methods are rudimentary and the rule is not validated in the study, it represents the most widely accepted and the simplest of the clinical decision rules. By combining the results for 8 findings from the medical history or the examination (which conveniently spells out the mnemonic MANTRELS), the resulting score provides guidance on whether to operate in the setting of suspected appendicitis. Of 10 potential points, patients with a score of 7 or higher are recommended for surgical intervention. The various components are Migration, Anorexia-acetone, Nausea-vomiting, Tenderness in RLQ, Rebound pain, Elevation of temperature, Leukocytosis, and Shift to the left of normal WBC count. The Eskelinen et al2 study evaluated more than a thousand patients with a rule that includes 7 variables in men and 5 in women. The disadvantages of this study are that the rule is complex, computer based, and was validated with a small number of patients. The Fenyo et al3 study assessed 10 variables used in a complex equation. The results for the individual findings showed that a WBC count of less than 8.9 × 109/L (LR, 0.16) was the one finding that had reasonable measurement properties, leading to a lower likelihood of appendicitis. 61

CHAPTER 5

Update

The Ohmann et al4 study displayed a parallel analysis of 10 available studies, including the 3 mentioned above. A database of 45 variables prospectively collected from 1254 consecutive patients on a standardized form was used to evaluate these studies. A surprising outcome of the study was that none of the rules produced sufficiently low rates (80 50-99

LR– (95% CI)

5.5 (4.1-7.2) 3.2 (2.4-4.2)

0.48 (0.38-0.60) 0.31 (0.18-0.50)

1.6 (1.4-1.8) 6.0 (3.2-10) 8.6 (4.3-15) 4.2 (2.3-7.2)

0.61 (0.54-0.68) 0.48 (0.25-0.74) 0.24 (0.07-0.60) 0.55 (0.34-0.77)

3.0 (1.3-7.1)

0.49 (0.36-0.67)

6.0 (2.6-14)

0.45 (0.22-0.92)

4.0-10

Uncertain

Abbreviations: CI, confidence interval; LR, likelihood ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a All patients were symptomatic. After the arteriogram, 298 carotid arteries were considered “symptomatic” and 124 arteries were considered “asymptomatic.” Retrospectively, bruits were heard in 95 of 298 symptomatic arteries and 41 of 124 asymptomatic arteries. Data are not provided to allow calculation of separate sensitivity and specificity. b Mead et al,6 Hankey and Warlow,2 and Sauve et al.3 c Hill et al8 and de Virgilio et al.9

108

An interesting finding is revealed by looking at 2 of the studies cited in the original Rational Clinical Examination article. These studies2,3 included the most selective population of patients for whom the sensitivity and specificity were reported. The study by Hankey and Warlow2 included only those symptomatic patients who were suitable candidates for endarterectomy. The North American Symptomatic Carotid Endarterectomy Trial4 included only patients with carotid stenosis, and then only those who were randomized to endarterectomy versus medical therapy. These studies exhibit verification bias, which typically creates underestimates of the specificity and value of hearing a carotid bruit. Thus, it should not be surprising that these studies also have the lowest positive likelihood ratio (LR+) of those we reviewed (Table 9-3). On the other hand, this reappraisal confirms that the absence of a bruit in symptomatic patients does not have enough diagnostic power in the symptomatic patient to rule out an important stenotic lesion.

Asymptomatic Patients By asymptomatic, we mean asymptomatic for cerebrovascular disease. Two studies, one in preoperative cardiac patients and the other in peripheral vascular disease patients, allow us to calculate both the sensitivity and specificity of the carotid bruit. Although the studies used slightly different thresholds to characterize patients as having carotid stenosis, the predictive values for bruit are statistically similar among the asymptomatic studies (Table 9-4). However, we can also look at the predictive value for the presence of a bruit and determine whether it varies (Table 9-4), which is useful because the studies that allowed us to calculate sensitivity and specificity may not generalize to an age-matched general medical patient. The positive predictive value for symptomatic patients is approximately 50% and about half that (22%) for patients with no cerebrovascular symptoms. Because we know the predictive value, we can make inferences about the LR+. This follows from the equation: Posterior odds = Prior odds × LR From epidemiologic studies, the prevalence should range from approximately 0.5% for patients aged 50 years or older to approximately 10% for patients aged 80 years or older.5 These values establish a range of reasonable prior odds. The data from the positive predictive value studies allow us to develop a range for the posterior odds. We can then solve for the LR for both symptomatic patients (50% posterior probability) and asymptomatic patients (22% posterior probability) (Figure 9-2). The likelihood ratio for a bruit to predict significant carotid stenosis varies with the prior probability. Figure 9-2 shows that as the prior probability of stenosis increases (xaxis), the importance of a carotid bruit becomes less. If your population of asymptomatic patients is recognizably similar to those who were included in the baseline summary estimate from Table 9-4, then you would use the asymptomatic probability line and see that across a reasonable range of prior probabilities (about 3%-8% on the x-axis) for carotid

CHAPTER 9

Carotid Bruit

Table 9-4 Results for the Positive Predictive Value for the Presence of a Bruit in Predicting Ipsilateral Carotid Stenosis Study Sauve et al3 (symptomatic, part of endarterectomy trial) Mead et al6 (symptomatic, referred to neurologist) Hankey and Warlow 2 (symptomatic, referred to neurologist for evaluation of endarterectomy) Hill et al8 (asymptomatic before cardiac surgery) Chambers and Norris10 (asymptomatic referred for bruit) Lewis et al11 (asymptomatic referred for bruit) Summary Predictive Value Symptomatic (n = 3 studies, 868 patients)a Asymptomatic (n = 3 studies, 1303 patients)b

Degree of Stenosis, %

Stenosis/ All Patients

70-99 70-99 75-99 >80 >75 80-99

420/667 54/119 35/95 7/23

Positive Predictive Value, % (95% CI)

70-99 >75

63 (60-68) 45 (36-55) 37 (28-47) 30 (16-51) 23 (19-26) 21 (18-24) 50 (35-64) 22 (20-24)

Abbreviation: CI, confidence interval. a Studies combined from Mead et al,6 Sauve et al,3 and Hankey and Warlow.2 b Studies combined from Hill et al,8 Chambers and Norris,10 and Lewis et al.11 The results are homogenous (P = .42).

stenosis, finding a bruit has a useful LR of 4 to 10 (from the y-axis). Fortunately, we can feel more confident about this because the results are similar to the summary LRs for asymptomatic patients from Table 9-3.

performing an endarterectomy on patients with surgically significant lesions. The Canadian Task Force recommended that clinicians not listen for carotid bruits in asymptomatic patients.13 There does seem to be consensus that the presence of an asymptomatic bruit is a marker of atherosclerotic risk.

EVIDENCE FROM GUIDELINES Symptomatic patients with TIAs who are surgical candidates should be evaluated for carotid stenosis, whether or not they have a bruit.12 The US Preventive Services Task Force reviewed screening for asymptomatic carotid artery stenosis in 1996 and found insufficient evidence to make a recommendation about listening for carotid bruits.5 The Task Force observed that the annual incidence of stroke unheralded by any TIA symptoms ipsilateral to a bruit is 1% to 3%. The interpretation of data presented in this update were not available to the Task Force and have not been incorporated into the 1996 recommendations. There are still no data that assess the effect of screening for an asymptomatic bruit, confirming stenosis, and then

50

Symptomatic Asymptomatic

45

Likelihood Ratio

40 35 30 25 20

CLINICAL SCENARIO—RESOLUTION You listened for a bruit with the plan that you would emphasize risk-reduction strategies for your hypertensive patient, but now, she has asked you to use the findings to help decide whether to assess her for carotid stenosis. A variety of studies suggest that the LR+ for carotid stenosis when a bruit is heard is 4 to 10. Let us say you estimate that her prior probability of carotid stenosis is approximately 3%, which agrees with epidemiologic data. Finding a bruit increases her probability of carotid stenosis to approximately 11%, but it might be as high as 20%. Hence, you probably have identified a patient at higher risk of carotid stenosis. The issue is not whether you can identify stenosis with ultrasonography, but whether you should. Studies of diagnostic tests give you only the likelihood of the target disorder. You will need to review the natural history of patients with asymptomatic carotid stenosis to help this patient decide whether to pursue further testing.

15 10 5

REFERENCES FOR THE UPDATE

0 0

1

2

3

4

5

6

7

8

9

10

Prior Probability of Stenosis

Figure 9-2 Likelihood Ratio of Carotid Bruit as a Function of Symptoms and Prior Probability of Stenosis The likelihood ratio of a carotid bruit in predicting carotid stenosis depends on whether the patient is symptomatic or asymptomatic and on the prior probability of carotid stenosis. However, for both groups of patients the positive likelihood ratio decreases in value as the prior probability of carotid stenosis increases.

1. Blakeley DD, Oddone EZ, Hasselblad V, Simel D, Matchar DB. Noninvasive carotid artery testing: a meta-analytic review. Ann Intern Med. 1995;122(5):360-367. 2. Hankey GJ, Warlow CP. Symptomatic carotid ischaemic events: safest and most cost-effective way of selecting patients for angiography, before carotid endarterectomy. BMJ. 1990;300(6738):1485-1491.a 3. Sauve JS, Thorpe KE, Sackett DL, et al. Can bruits distinguish highgrade from moderate symptomatic carotid stenosis? Ann Intern Med. 1994;120(8):633-637.a

109

CHAPTER 9

Update

CAROTID STENOSIS—MAKE THE DIAGNOSIS It is hard for physicians to resist auscultating the neck. Perhaps no physical finding in adults causes as much confusion as the presence of the carotid bruit in asymptomatic patients. Most clinical research suggests that there is a clear benefit to carotid endarterectomy for patients with symptoms and a benefit (although likely small) for asymptomatic patients.

PRIOR PROBABILITY FOR CAROTID STENOSIS Symptomatic Patients Prior Probability

After ruling out patients for whom endarterectomy would not be considered, 10% to 30% will have surgically amenable carotid stenosis. There is variability in the estimates of the remaining patients who will prove to have surgically correctable carotid stenosis. The variability depends on the patient population, criteria for determining surgical risk, and the threshold for defining an “important” stenosis.

Asymptomatic Patients Prior Probability

For patients 60 years or older, there is 1% to 10% probability for carotid stenosis. The prevalence of carotid stenosis increases from approximately 0.5% for patients 50 years of age to approximately 10% by age 90 years.14 For patients older than 65 years, 5% to 7% of women and 7% to 10% of men will have a carotid stenosis of 50% or higher. For more significant degrees of stenosis, 2 prospective, population-based samples show that 1% to 2.3% of women and 1% to 4.1% of men older than 60 years will have a stenosis of 75% to 99%.15,16

POPULATION FOR WHOM THE CAROTID BRUIT MIGHT BE AUSCULTATED • Patients with cerebrovascular symptoms compatible with a nondebilitating stroke or TIA • Older patients, as part of an assessment for cardiovascular risk

DETECTING THE LIKELIHOOD OF CAROTID STENOSIS The presence of a carotid bruit does increases the likelihood of an important stenotic lesion, but the absence of a bruit (especially in patients with atherosclerotic risk factors) does not rule out carotid stenosis (see Tables 9-5 and 9-6). Table 9-5 Do Carotid Bruits Predict Stenosis in Symptomatic Patients? LR for Carotid Stenosis, 70%-99% (95% CI) Ipsilateral bruit No ipsilateral bruit

3.0 (1.3-7.1) 0.49 (0.36-0.67)

Abbreviations: CI, confidence interval; LR, likelihood ratio.

Table 9-6 Do Carotid Bruits Increase the Likelihood of Carotid Stenosis in Asymptomatic Patients? LR for Carotid Stenosis, 70%-99% Ipsilateral bruit No ipsilateral bruit

4.0-10 Uncertain

Abbreviation: LR, likelihood ratio.

REFERENCE STANDARD TESTS • Carotid duplex ultrasonography • Carotid Doppler ultrasonography • Magnetic resonance angiogram 4. North American Symptomatic Carotid Endarterectomy Trial (NASCET) Steering Committee. North American Symptomatic Carotid Endarterectomy Trial. Stroke. 1991;22(6):711-720. 5. US Preventive Services Task Force. Screening for Asymptomatic Carotid Artery Stenosis: Guide to Clinical Preventive Services. 2nd ed. Baltimore, MD: Lippincott Williams & Wilkins; 1996:53-61. 6. Mead GE, Wardlaw JM, Lewis SC, McDowall M, Dennis MS. Can simple clinical features be used to identify patients with severe carotid stenosis on Doppler ultrasound? J Neurol Neurosurg Psychiatry. 1999;66(1):16-19.a 7. Magyar MT, Nam E, Csiba L, Ritter MA, Ringelstein EB, Droste DW. Carotid artery auscultation—anachronism or useful screening procedure? Neurol Res. 2002;24(7):705-708.a 8. Hill AB, Obrand D, Steinmeitz OK. The utility of selecting screening for carotid stenosis in cardiac surgery patients. J Cardiovasc Surg. 1999;40 (6):829-836.a 9. de Virgilio C, Toosie K, Arnell T, et al. Asymptomatic carotid artery stenosis screening in patients with lower extremity atherosclerosis: a prospective study. Ann Vasc Surg. 1997;11(4):374-377.a 10. Chambers BR, Norris JW. Outcome in patients with asymptomatic neck bruits. N Engl J Med. 1986;315(14):860-865.a

11. Lewis R, Abrahamowicz M, Cote R, Battista R. Predictive power of duplex ultrasonography in asymptomatic carotid disease. Ann Intern Med. 1997;127(1):13-20.a 12. Albers GW, Hart RG, Lutsep HL, Newell DW, Sacco RL. AHA scientific statement: supplement to the guidelines for the management of transient ischemic attacks: a statement from the Ad Hoc Committee on Guidelines for the Management of Transient Ischemic Attacks, Stroke Council, American Heart Association. Stroke. 1999;30(11):2502-2511. 13. Mackey A, Cote R, Battista RN. Asymptomatic carotid disease. In: Canadian Task Force on the Periodic Health Examination, eds. Canadian Guide to Clinical Preventive Health Care. Ottawa, Ontario, Canada: Health Canada; 1994:692-704. 14. Goldstein LB, Adams R, Becker K, et al. Primary prevention of ischemic stroke. Circulation. 2001;103(1):163-182. 15. Willeit J, Kiechl S. Prevalence and risk factors of asymptomatic extracranial carotid artery atherosclerosis. Arterioscler Thromb. 1993;13(5):661-668. 16. Leary DHO, Polak JF, Kronmal RA, et al. Distribution and correlates of sonographically detected carotid artery disease in the cardiovascular health study. Stroke. 1992;23(12):1752-1760. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

110

EVIDENCE TO SUPPORT THE UPDATE: Carotid Bruit

9

MAIN RESULTS TITLE Outcome in Patients With Asymptomatic Neck Bruits. AUTHORS Chambers BR, Norris JW. CITATION N Engl J Med. 1986;315(14):860-865. QUESTION Does a bruit predict the presence or absence of carotid stenosis? DESIGN Baseline data collected as part of a prospective cohort of patients enrolled in a study of asymptomatic neck bruits.

See Table 9-7. Table 9-7 The Predictive Value of a Carotid Bruit for Identifying Various Levels of Carotid Stenosis Stenosis, No. (Degree of Stenosis, %)

Positive Predictive Value of a Carotid Bruit (95% CI)

113 (>75) 157 (30-74) 230 (0-29)

23 (19-26) 31 (27-36) 46 (42-50)

Abbreviation: CI, confidence interval.

SETTING Single site, stroke unit in Toronto.

CONCLUSIONS PATIENTS Among 659 patients referred for Doppler ultrasonography, 500 were asymptomatic and were enrolled in a prospective cohort. The patients include those in whom physicians might consider the presence of carotid stenosis. They had a mean age of 64 years, 74% were men, 58% had hypertension, 58% had heart disease, 57% had peripheral vascular disease, 13% had diabetes, 73% had smoking history or currently smoked, and 35% had hypercholesterolemia.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Patients were examined at enrollment. Carotid Doppler ultrasonography was performed without knowledge of the auscultatory findings. The ultrasonographers had demonstrated proficiency when their findings were compared with angiography.

LEVEL OF EVIDENCE Positive predictive value studies. STRENGTHS Prospective with careful screening and con-

firmed proficiency of ultrasonographers. LIMITATIONS The results generalize only to populations with the same prevalence of carotid stenosis among patients with carotid bruits. No patient who lacked a carotid bruit was included, so the sensitivity and specificity cannot be determined. This study included a large cohort of asymptomatic patients, evaluated solely because they had a bruit. The cohort seems typical of a group of patients at risk for cerebrovascular or atherosclerotic disease. To apply these data to your own patients, you would need to know whether the study patients were similar to your patients because the predictive value is affected by the prevalence of disease.

Reviewed by David L. Simel, MD, MHS

MAIN OUTCOME MEASURE Positive predictive value at different degrees of stenosis.

E9-1

CHAPTER 9

Evidence to Support the Update

TITLE Do Carotid Bruits Predict Disease of the Internal Carotid Arteries?

taken from the referral requests, which may not have been consistently thorough. Thus, it is likely that some patients recorded as not having a bruit may have actually had a cervical bruit and vice versa.

AUTHORS Davies KN, Humphrey PRD. CITATION Postgrad Med J. 1994;70(824):433-435.

Reviewed by David L. Simel, MD, MHS

QUESTION Do bruits identify patients with carotid stenosis? DESIGN Prospective, consecutive patients. SETTING Single site, cerebrovascular clinic in the United Kingdom.

TITLE Asymptomatic Carotid Artery Stenosis Screening in Patients With Lower Extremity Atherosclerosis: A Prospective Study.

PATIENTS All patients were referred for evaluation. The underlying prevalence of cardiovascular risk factors is not described.

AUTHORS de Virgilio C, Toose K, Arnell T, Lewis RJ, Donayre CE, Baker JD, Melany M, White RA.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD

QUESTION Does a bruit predict ipsilateral carotid stenosis among patients with peripheral vascular disease who have no cerebrovascular symptoms?

The presence of a bruit was taken from the referral note but was not confirmed at study entry. The history was confirmed in regard to symptoms.

CITATION Ann Vasc Surg. 1997;11(4):374-377.

DESIGN Prospective. SETTING Vascular surgery clinic, West Los Angeles Veterans Affairs medical center.

MAIN OUTCOME MEASURE Carotid stenosis of 70% to 99%.

PATIENTS Men (n = 89) who were referred for surgical evaluation for peripheral vascular disease. Patients were excluded if they had any symptoms of cerebrovascular disease. Ninety percent of the patients had typical claudication, 88% were smokers, 60% had hypertension, and 42% had diabetes.

MAIN RESULTS See Table 9-8.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD

CONCLUSIONS LEVEL OF EVIDENCE Level 4. STRENGTHS Pragmatic study from the perspective of a vas-

Auscultation of the carotids and a carotid duplex ultrasonography were performed on each carotid by a radiologist blinded to the clinical status of the patient.

cular laboratory that would take the information from the referral note. LIMITATIONS The presence of a bruit was not confirmed in

a standardized manner. It is not stated whether the ultrasonography was done blinded to the clinical findings. Although interesting from the perspective of clinicians in a vascular laboratory, the presence or absence of a bruit was not systematically confirmed by the study clinicians. The data were

Table 9-8 Likelihood Ratio of a Carotid Bruit for Carotid Stenosis of at Least 70% Test

Sensitivity

Specificity

LR+ (95% CI)

LR– (95% CI)

Bruit

0.57

0.70

1.9 (1.4-2.6)

0.61 (0.41-0.83)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E9-2

MAIN OUTCOME MEASURE Presence of carotid stenosis greater than 50%. Data are presented for numbers of arteries imaged (n = 178).

MAIN RESULTS See Table 9-9. Of 89 patients, 18 had a bruit (in 14 of 18, the bruit was bilateral). Of 32 carotid arteries with bruits, 13 had a stenosis of at least 50%. This study used a threshold value different from those used by other studies on the sensitivity and specificity for a carotid bruit. However, traditionally we like to think of the screening test as having the same sensitivity and specificity independent of the prevalence. Likelihood ratios (LRs) for this study are similar to those among asymptomatic cardiac surgery patients.

CHAPTER 9

Table 9-9 Likelihood Ratios for a Carotid Bruit to Predict a Carotid Stenosis of at Least 50% Test

Sensitivity

Specificity

LR+ (95% CI)

LR– (95% CI)

Bruit

0.52

0.88

4.2 (2.3-7.2)

0.55 (0.34-0.77)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Carotid Bruit

TITLE Prospective Evaluation of Carotid Bruit as a Predictor of First Stroke in Type 2 Diabetes: The Fremantle Diabetes Study. AUTHORS Gillett M, Davis WA, Jackson D, Bruce DG, Davis TME. CITATION Stroke. 2003;34(9):2145-2151.

CONCLUSIONS LEVEL OF EVIDENCE Level 2. STRENGTHS Prospective study. LIMITATIONS Small sample size. The study used a lower carotid stenosis threshold (50%) than other studies for reporting the association with bruits. The number of arteries with a carotid stenosis of greater than 75% in this study was small (6.7%; 12 of 178). This is a small but sound study. The population studied seems typical of male patients with claudication. It is not clear whether the patients were consecutive patients or just those for whom peripheral vascular surgery was considered. Nonetheless, we can derive some information about the predictive value in patients with claudication. By reporting the data at a lower threshold for defining disease (50% as opposed to 75%), there should be proportionally more patients with disease as opposed to “normal.” This would not necessarily affect the sensitivity and specificity if the importance of a bruit is independent of the prevalence of disease. In fact, traditionally Bayesian analysis predicts that the sensitivity and specificity will not change with the prevalence of disease. Despite using a different threshold for defining carotid stenosis, the LRs were almost identical to most of the studies using a 70% to 75% cut point. Unfortunately, we cannot combine these data with studies using a different cut point for assessing the predictive value.

Reviewed by David L. Simel, MD, MHS

QUESTION Among patients with diabetes who are asymptomatic for cerebrovascular ischemia, does the presence of a carotid bruit identify those who will have stroke? DESIGN Prospective, observational study of the natural history of diabetes. Patients had a baseline assessment and then yearly follow-up (recruitment, 1993-1996; followup, until 2000) or until they had a qualifying event. The mean follow-up was 6.5 ± 2.2 years. SETTING Community based in Fremantle, Western Australia. PATIENTS Patients in a defined region of Australia were recruited from the community to participate in the Fremantle Diabetes Study. The current study includes 1181 patients from the registry who had no history of cerebrovascular disease at recruitment into the study. Fiftythree patients had bruits compared with 1128 patients without bruits.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The presence of a carotid bruit was assessed at entry into the observational study. The presence of preexisting cerebrovascular disease was inferred from the lack of patient symptoms or history of an event. At annual follow-up, qualifying events were determined from patient’s self-reported strokes or transient ischemic attack (TIA) symptoms, or a neurologic examination. Details of admissions for stroke or death were reviewed. It is not clear whether the assessment of a qualifying event was made with the knowledge of a baseline bruit. Deaths were reviewed without knowledge of carotid bruit status.

MAIN OUTCOME MEASURE TIA or stroke.

MAIN RESULTS See Table 9-10. Eighteen patients with bruits had strokes (18 of 53; 34%) vs 116 strokes in patients without bruits at entry (116 of 1128; 10%). Of the 18 patients with bruits and stroke, complete clinical data were available for 10 patients and revealed that 9 of 10 patients had a stroke ipsilateral to the bruit.

E9-3

CHAPTER 9

Evidence to Support the Update

Table 9-10 Likelihood Ratios That a Bruit Predicts a Subsequent Stroke Test

Outcome

LR+ (95% CI)

LR– (95% CI)

Bruit

Stroke in the first 2 y after entry into the study Stroke from entry to end of study

6.6 (3.6-12)

0.78 (0.64-0.89)

4.0 (2.3-6.8)

0.90 (0.82-0.95)

Bruit

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

The patients with bruits were older on entry into the study compared with those without bruits (mean age, 71 vs 63 years; P < .001), had a longer history of diabetes (5.0 vs 3.8 years; P = .009), had a higher blood pressure (mean systolic, 164 vs 149 mm Hg; P < .001) that more frequently led to blood pressure treatment (76% vs 47%; P < .001), and had less adiposity (waist circumference, 96 vs 100 cm; P = .004). At entry, there was a low frequency of aspirin therapy (26% of those without bruits vs 19% of those without bruits). Of the 4.9% of patients with atrial fibrillation, only 17% without bruits were taking warfarin, whereas none of the patients with bruits were taking warfarin (P >.99). During follow-up, 25 patients underwent carotid endarterectomy; all but 3 had qualifying endpoint symptoms. On proportional hazards modeling, there was a difference in the effect of risk factors for the first 2 years of enrollment compared with the duration of the study. From baseline to year 2, the important risk factors for a stroke were the presence of a carotid bruit (hazard ratio [HR], 6.1; 95% confidence interval [CI], 3.1-12), age (HR, 1.5 for each 10-year increase), and diastolic blood pressure (HR, 1.4 for each 1mm Hg increase). However, after 2 years, the influence of a carotid bruit at baseline lost statistical significance.

CONCLUSIONS LEVEL OF EVIDENCE Level 4. STRENGTHS Community-based study of patients who are

asymptomatic for cerebrovascular disease but who have a risk factor for atherosclerotic disease (diabetes). The prevalence of bruits in these asymptomatic patients with diabetes (4.5%) is approximately what we would expect in a general, community population. LIMITATIONS The assessment of previous outcomes at baseline or during follow-up (stroke or TIA) relied on patient selfreport or the follow-up examination. Thus, not all patients with events were hospitalized or examined when they had their TIA or stroke. The clinicians would have been aware that the patients had bruits (or not) when assessing outcomes. Using a diagnostic test to establish prognosis can lead to errors when the prognosis depends on whether there were interventions. In this particular study, there may not have been large differences in interventions between the 2 groups

E9-4

even though there was no standardized approach to care. Some statisticians would take the opportunity to do a propensity analysis to sort this out further and determine whether a bruit was associated with any treatments. Given these caveats, can we use these data? The notion that the carotid bruit may lose “importance” over time does make sense but needs to be confirmed in other studies and in patients with different atherosclerotic risk factors. An alternative explanation may be that the stroke risk was higher early in the study because the patients were not at currently recommended levels of systolic blood pressure control. Obviously, these data could apply only to patients with diabetes who already have other risk factors for stroke and atherosclerotic disease. What they seem to suggest is that carotid bruits, at the least, are important “by the company they keep.” Acknowledgment

Timothy M. E. Davis, FRACP, graciously provided the raw data for the event rates during the first 2 years after patient enrollment. Reviewed by David L. Simel, MD, MHS

TITLE Symptomatic Carotid Ischaemic Events: Safest and Most Cost Effective Way of Selecting Patients for Angiography Before Carotid Endarterectomy. AUTHORS Hankey GJ, Warlow CP. CITATION BMJ. 1990;300(6738):1485-1491. QUESTION Among patients considered for endarterectomy after a symptomatic cerebrovascular event, does a carotid bruit predict those who will have carotid stenosis? DESIGN Consecutive patients under evaluation for a carotid endarterectomy who were referred to a neurologist. SETTING Single site, Western General Hospital in Edinburgh, Scotland. PATIENTS Four hundred eighty-five consecutive patients were referred for evaluation. Because a decision was made not to pursue possible endarterectomy, 189 patients were excluded, leaving 296 patients for analysis. Of the 296 patients, 32% had a bruit, and 70% were men with a mean age of 61 years. The excluded patients also had a prevalence of 32% bruits, and 60% were men with a mean age of 70 years. The investigators state that the decision to pursue possible surgery was independent of the presence of a bruit.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Each patient was clinically evaluated by the neurologist. The reference standard was carotid arteriography.

CHAPTER 9

Table 9-11 Likelihood Ratios of Bruit for Carotid Stenosis of at Least 75% Test

Sensitivity

Specificity

0.76

0.76

Ipsilateral bruit

Carotid Bruit

LR+ (95% CI) LR– (95% CI)

TITLE The Utility of Selective Screening for Carotid Stenosis in Cardiac Surgery Patients.

3.2 (2.4-4.2)

AUTHORS Hill AB, Obrand D, Steinmetz OK.

0.31 (0.18-0.50)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

CITATION J Cardiovasc Surg. 1999;40(6):829-836. QUESTION Among patients scheduled for cardiac surgery, does a carotid bruit identify those with carotid stenosis?

MAIN OUTCOME MEASURE Carotid stenosis of 75% to 99%.

DESIGN Prospective, consecutive patients. SETTING Single site, McGill University, Montreal.

MAIN RESULTS See Table 9-11.

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS Carotid arteriogram was the reference stan-

dard test. LIMITATIONS The study population includes only patients for whom surgery was considered an option. It is unclear whether the presence of a bruit affected the decision to pursue ultrasonography, but the proportion of patients with bruits was the same between included and excluded groups. This population of patients is most similar to that reported from the North American Symptomatic Carotid Endarterectomy Trial (NASCET).1,2 However, the NASCET report on bruits included only patients who were randomized to endarterectomy instead of medical treatment. The study reviewed here includes patients a step before that. Thus, it is less selective because it included patients for whom surgery was being considered rather than only those for whom endarterectomy was planned. The study was affected by verification bias. However, the percentage of patients with bruits was identical to the percentage of patients without bruits. If the authors are correct that the presence of a bruit did not affect the decision to use arteriography, then the effect of verification bias is negligible. The data also allow us to calculate the predictive value of a bruit for different threshold levels for stenosis.

REFERENCES FOR THE EVIDENCE 1. Sauve JS, Thorpe KE, Sackett DL, et al. Can bruits distinguish highgrade from moderate symptomatic carotid stenosis? Ann Intern Med. 1994;120(8):633-637. 2. North American Symptomatic Carotid Endarterectomy Trial (NASCET) Steering Committee. North American Symptomatic Carotid Endarterectomy Trial. Stroke. 1991;22(6):711-720.

Reviewed by David L. Simel, MD, MHS

PATIENTS Two hundred consecutive patients who were scheduled for elective cardiac surgery (196 for coronary bypass grafting). Most of the patients were asymptomatic for carotid artery disease (n = 186). The data are given so that the results for patients with asymptomatic carotid bruits (n = 23) can be extracted. The distribution of patient characteristics suggests that they were typical of those undergoing coronary bypass grafting. Fifty percent were older than 65 years; half of all patients were smokers, 22% having diabetes mellitus, 31% having hyperlipidemia, and 20% having peripheral vascular disease.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The patients were examined before the carotid ultrasonography. The ultrasonography was done by vascular technicians who had proved their proficiency compared with angiography.

MAIN OUTCOME MEASURES Carotid stenosis of 80% or more by duplex ultrasonography. All patients with a positive duplex result also had arteriography.

MAIN RESULTS See Table 9-12. In a logistic model with many clinical variables, the neurologic history (odds ratio [OR], 14; 95% confidence interval [CI], 2.9-73) and a carotid bruit (OR, 28; 95% CI, 6.6123) were the only variables that were important.

Table 9-12 Likelihood Ratio for a Carotid Bruit to Predict Stenosis of at Least 80% Test Asymptomatic carotid bruit

Sensitivity Specificity LR+ (95% CI) 0.78

0.91

8.6 (4.3-15)

LR– (95% CI) 0.24 (0.07-0.60)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E9-5

CHAPTER 9

Evidence to Support the Update

CONCLUSIONS

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD

LEVEL OF EVIDENCE Level 2.

A neurologist evaluated all patients to confirm a bruit. Ultrasonography was performed, although obviously the radiologist was aware of the presence of a bruit.

STRENGTHS Prospective consecutive enrollment of patients,

primarily those asymptomatic for carotid artery disease. Although the study patients were all scheduled for cardiac surgery, the population included patients for whom carotid artery stenosis might be considered. It is one of the few studies that contain specificity data for a population of patients who are asymptomatic for cerebrovascular disease. A logistic regression was done to determine whether carotid bruits were important after controlling for other clinical variables.

MAIN OUTCOME MEASURE The predictive value of a carotid bruit for identifying various levels of carotid stenosis.

LIMITATIONS Small sample size.

MAIN RESULTS

Despite the small sample size compared with studies of symptomatic patients, this is an important study. The prevalence of carotid disease (defined as >80%) was 4.8% for individuals who were asymptomatic for neurologic symptoms vs 36% for those with symptoms. The positive predictive value for finding an asymptomatic bruit was 30%. The prevalence of carotid stenosis in this study is approximately what could be expected for an age-matched population of patients with atherosclerotic disease. Although more studies with specificity data for the bruit in asymptomatic patients are needed, these results may generalize to those with atherosclerotic disease.

See Table 9-13.

Reviewed by David L. Simel, MD, MHS

Table 9-13 Predictive Value of a Carotid Bruit for Identifying Various Levels of Carotid Stenosis Stenosis, No. (Degree of Stenosis, %) 37 (100) 113 (80-99) 207 (50-79) 113 (16-49) 180 (1-15) 64 (Normal)

Positive Predictive Value of a Bruit (95% CI) 5 (4-7) 16 (13-19) 29 (26-32) 16 (13-19) 25 (22-29) 9 (7-11)

Abbreviation: CI, confidence interval.

TITLE Predictive Power of Duplex Ultrasonography in Asymptomatic Carotid Disease.

CONCLUSIONS LEVEL OF EVIDENCE Positive predictive value study.

AUTHORS Lewis R, Abrahamowicz M, Core R, Battista RN. CITATION Ann Intern Med. 1997;127(1):13-20. QUESTION What is the prevalence of carotid stenosis in a large cohort of asymptomatic patients? DESIGN Prospective natural history study and randomized trial of aspirin vs placebo, begun in 1988.1 SETTING Multicenter. PATIENTS General practitioners and specialists referred patients from community and teaching hospital settings for evaluation of carotid stenosis. Patients were excluded if they had cerebrovascular symptoms, valvular heart disease, recent myocardial infarction, and a variety of other conditions that would have affected outcomes in the randomized trial. Seven hundred fourteen patients were enrolled, with the focus of this review being only the baseline evaluation. The patient population showed a typical prevalence of patients with atherosclerotic risk: mean age, 65 years; hypertension, 47%; heart disease, 39%; hyperlipidemia, 50%; diabetes, 20%; and current smokers, 35%. E9-6

STRENGTHS A typical population of patients referred for

ultrasonography. However, what makes this study unique is that all of the patients were asymptomatic for cerebrovascular disease. The ultrasonographers validated their proficiency. LIMITATIONS The results generalize only to populations with the same prevalence of carotid stenosis among patients with carotid bruits. No patient who lacked a carotid bruit was included, so the sensitivity and specificity cannot be determined. The study population and trial design are similar to those of an earlier study.2 Furthermore, the patients in the 2 studies are similar in terms of their risk factors for atherosclerotic disease, which is important because the positive predictive value of a test depends on the prevalence of disease. The 2 studies had almost identical positive predictive values for carotid stenosis (21% in this study for stenosis ≥80% vs 23% in the earlier study that used a cut point of 75%).

CHAPTER 9

REFERENCES FOR THE EVIDENCE 1. Asymptomatic Cervical Bruit Study Group. Natural history and effectiveness of aspirin in asymptomatic patients with cervical bruits. Arch Neurol. 1991;48(7):683-686. 2. Chambers BR, Norris JW. Outcome in patients with asymptomatic neck bruits. N Engl J Med. 1986;315(14):860-865.

Reviewed by David L. Simel, MD, MHS

Carotid Bruit

Table 9-14 Likelihood Ratios of Bruit for Carotid Stenosis of at Least 70%a Test

Sensitivity

Specificity

LR+ (95% CI)

LR– (95% CI)

Bruit

0.56

0.91

6.0 (3.2-10)

0.48 (0.25-0.74)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aData are not broken out for symptomatic vs asymptomatic patients.

MAIN RESULTS TITLE Carotid Artery Auscultation—Anachronism or Useful Screening Procedure?

See Table 9-14.

AUTHORS Magyar MT, Nam E, Csiba L, Ritter MA, Ringelstein EB, Droste DW.

CONCLUSIONS

CITATION Neurol Res. 2002;24(7):705-708. QUESTION Among patients referred for carotid ultrasonographic studies, does the presence of a bruit predict carotid stenosis of 70% to 99%? DESIGN Prospective, consecutive patients referred for ultrasonography. SETTING Single site. Inpatients and outpatients of a neurology department at a university hospital (Germany) who were referred for carotid ultrasonography. PATIENTS A total of 145 patients, of whom 43% had no history of cerebrovascular event (“asymptomatic”). The sample reflects a referred population of patients at risk for atherosclerotic vascular disease (hypertension, 43%; hyperlipidemia, 35%; smokers, 24%; angina, 19%; previous myocardial infarction, 18%; claudication, 12%; and diabetes, 12%), although other patients were referred for lower-risk conditions (vertigo, dizziness, and psychosomatic symptoms). A total of 273 carotid arteries were evaluated.

LEVEL OF EVIDENCE Level 3. LIMITATIONS Relatively small sample size, referred popu-

lation. STRENGTHS Includes a mixture of patients with and with-

out cerebrovascular symptoms. Auscultation was done without knowledge of ultrasonographic results. The study enrolled consecutive referred patients and includes a population of patients with and without symptoms. With patients at various risk levels of cerebrovascular disease, the results ought to overlap with other populations of asymptomatic patients and symptomatic patients—the confidence intervals for the likelihood ratios are similar to those of most other studies. Reviewed by David L. Simel, MD, MHS

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD A single physician blinded to the patient’s medical history and the ultrasonographic results conducted the carotid auscultation. A different physician performed the carotid ultrasonography.

MAIN OUTCOME MEASURE Carotid stenosis of 70% to 99%.

E9-7

CHAPTER 9

Evidence to Support the Update

TITLE Can Simple Clinical Features Be Used to Identify Patients With Severe Carotid Stenosis on Doppler Ultrasound?

Table 9-15 Likelihood Ratios of Bruit for Carotid Stenosis of at Least 70%

AUTHORS Mead GE, Warlaw JM, Lewis SC, McDowall M, Dennis MS.

Ipsilateral bruit 0.56 0.90 5.5 (4.1-7.2) 0.48 (0.38-0.60) Peripheral 0.28 0.84 1.7 (1.2-2.4) 0.86 (0.74-0.96) vascular diseasea Diabetes 0.16 0.91 1.7 (1.0-2.8) 0.93 (0.83-1.0) Combination of Findings (Ipsilateral Bruit, Diabetes, Previous TIA, Not a Lacunar Event) ≥2 Findings 4.8 (3.5-6.5) 0-1 Finding 0.57 (0.46-0.69)

CITATION J Neurol Neurosurg Psychol. 1999;66(1):16-19. QUESTION Do carotid bruits, or a combination of clinical findings, predict the presence of significant carotid stenosis in symptomatic patients? DESIGN Prospective. SETTING British hospital and neurovascular clinic. PATIENTS A total of 726 patients with an acute stroke, transient ischemic attack, or retinal stroke entered into the Lothian Stroke Registry. All patients had ultrasonography, independent of whether or not a bruit was detected.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD All patients were examined by a stroke physician or research registrar. Carotid Doppler ultrasonography was performed by one of 2 neuroradiologists who had excellent agreement with a subset of patients referred to angiography (κ = 0.70.8). The ultrasonographers were blinded to the clinical data.

MAIN OUTCOME MEASURES Stenosis of 70% to 99% by ultrasonography vs a nonsurgical stenosis ( 40 y Katz et al,32 1990 Nocturnal Paresthesia Buch-Jaeger and Foucher,31 1994 Gupta and Benstead,62 1997 Katz et al,32 1990 Pooled results Bilateral Symptoms Katz et al,32 1990 Weak Thumb Abduction Gerr et al,33 1995 Kuhlman and Hennessey,30 1997 Pooled results Thenar Atrophy Gerr et al,33 1995 Golding et al,64 1986 Katz et al,32 1990 Pooled results Hypalgesia Golding et al,64 1986 Kuhlman and Hennessey,30 1997 Pooled results 2-Point Discrimination Buch-Jaeger and Foucher,31 1994, 6 mm Gerr et al,33 1995, 5 mm Katz et al,32 1990, 4 mm Pooled results Abnormal Vibration Buch-Jaeger and Foucher,31 1994 Gerr et al,33 1995 Pooled results Abnormal Monofilament Findings Buch-Jaeger and Foucher,31 1994 Square Wrist Sign Kuhlman and Hennessey,30 1997 Radecki,27 1994 Pooled results Closed Fist Sign De Smet et al,28 1995 Flick Sign Pryse-Phillips,29 1984

145

0.64

0.73

2.4 (1.6-3.5)

0.5 (0.3-0.7)

110 a

0.80

0.41

1.3 (1.0-1.7)

0.5 (0.3-1.0)

112a 92 110 …b

0.51 0.84 0.77 …

0.68 0.33 0.27 …

1.6 (1.0-2.6) 1.2 (1.0-1.6) 1.1 (0.9-1.3) 1.2 (1.0-1.4)

0.7 (0.5-1.0) 0.5 (0.2-1.1) 0.8 (0.4-1.6) 0.7 (0.5-0.9)

0.58

1.4 (1.0-2.1)

0.7 (0.4-1.0)

0.62 0.66 …

1.7 (1.1-2.4) 2.0 (1.4-2.7) 1.8 (1.4-2.3)

0.6 (0.4-0.9) 0.5 (0.4-0.7) 0.5 (0.4-0.7)

0.82 0.99 0.90 …

1.6 (0.8-3.2) 5.4 (0.2-130) 1.5 (0.5-4.1) 1.6 (0.9-2.8)

0.9 (0.7-1.1) 1.0 (0.9-1.0) 0.9 (0.8-1.1) 1.0 (0.9-1.0)

110 a

115 228 … 115 110 110 a …

0.61 Motor Examination 0.63 0.66 … 0.28 0.04 0.14 … Sensory Examination

110 228 …

0.15 0.51 …

0.93 0.85 …

2.2 (0.7-6.7) 3.4 (2.0-5.8) 3.1 (2.0-5.1)

0.9 (0.8-1.1) 0.6 (0.5-0.7) 0.7 (0.5-1.1)

167 115 110 a …

0.06 0.28 0.32 …

0.99 0.64 0.80 …

4.5 (0.6-37) 0.8 (0.5-1.3) 1.6 (0.8-3.1) 1.3 (0.6-2.7)

1.0 (0.9-1.0) 1.1 (0.9-1.5) 0.8 (0.7-1.1) 1.0 (0.9-1.1)

172 115 …

0.20 0.61 …

0.81 0.71 …

1.1 (0.6-2.0) 2.1 (1.3-3.3) 1.6 (0.8-3.0)

1.0 (0.8-1.1) 0.5 (0.4-0.8) 0.8 (0.4-1.3)

167

0.59 Other Tests

0.59

1.5 (1.1-2.0)

0.7 (0.5-0.9)

228 665 …

0.69 0.47 …

0.73 0.83 …

2.6 (1.8-3.7) 2.8 (2.1-3.8) 2.7 (2.2-3.4)

0.4 (0.3-0.6) 0.6 (0.6-0.7) 0.5 (0.4-0.8)

35

0.61

0.92

7.3 (1.1-49)

0.4 (0.2-0.7)

396

0.93

0.96

21 (11-42)

0.1 (0-0.1) (Continued)

115

CHAPTER 10

The Rational Clinical Examination

Table 10-2 Diagnostic Accuracy of History and Physical Examination for Carpal Tunnel Syndrome (Continued) Findings by Reference and Year

No. of Hands

Sensitivity

Specificity

LR+ (95% CI)

LR– (95% CI)

Other Tests Tinel Sign Gerr et al,33 1995 Golding et al,64 1986 Heller et al,65 1986 Katz et al,32 1990 Kuhlman and Hennessey,30 1997 Buch-Jaeger and Foucher,31 1994 Pooled results Phalen Sign Buch-Jaeger and Foucher,31 1994 Gerr et al,33 1995 Heller et al,65 1986 Katz et al,32 1990 Kuhlman and Hennessey,30 1997 Golding et al,64 1986 Burke et al,66 1999 De Smet et al,28 1995 Pooled results Pressure Provocation Test Kuhlman and Hennessey,30 1997 Burke et al,66 1999 Buch-Jaeger and Foucher,31 1994 De Smet et al,28 1995 Pooled results Tourniquet Test Buch-Jaeger and Foucher,31 1994 Golding et al,64 1986 Pooled results

115 110 80 110 a 228 172 …

0.25 0.26 0.60 0.59 0.23 0.42 …

0.67 0.80 0.77 0.67 0.87 0.64 …

0.7 (0.4-1.3) 1.3 (0.6-2.6) 2.7 (1.2-5.9) 1.8 (1.2-2.7) 1.8 (1.0-3.4) 1.1 (0.8-1.7) 1.4 (1.0-1.9)

1.1 (0.9-1.4) 0.9 (0.7-1.2) 0.5 (0.3-0.8) 0.6 (0.4-0.9) 0.9 (0.8-1.0) 0.9 (0.7-1.2) 0.8 (0.7-1.0)

166 115 80 110 a 228 110 200 66 …

0.58 0.75 0.67 0.75 0.51 0.10 0.51 0.91 …

0.54 0.33 0.59 0.47 0.76 0.86 0.54 0.33 …

1.3 (0.9-1.7) 1.1 (0.9-1.4) 1.6 (1.0-2.8) 1.4 (1.1-1.9) 2.1 (1.4-3.2) 0.7 (0.2-2.2) 1.1 (0.7-1.8) 1.4 (0.9-2.0) 1.3 (1.1-1.6)

0.8 (0.6-1.1) 0.7 (0.4-1.3) 0.6 (0.3-0.9) 0.5 (0.3-0.9) 0.6 (0.5-0.8) 1.0 (0.9-1.2) 0.9 (0.6-1.3) 0.3 (0.1-0.9) 0.7 (0.6-0.9)

228 205 155 66 …

0.28 0.52 0.49 0.63 …

0.74 0.38 0.54 0.33 …

1.1 (0.7-1.7) 0.8 (0.6-1.2) 1.1 (0.8-1.5) 0.9 (0.6-1.5) 1.0 (0.8-1.3)

1.0 (0.8-1.1) 1.3 (0.7-2.2) 0.9 (0.7-1.3) 1.1 (0.5-2.7) 1.0 (0.9-1.1)

145 110 …

0.52 0.21 …

0.36 0.87 …

0.8 (0.6-1.1) 1.6 (0.7-3.9) 1.0 (0.5-1.9)

1.3 (0.9-2.0) 0.9 (0.8-1.1) 1.0 (0.7-1.5)

Abbreviations: CI, confidence interval; LR, likelihood ratio. A positive LR (LR+) indicates a positive finding for carpal tunnel syndrome; a negative LR (LR–) indicates either a negative finding or an absent finding. a Refers to individual subjects instead of individual hands. b Ellipses indicate not applicable.

that indicated in Table 10-2.84,85 Therefore, before any of these 3 findings can be recommended, further supportive evidence is necessary. There are several reasons why some findings are not as helpful diagnostically as traditionally thought. Thenar atrophy is probably not useful because it occurs only in long-standing or neglected cases of CTS and can also result from lower cervical radiculopathies or polyneuropathies. Tinel described his sign for following the course of regenerating nerve in patients after blunt traumatic nerve injury.30,76,87 The idea that patients with CTS would also have a stub of continually regenerating nerve at the distal wrist crease seems unlikely, limiting the diagnostic utility of this particular test. Our analysis shows that hypalgesia in the median nerve distribution is a more useful diagnostic finding than are abnormalities of other sensory modalities, in part because hypalgesia is a more specific 116

finding. It is not clear why this should be, although it may indicate that the threshold for abnormal results when testing sensation for vibration, 2-point discrimination, and monofilaments is set too low (eg, in one study, 20% of asymptomatic hands also displayed abnormal monofilament results76). In our analysis, only results for the Tinel sign were heterogeneous. The heterogeneity is not explained by differences in the electrodiagnostic parameters used as criterion standards in the individual studies, variations in examination technique (ie, whether the clinician tapped over the median nerve using the index finger or a reflex hammer), differences in prevalence of CTS in each of the studies (mean prevalence was 57%), differences in the age and sex composition (mean age was 50 years; 77% were women), or by an apparent workup bias. Excluding the 2 studies that account for the heterogeneity62,64 does not change the

CHAPTER 10 summary measure in any meaningful way, and therefore, these studies are included in our analysis.

THE BOTTOM LINE When evaluating patients with hand dysesthesias, the findings most helpful in predicting the electrodiagnosis of CTS are hand symptom diagrams, hypalgesia, and weak thumb abduction strength testing. The square wrist sign, flick sign, and closed fist sign also show promise but require validation by other investigators. Many traditional findings, including Phalen and Tinel signs, have limited ability to predict the electrodiagnosis of CTS. The main limitation of the existing literature is the lack of an ideal criterion standard, which complicates all clinical research in the field of CTS. It is also important that these data are derived from symptomatic patients presenting to a surgeon, physical therapist, or an electrodiagnostic laboratory. There are no data addressing the value of physical diagnosis in patients presenting to a primary care physician with symptoms suggestive of CTS. Our analysis, therefore, is most applicable to patients with severe enough symptoms to warrant such a referral. Returning to the case presented at the beginning of the article, the findings of a classic hand diagram and thumb abduction weakness support the diagnosis of CTS. The findings of a normal thenar eminence, a positive Tinel sign, and a negative Phalen sign do not contribute significant diagnostic information. The patient’s clinician believed that she probably had CTS and chose to manage her symptoms by splinting her wrists and recommending anti-inflammatory medications. If the patient’s symptoms fail to improve, nerve conduction testing, additional empiric therapeutic modalities (eg, corticosteroid injections), or referral for surgical assessment should be considered. Author Affiliations at the Time of the Original Publication

University of Washington Health Sciences Center, Seattle (Dr D’Arcy); and University of Washington, Seattle–Puget Sound Veterans Affairs Health Care System (Dr McGee), Seattle. Acknowledgments

We thank Jaya Rao, MD, MHS, and Richard W. Tim, MD, who reviewed this article and provided many helpful comments.

REFERENCES 1. Lum PB, Kanaklamedala R. Conduction of the palmar cutaneous branch of the median nerve. Arch Phys Med Rehabil. 1986;67(11): 805-806. 2. Tanaka S, Wild D, Seligman P, et al. The US prevalence of self-reported carpal tunnel syndrome: 1988 national health interview survey data. Am J Public Health. 1994;84(11):1846-1848. 3. Stevens JC, Sun S, Beard CM, O’Fallon WM, Kurland L. Carpal tunnel syndrome in Rochester, Minnesota, 1961-1980. Neurology. 1988;38(1): 134-138. 4. Atroshi I, Gummesson C, Johnsson R, et al. Prevalence of carpal tunnel syndrome in a general population. JAMA. 1999;282(2):153-158. 5. Green DP. Diagnostic and therapeutic value of carpal tunnel injection. J Hand Surg [Am]. 1984;9(6):850-854.

Carpal Tunnel

6. Gelberman RH, Aronson D, Weisman MH. Carpal tunnel syndrome: results of a prospective trial of steroid injection and splinting. J Bone Joint Surg Am. 1980;62(7):1181-1184. 7. Weiss AP, Sachar K, Gendreau M. Conservative management of carpal tunnel syndrome: a reexamination of steroid injection and splinting. J Hand Surg [Am]. 1994;19(3):410-415. 8. Dammers JWHH, Veering MM, Vermeulen M. Injection with methylprednisolone proximal to the carpal tunnel: randomized double blind trial. BMJ. 1999;319(7214):884-886. 9. Gainer JV Jr, Nugent GR. Carpal tunnel syndrome: report of 430 operations. South Med J. 1977;70(3):325-328. 10. Cseuz KA, Thomas JE, Lambert EH, Love JG, Lipscomb PR. Long-term results of operation for carpal tunnel syndrome. Mayo Clin Proc. 1966;41(4):232-241. 11. Bande S, De Smet L, Fabry G. The results of carpal tunnel release: open versus endoscopic technique. J Hand Surg [Br]. 1994;19(1):1417. 12. Tountas CP, Macdonald CJ, Meyerhoff JD, Bihrle DM. Carpal tunnel syndrome: a review of 507 patients. Minn Med. 1983;66(8):479-482. 13. Mühlau G, Both R, Kunath H. Carpal tunnel syndrome—course and prognosis. J Neurol. 1984;231(2):83-86. 14. Kendall D. Aetiology, diagnosis, and treatment of paraesthesiae in the hands. BMJ. 1960;2(5213):1633-1640. 15. Phalen GS. The carpal-tunnel syndrome: seventeen years’ experience in diagnosis and treatment of six hundred fifty-four hands. J Bone Joint Surg Am. 1966;48(2):211-228. 16. Doyle JR, Carrol RE. The carpal tunnel syndrome: a review of 100 patients treated surgically. Calif Med. 1968;108(4):263-267. 17. Brown RA, Gelberman RH, Seiler JG 3rd, et al. Carpal tunnel release: a prospective, randomized assessment of open and endoscopic methods. J Bone Joint Surg Am. 1993;75(9):1265-1275. 18. Boeckstyns MEH, Sorensen AI. Does endoscopic carpal tunnel release have a higher rate of complications than open carpal tunnel release? an analysis of published series. J Hand Surg [Br]. 1999;24:9-15. 19. Stevens JC, Beard CM, O’Fallon WM, Kurland L. Conditions associated with carpal tunnel syndrome. Mayo Clin Proc. 1992;67(6):541-548. 20. Nakamichi K, Tachibana S. Histology of the transverse carpal ligament and flexor tenosynovium in idiopathic carpal tunnel syndrome. J Hand Surg [Am]. 1998;23(6):1015-1024. 21. Kerr CD, Sybert DR, Albarracin NS. An analysis of the flexor synovium in idiopathic carpal tunnel syndrome: report of 625 cases. J Hand Surg [Am]. 1992;17(6):1028-1030. 22. Gelberman RH, Hergenroeder PT, Hargens AR, Lundborg GN, Akeson WH. The carpal tunnel syndrome: a study of carpal canal pressures. J Bone Joint Surg Am. 1981;63(3):380-383. 23. Gelberman RH, Szabo RM, Williamson RV, Dimick MP. Sensibility testing in peripheral nerve compression syndromes: an experimental study in humans. J Bone Joint Surg Am. 1983;65(5):632-638. 24. Lundborg G, Gelberman RH, Minteer-Convery M, Lee YF, Hargens AR. Median nerve compression in the carpal tunnel—functional response to experimentally induced controlled pressure. J Hand Surg [Am]. 1982; 7(3):252-259. 25. Gelberman RH, Rydevik BL, Pess GM, Szabo RM, Lundborg G. Carpal tunnel syndrome: a scientific basis for clinical care. Orthop Clin North Am. 1988;19(1):115-124. 26. Phalen GS. The birth of a syndrome, or carpal tunnel revisited. J Hand Surg [Am]. 1981;6(2):109-110. 27. Radecki P. A gender specific wrist ratio and the likelihood of a median nerve abnormality at the carpal tunnel. Am J Phys Med Rehabil. 1994; 73(3):157-162. 28. De Smet L, Steenwerckx A, Van Den Bogaert G, Cnudde P, Fabry G. Value of clinical provocative tests in carpal tunnel syndrome. Acta Orthop Belg. 1995;61(3):177-182. 29. Pryse-Phillips WE. Validation of a diagnostic sign in carpal tunnel syndrome. J Neurol Neurosurg Psychiatry. 1984;47(8):870-872. 30. Kuhlman KA, Hennessey WJ. Sensitivity and specificity of carpal tunnel syndrome signs. Am J Phys Med Rehabil. 1997;76(6):451-457. 31. Buch-Jaeger N, Foucher G. Correlation of clinical signs with nerve conduction tests in the diagnosis of carpal tunnel syndrome. J Hand Surg [Br]. 1994;19(6):720-724. 32. Katz JN, Larson MG, Sabra A, et al. Carpal tunnel syndrome: diagnostic utility of history and physical examination findings. Ann Intern Med. 1990;112(5):321-327.

117

CHAPTER 10

The Rational Clinical Examination

33. Gerr F, Letz R, Harris-Abbott D, Hopkins LC. Sensitivity and specificity of vibrometry for detection of carpal tunnel syndrome. J Occup Environ Med. 1995;37(9):1108-1115. 34. Concannon MJ, Gainor B, Petroski GJ, Puckett CL. The predictive value of electrodiagnostic studies in carpal tunnel syndrome. Plast Reconstr Surg. 1997;100(6):1452-1458. 35. American Academy of Neurology, American Association of Electrodiagnostic Medicine, American Academy of Physical Medicine and Rehabilitation. Practice parameter for electrodiagnostic studies in carpal tunnel syndrome (summary statement). Neurology. 1993;43(11):24042405. 36. Quality Standards Subcommittee of the American Academy of Neurology. Practice parameter for carpal tunnel syndrome (summary statement). Neurology. 1993;43(11):2406-2409. 37. Jablecki CK, Andary MT, So YT, Wilkins DE, Williams FH. Literature review of the usefulness of nerve conduction studies and electromyography for the evaluation of patients with carpal tunnel syndrome. Muscle Nerve. 1993;16(12):1392-1414. 38. Kimura J. The carpal tunnel syndrome: localization of conduction abnormalities within the distal segment of the median nerve. Brain. 1979;102(3):619-635. 39. Nathan P, Meadow KD, Doyle LS. Sensory segmental latency values of the median nerve for a population of normal individuals. Arch Phys Med Rehabil. 1988;69(7):499-501. 40. Jackson DH, Clifford JC. Electrodiagnosis of mild carpal tunnel syndrome. Arch Phys Med Rehabil. 1989;70:199-204. 41. Dawson DM, Hallett M, Wilbourn AJ. Carpal tunnel syndrome. In: Dawson DM, Hallett M, Wilbourn AJ, eds. Entrapment Neuropathies. 3rd ed. Philadelphia, PA: Lippincott-Raven Publishers; 1999:20-94. 42. Gilliat RW, Sears TA. Sensory nerve action potentials in patients with peripheral nerve lesions. J Neurol Neurosurg Psychiatry. 1958;21(2):109-118. 43. Robinson LR, Temkin NR, Fujimoto WY, Stolov WC. Effect of statistical methodology on normal limits in nerve conduction studies. Muscle Nerve. 1991;14(11):1084-1090. 44. Goodgold J. A statistical problem in diagnosis of carpal tunnel disease. Muscle Nerve. 1994;17(12):1490-1491. 45. Ferry S, Silman AJ, Pritchard T, Keenan J, Croft P. The association between different patterns of hand symptoms and objective evidence of median nerve compression. Arthritis Rheum. 1998;41(4):720-724. 46. Thomas JE, Lambert EH, Cseuz KA. Electrodiagnostic aspects of the carpal tunnel syndrome. Arch Neurol. 1967;16(6):635-641. 47. Redmond KD, Rivner MH. False-positive electrodiagnostic tests in carpal tunnel syndrome. Muscle Nerve. 1988;11(5):511-518. 48. Grundberg AB. Carpal tunnel decompression in spite of normal electromyography. J Hand Surg [Am]. 1983;8(3):348-349. 49. Phalen GS. The carpal tunnel syndrome: clinical evaluation of 598 hands. Clin Orthop Relat Res. 1972;83:29-40. 50. Mainous III AG, Nelson KR. How often are preoperative electrodiagnostic studies obtained for carpal tunnel syndrome in a Medicaid population? Muscle Nerve. 1996;19(2):256-257. 51. Harris CM, Tanner E, Goldstein MN, Pettee DS. The surgical treatment of the carpal-tunnel syndrome correlated with preoperative nerve-conduction studies. J Bone Joint Surg Am. 1979;61(1):93-98. 52. Pätiälä H, Rokkanen P, Kruuna O, et al. Carpal tunnel syndrome: anatomical and clinical investigation. Arch Orthop Trauma Surg. 1985;104(2):69-73. 53. Kaufman MA. Differential diagnosis and pitfalls in electrodiagnostic studies and special tests for diagnosing compressive neuropathies. Orthop Clin North Am. 1996;27(2):245-252. 54. Spinner RJ, Bachman JW, Amadio PC. The many faces of carpal tunnel syndrome. Mayo Clin Proc. 1989;64(7):829-836. 55. Haig AJ, Tzeng HM, LeBreck D. The value of electrodiagnostic consultation for patients with upper extremity nerve complaints: a prospective comparison with the history and physical examination. Arch Phys Med Rehabil. 1999;80(10):1273-1281. 56. Bessette L, Keller RB, Lew RH, et al. Prognostic value of a hand symptom diagram in surgery for carpal tunnel syndrome. J Rheumatol. 1997;24 (4):726-734. 57. Rosenbaum RB. The role of imaging in the diagnosis of carpal tunnel syndrome. Invest Radiol. 1993;28(11):1059-1062. 58. Winn FJ Jr, Habes DJ. Carpal tunnel area as a risk factor for carpal tunnel syndrome. Muscle Nerve. 1990;13(3):254-258. 59. Cantatore FP, Dell’Accio F, Lapadula G. Carpal tunnel syndrome: a review. Clin Rheumatol. 1997;16(6):596-603.

118

60. Seyfert S, Boegner F, Hamm B, Kleindienst A, Klatt C. The value of magnetic resonance imaging in carpal tunnel syndrome. J Neurol. 1994;242 (1):41-46. 61. Lee D, van Holsbeeck MT, Janevski PK, et al. Diagnosis of carpal tunnel syndrome: ultrasound versus electromyography. Radiol Clin North Am. 1999;37(4):859-872. 62. Gupta SK, Benstead TJ. Symptoms experienced by patients with carpal tunnel syndrome. Can J Neurol Sci. 1997;24(4):338-342. 63. Katz JN, Stirrat C, Larson MG, et al. A self-administered hand symptom diagram in the diagnosis and epidemiologic study of carpal tunnel syndrome. J Rheumatol. 1990;17(11):1495-1498. 64. Golding DN, Rose DM, Selvarajah K. Clinical tests for carpal tunnel syndrome: an evaluation. Br J Rheumatol. 1986;25(4):388-390. 65. Heller L, Ring H, Costeff H, Solzi P. Evaluation of Tinel and Phalen signs in the diagnosis of the carpal tunnel syndrome. Eur Neurol. 1986;25(1): 40-42. 66. Burke DT, Burke MA, Bell R, et al. Subjective swelling: a new sign for carpal tunnel syndrome. Am J Phys Med Rehabil. 1999;78(6):504-508. 67. Yii NW, Elliot D. A study of the dynamic relationship of the lumbrical muscles and the carpal tunnel. J Hand Surg [Br]. 1994;19(4):439-443. 68. González del Pino J, Delgado-Martínez AD, González González I, Lovic A. Value of the carpal compression test in the diagnosis of carpal tunnel syndrome. J Hand Surg [Br]. 1997;22(1):38-41. 69. Bowles AP Jr, Asher SW, Pickett JD. Use of Tinel’s sign in carpal tunnel syndrome. Ann Neurol. 1983;13(6):689-690. 70. Durkan JA. A new diagnostic test for carpal tunnel syndrome. J Bone Joint Surg Am. 1991;73(4):535-538. 71. Seror P. Tinel’s sign in the diagnosis of carpal tunnel syndrome. J Hand Surg [Br]. 1987;12(3):364-365. 72. Seror P. Phalen’s test in the diagnosis of carpal tunnel syndrome. J Hand Surg [Br]. 1988;13(4):383-385. 73. Tetro AM, Evanoff BA, Hollstien SB, Gelberman RH. A new provocative test for carpal tunnel syndrome: assessment of wrist flexion and nerve compression. J Bone Joint Surg Br. 1998;80(3):493-498. 74. Williams TM, Mackinnon SE, Novak CB, McCabe S, Kelly L. Verification of the pressure provocative test in carpal tunnel syndrome. Ann Plast Surg. 1992;29(1):8-11. 75. Stewart JD, Eisen E. Tinel’s sign and the carpal tunnel syndrome. BMJ. 1978;2(6145):1125-1126. 76. Gelmers HJ. The significance of Tinel’s sign in the diagnosis of carpal tunnel syndrome. Acta Neurochir (Wien). 1979;49(3-4):255-258. 77. Borg K, Lindblom U. Diagnostic value of quantitative sensory testing (QST) in carpal tunnel syndrome. Acta Neurol Scand. 1988;78(6):537-541. 78. Fertl E, Wöber C, Zeitlhofer J. The serial use of two provocative tests in the clinical diagnosis of carpal tunnel syndrome. Acta Neurol Scand. 1998;98(5):328-332. 79. Ghavanini MR, Haghighat M. Carpal tunnel syndrome: reappraisal of five clinical tests. Electromyogr Clin Neurophysiol. 1998;38(7):437-441. 80. Koris M, Gelberman RH, Duncan K, Boublick M, Smith B. Carpal tunnel syndrome: evaluation of a quantitative provocational diagnostic test. Clin Orthop Relat Res. 1990;(251):157-161. 81. Gilliatt RW, Wilson TG. A pneumatic-tourniquet test in the carpal tunnel syndrome. Lancet. 1953;265(6786):595-597. 82. Spindler H, Dellon A. Nerve conduction studies and sensibility testing in carpal tunnel syndrome. J Hand Surg [Am]. 1982;7(3):260-263. 83. Szabo RM, Slater RR, Farver TB, Stanton DB, Sharman WK. The value of diagnostic testing in carpal tunnel syndrome. J Hand Surg [Am]. 1999;24(4):704-714. 84. Roquer J, Herraiz J. Validity of flick sign in CTS diagnosis. Acta Neurol Scand. 1988;78(4):351. 85. Krendell DA, Jöbsis M, Gaskell PC Jr, Sanders DB. The flick sign in carpal tunnel syndrome. J Neurol Neurosurg Psychiatry. 1986;49(2):220221. 86. de Krom MC, Knipschild PG, Kester ADM, Spaans F. Efficacy of provocative tests for diagnosis of carpal tunnel syndrome. Lancet. 1990;335 (8686):393-395. 87. Kuschner SH, Ebramazadeh E, Johnson D, Brien WW, Sherman R. Tinel’s sign and Phalen’s test in carpal tunnel syndrome. Orthopedics. 1992;15(11):1297-1302. 88. Homan MM, Franzblau A, Werner RA, et al. Agreement between symptom surveys, physical examination procedures and electrodiagnostic findings for the carpal tunnel syndrome. Scand J Work Environ Health. 1999;25(2):115-124.

CHAPTER 10 89. Mossman SS, Blau JN. Tinel’s sign and the carpal tunnel syndrome. Br Med J (Clin Res Ed). 1987;294(6573):680. 90. Gellman H, Gelberman RH, Tan AM, Botte MJ. Carpal tunnel syndrome. J Bone Joint Surg Am. 1986;68(5):735-737. 91. Novak CB, Mackinnon SE, Brownlee R, Kelly L. Provocative sensory testing in carpal tunnel syndrome. J Hand Surg [Br]. 1994;19(5):817820. 92. Stevens JC, Smith BE, Weaver AL, et al. Symptoms of 100 patients with electromyographically verified carpal tunnel syndrome. Muscle Nerve. 1999;22(10):1448-1456.

Carpal Tunnel

93. Simel DL, Samsa GP, Matchar DB. Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991; 44(8):763-770. 94. DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7(3):177-188. 95. Hasselbland V, Hedges LV. Meta-analysis of screening and diagnostic tests. Psychol Bull. 1995;117(1):167-178. 96. Marx RG, Hudak PL, Bombardier C, et al. The reliability of physical examination for carpal tunnel syndrome. J Hand Surg [Br]. 1998;23(4): 499-502.

119

This page intentionally left blank

U P D A T E : Carpal Tunnel

10

Prepared by David L. Simel, MD, MHS Reviewed by Richard Bedlack, MD

CLINICAL SCENARIO Your 50-year-old secretary complains to you that she cannot complete your clinic notes on the computer without her hands tingling, especially her thumb and second and third fingers. Her symptoms are there even when she is not typing. In fact, she says that she has more problems at home because discomfort in her hands awakens her at night. She has had diabetes for 6 years. You purchase a variety of office supply products that might help her type and then wait to see whether her symptoms resolve. A week later, she still has problems, although a cushion she ordered for her wrists has not arrived. You check for Tinel sign (which she has), and when you flex her wrists, it reproduces her symptoms. You suggest that she consult her primary care physician, and she asks you what to expect. You suggest that her physician assess her diabetes to see whether she have might have a neuropathy, check neck radiographs to ensure there is no evidence of cervical degenerative changes, and review thyroid function tests, nerve conduction tests, and a magnetic resonance image (MRI) of the wrists. Have you requested all the necessary tests, or did you suggest too many?

UPDATED SUMMARY ON CARPAL TUNNEL SYNDROME Original Review D’Arcy C, McGee S. Does this patient have carpal tunnel syndrome? JAMA. 2000;283(23):3110-3117.

Clinical Examination article, we were interested only in studies that assessed clinical findings in a population of patients with hand symptoms, that were an independent comparison with electrodiagnosis, and from which we could extract the data. The abstracts were reviewed to identify studies that might allow us to assess the sensitivity and specificity either of the findings judged helpful in the original review (eg, hand symptom diagram, hypalgesia, and thumb abduction strength testing) or for less commonly used maneuvers that required additional data (eg, squarewrist sign, flick sign, and closed-fist sign). We found 12 original articles for further review. A review of the reference lists identified 6 other articles that were obtained. For original articles, we retained those that studied at least 100 hands. We excluded articles that used normal persons without symptoms as a control population or that were retrospective studies, which is necessary because the usefulness of tests can be overstated when a population of patients for whom CTS would not be considered is included.1 Including “normal” control patients tends to overstate the specificity and makes it appear that a finding helps identify those with the disorder. For example, a Phalen sign has a positive likelihood ratio (LR+) of 2.9 when normal, asymptomatic patients are included. However, when only symptomatic patients for whom CTS would be considered are studied, the finding appeared useless in the same study, with an LR+ of 0.91.1 No systematic review of the clinical examination findings used the inclusion criteria we required. A systematic review of surgery for CTS evaluated the role of electrodiagnostic testing as a suitable reference standard for predicting a successful outcome.

UPDATED LITERATURE SEARCH Our literature search used the parent search for The Rational Clinical Examination series, which combined the subject heading carpal tunnel syndrome (CTS) with metaanalysis or receiver operating characteristic curve. The results were crossed with the text words “Phalen,” “Tinel,” “square wrist,” “thumb abduction,” “hypalgesia,” “closed fist,” “flick,” or “hand diagram” appearing in studies published in English from 1999 to 2004. The results yielded 141 titles and abstracts for review. As in the original Rational

NEW FINDINGS • People flick their hands when they have hand symptoms, whether or not they have CTS.2 • Clinical maneuvers designed to induce or exacerbate the patient’s symptoms cause them discomfort, but do nothing to alter the likelihood for or against CTS.3–5 • Additional evidence confirms the uselessness of Tinel or Phalen signs.2,6 121

CHAPTER 10

Update

• Combining symptoms2 and signs7 does not appear to improve accuracy. • Clinicians should focus further diagnostic efforts on patients with symptoms in the median nerve distribution. These symptomatic patients are the only patients who will meet the reference standard criteria of combined hand diagram results and electrodiagnosis.

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION Additional data confirm the lack of utility for Tinel or Phalen signs and provocation tests. New summary estimates are provided for these findings. No studies were found that were missed in the original publication. New data help us come up with prior probability estimates for CTS. When screened by a questionnaire, about 10% of patients in the community claim numbness or tingling in the radial fingers (median nerve distribution) in at least 1 of their hands.8,9 About 70% of patients with numbness or tingling in a median nerve distribution will complete hand diagrams that suggest “classic or probable” CTS.2 Thus, among all adults, the prior probability of hand symptoms compatible with CTS is 7% (ie, 0.10 × 0.70). Because the diagnosis of CTS is considered only when the patient has hand symptoms, we can use the value of 7% as a starting point for our prior probability of CTS. This makes sense because the classic/probable distribution on the hand diagram is part of our pragmatic reference standard for CTS. These estimates from a population sample are supported by a large clinical sample of patients referred for electrodiagnosis; among 8223 electrodiagnostic studies in patients evaluated for CTS,7 the distribution of positive electrodiagnostic studies is the following: • First, second, and third finger symptoms: 26% positive • All fingers (1-5): 17%

CHANGES IN THE REFERENCE STANDARD The original publication in The Rational Clinical Examination series focused on patients with CTS symptoms who had their disease status confirmed by electrical studies. A letter to the editor highlighted the dilemma in making this diagnosis,

Table 10-3 Carpal Tunnel Syndrome (CTS) Using the Paired Hand Diagram and Electrodiagnostic Results as the Reference Standard Symptom Classic/probable Possible Classic/probable Possible Unlikely Unlikely

122

Electrodiagnosis

Ordinal Rank in Terms of Likelihood of CTS

Abnormal Abnormal Negative Negative Abnormal Negative

1 (Most likely) 2 3 4 5 6 (Least likely)

with the author’s suggestion that we should have titled the article “Does This Patient Have Abnormal Median Conduction?”10 Some researchers have advocated MRI to identify affected patients. A systematic review of MRI revealed that much-higher-quality evidence must be generated before MRI can be accepted as a screening test, but it seems unlikely that it will ever suffice as a reference standard.11 The use of electrodiagnosis for CTS is not perfect. The explanations for the fallibility of electrodiagnosis as “the” reference standard are as follows: some patients have clinically significant nerve compression with normal electrodiagnosis study results, the use of population means and standard deviations to define normality ensures that 2.5% of the population will have CTS (ie, the area beyond 2 SDs of 1 tail in the normal distribution curve for median nerve conduction velocity), and studies use various cut points for normality on median nerve testing.12 A group of experts in carpal tunnel epidemiology, clinical care, and outcome assessment used a nominal group process method to develop case definitions suitable for epidemiologic research.13 Although the authors state that their criteria were not meant for actual clinical practice, we used these criteria in the original review in The Rational Clinical Examination article, and they reflect the combination of symptoms and electrodiagnosis that most clinicians use to establish the diagnosis (Table 10-3). The symptoms refer to the Katz hand diagram as shown in Figure 10-3 of the original Rational Clinical Examination article. A systematic review by a panel of neurology experts identified 497 articles published from 1990 to 2000 on CTS diagnosis.14 According to formal criteria that included (among others) prospective study design and that all patients must have had a clinical diagnosis of CTS performed independently of electrodiagnosis, they retained 25 articles for review. Their meta-analysis found a pooled sensitivity of 0.85 and a specificity of 0.98 for sensory or mixed median nerve conduction to confirm the clinical diagnosis. At face value, this seems reassuring. However, the group noted the problems with selection bias and observer bias in extant studies of CTS and electrodiagnosis. They proposed clinical diagnostic criteria for future CTS research that give important insight into the symptoms that primary care providers should evaluate. As in the Rempel et al13 report, no particular physical examination findings are required to establish the clinical diagnosis14 (Table 10-4). The combination of clinical diagnosis and electrodiagnosis serves as both a suitable epidemiologic standard and pragmatic clinical reference standard for primary care clinicians. However, it is clear that some patients with classic symptoms but normal electrodiagnosis can improve with treatment of CTS. Jordan et al15 performed a systematic review of surgical therapy for CTS, specifically for assessing whether the results of electrodiagnostic testing predicted treatment response. They found the results not only of generally poor quality but also showing no differences in surgical outcomes for patients with symptoms and positive electrodiag-

CHAPTER 10 nosis vs symptoms and normal electrodiagnostic study results. Of the 4 studies they included with relative risk data, the confidence interval included 1 for the relative risk, favoring good outcomes for those with a positive electrodiagnosis vs those with a normal electrodiagnosis. Three recent Cochrane reviews of CTS treatment found a few studies that did not require electrodiagnosis, but none included an analysis of whether patients diagnosed with symptoms alone had a different response compared with those with symptoms plus abnormal electrodiagnostic test results.16-18

RESULTS OF LITERATURE REVIEW See Table 10-5.

EVIDENCE FROM GUIDELINES No US or Canadian guidelines exist for routine screening for CTS.

CLINICAL SCENARIO—RESOLUTION The diagnosis of CTS seems reasonably certain, given that your secretary has the appropriate symptoms in the appropriate distribution (median nerve). You did not need to do the Tinel sign or make her fingers tingle with a provocation test. The suggestion that she be evaluated for diabetic neuropathy is important. Neck radiographs do not seem indicated unless there are some other symptoms to suggest a cervical problem. A systematic review of routine testing for diabetes, thyroid disease, or rheumatoid arthritis in patients with CTS showed that this practice infrequently picks up new diagnoses and is not necessary.20 An electrodiagnostic test result, if positive, would mean that she meets the research criteria for CTS. MRI does not have an established role in diagnostic assessment for CTS. The remaining question is, should you have suggested a nerve conduction study? A nerve conduction study might be indicated as part of an assessment for a systemic neuropathy. Her carpal tunnel symptoms, together with a positive electrodiagnostic test, would fulfill the accepted reference standard for research studies. However, some patients with positive symptoms have normal nerve conduction study results. It might be appropriate to wait and see whether she responds to simple ergonomic measures, wrist splinting, and, perhaps, steroid injection for short-term relief before considering the nerve conduction test.

Carpal Tunnel

Table 10-4 Carpal Tunnel Syndrome (CTS) Diagnosis Using the Paired-Hand Diagram, Additional Symptoms, and Electrodiagnostic Results as the Reference Standard Inclusion Criteria for CTS for Research Studies on Electrodiagnosis 1. Symptom distribution as noted above (but the fourth finger is also allowed) 2. Symptoms must be present for 1 month, and there must be periods when the symptoms are intermittent 3. Symptoms must be aggravated by sleep, sustained hand or arm positioning, or repetitive motion of the hand 4. Symptoms must be relieved by change in hand position, shaking the hand, or use of a wrist splint 5. When pain is present, the pain in the wrist, hand, or finger must be worse than any pain in the elbow, shoulder, or neck Exclusion Criteria for CTS for Research Studies on Electrodiagnosis 1. Symptoms primarily in the fifth finger 2. Neck or shoulder pain preceding digital paresthesias 3. Numbness or paresthesias in the feet that preceded hand symptoms 4. Another disorder that explains symptoms that is more likely than CTS

Table 10-5 Likelihood Ratios for a Variety of Signs and Combinations of Findings for Carpal Tunnel Syndrome Finding (n = No. of Combined Studies) Tinel (n = 8)a Phalen (n = 10)a Provocation tests (n = 8)a Multivariate model with 11 clinical variables (n = 1)b Flick or Tinel (n = 1)2 Phalen or Tinel (n = 1)2 Flick (n = 1)2 Flick or Phalen (n = 1)2 Abnormal monofilament in digits 1, 2, or 3 (n = 1)c

Sensitivity Specificity LR+ (95% CI)

LR– (95% CI)

1.5 (1.2-2.1) 0.82 (0.72-0.93) 1.3 (1.2-1.5) 0.74 (0.62-0.87) 1.1 (0.96-1.3) 0.89 (0.79-1.0) 0.79

0.54

1.7 (1.6-1.8) 0.39 (0.35-0.43)

0.46

0.68

1.5 (0.94-2.4) 0.79 (0.60-1.0)

0.41

0.72

1.5 (0.89-2.5) 0.81 (0.63-1.0)

0.37 0.49

0.74 0.62

1.4 (0.80-2.4) 0.85 (0.68-1.1) 1.3 (0.86-2.0) 0.82 (0.61-1.1)

0.98

0.15

1.2 (1.0-1.3) 0.11 (0.02-0.64)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Updated summary adds data from Hansen et al2 and O’Gradaigh and Merry6 to data from the original Rational Clinical Examination article. b A multivariate model7 using 4 symptoms (nocturnal symptoms, morning symptoms, worsens on driving, and relieved by “waking and shaking”), symptom distribution, side of worst symptoms, handedness, duration of symptoms, response to splinting, and patient age was studied with a large “training” set and “test” set. The model had an accuracy of only 66% (area under the receiver operating characteristic curve). c The sensitivity from this study19 requires confirmation in additional studies.

123

CHAPTER 10

Update

CARPAL TUNNEL SYNDROME—MAKE THE DIAGNOSIS

PRIOR PROBABILITY

DETECTING THE LIKELIHOOD OF CARPAL Among all adults, the prior probability of hand symptoms TUNNEL SYNDROME compatible with CTS is 7%. See Table 10-6 for the likelihood ratios for Tinel and Phalan signs.

The examination should focus on the distribution of symptoms in a hand diagram, rather than provocative maneuvers to elicit symptoms.

Table 10-6 Likelihood Ratios for Tinel and Phalen Signs Finding

LR

The presence of Tinel or Phalen signs in a patient with symptoms

≈1

The absence of Tinel or Phalen signs in a patient with symptoms

≈1

Abbreviation: LR, likelihood ratio.

POPULATION FOR WHOM CARPAL TUNNEL SYNDROME SHOULD BE CONSIDERED

REFERENCE STANDARD TESTS The distribution of hand symptoms (from a hand diagram) plus abnormal nerve conduction studies is the reference standard for epidemiologic studies. For clinical care, patients can have CTS despite a normal nerve conduction result. Data are inconclusive about whether treatment outcomes differ according to the nerve conduction results.

• Patients with tingling or numbness in the hands or arms—always assess for median nerve involvement. • Special populations include those with occupational exposure of repetitive motion or pregnancy in the third trimester. • The rates of CTS might be slightly higher in those with diabetes mellitus, rheumatoid arthritis, or hypothyroidism. However, the data are not convincing, and routine screening for these diseases will infrequently lead to new diagnoses.

REFERENCES FOR THE UPDATE 1. Gerr F, Letz R. The sensitivity and specificity of tests for carpal tunnel syndrome vary with the comparison subjects. J Hand Surg. 1998;23(2):151-155. 2. Hansen PA, Mickelsen P, Robinson LR. Clinical utility of the flick maneuver in diagnosing carpal tunnel syndrome. Am J Phys Med Rehab. 2004;83 (5):363-367.a 3. Karl AI, Carney ML, Kaul MP. The lumbrical provocation test in subjects with median inclusive paresthesia. Arch Phys Med Rehabil. 2001;82(7):935937.a 4. Kaul MP, Pagel KJ, Wheatley MJ, Dryden JD. Carpal compression test and pressure provocative test in veterans with median-distribution paresthesias. Muscle Nerve. 2001;24(1):107-111.a 5. Kaul MP, Pagel KJ, Dryden JD. Lack of predictive power of the “tethered” median stress test in suspected carpal tunnel syndrome. Arch Phys Med Rehabil. 2000;81(7):348-350.a 6. O’Gradaigh D, Merry P. A diagnostic algorithm for carpal tunnel syndrome based on Bayes’ theorem. Rheumatology. 2000;39(9):1040-1041.a 7. Bland JDP. The value of the history in the diagnosis of carpal tunnel syndrome. J Hand Surgery [Br]. 2000;25(5):445-450.a 8. Atroshi I, Gummesson C, Johnsson R, Ornstein E. Diagnostic properties of nerve conduction tests in population-based carpal tunnel syndrome. BMC Musculoskelet Disord. 2003;4:9. 9. Reading I, Walker-Bone K, Palmer KT, Cooper C, Coggon D. Anatomic distribution of sensory symptoms in the hand and their relation to neck pain, psychosocial variables, and occupational activities. Am J Epidemiol. 2003;157(6):524-530. 10. LeBlond RF. Clinical diagnosis of carpal tunnel syndrome [letter]. JAMA. 2000;284(15):1925; author reply, 1925-1926. 11. Pasternack II, Malmivaara A, Tervahartiala P, Forsberg H, Vehmas T. Magnetic resonance imaging findings in respect to carpal tunnel syndrome. Scand J Work Environ Health. 2003;29(3):189-196.

124

12. Graham B, Regehr G, Wright JG. Delphi as a method to establish consensus for diagnostic criteria. J Clin Epidemiol. 2003;56(12):11501156. 13. Rempel D, Evanoff B, Amadio PC, et al. Consensus criteria for the classification of carpal tunnel syndrome in epidemiologic studies. Am J Public Health. 1998;88(10):1447-1451. 14. Jablecki CK, Andary MT, Gloeter MK, et al. Practice parameter: electrodiagnostic studies in carpal tunnel syndrome. Neurology. 2002;58(11):1589-1592. 15. Jordan R, Carter T, Cummins C. A systematic review of the utility of electrodiagnostic testing in carpal tunnel syndrome. Br J Gen Pract. 2002;52(481):670-672. 16. Verdugo RJ, Salinas RS, Castillo J, Cea JG. Surgical versus non-surgical treatment for carpal tunnel syndrome: Cochrane Neuromuscular Disease Group. Cochrane Database Syst Rev. 2002;(2):CD001552. doi: 10.1002/ 14651858.CD001552. 17. O’Connor D, Marshall S, Massy-Westropp N. Non-surgical treatment (other than steroid injection) for carpal tunnel syndrome. Cochrane Database Syst Rev. 2003;(1):CD003219. doi: 10.1002/14651858.CD003219. 18. Scholten RJPM, Mink van der Molen A, Uitdehaag BMJ, Bouter LM, de Vet HCW. Surgical treatment options for carpal tunnel syndrome. Cochrane Database Syst Rev. 2002;(4):CD003905. doi: 10.1002/14651858. CD003905.pub3. 19. Pagel KJ, Kaul MP, Dryden JD. Lack of utility of Semmes-Weinstein monofilament testing in suspected carpal tunnel syndrome. Am J Phys Med Rehab. 2002;81(8):597-600.a 20. van Dijk MAJ, Reitsma JB, Fischer JC, Sanders GTB. Indications for requesting laboratory tests for concurrent diseases in patients with carpal tunnel syndrome: a systematic review. Clin Chem. 2003;49(9):1437-1444. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

EVIDENCE TO SUPPORT THE UPDATE: Carpal Tunnel

10

MAIN OUTCOME MEASURE TITLE The Value of the History in the Diagnosis of Carpal Tunnel Syndrome.

A multivariate model using electrodiagnosis as the reference standard.

AUTHOR Bland JDP. CITATION J Hand Surgery [Br]. 2000;25(5):445-450.

MAIN RESULTS

QUESTION Do any patient symptoms predict abnormality on electrodiagnostic studies?

The data were split into a training set (n = 5000) and a test set (n = 3223). A logistic model for patient symptoms was created using the data for 5000 patients. The model contained 4 symptoms (nocturnal symptoms, morning symptoms, worse on driving, and relieved by “waking and shaking”), symptom distribution, side of worst symptoms, handedness, duration of symptoms, response to splinting, and patient age as continuous variables. The only variables with an odds ratio (OR) greater than 2 were the presence of symptoms in the thumb and the second and third fingers (OR, 2.5; 95% confidence interval [CI], 2.13.0) or symptoms in the third and fourth fingers (OR, 2.4; 95% CI, 1.9-3.1). The only variable that had an OR less than 0.5 was the presence of symptoms in the fourth and fifth fingers (OR, 0.42; 95% CI, 0.29-0.62). As a continuous variable, age also had an important impact on the probability of carpal tunnel syndrome (CTS). For example, with a typical symptom pattern, without regard to any other symptom, a righthanded patient with right-handed symptoms has a predicted probability of 29% at age 30 years vs 66% at age 50 years.

DESIGN Data collected prospectively during an 8-year period. SETTING Single center in the United Kingdom that performs all the electrodiagnostic studies for the local area. PATIENTS Referred (n = 8223) for electrodiagnosis among a broad array of patients being considered for carpal tunnel surgery or for diagnostic evaluation.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD A questionnaire was given to all patients before the electrodiagnostic study. A single examiner performed all studies. It is likely that the examiner reviewed the questionnaire before the nerve conduction studies. See Table 10-7.

Table 10-7 Likelihood Ratios of the Tinel, Flick, and Phalen Signs for Carpal Tunnel Syndrome

CONCLUSIONS

Test Tinel Flick Phalen Phalen or Tinel Flick or Tinel Flick or Phalen

Sensitivity Specificity

LEVEL OF EVIDENCE Level 1.

LR+ (95% CI)

LR– (95% CI)

STRENGTHS Very large patient population that captured all

patients referred for electrodiagnostic studies. It is likely that these patients reflect the array of patients who are referred in other community studies for the evaluation of CTS.

0.27 0.37 0.34 0.41

0.91 0.74 0.74 0.72

3.2 (1.2-8.6) 1.4 (0.80-2.4) 1.3 (0.74-2.3) 1.5 (0.89-2.5)

0.79 (0.68-0.92) 0.85 (0.68-1.1) 0.89 (0.71-1.1) 0.81 (0.63-1.0)

0.46 0.49

0.68 0.62

1.5 (0.94-2.4) 1.3 (0.86-2.0)

0.79 (0.60-1.0) 0.82 (0.61-1.1)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

LIMITATIONS The examiner would have known the results of the questionnaire (although the examiner would not have known the variables that would ultimately go in the logistic model). The results of the logistic model would be difficult to apply in general practice. However, understanding the role of the dis-

E10-1

CHAPTER 10

Evidence to Support the Update

tribution of symptoms in the digits is important and is integral to the current accepted reference standard of hand diagrams plus electrodiagnosis. Unfortunately, despite including 11 seemingly relevant clinical variables, the multivariate logistic model had a sensitivity of only 79% and a specificity of 54%. The area under the receiver operating characteristic (ROC) curve was only 0.66 (standard error of 0.01), reflecting an accuracy that seems too low for clinical use. Reviewed by David L. Simel, MD, MHS

TITLE Clinical Utility of the Flick Maneuver in Diagnosing Carpal Tunnel Syndrome. AUTHORS Hansen PA, Mickelsen P, Robinson LR. CITATION Am J Phys Med Rehab. 2004;83(5):363-367. QUESTION Is the flick sign better than the Phalen or Tinel sign in identifying patients with hand symptoms who will have abnormal electrodiagnostic tests? DESIGN Prospective, consecutive enrollment. SETTING Electrodiagnostic clinic. PATIENTS All patients (n = 142) had upper limb symptoms and were referred by their physicians for electrodiagnostic testing to establish the diagnosis. For all patients, carpal tunnel syndrome (CTS) was part of the differential diagnosis. When patients had bilateral symptoms, only the more severely affected hand was evaluated for the study.

CONCLUSIONS LEVEL OF EVIDENCE Level 1. STRENGTHS Prospective, consecutive enrollment among a

group of referred patients for whom CTS was part of their differential diagnosis. The examination was done before the electrodiagnostic test. LIMITATIONS Electrodiagnosis may not have been blinded to the clinical findings, but the reporting of nerve conduction studies based on quantitative time rather than subjective time may make this less of a problem. The authors sum up the results best: “people … [with hand symptoms] flick their hands” whether or not they have CTS. These data confirm the uselessness of the Phalen sign. Unfortunately, the combination of the flick or Tinel sign does not improve the diagnostic efficiency. The positive likelihood ratio for the Tinel was the highest, but the confidence interval is broad (see Table 10-7).

Reviewed by David L. Simel, MD, MHS

TITLE The Lumbrical Provocation Test in Subjects With Median Inclusive Paresthesia. AUTHORS Kaul AI, Carney ML, Kaul MP. CITATION Arch Phys Med Rehabil. 2001;82(7):935-937. TITLE Carpal Compression Test and Pressure Provocative Test in Veterans With Median-distribution Paresthesias. AUTHORS Kaul MP, Pagel KJ, Wheatley MJ, Dryden JD.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD

CITATION Muscle Nerve. 2001;24(1):107-111.

Standard assessment of the Phalen and Tinel signs. The flick sign was obtained by asking the patients how they relieved the discomfort in their hands and wrists when they were experiencing severe symptoms. Patients who demonstrated that they flick their hands (like shaking down a mercury thermometer) were considered “positive.” The criterion standard was standard electrodiagnostic testing, performed after the clinical evaluation. It is not clear whether the same examiner did the clinical examination and the electrodiagnostic testing. However, the electrodiagnostic testing was based on the quantitative output nerve latency.

TITLE Lack of Predictive Power of the “Tethered” Median Stress Test in Suspected Carpal Tunnel Syndrome. AUTHORS Kaul MP, Pagel KJ, Dryden JD. CITATION Arch Phys Med Rehabil. 2000;81(7):348-350. QUESTION Does a physical examination maneuver meant to provoke symptoms predict patients who will have abnormal electrodiagnostic testing? Each study in this summary reports a different maneuver. DESIGN Prospective, consecutive.

MAIN OUTCOME MEASURE Electrodiagnosis of CTS.

SETTING Electrodiagnostic laboratory of a Veterans Affairs medical center, Portland, Oregon.

MAIN RESULTS

PATIENTS In each study, patients had median nerve symptoms, no previous surgery for carpal tunnel syndrome, and no proximal neuropathy on the affected side.

One hundred forty-two patients were studied, of whom 95 had electrodiagnostic testing of CTS.

E10-2

CHAPTER 10

Carpal Tunnel

Table 10-8 Likelihood Ratios of Provocation Tests for Carpal Tunnel Syndrome Test (n) Pressure provocation (134) Carpal compression (135) Lumbrical (fist) provocation (96) “Tethered” median nerve stretch (112)

Abnormal Electrodiagnostic Study Result

Sensitivity

Specificity

LR+ (95% CI)

77 80 51 58

0.55 0.52 0.37 0.50

0.68 0.56 0.71 0.59

1.7 (1.1-2.7) 1.4 (0.94-2.1) 1.3 (0.73-2.3) 1.2 (0.8-1.9)

LR– (95% CI) 0.66 (0.49-0.90) 0.77 (0.56-1.0) 0.88 (0.66-1.2) 0.85 (0.59-1.2)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Positive test results induce or exacerbate the median nerve symptoms. The lumbrical provocation test is performed by having the patient hold a fist for 1 minute. (The lumbricales are the 4 small muscles of the palm of the hand that flex the proximal phalanx and extend the 2 distal phalanges of each finger.) The “tethered” median nerve test creates a stretch of the median nerve by the examiner’s passively hyperextending the wrist and distal interphalangeal joint of the index finger. The carpal compression test is performed by applying moderate pressure with both thumbs over the transverse carpal ligament. The pressure provocation test uses a 2.5-cm-wide pressure cuff applied to the patient’s wrist. The cuff is inflated to 50 mm Hg, and then direct pressure is applied to bring the sphygmomanometer reading to 150 mm Hg. The electrodiagnostic studies were performed immediately after the provocation tests. When the provocation test result was positive, the patient was allowed to have the symptoms return to baseline before the electrodiagnostic studies.

were based on a quantitative assessment. The results apply only to patients with median nerve symptoms. Even with the possibility that the provocation test affected the electrodiagnostic studies, this maneuver did not work to identify the patients with median nerve symptoms who would have an abnormal electrodiagnosis. As in all clinical diagnosis studies, it is important to recognize that the clinicians included only patients with median nerve syndromes, something that can be evaluated at the bedside and is part of the recommended hand diagram. The provacation tests seem relatively useless as both the summary positive and negative likelihood ratios approach 1. Clinicians should stop trying to reproduce a patient’s median nerve symptoms because the response should not affect clinical decisions. Reviewed by David L. Simel, MD, MHS

MAIN OUTCOME MEASURE Electrodiagnostic studies.

MAIN RESULTS See Table 10-8.

CONCLUSIONS LEVEL OF EVIDENCE Level 2. STRENGTHS All patients had median nerve symptoms. The

provocation tests were applied before the electrodiagnostic tests. An additional strength is that patients with neck pain were also included, as long as they also had median nerve symptoms. LIMITATIONS The electrodiagnostic testing was performed blinded to the “tethered” median nerve test. It is not clear whether the electrodiagnostic tests were performed independently in the other 2 studies. However, the protocol for the electrodiagnostic procedure is described well and the results

E10-3

CHAPTER 10

Evidence to Support the Update

TITLE A Diagnostic Algorithm for Carpal Tunnel Syndrome Based on Bayes’ Theorem.

trodiagnostic test result similar to those obtained from the initial phase of the study.

AUTHORS O’Gradaigh D, Merry P.

CONCLUSIONS

CITATION Rheumatology. 2000;39(9):1040-1041.

LEVEL OF EVIDENCE Level 3.

QUESTION Can the results of a hand diagram, Phalen test, and Tinel test be applied sequentially? DESIGN Two-phase study. An initial study to determine the sensitivity and specificity of the findings may have been a convenience sample (n = 105 patients). The second phase assessed the sensitivity and specificity prospectively, but it is not stated whether these were consecutive patients (n = 42). SETTING Rheumatology clinic in the United Kingdom. PATIENTS Patients were referred because of a suspicion of carpal tunnel syndrome (CTS).

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Patients completed a hand diagram. Patients with classic or probable patterns were considered to have a positive test result. Phalen and Tinel tests were done by a single examiner.

STRENGTHS Prospective assessment of sequentially con-

ducting the Tinel and Phalen tests for patients after a hand diagram test. LIMITATIONS We infer that these patients were referred to the rheumatologist for therapeutic injections, accounting for the high prevalence of disease. The enrollment was not consecutive patients. It is not clear whether the electrodiagnosis was done by the same person who performed the clinical examination. The prevalence of disease was much higher in this study than in many other studies. In a high-prevalence setting, the Phalen and Tinel tests will not demonstrate clinically important differences in the probability of disease. We infer that these patients are not representative of all patients with CTS symptoms. However, the data support the concept that the Phalen or Tinel test will not alter the information from a hand diagram in a clinically important fashion. The authors suggest that patients with a high probability of CTS could be offered treatment (injection therapy) without nerve conduction tests.

Reviewed by David L. Simel, MD, MHS

MAIN OUTCOME MEASURE Electrodiagnosis. TITLE Lack of Utility of Semmes-Weinstein Monofilament Testing in Suspected Carpal Tunnel Syndrome.

MAIN RESULTS In the first set of 105 patients, 75 had abnormal electrodiagnostic testing results. See Table 10-9. For patients with a positive hand diagram result, the probability of an abnormal electrodiagnostic test increased from 79% to 92% when both the Tinel and Phalen test results were positive. Only 6 patients with a negative hand diagram result had an abnormal electrodiagnostic test result. Because the prevalence of an abnormal electrodiagnosis test result was so high, the posterior probability with a negative hand diagram result was still 33%. The second prospective phase of the study obtained posterior probabilities for an abnormal elec-

Table 10-9 Likelihood Ratios of Tinel and Phalen Signs and the Hand Diagram for Carpal Tunnel Syndrome Test Tinel test Hand diagram Phalen test

Sensitivity Specificity LR+ (95% CI) 0.55 0.92 0.72

0.72 0.40 0.53

LR– (95% CI)

2.1 (1.2-4.0) 0.60 (0.44-0.88) 1.5 (1.2-2.2) 0.20 (0.08-0.50) 1.5 (1.1-2.4) 0.52 (0.32-0.88)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E10-4

AUTHORS Pagel KJ, Kaul MP, Dryden JD. CITATION Am J Phys Med Rehab. 2002;81(8):597-600. QUESTION Do 2 types of testing with a monofilament among patients who have median nerve symptoms identify those who will have abnormal electrodiagnostic test results? DESIGN Prospective, consecutive enrollment. SETTING Electrodiagnostic laboratory of a Veterans Affairs hospital, Portland, Oregon. PATIENTS All patients (n = 113) had paresthesias of the median nerve. Patients with a previous carpal tunnel release operation, stroke, paresthesias in the fourth and fifth fingers only, or neurologic disease were excluded.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Two types of monofilament testing were done on the pad of each digit so that the filament bowed for 1.5 seconds. If the

CHAPTER 10 monofilament was felt on at least 1 of 3 trials in each digital pad, the test result was considered normal. In the first protocol, the patient had an abnormal response if there was no sensation or a sensation only with an increased stimulus (>2.83 monofilament) in any of the radial 3 digits. In the second protocol, patients were considered to have an abnormal response only if abnormal findings in the third finger were associated with normal findings in the fifth finger. The examiners used a monofilament testing kit with various sizes of filaments. The reference test was a standard electrodiagnostic study, blinded to the monofilament results.

Carpal Tunnel

of patients. Why might the results be different (ie, better) than what was reported in the original Rational Clinical Examination article? The study we initially used assessed only the response in the index finger rather than all 3 digits of the median nerve and found a sensitivity of only 59%.1 Thus, requiring a normal response in all 3 digits would automatically improve the sensitivity. If the utility of a normal monofilament response can be validated, then this might be a useful test for identifying patients much less likely to have abnormal electrodiagnostic testing results. We would like to see this study repeated in a large population of patients with upper arm symptoms for whom CTS is considered.

MAIN OUTCOME MEASURE REFERENCE FOR THE EVIDENCE

Abnormal electrodiagnosis studies.

1. Buch-Jaeger N, Foucher G. Correlation of clinical signs with nerve conduction tests in the diagnosis of carpal tunnel syndrome. J Hand Surg [Br]. 1994;19(6):720-724.

MAIN RESULTS Of 113 patients, 60 (53%) had abnormal electrodiagnostic testing results. See Table 10-10.

Reviewed by David L. Simel, MD, MHS

CONCLUSIONS LEVEL OF EVIDENCE Level 1. STRENGTHS Evidence that the test (monofilament) and refer-

ence standard (electrodiagnosis) were applied independently. Clear guidelines on how to do the monofilament testing. LIMITATIONS There was some selection bias in that not only were the patients all referred to the electrodiagnostic laboratory, but they were also evaluated to confirm that they had symptoms in the median nerve distribution. However, this is the appropriate population for whom carpal tunnel syndrome [CTS] ought to be correctly considered. The authors conclude that the tests are worthless. Certainly, this appears true for the second method of monofilament testing (comparing the median nerve findings to the fifth finger). However, the ability of a normal response to monofilament testing in each of the first three digits decreases the likelihood of abnormal electrodiagnostic testing results in this population

Table 10-10 Likelihood Ratio of Monofilament Testing for Carpal Tunnel Syndrome Test Decreased threshold or absent sensation in terminal digit pads 1, 2, or 3 Decreased threshold in terminal digit pad 3 with normal terminal digit pad 5

Sensitivity Specificity

LR+ (95% CI)

LR– (95% CI)

0.98

0.15

1.2 (1.0-1.3)

0.11 (0.02-0.64)

0.13

0.88

1.2 (0.45-3.1)

0.98 (0.84-1.1)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

TITLE The Relationship Among Five Common Carpal Tunnel Syndrome Tests and the Severity of Carpal Tunnel Syndrome. AUTHORS Priganc VW, Henry SM. CITATION J Hand Ther. 2003;16(3):225-236. QUESTION Among patients with carpal tunnel syndrome, do the diagnostic tests separate patients with mild, moderate, or severe electrodiagnostic results? Are the test results reliable during a 2- to 7-day period? DESIGN Prospective. All tests were done before nerve conduction studies. The order of tests was randomized, except that the provocation tests were always done after the other maneuvers. The examiner waited 2 to 3 minutes between provocation tests for all the patients to return to baseline. Patients (n = 27) returned to the laboratory 2 to 7 days after the first test to assess reliability. SETTING Patients referred from 3 neurology clinics in one community (Burlington, Vermont) for nerve conduction studies. PATIENTS Patients scheduled for nerve conduction studies (n = 206) were contacted and invited to participate. Patients were excluded if they had systemic peripheral neuropathy, previous carpal tunnel release, proximal median nerve compression, or foot numbness not attributable to an orthopedic problem. Sixty-six patients (95 hands) were ultimately qualified for the study because the study reported only those with abnormal electrodiagnostic results. E10-5

CHAPTER 10

Evidence to Support the Update

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Phalen, Tinel, and carpal compression tests (examiners apply both of their thumbs to the patient’s transverse carpal ligament), and Katz hand diagram. All patients had a nerve conduction test, along with a carpal tunnel outcomes assessment test that had scales for symptom severity and functional status. The tests were applied without knowledge of the electrodiagnostic results.

MAIN OUTCOME MEASURES According to preestablished criteria, the nerve conduction quantitative results were classified into mild (55 hands), moderate (23 hands), or severe (17 hands) outcomes. Reliability was assessed during a 2- to 7-day follow-up period.

MAIN RESULTS The Katz hand diagram was the most reliable finding (Table 10-11). The authors reported that only the Phalen test showed an association with the nerve conduction severity (P < .05). Our reanalysis of the data shows minimal significance (P = .05). In a logistic model, the odds ratio is 2.6 (95% confidence interval, 0.98-6.9) and the accuracy of the model as displayed by the area under the receiver operating characteristic curve is only 0.50 (a measure of accuracy).

E10-6

Table 10-11 Reliability of Various Tests for Carpal Tunnel Syndrome κ (95% CI)

Test Katz hand diagram Carpal compression Phalen

0.95 (0.84-1.0) 0.63 (0.33-0.92) 0.58 (0.22-0.94)

Abbreviation: CI, confidence interval.

CONCLUSIONS LEVEL OF EVIDENCE Level 2. STRENGTHS A different type of study design to see whether

the tests correlate with the degree of abnormality, rather than just the presence of carpal tunnel syndrome. LIMITATIONS The results can be applied only to patients with known carpal tunnel syndrome. Thus, they are of limited value in the primary care clinic. The goal of identifying patients who will have abnormal nerve conduction results differs from the goal of using the physical examination results to identify those who will have mild, moderate, or severely abnormal electrodiagnostic results. These results suggest that the physical examination findings did not help much with categorizing the severity. The intrarater reliability for these findings is reassuring in that the results are similar during a 2- to 7-day period.

Reviewed by David L. Simel, MD, MHS

11

C H A P T E R

Does This Patient Have Abnormal Central Venous Pressure? Deborah J. Cook, MD, FRCPC, MSc (Epid) David L. Simel, MD, MHS

CLINICAL SCENARIO A 65-year-old woman has had dyspnea for 2 months. She has had to give up her hobby of hiking and is now short of breath after climbing even 1 flight of stairs. Her dyspnea is sometimes worse at night. She has no chest pain, cough, or sputum, and the result of systems review is otherwise negative. On physical examination, her blood pressure is 135/ 90 mm Hg, and she has a regular cardiac rhythm at 72/ min. You turn your attention to the jugular veins and next ask yourself, “Does this patient have abnormal central venous pressure (CVP)?”

WHY IS THIS QUESTION IMPORTANT? Evaluation of the jugular venous pulse provides important information about pressure and other hemodynamic events in the right atrium.1-3 The jugular venous pulse provides a useful estimate of CVP and thus the patient’s intravascular volume status. Inspection of the waveforms can assist the diagnosis of several tricuspid and pulmonic valvular abnormalities. Moreover, accurate assessment of CVP by physical examination may obviate the necessity for invasive hemodynamic monitoring. Accordingly, the clinical evaluation of jugular venous pressure (JVP) and waveforms is useful whenever intravascular volume status, ventricular function, valvular disease, or pericardial constriction is in question. Proficiency in this examination is especially important, given that it may be difficult, if not impossible, to identify venous pulsation in patients with low CVP,4 in patients receiving mechanical ventilation,4,5 in patients with short or fat necks, and in some patients who have conditions causing wide swings in CVP during the respiratory cycle (eg, during acute asthma).

ANATOMIC AND PHYSIOLOGIC ORIGINS OF THE JUGULAR VENOUS PRESSURE Because the jugular veins act as manometer tubes for the right atrium, they display changes in blood flow and pressure caused by right atrial filling, contraction, and emptying. In general, the jugular vein with the most distinct, undamped waveform is likely to most accurately reflect right atrial pressure. Because the right internal jugular vein is directly in line with the right atrium, thereby favoring an unimpeded transmission of atrial pulsations and pressure, it is the preferred site for examining the jugular venous pulse. Direct measurements of CVP according to the left jugular veins tend to be higher than those on the right, but the correlation between the 2 is high.6 The discrepancy may reflect the fact that both the innominate vein and the left internal jugular vein can be compressed by a variety of normal or abnormal structures. Copyright © 2009 by the American Medical Association. Click here for terms of use.

125

CHAPTER 11

The Rational Clinical Examination

Although the internal jugular vein lies deep to the sternocleidomastoid muscle and may not always be visible as a discrete structure, its pulsation usually is transmitted to the overlying skin. Normally, the CVP pulsation moves toward the heart during inspiration because of a sudden increase in venous return to the right side of the heart. The external jugular veins, although sometimes easier to see, may be constricted as they pass through the fascial planes of the neck and thus may not accurately reflect right atrial pressures. However, in one study, venous pressures measured in the external jugular vein accurately reflected right atrial pressures during anesthesia and with controlled or spontaneous ventilation.7 Positive-pressure ventilation caused regular, periodic changes to occur in venous return, which resulted in similar phasic changes in right atrial and external jugular pressures. The only significant difference was the greater right atrial pressure variation during mechanical ventilation, although the maximal venous pressures at the 2 sites were nearly identical.7

Figure 11-1 Venous Pulsation in the Neck Corresponds With the Electrocardiogram Simultaneous recording of an electrocardiogram (top tracing) and jugular venous pressure waves (lower tracing). The a wave reflects right atrial contraction just before the first heart sound and carotid pulse; atrial relaxation is reflected by the x descent; c wave reflects the bulging of the tricuspid valve into the right atrium during ventricular isovolumetric contraction; x1 descent reflects subsequent atrial relaxation; v wave reflects the closure of tricuspid valve and subsequent distention of the right atrium; and y descent reflects the right atrium emptying after the opening of the tricuspid valve.

Table 11-1 Abnormalities of the Venous Waveforms Waveform Absent a wave Flutter waves Prominent a waves Large a waves Cannon a waves Absent x descent Prominent x descent Large cv waves Slow y descent Rapid y descent Absent y descent

126

Cardiac Condition Atrial fibrillation, sinus tachycardia Atrial flutter First-degree atrioventricular block Tricuspid stenosis, right atrial myxoma, pulmonary hypertension, pulmonic stenosis Atrioventricular dissociation, ventricular tachycardia Tricuspid regurgitation Conditions causing enlarged a waves Tricuspid regurgitation, constrictive pericarditis Tricuspid stenosis, right atrial myxoma Constrictive pericarditis, severe right heart failure, tricuspid regurgitation, atrial septal defect Cardiac tamponade

Among critically ill patients, one group of investigators found jugular venous pulsations sufficiently obvious for examination only 20% of the time,8 whereas another group was able to estimate CVP in 84% of critically ill patients.4 In the former study, although external jugular pulsations were visible in all patients, clinicians’ estimates of venous pressure according to physical examination were within 2 cm of CVP determined by central venous catheter only 47% of the time. The evaluation of individual components of the venous pulse in health and disease lies outside the focus of this overview but can be summarized as follows.

ANALYSIS OF THE VENOUS WAVEFORM The normal JVP reflects phasic pressure changes in the right atrium and consists of 3 positive waves and 3 negative troughs (Figure 11-1). Although these pressure changes can be recorded with pressure monitors, they are not always appreciable on clinical examination of the jugular pulse. Auscultation of the heart or simultaneous palpation of the left carotid artery may aid the examiner in relating the pattern of venous pulsations to the cardiac cycle. Taken in sequence, right atrial contraction is reflected by the dominant positive a wave and occurs just before the first heart sound and carotid pulse. Atrial relaxation is reflected by the first negative trough, the x descent. The second positive wave is produced by the bulging of the tricuspid valve into the right atrium during ventricular isovolumetric contraction; this is called the c wave. Subsequent atrial relaxation creates the most dominant descent, the x1 descent. When the tricuspid valve closes, subsequent distention of the right atrium creates the v wave, which occurs just after the arterial pulse. Finally, after the opening of the tricuspid valve, the right atrium empties, resulting in the y descent. Various cardiac conditions are associated with waveform abnormalities. A few of the most common include the absence of a waves in atrial fibrillation, large cv waves in tricuspid regurgitation, the slow y descent of tricuspid stenosis, and the brisk y descent seen in constrictive pericarditis. Table 11-1 shows a summary of abnormal venous waveforms and the conditions in which they occur. Remember, it is not always possible to see each of these waves and descents.

HOW TO EXAMINE THE NECK VEINS The right internal jugular vein should be used to assess CVP for several reasons. It is in direct line with the right atrium, thereby favoring unimpeded transmission of atrial pulsations and pressure. Clinical assessment of CVP on the left may be marginally higher than that on the right. Finally, constricted or tortuous external jugular veins may introduce inaccuracy.

Positioning Proper positioning is crucial for examination of the neck veins. The patient’s head is supported to relax the neck muscles, and the trunk is inclined at an angle that brings the top of the column of

CHAPTER 11 blood in the internal jugular vein to a level above the clavicle but below the angle of the jaw; in normal subjects, this positioning is accomplished at 30 to 45 degrees above the horizontal. In patients with elevated venous pressure, it often is necessary to elevate the trunk beyond 45 degrees, and patients with severe venous congestion may have to stand up and inspire deeply to bring the meniscus down into view. In some cases, the level of venous pulsation will be seen behind the angle of the jaw or will appear to move the earlobes. If the pressure in the internal jugular vein is high, venous pulsations will be lost in the completely full vein, and the high venous pressure may be overlooked. Conversely, patients with low CVP may have to be positioned at 0 to 30 degrees. When CVP is low, the neck veins will be empty, and pulsations may not be visible even when the patient is horizontal. Tangential light often improves the detection of the venous pulse. When ambient light is insufficient for this purpose, a penlight, directed away from the examiner’s eyes, may be useful.

Central Venous Pressure

Table 11-2 Distinguishing the Carotid Arterial From Jugular Venous Pulsation Characteristic Waveform Positional change Respiratory variation Effect of palpation Abdominal pressure

Venous Pulse Diffuse biphasic Varies with position Height falls on inspiration Wave nonpalpable, pressure obliterates pulse, vein fills Displaces pulse upward

Carotid Pulse Single sharp No variation No variation Pulse palpable, not compressible Pulse unchanged

Difficulty in distinguishing between the carotid arterial pulse and jugular venous pulse may be overcome by noting several differentiating features (Table 11-2).9 First, the venous pulsation is diffuse, usually has 2 waves, and the upward deflection is slow. In contrast, the carotid pulse is a fast, well-localized, single, outward deflection. Second, venous pulsations (unless the venous pressure is extremely high) diminish toward the clavicle or disappear beneath it as the patient sits up or stands and advance toward the angle of the jaw as the patient reclines; carotid pulses generally do not vary with position. Third, in the absence of intrathoracic disease, the top of the venous wave descends during inspiration (because of increasingly negative intrathoracic pressure). However, the visible carotid pulse does not vary with the respiratory cycle, except during pulsus paradoxus. Fourth, the JVP is nonpalpable, and gentle pressure applied by the examiner’s finger to the root of the neck above the clavicle will obstruct the vein, fill its distal segment, and obliterate the venous pulse. However, the carotid pulse is almost always palpable, usually striking the examining finger with considerable force. Finally, sustained pressure on the abdomen (the abdominojugular reflux test, to be described later) usually will cause even a normal venous pulse to increase briefly but will have no effect on the carotid pulse.

Using the sternal angle as the reference point, the vertical distance (in centimeters) to the top of the jugular venous wave can be determined (Figure 11-2) and reported as the JVP; thus, JVP is 5 cm less than CVP. When the patient is positioned at 45 degrees above the horizontal, the clavicle lies a vertical distance of about 2 cm above the sternal angle, and only CVPs of at least 7 cm will be observed.10 Because the normal CVP in adults is 5 cm, the top of their venous pressure column lies at their sternal angle, 2 cm below their lowest visible point in a patient at 45 degrees, and will only appear as the patient reclines toward the horizontal. The upper limit of normal for CVP is 9 cm H2O, which produces a JVP extending 4 cm above the sternal angle.1 (Note: The Update that follows this section revealed that physicians underestimate the value of the central venous pressure from the jugular vein meniscus. Part of the underestimate may result from variability in the depth measured from the sternal notch to the mid-right atrium. This can be partially corrected by accepting a JVP of 3 cm or more as elevated.) Estimating CVP may be done as follows: Identify the highest point of pulsation in the internal jugular vein; find the sternal angle of Louis; from the sternal angle, measure the vertical distance to the top of the pulsation in centimeters; and report as “the JVP is xx cm.” Alternative methods of assessing CVP exist but have not been validated. For example, with a reclining patient, the clinician can inspect the veins of the back of the hand as the arm is slowly, passively raised; the level at which the veins collapse can then be related to the angle of Louis. This method may give false high readings with local obstruction and peripheral venous constriction, so it is not recommended.

Estimation of Central Venous Pressure

Abnormal Central Venous Pressure

The level of venous pressure is estimated by identifying the highest point of oscillation of the internal jugular vein (which usually occurs during the expiratory phase of respiration). This level must then be related to the middle of the right atrium, where venous pressure is, by convention, zero. Because the latter site is inaccessible on clinical examination, an accessible, reliable landmark is substituted: the sternal angle of Louis. This easily palpated landmark, found at the junction of the manubrium with the body of the sternum, lies 5 cm above the middle of the right atrium (for all practical purposes) in reclining patients of normal size and shape, regardless of the angle at which they are reclining.

Elevated JVP reflects an increase in CVP. This increase can be due to increased right ventricular diastolic pressure (eg, right ventricular failure or infarction, pulmonary hypertension, or pulmonic stenosis), obstruction to right ventricular inflow (eg, tricuspid stenosis, right atrial myxoma, or constrictive pericarditis), hypervolemia, or superior vena cava obstruction. Decreased JVP reflects a decreased or a low CVP. Low CVP may be due to intravascular volume depletion from gastrointestinal losses (vomiting or diarrhea), urinary losses (diuretics, uncontrolled diabetes mellitus, or diabetes insipidus), third-space fluid losses, and hypovolemic shock.

Distinguishing Arterial (Carotid) From Venous (Jugular) Pulsation

127

CHAPTER 11

The Rational Clinical Examination

Examiner places the patient in a reclined position and puts the base of the ruler at the sternal angle. “JVP Ruler”

M

AN

D

IB

Positive indication: Jugular venous pulse ≥ 3 cm above the sternal notch, or a sustained jugular venous pulse of ≥ 4 cm with abdominal compression, suggests a 3- to 4-fold increase in the likelihood that the central venous pressure is elevated.

LE

Height of right jugular vein

30-45° Pre

ssu

Elevated CVP: JVP ≥ 3 cm

Calculation of CVP = JVP + 5 cm

re

Manubrium Sternal angle (of Louis) Sternum

Normal CVP meniscus level

Depth to right atrium: 5 cm

T GH M RI IU R AT

HE

AR

T

Figure 11-2 Estimation of Central Venous Pressure From the Jugular Venous Pulse At any patient position, the top of the jugular vein meniscus is identified. The jugular venous pulse measurement is sighted from the height read from a ruler placed vertically over the sternal notch. The traditional assumption has been that the CVP is the JVP + 5 cm. However, the Update for this article showed that physicians tend to underestimate the CVP and the assumption of a 5-cm depth from the sternal notch to the right atrium is probably not valid. Thus, this figure has been updated to reflect current recommendations that a JVP ≥ 3 cm suggests an elevated CVP. Abbreviations: CVP, central venous pressure; JVP, jugular venous pressure.

Abdominojugular Reflux Test (Hepatojugular Reflux) The abdominojugular reflux test consists of observing JVP before, during, and after abdominal compression. The increase in jugular pressure that follows abdominal compression is believed to be a consequence of blood shifting from abdominal veins into the right atrium. Pasteur first described the hepatojugular reflux in 1885.11 Now, this bedside test is used to confirm the presence of right ventricular failure or reduced right ventricular compliance. Like all clinical tests, it is most reliable when performed in a standardized fashion. The patient is instructed to relax and breathe normally through an open mouth (to avoid the false-positive increase in jugular pressure that accompanies the Valsalva maneuver). Firm pressure is then applied with the palm of the hand to the midabdomen for 15 to 30 seconds (abdominal compression for 1 minute, as has previously been described, is not required).10,12,13 This pressure should approximate 20 to 35 mm Hg when an unrolled bladder of a standard adult blood pressure cuff, partially inflated with 6 full bulb compressions, is placed between the examiner’s hand and the patient’s abdomen.10,13 Pressure directly over the liver, as was originally described,1,2,12,14 appears to be unnecessary.13,15 Therefore, designation of the test as abdominojugular reflux, rather than hepatojugular reflux is more appropriate. If pain is produced by this maneuver, or if the patient strains (Valsalva), the test becomes falsely positive. Either instruct the patient to open his or her mouth and breathe slowly or try a trial run, which is sometimes useful to demonstrate to the patient the force that will be applied over the abdomen. 128

Healthy individuals may exhibit one of 3 responses to abdominal compression: no change in JVP; a transient (few seconds) increase of more than 4 cm that returns to its former level or near the baseline before 10 seconds, with little or no decrease when abdominal pressure is released; or an increase of more than 3 cm sustained throughout compression.10,13 A positive abdominojugular test result occurs when abdominal compression causes a sustained increase in JVP of greater than or equal to 4 cm.

Kussmaul Sign The JVP normally decreases during inspiration. The Kussmaul sign is the paradoxic increase in the height of JVP that occurs during inspiration. It can be explained by a heart that is unable to accommodate the increased venous return that accompanies the inspiratory decrease in intrathoracic pressure. Although classically described in constrictive pericarditis, the most common contemporary cause of the Kussmaul sign is severe right-sided heart failure, regardless of etiology. Other causes include myocardial restrictive disease such as amyloidosis, tricuspid stenosis, and superior vena cava syndrome.

PRECISION OF THE CLINICAL ASSESSMENT OF CENTRAL VENOUS PRESSURE When 2 clinicians examine the same patient once (interobserver variation), and even when 1 clinician examines the same patient twice (intraobserver variation), estimates of CVP commonly vary by up to 7 cm.4 Final-year medical stu-

CHAPTER 11 dents, first- and second-year medical residents, and attending physicians examined the same 50 intensive care unit patients (but were blinded to simultaneous CVP manometry) and estimated these patients’ CVPs as low (10 cm).4 Agreement between students and residents was substantial (κ, a measure of chance-corrected agreement, was 0.65), agreement between students and attending physicians was moderate (κ = 0.56), and agreement between residents and attending physicians was modest (κ = 0.30). Suggested causes for disagreement include variations in the positioning of patients, poor ambient lighting, difficulty in distinguishing carotid from venous pulsations, biological variation in CVP with the phases of respiration, and the effects of vasoactive medication and diuretics. The precision of the abdominojugular reflux test has not been reported, but its results will vary with the force of abdominal compression. Different investigators suggest different forces: Ducas et al10 compressed a semi-inflated blood pressure cuff placed in the middle of the abdomen to 35 mm Hg (equivalent to a weight of approximately 8 kg), whereas Ewy13 applied a pressure of approximately 20 mm Hg. Although no validated methods for improving precision in determining JVP have been reported, it seems prudent to standardize the procedure as described herein, encourage normal breathing, rehearse abdominal compression until the Valsalva maneuver is avoided, and gradually increase abdominal compression during a few seconds.16 Even when the Valsalva maneuver is avoided, there is still a small variation in JVP with the phases of breathing.17

ACCURACY OF THE CLINICAL ASSESSMENT OF CENTRAL VENOUS PRESSURE We describe 3 studies that have reported the relation between clinical assessments of CVP and the gold standard of simultaneous pressure measurements through an indwelling central venous catheter.4,5,18 When the clinical assessment was reported as low, normal, or high, the pooled overall accuracy was 56%. In one study,4 venous pressure was assessed in each of 50 intensive care unit patients by one of 3 intensive care unit attending physicians, one of 6 medical residents, and one of 6 medical students. Although all groups tended to underestimate venous pressure, only the residents did so to a statistically significant degree. The correlation coefficient between clinical assessment and central line measured CVP was highest for medical students (0.74), slightly lower for residents (0.71), and lowest for staff physicians (0.65), and these correlations improved slightly when patients receiving mechanical ventilation were excluded. The students’ data from this study 4 (Table 11-3) display the results for 2 clinical questions: “Is the patient’s true CVP low?” and “Is the patient’s true CVP high?”2 Despite small numbers of participants, it is apparent that a clinically assessed low CVP increases the likelihood by about 3-fold that the measured CVP will be low;

Central Venous Pressure

Table 11-3 Measured Central Venous Pressurea Is the CVP Low? Clinical Assessment

Low, CVP 5 cm

CVP low CVP normal CVP high

3 4 0

Clinical Assessment

High, CVP >10 cm

Normal or Low, CVP 18 mm Hg was considered abnormal, indicating volume overload).

MAIN OUTCOME MEASURES Sensitivity, specificity, and κ for the physical examination findings.

Test

Sensitivity Specificity LR+ (95% CI) LR– (95% CI) 0.57

0.93

8.5 (1.8-49) 0.46 (0.60-0.69)

0.81

0.80

4.0 (1.8-12) 0.24 (0.11-0.47)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

MAIN RESULTS See Table 11-7. The agreement for the presence of JVP elevation was good (κ = 0.69) but even better for abdominojugular reflux (κ = 0.92). These patients were mostly men and had a low ejection fraction (mean ejection fraction, 18%; range, 6%-39%).

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS Precision was determined. The clinicians

judged the JVP as abnormal or not. With the patient at 45 degrees from horizontal, the clinicians recorded an elevated central venous pressure (CVP) when they could visualize the jugular vein contours. LIMITATIONS The pulmonary capillary wedge pressure served as the reference standard rather than the CVP (the wedge pressure is a better indicator of volume status). Small sample size in a select group of patients. There must have been expectation bias in that most clinicians would have expected these severely affected patients to have volume overload and abnormal physical findings. This should have led to an overestimate of sensitivity and an underestimate of specificity (because more patients would have been expected to be in the first row of the 2 × 2 table). The results are consistent with those found for estimated CVP greater than 10 cm by Stein et al1 in a similar population of patients with advanced heart failure. An intriguing finding is that every patient judged to have an elevated JVP also had

E11-1

CHAPTER 11

Evidence to Support the Update

an abnormal abdominojugular reflux. The gain in sensitivity from the abdominojugular reflux assessment vs the JVP was offset by the loss of specificity; these tests performed similarly in this population.

range, 4.6-6.1 cm). However, when the patient was at 90 degrees, the median distance was 8.3 cm (interquartile range, 7-9.6 cm). Between 30- and 60-degree elevation (the elevation typically used in clinical assessments), the median calculated distance was approximately 8 cm.

REFERENCE FOR THE EVIDENCE 1. Stein JR, Neumann A, Marcus RH. Comparison of estimates of right atrial pressure by physical examination and echocardiography in patients with congestive heart failure and reasons for discrepancies. Am J Cardiol. 1997;80(12):1615-1618.

Reviewed by David L. Simel, MD, MHS

TITLE How Far Is the Sternal Angle From the Mid-Right Atrium? AUTHORS Seth R, Magner P, Matzinger F, van Walraven C. CITATION J Gen Intern Med. 2002;17(11):861-865. QUESTION Is the recommendation to add 5 cm to the jugular venous pulse to estimate the central venous pressure valid? DESIGN Convenience sample of patients undergoing computed tomography of the chest.

CONCLUSIONS LEVEL OF EVIDENCE Not a diagnostic test study. STRENGTHS Large sample, asking an important question

about the assumptions necessary for the clinical examination. LIMITATIONS The authors had to make their own assumption about the position of the mid-right atrium. The CT scans were done in a population of patients primarily with lung or thoracic disease (eg, carcinoma). This is a clever and basic study to test an assumption underlying the clinical examination. The decision to add 5 cm to the estimation of the jugular venous pressure (JVP) makes sense when the patient is supine. However, clinicians almost never assess the JVP in the supine patient. The authors found a median distance of 8 cm in positions typically used during the clinical examination. Thus, clinicians using the JVP would underestimate the central venous pressure (CVP) by 3 cm. The implication of this is that any patient with a JVP 3 cm above the horizontal should be considered as having a high CVP because the likely CVP will be more than 10 cm. This recommendation is consistent with data from the Stein et al1 study that found the JVP leads to underestimates of around 5 cm for patients with elevated CVP.

SETTING Imaging unit in a Canadian university hospital. PATIENTS One hundred sixty of 333 potentially eligible patients. Patients with chest deformities, large habitus prohibiting landmark identification, and refusals to participate were excluded.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD High-speed chest computed tomography (CT) scans were performed on patients while they were in the lateral supine and 90-degree positions during an end-inspiratory breath hold. The authors assumed that the mid-right atrium was 2 cm below the superior vena cava and right atrial junction.

MAIN OUTCOME MEASURE Measured sternal angle distance for supine and 90-degree positions. The investigators used geometric calculations to determine the distance between the sternal angle at 30 degrees, 45 degrees, and 60 degrees.

MAIN RESULTS With the patient supine, the median distance between the sternal angle and right atrium was 5.4 cm (interquartile E11-2

REFERENCE FOR THE EVIDENCE 1. Stein JR, Neumann A, Marcus RH. Comparison of estimates of right atrial pressure by physical examination and echocardiography in patients with congestive heart failure and reasons for discrepancies. Am J Cardiol. 1997;80(12):1615-1618.

Reviewed by David L. Simel, MD, MHS

CHAPTER 11

TITLE Comparison of Estimates of Right Atrial Pressure by Physical Examination and Echocardiography in Patients With Congestive Heart Failure and Reasons for Discrepancies. AUTHORS Stein JH, Neumann A, Marcus RH. CITATION Am J Cardiol. 1997;80(12):1615-1618. QUESTION Among patients with severe heart congestive heart failure, how closely do cardiologists predict the central venous pressure? DESIGN Consecutive, prospective.

Central Venous Pressure

Table 11-8 Likelihood Ratio of Clinically Estimated CVP CVP

Invasive CVP

LR+ (95% CI)

LR– (95% CI)

Clinical estimate of CVP > 10 cm Clinical estimate CVP > 8 cm

CVP ≥ 10 cm

11 (0.73-157)

0.25 (0.10-0.64)

CVP ≥ 8 cm

1.6 (0.98-3.7)

0.18 (0.03-1.1)

Abbreviations: CI, confidence interval; CVP, central venous pressure; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS Objective reference standard done immedi-

SETTING Cardiac catheterization laboratory, Chicago, Illinois. PATIENTS Twenty-two patients with an average left ventricular ejection fraction of 19% (range, 12%-29%).

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The central venous pressure (CVP) was measured from the jugular venous pressure (JVP) by identifying the peak JVP. A centimeter ruler placed vertically on the sternal angle was used to measure the distance at an intersection with a horizontal straight edge placed at the JVP. To estimate the CVP, 5 cm was added to the vertical distance. A right-sided heart catheterization was performed immediately thereafter.

MAIN OUTCOME MEASURE Correlation between clinical estimate of the CVP and the invasive measurement. The data are displayed in a scatterplot, so that lines can be drawn to extract the raw results.

ately after the CVS was assessed. LIMITATIONS Small population of patients in a narrow spectrum of disease. The examiners were specialists. These data are most useful for validating the concept that clinical assessment of the CVP, by measuring the vertical distance to the JVP, and then adding 5 cm, will systematically underestimate the true pressure. Because the population of patients was small, the confidence intervals around the estimates using a cut point of 10 are huge for the positive LR. Every patient with a clinical estimate of more than 10 cm CVP had the result confirmed by the invasive test. However, it seems likely that cardiologists estimating a high pressure (>10 cm) in a population of patients with low ejection fractions are usually going to be correct. It becomes much more difficult to identify patients with volume overload when lower clinical and invasive thresholds are used. Reviews of clinical assessments of CVP recommend that clinicians use a clinical estimate of 8 cm as their threshold for assessing a high pressure.

Reviewed by David L. Simel, MD, MHS

MAIN RESULTS The correlation between the raw clinical estimate and the invasive measure was 0.92. The clinical estimates systematically underestimated the actual value. The bias was least for those with clinical estimates of less than 8 (correlation was near perfect), but the underestimation became more pronounced as the clinician estimated a higher CVP from the JVP. With estimates of 9 to 14 cm, the clinicians underestimated the true CVP by 5.0 cm. Dichotomizing the data for a clinical estimate of 8 cm or more and extracting the results from the scatterplot reveals the likelihood ratios (LRs) in Table 11-8.

E11-3

This page intentionally left blank

12

C H A P T E R

Does This Patient Have Acute Cholecystitis? Robert L. Trowbridge, MD Nicole K. Rutkowski, MD Kaveh G. Shojania, MD

CLINICAL SCENARIO A 72-year-old woman with poorly controlled diabetes, coronary artery disease, and hypertension presents to the emergency department complaining of nausea and vomiting. As an emergency department resident, you elicit the history that the patient felt well until 24 hours ago, when she developed anorexia, followed rapidly by bilious emesis. She describes mild upper abdominal discomfort but is unable to further localize the pain. There have been no abnormal bowel movements, gastrointestinal bleeding, or chest pain. The patient is febrile (39°C) and appears uncomfortable. Her lungs are clear, and cardiac examination reveals only a fourth heart sound. There is moderate epigastric tenderness and guarding throughout the abdomen, but no rigidity. Pelvic and rectal examination results are unremarkable. Electrocardiography shows no changes suggestive of ischemia. Laboratory testing shows a leukocytosis level of 17500 × 103/μL, serum transaminase levels twice the upper limit of normal, and a total bilirubin level of 3.2 mg/dL. In considering the differential diagnosis for the patient’s presenting complaint and laboratory results, you wonder whether the suspicion of acute cholecystitis is high enough to warrant further testing.

WHY IS THIS QUESTION IMPORTANT? Acute cholecystitis accounts for 3% to 9% of hospital admissions for acute abdominal pain.1-4 Most patients presenting with upper abdominal complaints are subsequently found to have a relatively benign cause of pain (eg, dyspepsia or gastroenteritis),2,5 but the possibility of acute cholecystitis mandates the completion of a comprehensive and at times laborious diagnostic evaluation. The importance of this clinical dilemma is only magnified by the frequency with which abdominal pain is encountered in clinical practice.6-8 Traditionally, the diagnosis of acute cholecystitis was followed by a several-week “cooling off ” period before proceeding to surgery. Most clinicians now advocate early cholecystectomy (ie, within several days of the onset of symptoms),9 because it leads to lower complication rates, reduced costs, and shortened recovery periods.10-14

DEFINITION OF CHOLECYSTITIS Defining cholecystitis as “inflammation of the gallbladder” implies a pathologic state. What clinicians usually mean by acute cholecystitis, however, is the presence of this pathologic state (seen macroscopically at laparotomy or microscopically by the pathologist) in the setting of a plausibly Copyright © 2009 by the American Medical Association. Click here for terms of use.

137

CHAPTER 12

The Rational Clinical Examination

related clinical presentation. Practically speaking, cholecystitis is a syndrome encompassing a continuum of clinicopathologic states. At one end of this continuum is symptomatic cholelithiasis, with acute attacks of pain (biliary colic) that resolve in 4 to 6 hours. At the other end, that which is typically associated with the term acute cholecystitis, is a clinical picture in which biliary colic is longer lasting and accompanied by fever, laboratory markers of inflammation, or cholestasis.15,16 Gallbladder inflammation without gallstones (ie, acalculous cholecystitis) typically occurs in critically ill patients and is consequently associated with a high mortality rate.17,18

HOW TO ELICIT THE RELEVANT SIGNS AND SYMPTOMS Cope’s Early Diagnosis of the Acute Abdomen15 points out that “biliary colic” is a misnomer because biliary obstruction produces pain of a steady, nonparoxysmal nature. A majority of studies have explicitly defined biliary colic in similar terms (eg, a steady right upper quadrant pain lasting for at least 30 minutes), but others have used the term without definition.19 Cope’s15 also stresses that biliary colic localizes to the midepigastrium as often as to the right upper quadrant. A recent systematic review19 supports this observation because “upper abdominal pain” exhibited test characteristics comparable to right upper quadrant pain. Thus, the clinician should inquire about both pain in the upper quadrant and more generally pain in the upper abdomen. The clinician should also ask the patient about fat intolerance because abdominal discomfort after fatty meals may have a predictive value similar to that of biliary colic.19 Physical findings most famously associated with the gallbladder are the Courvoisier and Murphy signs. The Courvoisier sign has evolved in meaning,20 but standard definitions describe the sign as referring to a palpable, nontender gallbladder in a patient with jaundice.21,22 Courvoisier observed that dilation of the gallbladder occurred more commonly when obstruction resulted from malignancy, rather than from benign conditions such as gallstones. Although this association is real, the sign should not be elevated to the status of a “law,”20-22 because recent reports confirm the occurrence of the Courvoisier sign in biliary conditions other than obstructive malignancies.23 The Murphy sign refers to pain and arrested inspiration occurring when the patient inspires deeply while the examiner’s fingers are hooked underneath the right costal margin.21,22,24 Data addressing the usefulness of the Murphy sign in evaluating patients suspected of having acute cholecystitis are discussed along with other findings from the systematic review presented below. The only other physical sign we identified as specifically associated with acute cholecystitis was the Boas sign. Originally, this sign referred to point tenderness in the region to the right of the 10th to 12th thoracic vertebrae,25-27 but contemporary sources describe hyperesthesia to light touch in the right upper quadrant or infrascapular area.22 One study28 reported that 7% of patients undergoing cholecystectomy exhibited hyperesthesia 138

in this region, but no patient exhibited the Boas sign in the original sense. None of the other studies reviewed below assessed the Boas sign in either form.

ACCURACY OF DIAGNOSTIC IMAGING Ultrasonography of the right upper quadrant has emerged as the most commonly used imaging modality for suspected cholecystitis. Meta-analysis of the diagnostic performance of ultrasonography in detecting acute cholecystitis indicated an unadjusted sensitivity and specificity of 94% and 78%, respectively.29 The investigators included in their analysis adjustments for verification bias30-32 (also called workup bias33), which refers to the distorted diagnostic test characteristics observed when the decision to proceed with a gold standard test (eg, cholecystectomy) is affected by the results of preliminary tests such as right upper quadrant ultrasonography. Patients with a negative ultrasonography result will undergo cholecystectomy only in the setting of extremely typical clinical findings. The consequent loss of patients with atypical clinical presentations reduces the opportunity for false-negative ultrasonography results, thus inflating the apparent sensitivity of ultrasonography and its associated “rule-out” power. Conversely, specificity and the associated “rule in” ability of ultrasonography are underestimated. Adjustments for the effects of verification bias in the abovementioned meta-analysis29 indicated that ultrasonography detects acute cholecystitis with sensitivity of 88% (95% confidence interval [CI], 74%-100%) and specificity of 80% (95% CI, 62%-98%). Sensitivity for the detection of cholelithiasis was comparable, but specificity was higher, at approximately 99%. Radionuclide scanning has slightly better test characteristics for the diagnosis of acute cholecystitis but offers no evaluation of alternative abdominal diagnoses and has the disadvantages of greater inconvenience and patient exposure to radionuclides.29 Computed tomography of the abdomen, although useful for the evaluation of suspected complications and concurrent intra-abdominal conditions, is inferior to ultrasonography in the assessment of acute biliary disease.34,35

METHODS The initial electronic search queried the MEDLINE database for January 1966 through November 2000 (limited to English-language articles) using the Medical Subject Headings (MeSH) “acute abdomen,” “abdominal pain,” “cholecystitis,” “cholelithiasis,” “gallbladder,” and “gallbladder diseases.” These terms were then combined with various combinations of MeSH terms, title words, and text words: “physical examination,” “medical history taking,” “professional competence,” “sensitivity and specificity,” “reproducibility of results,” “observer variation,” “diagnostic tests,” “decision support techniques,” “Bayes theorem,” “predictive value of tests,” “palpation,” “percussion,” “differential diagnosis,” and “diagnostic errors.” The Science Citation Index and Cochrane Library were also searched, and a hand search of Index Medicus was conducted for 1950 through 1965, using the terms

CHAPTER 12 “cholecystitis,” “acute abdomen,” and “gallbladder.” Bibliographies of identified articles were searched for additional pertinent articles, as were the bibliographies of prominent textbooks of physical examination, surgery, and gastroenterology. An electronic search of MEDLINE was repeated in July 2002 to look for any relevant articles appearing since completion of the more comprehensive search. Two authors (RT and NR) independently abstracted data from the identified studies, and all 3 authors reviewed these data for inclusion. Included studies evaluated the role of a clinical test (including medical history, physical examination, and basic laboratory tests) in adult patients with abdominal pain or suspected acute cholecystitis. Included studies were also required to report data from a control group of patients subsequently found not to have acute cholecystitis, with sufficient detail to allow construction of 2 × 2 tables. Finally, studies were required to define cholecystitis according to an adequate gold standard, including surgery, pathologic examination, radiographic imaging (hepatic iminodiacetic acid [HIDA] scan or

Cholecystitis

right upper quadrant ultrasonography), or clinical follow-up documenting a course consistent with acute cholecystitis and without evidence for an alternate diagnosis. Summary measures for the sensitivity of the evaluated components of the clinical examination and basic laboratory tests for cholecystitis were derived from published raw data from the reported studies meeting our inclusion criteria. A randomeffects model was used to generate conservative summary measures and CIs for the sensitivity and likelihood ratios (LRs).36-38 For LRs, a summary measure is reported only when more than 2 studies were identified; otherwise, a range was reported.

RESULTS Of 195 studies identified by our search, 17 evaluated the role of the clinical examination or basic laboratory test in patients with acute abdominal pain and possible acute cholecystitis and also met our inclusion criteria (Table 12-1).39-55

Table 12-1 Studies of the Diagnostic Performance of Clinical and Laboratory Findings in Detecting Acute Cholecystitis Source

Study Period 39

Adedeji and McAdam, 1996 1985-1990 Bednarz et al,40 1986 1983-1984 Brewer et al,41 1976 Dunlop et al,42 1989

1971-1972 1982-1986

Eikman et al,43 1975

Not stated

Gruber et al,44 1996

1990-1993

Halasz,45 1975

1969-1974

Johnson and Cooper,46 1995 Not stated

Juvonen et al,47 1992

1988-1989

Liddington and Thomson,48 Not stated 1991 Lindenauer and Child,49 1966 1959-1964 Potts and Vukov,50 1999 1992-1995 Prevot et al,51 1999

1997-1999

Raine and Gunn,52 1975

1965-1973

Schofield et al,53 1986

Not stated

Singer et al,54 1996

1993

Staniland et al,55 1972

Not stated

Sample Consecutive Size Patients

Selection Criteria

Design

Basis for Diagnosis

Acute abdominal pain and age > 70 y Suspected acute cholecystitis and referred for HIDA scan Abdominal pain Acute abdominal pain and suspected acute cholecystitis Suspected acute cholecystitis and referred for radiology testing Positive HIDA scan results and underwent surgery for suspected acute cholecystitis Suspected acute cholecystitis

Retrospective Prospective

431 70

Yes Yes

Retrospective Prospective

570 270

Yes Yes

Prospective

38

Yes

Retrospective

198

Yes

Retrospective

238

Yes

Positive HIDA scan results and underwent surgery for suspected acute cholecystitis Suspected acute cholecystitis referred for ultrasonography Abdominal pain

Retrospective

69

No

Prospective

129

Yes

Prospective

142

No

Pathology (95%) Ultrasonography (5%) Clinical impression

Underwent cholecystectomy Abdominal pain requiring operation and age > 80 y ICU patients with suspected acute acalculous cholecystitis Suspected acute cholecystitis and underwent surgery Abdominal pain and suspected acute cholecystitis Suspected acute cholecystitis and radiology testing completed Admission for abdominal pain of < 1 wk

Retrospective Retrospective

200 117

No Yes

Pathology Pathology

Prospective

32

Yes

Prospective

156

Yes

Pathology (50%) Clinical impression (50%) Pathology

Prospective

100

Yes

Gallstones at laparotomy

Retrospective

100

Yes

Retrospective

600

No

Pathology (44%) HIDA scintigraphy (56%) Surgery

Clinical follow-up Surgery (43%) Clinical impression (57%) Multiple Pathology (71%) Clinical impression (29%) Surgical (38%) Clinical impression (62%) Pathology

Surgery (65%) Other (35%)a Pathology

Abbreviations: HIDA, hepatic iminodiacetic acid; ICU, intensive care unit. aRadiology testing and clinical follow-up; exact proportions not specified.

139

CHAPTER 12

The Rational Clinical Examination

Twelve of these studies40,42-47,49,51-54 enrolled patients specifically suspected of having acute cholecystitis, with inclusion of many of these studies based on patient referral for radiology testing (ie, HIDA scan or right upper quadrant ultrasonography) for the confirmation of a clinical diagnosis. The remaining 5 studies39,41,48,50,55 enrolled patients presenting with abdominal pain and did not require a specific suspicion of acute cholecystitis for patient inclusion. Each of the 17 studies evaluated a variable number of clinical and laboratory findings included in the evaluation of suspected cholecystitis, ranging from 1 to 9 characteristics per study (Table 12-2).

Precision of Signs and Symptoms Measurements of laboratory characteristics and objective clinical signs such as temperature are assumed to have

high precision, but the reproducibility of other aspects of the clinical examination for cholecystitis remains largely unknown. In fact, the only study identified as assessing the precision of some aspect of the clinical examination for biliary disease was an evaluation of the diagnostic value of iridology56 (iridologists believe that intricate neural connections between major organs and the iris permit diagnosis of general medical conditions through inspection of iris pigmentation patterns 57,58). In this relatively well-designed study, the accuracy and precision of iridologic signs for the diagnosis of cholecystitis were barely distinguishable from values expected by chance alone (κ = –0.06 to 0.28 for the 10 possible observer pairs). Unfortunately, analogous studies have not been carried out with conventional clinical maneuvers related to the

Table 12-2 Summary Test Characteristics for Clinical and Laboratory Findings in Included Studies

Finding (No. of Studies)

No. of Patientsa

Summary LRb Sensitivity (95% CI)

Specificity (95% CI)

LR+ (95% CI)

LR– (95% CI)

Clinical 41,55

Anorexia (2) Emesis (4)41,46,53,55 Fever (>35°C) (8)40,41,44,46,50-53 Guarding (2)41,55 Murphy sign (3)39,46,54 Nausea (2)46,54 Rebound (4)40,41,48,55 Rectal tenderness (2)41,55 Rigidity (2)41,55 Right upper abdominal quadrant Mass (4)40,45,53,54 Pain (5)40,45,46,54,55 Tenderness (4)40,45,54,55

1135 1338 1292 1170 565 669 1381 1170 1140

0.65 (0.57-0.73) 0.71 (0.65-0.76) 0.35 (0.31-0.38) 0.45 (0.37-0.54) 0.65 (0.58-0.71) 0.77 (0.69-0.83) 0.30 (0.23-0.37) 0.08 (0.04-0.14) 0.11 (0.06-0.18)

0.50 (0.49-0.51) 0.53 (0.52-0.55) 0.80 (0.78-0.82) 0.70 (0.69-0.71) 0.87 (0.85-0.89) 0.36 (0.34-0.38) 0.68 (0.67-0.69) 0.82 (0.81-0.83) 0.87 (0.86-0.87)

1.1-1.7 1.5 (1.1-2.1) 1.5 (1.0-2.3) 1.1-2.8 2.8 (0.8-8.6) 1.0-1.2 1.0 (0.6-1.7) 0.3-0.7 0.50-2.32

0.5-0.9 0.6 (0.3-0.9) 0.9 (0.8-1.0) 0.5-1.0 0.5 (0.2-1.0) 0.6-1.0 1.0 (0.8-1.4) 1.0-1.3 1.0-1.2

408 949 1001

0.21 (0.18-0.23) 0.81 (0.78-0.85) 0.77 (0.73-0.81)

0.80 (0.75-0.85) 0.67 (0.65-0.69) 0.54 (0.52-0.56)

0.8 (0.5-1.2) 1.5 (0.9-2.5) 1.6 (1.0-2.5)

1.0 (0.9-1.1) 0.7 (0.3-1.6) 0.4 (0.2-1.1)

Alkaline phosphatase > 120 U/L (4)42,46,49,51 Elevated ALT or AST levelc (5)42,46,49,51,53 Total bilirubin > 2 mg/dL (6)40,42,43,46,49,51 Total bilirubin, AST, or alkaline phosphatase (1)52 All 3 elevated Any 1 elevated Leukocytosisd (7)41,44,46,50-53 Leukocytosisd and fever (2)44,52

556

0.45 (0.41-0.49)

Laboratory 0.52 (0.47-0.57)

0.8 (0.4-1.6)

1.1 (0.6-2.0)

592

0.38 (0.35-0.42)

0.62 (0.57-0.67)

1.0 (0.5-2.0)

1.0 (0.8-1.4)

674

0.45 (0.41-0.49)

0.63 (0.59-0.66)

1.3 (0.7-2.3)

0.9 (0.7-1.2)

0.34 (0.30-0.36) 0.70 (0.67-0.73) 0.63 (0.60-0.67) 0.24 (0.21-0.26)

0.80 (0.69-0.88) 0.42 (0.31-0.53) 0.57 (0.54-0.59) 0.85 (0.76-0.91)

1.6 (1.0-2.8) 1.2 (1.0-1.5) 1.5 (1.2-1.9) 1.6 (0.9-2.8)

0.8 (0.8-0.9) 0.7 (0.6-0.9) 0.6 (0.5-1.8) 0.9 (0.8-1.0)

270

1197 351

Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aMay not equal sums of N in Table 12-1 because not all studies applied all tests to all patients. bSummary measures provided only for findings discussed by more than 2 studies. cGreater than upper limit of normal (ALT, 40 U/L; AST, 48 U/L). dWhite blood cell count of more than 10/μL.

140

CHAPTER 12 diagnosis of cholecystitis. In fact, as observed in a previous article in this series,59 the precision of even the most basic components of the abdominal examination (eg, guarding, rigidity, and rebound tenderness) remains uncharacterized. Poor reproducibility for abdominal examination would erode the assessments of sensitivity and specificity provided by different investigators. Presumably, then, one can infer a certain degree of interrater reliability from the fact that multiple studies demonstrate modest sensitivity for these signs in diagnosing important abdominal conditions.59 Nonetheless, further assessments of core components of the abdominal examination would be a welcome addition to the literature.

Accuracy of Signs and Symptoms No single clinical or laboratory finding had an LR– sufficiently low to rule out the diagnosis of acute cholecystitis (Table 12-2). Even the absence of right upper quadrant tenderness does not rule out acute cholecystitis with its LR of 0.4. Elderly patients may be particularly prone to present without signs or symptoms referable to the right upper quadrant.60 Similarly, individual symptoms, signs, and laboratory results did not have LR+s sufficiently high to rule in the diagnosis of acute cholecystitis. In fact, none of the LR+s were more than 2.0, with the exception of the Murphy sign, which was associated with a ratio of 2.8. The 95% CI for this summary estimate included 1.0, but the use of the Murphy sign was especially prone to verification bias. Thus, the true LR+ might exceed the estimated value.

Limitations of the Literature The problem of verification (or workup) bias30-33 was discussed in the section on diagnostic imaging but likely affected all of the clinical and laboratory findings assessed in this review. Patients with upper abdominal tenderness, fever, abnormal liver function results, or other “typical” findings more commonly undergo further evaluation (eg, diagnostic imaging) for acute cholecystitis than do patients presenting without these findings. The lack of patients with atypical presentations in studies leads to overestimates of sensitivity and underestimates of specificity. Supplementing the diagnosis of cholecystitis with clinical follow-up would mitigate the effects of verification bias, but only 1 study39 incorporated clinical follow-up in the diagnostic protocol. Spectrum bias61 (or, more recently, spectrum effect62) distorts test characteristics since there is inadequate representation of the relevant disease and disease-free states in the patient samples used to evaluate the test of interest. The prevalence of cholecystitis in the study populations was as high as 80% and averaged 41%, in contrast to the prevalence of 3% to 5% among patients presenting with abdominal pain of less than 1 week’s duration.1,2,41 Subgroup analysis can generate values for sensitivity and specificity in patient populations with substantially differ-

Cholecystitis

ent previous likelihoods of disease from the average value.62 Because available data often do not permit such analysis, one has to make qualitative inferences about the difference between the prior probability of disease in a particular patient and the prevalence in the population used to evaluate the test. For instance, a high prevalence of cholecystitis in clinical reports reduces the opportunity to detect both false-positive and true-negative results compared to the findings in patients with a lower prevalence of disease. Thus, clinical findings and laboratory tests used to evaluate cholecystitis may have different sensitivity and specificity than suggested in the available literature. Other limitations to the existing literature include the retrospective design of most studies, modest sample sizes, unblinded assessment of key outcomes and test results, and the variability in criteria for establishing a diagnosis of cholecystitis. The included studies varied between accepting clinicians’ diagnostic impressions (usually incorporating imaging results), findings at laparotomy, and pathologic findings as the means of diagnosis. Unfortunately, the correlation between clinical and pathologic diagnoses of cholecystitis is poor.63 Gallstones occur commonly enough that their presence, even in the context of inflammatory cells, may be “true but unrelated” with respect to the patient’s acute presentation. Overdiagnosis from this and other available gold standards likely resulted in an overestimation of the prevalence of acute cholecystitis, with consequent distortion of the usefulness of clinical and basic laboratory findings. Finally, studies assessing both calculous and acalculous cholecystitis were included in the review. Although these entities share many clinical traits, the nonspecific presentation of acalculous cholecystitis likely eroded the value of several clinical findings.

Combinations of Findings and the Clinical “Gestalt” Even with the above limitations, it seems unlikely that individual clinical or laboratory findings have LR+ or LR– of sufficient magnitude to play a decisive role in the diagnosis of acute cholecystitis. Thus, one might look to combinations of clinical signs and symptoms to facilitate, confirm, or exclude the diagnosis of cholecystitis. Unfortunately, only 3 included studies42,44,52 specifically evaluated the value of such combinations. Two studies evaluated the combination of fever and leukocytosis; the third reviewed various combinations of liver function tests. Assessments of the LRs of the above combinations demonstrated no benefit over their individual components, suggesting that these tests did not function independently of one another. Indeed, fever and leukocytosis may be seen as different manifestations of the same underlying process of nonspecific inflammation, so it is not surprising that combining them provided no synergistic diagnostic value. Similarly, right upper quadrant pain and the Murphy sign likely reflect the same underlying pathophysiologic process (ie, local inflammation and peritoneal irritation), so that these findings would not be expected to function independently of one another. 141

CHAPTER 12

The Rational Clinical Examination

Although the existing literature does not identify specific clinically useful combinations of findings, the effect of such combinations can be estimated with available data. In 2 randomized trials of early vs delayed cholecystectomy,13,14 laparotomy failed to confirm the preoperative diagnosis of acute cholecystitis in 5 of 99 patients (95% CI, 1.9-12)14 and in 0 of 104 patients (95% CI, 0-4.4).13 Given a likely bias toward confirming the preoperative diagnosis, let us assume that the actual false-positive rate for the clinical diagnosis of cholecystitis is higher (eg, 15%) than suggested by these values. A 15% false-positive rate would imply an 85% posttest probability for all clinical, laboratory, and radiologic tests. We know that ultrasonography of the right upper quadrant has a sensitivity and specificity of 88% and 80%, respectively.29 Working backward, we can infer that the composite clinical evaluation generates a pretest probability of approximately 60% before the results of ultrasonography are obtained. This posttest probability of 60% for the clinical suspicion of cholecystitis reflects the diagnostic power of the clinical evaluation before ultrasonography, as well as the pretest probability. At this stage in the diagnostic process, the pretest probability reflects the prevalence of the diagnosis, which is approximately 5% among patients presenting to the emergency department with abdominal pain.1,2,41 Thus, the clinical diagnosis of acute cholecystitis formulated according to medical history, physical examination, and basic laboratory testing must increase the pretest probability from 5% to 60%. Achieving this increase in pretest probability requires that the gestalt comprising certain clinical and laboratory findings have an LR+ on the order of 25 to 30. To put this range in perspective, “typical angina” has an LR+ of 115 for the diagnosis of coronary artery stenosis greater than 75% in adult men. Nonsloping depression of the ST segment of at least 2.5 mm during exercise electrocardiography has an LR+ of 39 for the same diagnosis.64 Thus, our estimate for the diagnostic usefulness of the clinical gestalt in diagnosing acute cholecystitis, approximate and speculative as it is, confirms the impression of many clinicians that the overall clinical assessment plays a crucial role in arriving at a diagnosis. It is tempting to supplement the existing literature by asking experts for their opinion on which specific findings drive the clinical impression for or against acute cholecystitis. Unfortunately, discerning the key elements of the clinical assessment can prove deceptive, even for experienced clinicians. For instance, a recent clinical model for the prediction of pulmonary embolism omits hypoxemia and pleurisy from the algorithm for determining pretest probability.65,66 Similarly, many of the classic descriptors of angina have surprisingly little influence on the assessment of chest pain.67 This dissociation between commonly accepted harbingers of disease and evidence-based determinants of disease probability undermines the role of expert opinion in identifying key clinical findings even for common conditions. Consequently, tempting as it is to open the “black box” of the clinical gestalt for cholecystitis, doing so will require further study of specific clinical findings or, more likely, combinations of findings. 142

CLINICAL SCENARIO—RESOLUTION Your differential diagnosis for the patient’s presentation includes viral hepatitis, cholecystitis, and gallstone pancreatitis. To validate your impression and help establish the relative likelihood of each, you ask the emergency department attending physician to evaluate the patient. She regards the likelihood of cholecystitis as high enough to warrant diagnostic imaging. In fact, her clinical impression is that cholecystitis is the leading diagnosis, so she recommends urgent right upper quadrant ultrasonography. The ultrasonography subsequently reveals the presence of gallstones, gallbladder wall thickening, and a sonographic Murphy sign. These findings, in the context of the patient’s presentation, virtually confirm the diagnosis of acute cholecystitis.68

THE BOTTOM LINE The existing literature identifies no single finding with sufficient diagnostic power to establish or exclude acute cholecystitis without further testing (eg, right upper quadrant ultrasonography). Combinations of certain symptoms, signs, and laboratory results likely have more useful LRs and presumably inform the diagnostic impressions of experienced clinicians. Future research may allow the development of prediction rules that combine basic demographics with clinical findings to distinguish patients who require no further testing from those who require continued diagnostic evaluation, as is currently possible with the evaluation of suspected pulmonary embolism.66,69 Until then, the clinical evaluation of patients with abdominal pain suggestive of cholecystitis will continue to rely heavily on the clinical gestalt and diagnostic imaging. Author Affiliations at the Time of the Original Publication

Maine Hospitalist Service, Maine Medical Center, Portland (Dr Trowbridge); Department of Medicine, California Pacific Medical Center, San Francisco (Dr Rutkowski); Department of Medicine, University of California, San Francisco (Dr Shojania). Acknowledgments

Dr Trowbridge’s work was supported in part by a grant from the Josiah Macy Jr Foundation. We thank Theodore N. Pappas, MD, David Edelman, MD, Robert Badgett, MD, and James Wagner, MD, for their helpful comments on drafts of the manuscript.

REFERENCES 1. Powers RD, Guertler AT. Abdominal pain in the ED. Am J Emerg Med. 1995;13(3):301-303. 2. Irvin TT. Abdominal pain. Br J Surg. 1989;76(11):1121-1125. 3. Kizer KW, Vassar MJ. Emergency department diagnosis of abdominal disorders in the elderly. Am J Emerg Med. 1998;16(4):357-362. 4. Miettinen P, Pasanen P, Lahtinen J, Alhava E. Acute abdominal pain in adults. Ann Chir Gynaecol. 1996;85(1):5-9.

CHAPTER 12 5. Wasson JH, Sox HC Jr, Sox CH. The diagnosis of abdominal pain in ambulatory male patients. Med Decis Making. 1981;1(3):215-224. 6. Popovic JR. 1999 National Hospital Discharge Survey: annual summary with detailed diagnosis and procedure data. Vital Health Stat 13. 2001;(151):i-v, 1-206. 7. Cherry DK, Burt CW, Woodwell DA. National Ambulatory Medical Care Survey: 1999 Summary Advance Data from Vital and Health Statistics. Hyattsville, MD: National Center for Health Statistics; 2001. No. 322. 8. McCaig L, Burt CW. National Hospital Ambulatory Medical Care Survey: 1999 Emergency Department Summary: Advance Data from Vital and Health Statistics. Hyattsville, MD: National Center for Health Statistics; 2001. No. 320. 9. Lillemoe KD. Surgical treatment of biliary tract infections. Am Surg. 2000;66(2):138-144. 10. Hashizume M, Sugimachi K, MacFadyen BV. The clinical management and results of surgery for acute cholecystitis. Semin Laparosc Surg. 1998;5(2):69-80. 11. Chandler CF, Lane JS, Ferguson P, Thompson JE, Ashley SW. Prospective evaluation of early versus delayed laparoscopic cholecystectomy for treatment of acute cholecystitis. Am Surg. 2000;66(9):896-900. 12. Eldar S, Eitan A, Bickel A, et al. The impact of patient delay and physician delay on the outcome of laparoscopic cholecystectomy for acute cholecystitis. Am J Surg. 1999;178(4):303-307. 13. Lai PB, Kwong KH, Leung KL, et al. Randomized trial of early versus delayed laparoscopic cholecystectomy for acute cholecystitis. Br J Surg. 1998;85(6):764-767. 14. Lo CM, Liu CL, Fan ST, Lai EC, Wong J. Prospective randomized study of early versus delayed laparoscopic cholecystectomy for acute cholecystitis. Ann Surg. 1998;227(4):461-467. 15. Silen W. Cope’s Early Diagnosis of the Acute Abdomen. 20th ed. New York, NY: Oxford University Press; 2000:128-137, 145-147. 16. Ahrendt SA, Pitt HA. Biliary tract. In: Townsend CM, Beauchamp RD, Evers BM, Mattox KL, eds. Sabiston Textbook of Surgery. 16th ed. Philadelphia, PA: WB Saunders; 2001:1076-1111. 17. Kalliafas S, Ziegler DW, Flancbaum L, Choban PS. Acute acalculous cholecystitis: incidence, risk factors, diagnosis, and outcome. Am Surg. 1998;64(5):471-475. 18. Warren BL, Carstens CA, Falck VG. Acute acalculous cholecystitis—a clinical-pathological disease spectrum. S Afr J Surg. 1999;37(4):99-104. 19. Berger MY, van der Velden JJ, Lijmer JG, de Kort H, Prins A, Bohnen AM. Abdominal symptoms: do they predict gallstones? a systematic review. Scand J Gastroenterol. 2000;35(1):70-76. 20. Verghese A, Dison C, Berk SL. Courvoisier’s “law”—an eponym in evolution. Am J Gastroenterol. 1987;82(3):248-250. 21. McGee S. Evidence-Based Physical Diagnosis. Philadelphia, PA: WB Saunders; 2001:594-639. 22. Sapira JD. The Art and Science of Bedside Diagnosis. Baltimore, MD: Urban & Schwartzenberg Inc; 1990:371-390. 23. Fournier AM, Michel J. Courvoisier’s sign revisited: two patients with palpable gallbladder. South Med J. 1992;85(5):548-550. 24. Aldea PA, Meehan JP, Sternbach G. The acute abdomen and Murphy’s signs. J Emerg Med. 1986;4(1):57-63. 25. Joachim H. Practical Bedside Diagnosis and Treatment. Baltimore, MD: CC Thomas; 1940. 26. Pullen RLR. Medical Diagnosis: Applied Physical Diagnosis. London, England: WB Saunders; 1944. 27. Zatuchni J. Notes on Physical Diagnosis. Philadelphia, PA: FA Davis Co; 1964. 28. Gunn A, Keddie N. Some clinical observations on patients with gallstones. Lancet. 1972;2(7771):239-241. 29. Shea JA, Berlin JA, Escarce JJ, et al. Revised estimates of diagnostic test sensitivity and specificity in suspected biliary tract disease. Arch Intern Med. 1994;154(22):2573-2581. 30. Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics. 1983;39(1):207-215. 31. Begg CB, McNeil BJ. Assessment of radiologic tests. Radiology. 1988;167 (2):565-569. 32. Zhou XH, Higgs RE. Assessing the relative accuracies of two screening tests in the presence of verification bias. Stat Med. 2000;19(11-12):16971705. 33. Guyatt G, Rennie D, for the Evidence-Based Medicine Working Group, American Medical Association. Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, IL: AMA Press; 2002.

Cholecystitis

34. Harvey RT, Miller WT Jr. Acute biliary disease. Radiology. 1999;213(3): 831-836. 35. Fidler J, Paulson EK, Layfield L. CT evaluation of acute cholecystitis: findings and usefulness in diagnosis. AJR Am J Roentgenol. 1996;166(5): 1085-1088. 36. Eddy D, Hasselblad V, Shachter R. Meta-Analysis by the Confidence Profile Method: The Statistical Synthesis of Evidence. San Diego, CA: Academic Press; 1992:93-98. 37. Eddy D, Hasselblad V. Fast*Pro v1.8: Software for Meta-analysis by the Confidence Profile Method. San Diego, CA: Academic Press; 1992. 38. Borensteiin M, Rothstein H. Comprehensive Meta-Analysis: A Computer Program for Research Synthesis. Englewood, NJ: Biostat Inc; 1999. 39. Adedeji OA, McAdam WA. Murphy’s sign, acute cholecystitis and elderly people. J R Coll Surg Edinb. 1996;41(2):88-89. 40. Bednarz GM, Kalff V, Kelly MJ. Hepatobiliary scintigraphy. Med J Aust. 1986;145(7):316-318. 41. Brewer BJ, Golden GT, Hitch DC, Rudolf LE, Wangensteen SL. Abdominal pain. Am J Surg. 1976;131(2):219-223. 42. Dunlop MG, King PM, Gunn AA. Acute abdominal pain. J R Coll Surg Edinb. 1989;34(3):124-127. 43. Eikman EA, Cameron JL, Colman M, et al. A test for patency of the cystic duct in acute cholecystitis. Ann Intern Med. 1975;82(3):318-322. 44. Gruber PJ, Silverman RA, Gottesfeld S, Flaster E. Presence of fever and leukocytosis in acute cholecystitis. Ann Emerg Med. 1996;28(3):273-277. 45. Halasz NA. Counterfeit cholecystitis, a common diagnostic dilemma. Am J Surg. 1975;130(2):189-193. 46. Johnson H Jr, Cooper B. The value of HIDA scans in the initial evaluation of patients for cholecystitis. J Natl Med Assoc. 1995;87(1):27-32. 47. Juvonen T, Kiviniemi H, Niemela O, Kairaluoma MI. Diagnostic accuracy of ultrasonography and C reactive protein concentration in acute cholecystitis: a prospective clinical study. Eur J Surg. 1992;158(6-7):365369. 48. Liddington MI, Thomson WH. Rebound tenderness test. Br J Surg. 1991;78(7):795-796. 49. Lindenauer SM, Child CG III. Disturbances of liver function in biliary tract disease. Surg Gynecol Obstet. 1966;123(6):1205-1211. 50. Potts FE IV, Vukov LF. Utility of fever and leukocytosis in acute surgical abdomens in octogenarians and beyond. J Gerontol A Biol Sci Med Sci. 1999;54(2):M55-M58. 51. Prevot N, Mariat G, Mahul P, et al. Contribution of cholescintigraphy to the early diagnosis of acute acalculous cholecystitis in intensive-careunit patients. Eur J Nucl Med. 1999;26(10):1317-1325. 52. Raine PA, Gunn AA. Acute cholecystitis. Br J Surg. 1975;62:697-700. 53. Schofield PF, Hulton NR, Baildam AD. Is it acute cholecystitis? Ann R Coll Surg Engl. 1986;68(1):14-16. 54. Singer AJ, McCracken G, Henry MC, Thode HC Jr, Cabahug CJ. Correlation among clinical, laboratory, and hepatobiliary scanning findings in patients with suspected acute cholecystitis. Ann Emerg Med. 1996;28 (3):267-272. 55. Staniland JR, Ditchburn J, De Dombal FT. Clinical presentation of acute abdomen: study of 600 patients. BMJ. 1972;3(5823):393-398. 56. Knipschild P. Looking for gall bladder disease in the patient’s iris. BMJ. 1988;297(6663):1578-1581. 57. Ernst E. Iridology: not useful and potentially harmful. Arch Ophthalmol. 2000;118(1):120-121. 58. Simon A, Worthen DM, Mitas II JA. An evaluation of iridology. JAMA. 1979;242(13):1385-1389. 59. Wagner JM, McKinney WP, Carpenter JL. Does this patient have appendicitis? JAMA. 1996;276(19):1589-1594. 60. Parker LJ, Vukov LF, Wollan PC. Emergency department evaluation of geriatric patients with acute cholecystitis. Acad Emerg Med. 1997;4(1):51-55. 61. Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med. 1978;299(17):926-930. 62. Mulherin SA, Miller WC. Spectrum bias or spectrum effect? Ann Intern Med. 2002;137(7):598-602. 63. Fitzgibbons RJ Jr, Tseng A, Wang H, Ryberg A, Nguyen N, Sims KL. Acute cholecystitis. Surg Endosc. 1996;10(12):1180-1184. 64. Diamond GA, Forrester JS. Analysis of probability as an aid in the clinical diagnosis of coronary-artery disease. N Engl J Med. 1979;300(24): 1350-1358. 65. Rodger MA, Carrier M, Jones GN, et al. Diagnostic value of arterial blood gas measurement in suspected pulmonary embolism. Am J Respir Crit Care Med. 2000;162(6):2105-2108.

143

CHAPTER 12

The Rational Clinical Examination

66. Wells PS, Anderson DR, Rodger M, et al. Excluding pulmonary embolism at the bedside without diagnostic imaging. Ann Intern Med. 2001;135(2):98-107. 67. Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. Is this patient having a myocardial infarction? JAMA. 1998;280(14):1256-1263.

144

68. Ralls PW, Colletti PM, Lapin SA, et al. Real-time sonography in suspected acute cholecystitis. Radiology. 1985;155(3):767-771. 69. Wicki J, Perneger TV, Junod AF, Bounameaux H, Perrier A. Assessing clinical probability of pulmonary embolism in the emergency ward: a simple score. Arch Intern Med. 2001;161(1):92-97.

U P D A T E : Cholecystitis

12

Prepared by Robert L. Trowbridge, MD, and Kaveh G. Shojania, MD Reviewed by Amy Rosenthal, MD

CLINICAL SCENARIO

UPDATED SUMMARY ON ACUTE CHOLECYSTITIS

A 52-year-old man presents to the emergency department with a 6-hour history of dull, epigastric discomfort. The pain began several hours after lunch and did not intensify with eating dinner, although his appetite was poor. He is mildly nauseated and feels warm but denies emesis, diarrhea, dyspnea, chest pain, fever, chills, or previous episodes of a similar pain. The pain neither radiates nor changes with his position. He has hypertension and gout and drinks 2 oz of whiskey per day. There is no history of gastrointestinal illness or abdominal surgery. He appears moderately uncomfortable, but he is afebrile, with normal vital signs. The bowel sounds are decreased but present. Although he allows you to palpate his abdomen, you elicit tenderness in the epigastrium without rebound. The Murphy sign is negative. The liver and gallbladder are not palpable, a rectal examination causes no pain, and the stool is negative for occult blood. On laboratory examination, the white blood cell count is 12700/μL (the automated differential shows increased neutrophil levels). He has normal electrolyte, transaminase, bilirubin, amylase, and lipase levels and normal renal function. The alkaline phosphatase level is slightly elevated, at 155 U/L. An electrocardiogram reveals no evidence of ischemia, and the troponin I level is normal. The emergency physician regards acute cholecystitis as the leading diagnosis. Six months ago, the physician took a 2day course in ultrasonography and has since performed 40 bedside ultrasonographic tests on patients with abdominal pain, the first 10 of which were proctored by a radiologist to assess competency. The emergency physician performs a focused right upper quadrant ultrasonography in the present patient. He finds no evidence of gallstones, and the point of maximal tenderness does not localize to the gallbladder (ie, there is no sonographic Murphy sign). He recommends that the patient be discharged home, with followup in your clinic. However, the patient is still uncomfortable, and you have not established a diagnosis. Have you effectively ruled out acute cholecystitis?

Original Review Trowbridge RL, Rutkowski NK, Shojania KG. Does this patient have acute cholecystitis? JAMA. 2003;289(1):80-86.

UPDATED LITERATURE SEARCH We repeated the original search strategy that targeted any study involving diagnosis, physical examination, sensitivity and specificity, reproducibility of results, decision support techniques, and other relevant methodologic terms, with any of the following text or keywords: “gallbladder,” “gall stones,” or “cholecystitis.” The updated PubMed search included the years 1998 through September 2004, and we included a more robust search for systematic reviews according to a published strategy.1 This search yielded 337 articles published since November 11, 2001. An independent search of the OVID database with slight differences in the methodologic terms identified an additional 34 English-language studies published from 2002 to September 2004.

NEW FINDINGS • The clinician’s gestalt is the most important piece of evidence from the clinical evaluation. The single findings with the highest diagnostic value remain Murphy sign (positive likelihood ratio, 2.8) and right upper quadrant tenderness (negative likelihood ratio, 0.4), although the confidence intervals (CIs) for both values cross 1, as documented in the original review. • Bedside ultrasonography performed by physicians with brief formal training courses may be useful when the result is the combined absence of a sonographic Murphy sign and any evidence of gallstones. Additional studies of bedside ultrasonography by nonradiologists are required.

Details of the Update The focus of this review remains acute calculous cholecystitis. Studies focusing predominantly on acalculous cholecystitis were excluded.

145

CHAPTER 12

Update

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION No new data were found that modify the original results, although we added data on bedside ultrasonography performed by nonradiologists.

CHANGES IN THE REFERENCE STANDARD Surgical findings combined with pathology or clinical followup in patients who do not undergo surgery remain the reference standard for acute cholecystitis.

RESULTS OF LITERATURE REVIEW Patients reproducibly report biliary symptoms when questioned again 2 weeks after an initial assessment, using an extensive questionnaire addressing the details of their symptoms across various domains—pain, association with eating, changes in bowel habits, and fever, among others.2 Physicians concurred with patients’ self-reported symptoms to a substantial extent (κ scores > 0.6 and much higher in several cases). Two exceptions were history of fever and radiation to the right shoulder. For these findings, physicians concurred with only moderate agreement (κ = 0.52 and κ = 0.46, respectively). The bedside ultrasonography examination performed by a nonradiologist is an emerging approach to cholecystitis diagnosis.3,4 We had not previously included this test, so we conducted a supplemental search for additional articles addressing the utility of bedside ultrasonography. With no date restriction, we found 6 studies, although only 1 study3 was of sufficiently high quality to warrant abstraction and inclusion in the update (Table 12-3). All studies used nonconsecutive convenience samples, but the 5 additional studies excluded from the update also had bias because of nonindependence of reference standard (ie, the decision to undergo confirmatory testing explicitly depended on the results of bedside ultrasonography).4-8 In addition, these studies did not attempt to diagnose

Table 12-3 Bedside Ultrasonographic Findings for Acute Cholecystitis Finding

LR+ (95% CI)

LR– (95% CI)

Bedside ultrasonography evidence of gallstones and a positive sonographic Murphy signa

2.7 (1.7-4.1)

0.13 (0.04-0.39)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Requires special training and validation of competence.

146

acute cholecystitis; they only evaluated agreement between bedside ultrasonography and formal ultrasonography with respect to specific radiologic findings. The single included study showed that physicians with brief training and moderate experience in bedside ultrasonography could adequately visualize the gallbladder in most patients.3 Even among patients with definitive bedside ultrasonography results, defined as the presence of both cholelithiasis and a sonographic Murphy sign, the positive predictive value was only 70%. Thus, patients with positive findings on bedside ultrasonography require confirmatory radiologic investigations before proceeding to surgery. The negative predictive value is 90%. For 30 patients in this study, bedside ultrasonography detected no sonographic Murphy sign and no cholelithiasis. Had these patients not undergone formal ultrasonography, there would have been a 26% reduction in ultrasonography use by the emergency department, at a cost of missing 1 case of cholecystitis. It is tempting to regard this miss rate of 97% as clearly adequate to rule out cholecystitis, but the 95% CI for 1 of 30 extends from 0.6% to 17%. Among patients for whom the pretest suspicion of cholecystitis is low, a definitely negative bedside ultrasonography result probably would be adequate to decide against formal ultrasonography, especially if adequate clinical follow-up is in place.

Evidence From Guidelines There are no governmental agency guidelines that address the diagnosis of acute cholecystitis.

CLINICAL SCENARIO—RESOLUTION You decide to observe the patient in the hospital for at least 24 hours. The patient’s pain improves with intravenous morphine, and he is admitted to the medical service. Overnight, he requires increasing doses of morphine for pain control, but the electrocardiogram output and cardiac enzyme levels remain normal. The following morning, the abdominal tenderness has worsened, and the white blood cell count has increased to 19 200/μL, but the serum amylase level remains normal. Given the persistent concern for acute cholecystitis, he undergoes formal abdominal ultrasonography in the radiology department, which reveals substantial pericholecystic fluid and a positive sonographic Murphy sign. A laparoscopic cholecystectomy reveals an acutely inflamed gallbladder with a small stone impacted in the cystic duct. Pathology confirms the diagnosis of acute cholecystitis.

CHAPTER 12

Cholecystitis

ACUTE CHOLECYSTITIS—MAKE THE DIAGNOSIS No single clinical finding, or known combination of clinical history and physical examination findings, efficiently establishes a diagnosis of acute cholecystitis. Thus, clinicians must rely on their clinical gestalt. Bedside ultrasonography requires additional study, and clinicians must receive proper training, followed by demonstration of their proficiency.

PRIOR PROBABILITY Approximately 5% of emergency department patients with abdominal pain have cholecystitis. Women and Native Americans have a higher risk of cholecystitis. Patients with increased risk of cholecystitis include those with chronic hemolytic disease (eg, sickle cell disease) or recent rapid weight loss.

POPULATION FOR WHOM ACUTE CHOLECYSTITIS SHOULD BE CONSIDERED Patients with abdominal pain.

REFERENCES FOR THE UPDATE 1. Shojania KG, Bero LA. Taking advantage of the explosion of systematic reviews: an efficient MEDLINE search strategy. Eff Clin Pract. 2001;4(4):157162. 2. Romero Y, Thistle JL, Longstreth GF, et al. A questionnaire for the assessment of biliary symptoms. Am J Gastroenterol. 2003;98(5):10421051.a 3. Rosen CL, Brown DF, Chang Y, et al. Ultrasonography by emergency physicians in patients with suspected cholecystitis. Am J Emerg Med. 2001;19(1):32-36.a 4. Kendall JL, Shimp RJ. Performance and interpretation of focused right upper quadrant ultrasound by emergency physicians. J Emerg Med. 2001;21(1):7-13.

DETECTING THE LIKELIHOOD OF ACUTE CHOLECYSTITIS See Table 12-4. Table 12-4 Likelihood Ratios for Acute Cholecysitis Finding (No. of Studies)

LR+ (95% CI)

LR– (95% CI)

≈25-30 Clinical gestalta Murphy’s sign (n = 3) 2.8 (0.8 to 8.6) Right upper quadrant tenderness (n = 4) 1.6 (1.0 to 2.5)

0.5 (0.2 to 1.0) 0.4 (0.2 to 1.1)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aThe LR is imputed from the baseline pretest probability (5%), the sensitivity and specificity of ultrasonography (0.88 and 0.80, respectively), and the false-positive rate of diagnosis.

REFERENCE STANDARD TESTS Surgical findings combined with pathology or clinical follow-up in patients who do not undergo surgery remain the reference standard for acute cholecystitis.

5. Lanoix R, Leak LV, Gaeta T, Gernsheimer JR. A preliminary evaluation of emergency ultrasound in the setting of an emergency medicine training program. Am J Emerg Med. 2000;18(1):41-45. 6. Durston W, Carl ML, Guerra W. Patient satisfaction and diagnostic accuracy with ultrasound by emergency physicians. Am J Emerg Med. 1999;17(7):642-646. 7. Schlager D, Lazzareschi G, Whitten D, Sanders AB. A prospective study of ultrasonography in the ED by emergency physicians. Am J Emerg Med. 1994;12(2):185-189. 8. Jehle D, Davis E, Evans T, et al. Emergency department sonography by emergency physicians. Am J Emerg Med. 1989;7(6):605-611. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

147

This page intentionally left blank

EVIDENCE TO SUPPORT THE UPDATE: Cholecystitis

TITLE A Questionnaire for the Assessment of Biliary Symptoms. AUTHORS Romero Y, Thistle JL, Longstreth GF, et al. CITATION Am J Gastroenterol. 2003;98(5):1042-1051. QUESTION What are the reproducibility, concurrent validity, and discriminative ability of a questionnaire designed to elicit patients’ self-reported biliary symptoms? DESIGN Prospective, independent, consecutive sample of blinded patients and investigators. SETTING Referral gastroenterology practice at a major teaching institution, Rochester, Minnesota. PATIENTS Two hundred forty-five adults (aged ≥ 18 years) referred to an outpatient clinic.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD A Biliary Symptoms Questionnaire (BSQ) was developed according to a review of the literature and the experience of the investigators using previously developed questionnaires for irritable bowel syndrome (IBS) and gastroesophageal reflux disease (GERD) as templates. The 114-question instrument was administered to subjects on initial presentation and then again after a 2-week interval. After the initial survey, subjects also underwent a structured interview conducted by investigators, who then completed their own BSQ according to the interview findings. Finally, investigators reviewed 10 BSQs of patients with known diagnoses (as determined by clinical follow-up and gastroenterologist opinion) and decided whether IBS, GERD, or biliary disease was the most likely diagnosis. A shortened BSQ was tested for reproducibility.

MAIN OUTCOME MEASURES Agreement was expressed as simple agreement (%) and agreement beyond chance (κ). The domains assessed were as follows:

12

1. Agreement between the serial surveys administered to the patient (reproducibility) 2. Agreement between patient-reported symptoms and physician-reported symptoms (concurrent validity) 3. Agreement between investigator diagnosis according to the BSQ and gastroenterologist clinical diagnosis (discriminative validity)

MAIN RESULTS Patients exhibited reasonable consistency throughout the 2week test-retest period (see Table 12-5). In addition, physicians concurred with patients’ self-reported symptoms with moderate or better agreement. Patient reproducibility and physician concurrence were almost perfect for complaints of upper abdominal pain (κ = 0.94 for both) and for jaundice (κ = 0.94 and κ = 0.84 for reproducibility and concurrence, respectively). Moderate agreement was observed for radiation of the pain (κ = 0.47 and 0.46 for reproducibility and concurrence, respectively). For fever, patients reported the symptom with substantial reproducibility (κ = 0.79), but physicians concurred with only moderate agreement (κ = 0.52). Although the questionnaire performed reasonably well in terms of discriminative ability (κ = 0.58), the limited sam-

Table 12-5 Questionnaire Results for Reproducibility and Concurrent Validity Reproducibility, κ (95% CI) Emesis Jaundice Pain in upper abdomen Nausea Fever Biliary symptomsa Radiation to right shoulder

0.95 (0.85 to 1) 0.94 (0.83 to 1) 0.94 (0.83 to 1) 0.81 (0.65 to 0.98) 0.79 (62 to 0.97) 0.72 (–0.03 to 0.95) 0.47 (–0.15 to 1)

Concurrent Validity, κ (95% CI) 0.73 (0.60 to 0.87) 0.84 (0.71 to 0.97) 0.94 (0.86 to 1) 0.75 (0.61 to 0.88) 0.52 (0.36 to 0.68) 0.64 (0.15 to 0.95) 0.46 (0.21 to 0.72)

Abbreviation: CI, confidence interval. aThe results for biliary symptoms reflect the median agreement across all 18 questions identified as biliary (as opposed to gastroesophageal reflux disease or irritable bowel syndrome), including stabbing upper abdominal pain, cramping upper abdominal pain, radiation to the back, radiation to the shoulder blade, periodicity of pain episodes, daytime or nocturnal occurrence, and pain improved with movement, among others.

E12-1

CHAPTER 12

Evidence to Support the Update

ple size (only 10 patients) and presentation of only 3 diagnostic choices (biliary colic, GERD, and IBS) severely limit this finding.

TITLE Ultrasonography by Emergency Physicians in Patients With Suspected Cholecystitis. AUTHORS Rosen CL, Brown DF, Chang Y, et al.

CONCLUSIONS

CITATION Am J Emerg Med. 2001;19(1):32-36.

LEVEL OF EVIDENCE Level 1 for reproducibility and con-

current validity. Level 4 for discriminative validity (nonindependent sample with small numbers). STRENGTHS The reproducibility and concurrent validity

sections were well designed. LIMITATIONS The study was designed primarily to assess the utility of a questionnaire as a research tool rather than to assess the variability in patient and physician reporting of abdominal symptoms. In testing the discriminative validity of the questionnaire, a small sample (10) of patients was used. In addition, investigators were given only 3 possible diagnoses to choose from—biliary pain, GERD, and IBS— which likely resulted in a significant overestimation of the discriminative ability of the questionnaire. The shortened BSQ was tested only for reproducibility, not concurrent validity or discriminative ability. This study evaluated the reproducibility and concurrent validity of a questionnaire aimed at evaluating those with possible biliary colic. Although a few conclusions may be inferred regarding the variability in reporting of abdominal symptoms, the main focus of the study was to validate the instrument as a research tool. Patients appeared to be reasonably consistent in reporting most abdominal symptoms over time, and physicians generally concurred in their assessments of patients’ symptoms.

Reviewed by Robert L. Trowbridge, MD

QUESTION How well do the assessments of emergency physicians using bedside ultrasonography (BUS) agree with the results of formal ultrasonography and clinical follow-up in the evaluation of suspected cholecystitis? DESIGN Prospective, independent, convenience sample. SETTING Emergency department at a major teaching hospital, Boston, Massachusetts. PATIENTS One hundred sixteen adults (aged ≥ 18 years) who presented with abdominal pain and were suspected of having cholecystitis.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Fifteen full-time emergency physicians underwent a 5-hour course, including didactic learning and hands-on training, on the use of an ultrasonographic machine to identify the gallbladder, detect gallstones, and elicit a sonographic Murphy sign. The bedside ultrasonographic findings were compared not only with formal ultrasonography by radiologists but also clinical follow-up, including the results of other noninvasive tests for cholecystitis, operative reports, pathology, and use of telephone follow-up 1 month after emergency department visit to ascertain subsequent episodes of abdominal pain requiring medical attention emergency visits.

MAIN OUTCOME MEASURES Agreement between BUS and formal ultrasonography in the detection of gallstones or presence of sonographic Murphy sign (ie, sensitivity and specificity of bedside ultrasonography, using formal ultrasonography as reference standard).

MAIN RESULTS Among 116 patients, the physician performing BUSs could not visualize the gallbladder adequately in 6 (5.2%) cases. Four of these 6 cases were diagnosed as cholecystitis on formal ultrasonography. The authors explicitly state their interest in focusing on cases in which bedside ultrasonography appears to provide a definitive answer. Definitive BUS results were defined as both findings present or both absent (ie, both gallstones and sonographic Murphy sign present or both absent). Of the 116 patients, 70 (60%) had definitive findings (see Table 12-6). Although we do not show it here, the E12-2

CHAPTER 12

Cholecystitis

Table 12-6 Likelihood Ratio of Bedside Ultrasonography Test Definitive bedside ultrasonography compared with clinical follow-up for detection of cholecystitis

Sensitivity

Specificity

LR+ (95% CI)

LR– (95% CI)

91%

66%

2.7 (1.7-4.1)

0.13 (0.04-0.39)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

authors presented data on the sensitivity and specificity of formal ultrasonography among patients with definitive results on BUSs. The negative likelihood ratio was similar to that above, but the positive likelihood ratio was much higher, at 14.

CONCLUSIONS LEVEL OF EVIDENCE Level 3, since not all patients referred

for formal ultrasonography were selected to undergo BUS. STRENGTHS Radiologists performed formal ultrasonogra-

phy without knowing the results of bedside ultrasonography. Distinct comparisons with formal ultrasonography and clinical follow-up provide useful information because cases not detected by formal ultrasonography would not be expected to be detected by BUS. Appropriately designed analysis, including adjusting for clustering effects. LIMITATIONS Convenience sample. It was not clear how clinicians decided to choose which patients they referred for right upper quadrant ultrasonography. Unconsciously or not, physicians may have selected cases in which bedside ultrasonography was likely to perform well. The 3 physicians with the most training and previous experience were investigators in the study, and they contributed almost half

of the patients. The remaining physicians each contributed 10 or fewer patients; 2 physicians contributed only 1 patient each. This study evaluated the potential effect of performing BUS on requests for formal ultrasonography to evaluate suspected acute cholecystitis. The limitations of the study (above) are important, but other well-designed aspects of the design and presentation of the results allow us to draw some reasonable conclusions. Physicians with brief training and moderate experience in bedside ultrasonography can adequately visualize the gallbladder in the majority of patients (95% in this study). Approximately 60% of patients had definitive BUS results, defined as the presence of both cholelithiasis and a sonographic Murphy sign. Among such patients, the positive predictive value of only 70% means positive results require confirmation with formal ultrasonography. The negative predictive value is 90%. The authors point out that, for 30 patients, bedside ultrasonography detected no sonographic Murphy sign and no cholelithiasis. Had these patients not been sent for formal ultrasonography, there would have been a 26% reduction in ultrasonographic use by the emergency department, at a cost of missing 1 case of cholecystitis. Reviewed by Kaveh G. Shojania, MD

E12-3

This page intentionally left blank

13

C H A P T E R

Does the Clinical Examination Predict Airflow Limitation? Donald R. Holleman Jr, MD David L. Simel, MD, MHS

CLINICAL SCENARIOS—DO THESE PATIENTS HAVE AIRFLOW LIMITATION? In each of the following cases, the clinician needs to decide whether the patient has airflow limitation. In case 1, a 63-year-old man who has smoked 2 packs of cigarettes per day for the past 47 years presents with decreased exercise tolerance caused by shortness of breath. In case 2, a 35-year-old woman complains of coughing, wheezing, and shortness of breath every autumn. In case 3, an 18year-old man is brought to an emergency department, with extreme difficulty breathing that began earlier that evening.

WHY IS IT IMPORTANT TO DETECT AIRFLOW LIMITATION BY CLINICAL EXAMINATION? Airflow limitation is a disorder known by many names, including airway obstruction and obstructive airways disease. Recognizing airflow limitation can lead to appropriate treatment and can yield important prognostic information. Patients with symptomatic airflow limitation may benefit by treatment with oral or inhaled bronchodilators, oral or inhaled glucocorticoids, or antibiotics. Recognition of this disorder also triggers environmental controls and preventive services, such as vaccination against pneumococcus and influenza. Screening is advocated for target disorders in which early intervention favorably affects patient outcomes. Physicians do not screen for airflow limitation because early intervention has not been shown to alter the disease course. Therefore, clinicians are likely to want to confirm or rule out disease in patients presenting with pulmonary symptoms, such as cough or dyspnea, rather than screen for unrecognized disease in asymptomatic individuals. The 3 clinical scenarios illustrate cases in which recognizing airflow limitation by the clinical examination is important. In the first case, recognizing airflow limitation might lead to the diagnosis of pulmonary emphysema, more intensive counseling on smoking cessation, vaccination against influenza and pneumococcal infection, and bronchodilator therapy to improve exercise tolerance. In the second case, recognizing airflow limitation might lead to the identification of environmental irritants or allergens responsible for symptoms. In the third case, recognizing airflow limitation would lead to the diagnosis of asthma and to acute, potentially lifesaving therapy with bronchodilators and systemic glucocorticoids. Recognizing airflow limitation clinically may have time, cost, and convenience advantages compared to routine pulmonary function testing. Spirometry is the test of choice for confirming a diagnosis of airflow limitation. Both the forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC) values are reduced in patients with airflow limitation; because the FEV1 is affected more than the FVC, the ratio of FEV1 to FVC (FEV1/FVC) also

Copyright © 2009 by the American Medical Association. Click here for terms of use.

149

CHAPTER 13

The Rational Clinical Examination

decreases. The reduced FEV1/FVC is the hallmark of airflow limitation. Although emphysema and chronic bronchitis represent permanent reductions in airflow, asthma is a disorder characterized by increased responsiveness of the bronchial tree to a variety of stimuli, leading to intermittent airflow limitation.1 In patients with asthma, provocative testing, such as methacholine challenge, may be necessary to bring about airflow limitation between symptomatic episodes. The reference standard for airflow limitation is the measurement of the FEV1 and the FVC by spirometry. An FEV1/FVC lower than the fifth percentile for age, height, and sex is considered abnormal.2 However, a normal FEV1/FVC during an asymptomatic period does not rule out intermittent airflow limitation. For most patients, the fifth percentile of FEV1/FVC is approximately 70%, but using this single value to diagnose airflow limitation is discouraged.2 We performed an English-language MEDLINE search, using the following Medical Subject Headings: (EXP Medical History Taking OR EXP Physical Examination) AND (EXP Lung Diseases, Obstructive). The titles and abstracts of the 1022 articles retrieved from the above MEDLINE search were reviewed independently by the 2 authors. If either reviewer chose an article as possibly useful, the article was reviewed for content. The authors had excellent agreement (κ = 0.85) on the 158 articles chosen for full review. If the article contained results of the clinical examination predicting airflow limitation, the article was reviewed for quality. References from appropriate articles were reviewed for additional references. Nineteen articles evaluating the clinical examination for airflow limitation3-21 used the accepted definition or a similar spirometric definition of disease. Others used a variety of definitions, including FEV1 only22-27 or other, less-accepted or unclear definitions.28-37 We chose to include articles using reference standards that are not currently accepted because they were otherwise methodologically sound or they provided the only data available for some of the clinical

Table 13-1 Reference Standards Used in Studies Yielding Operating Characteristics for Individual Clinical Examination Items Reference Standard FEV1 < fifth percentile and FEV1/FVC < fifth percentilea FEV1/FVC < fifth percentile FEV1/FVC < 0.70 FEV1/FVC < 0.75 and FVC < 80% of predicted FEV1 < 75% of predicted and FEV1/FVC < 0.80 FEV1 < 70% of predicted FEV1 < 2 L FEV1 < fifth percentile FEV1 < 60% of predicted or FEV1/FVC < 0.60 Roentgenography, total lung capacity, and residual capacity FEV1/FVC < 0.6 or history Diagnosis of asthma Normal spirometry

References 14 11 5-8, 16, 18, 22 9 20 23, 24 25, 26 37 17 33 31 32 30

Abbreviations: FEV1, forced expiratory volume in 1 second; FVC, forced vital capacity. a The definition recommended by the American Thoracic Society.1

150

examination findings. The reference standards used in studies evaluating operating characteristics for individual clinical examination items are listed in Table 13-1. Because all studies used reference standards of current airflow limitation, the results in this review can be used only to predict airflow limitation at the evaluation. Patients with asthma may be overlooked if examined between attacks.

PATHOPHYSIOLOGIC CHARACTERISTICS OF AIRFLOW LIMITATION Understanding the physiologic characteristics of pulmonary airflow helps to explain the clinical examination findings in airflow limitation. The airways are a branching system of tubes that link the outside atmosphere with the lung parenchyma. During inspiration, the thoracic cavity actively expands. As the chest volume increases, the intrathoracic pressure decreases. Because the airways are open to the atmosphere, air flows into the airways to equalize the intrathoracic pressure with the atmospheric pressure. Therefore, during inspiration, the pressure inside the airways is greater than the pressure in the surrounding lung. This pressure exerts a force on the inner wall of the airway, increasing the airway diameter during inspiration. At end inspiration, the chest no longer expands, and the intrathoracic-to-atmospheric pressure difference disappears. During expiration, the thoracic cavity passively contracts. As the chest volume decreases, the intrathoracic pressure increases and exceeds the atmospheric pressure. Because the airways communicate with the atmosphere, the pressure inside the airways is lower than the pressure in the surrounding lung. This pressure difference exerts a force on the outer wall of the airway, decreasing the airway diameter during expiration. The resistance to airflow is inversely and exponentially related to the diameter of the airway, so small decreases in airway diameter lead to large increases in resistance. During inspiration and expiration, the diameter of the airway varies around its static, resting diameter. In airflow limitation, the resting airway diameter is abnormally small. In emphysema, the lung parenchyma is destroyed. This leads to a decrease in the tethering forces that maintain airway diameter, resulting in decreased resting airway diameter. In asthma, the smooth muscle that surrounds the airway is hyperreactive to various stimuli. When one of these stimuli is present, the smooth muscle contracts. This leads to decreased resting diameter of the airway. In chronic bronchitis, there is increased mucus production in the airways. There may also be decreased mucus clearance caused by ciliary dysfunction. The resulting increased intra-airway mucus coats the inner wall of the airway. This leads to decreased resting diameter of the airway. Thus, in airflow limitation syndromes, the resistance to airflow is increased throughout the respiratory phase. Because of the further physiologic decrease in airway diameter during expiration, it is significantly more difficult to empty the lungs than to fill them. This leads to air trapping and to lung hyperinflation that can be demonstrated by an abnormally large residual volume on pulmonary function testing. The touted physical examination findings for airflow limitation arise either from the difficulty in emptying the lungs or from the resulting hyperinflation. The prolonged expiratory

CHAPTER 13 phase, wheezing, rhonchi, and match test are signs of abnormally high resistance to airflow during expiration. Decreased breath sounds, barrel chest, hyperresonance, decreased cardiac and hepatic dullness, absent or subxiphoid cardiac apical impulse, decreased chest expansion, and decreased diaphragmatic movement are signs of hyperinflation. Use of accessory muscles results from both the increased work of expiration and pulmonary hyperinflation.

Chronic Obstructive Airways Disease

this systolic blood pressure value is noted. The cuff pressure is further reduced until the first Korotkoff sound is heard throughout inspiration; the systolic blood pressure at this point is also noted. The systolic blood pressure is normally lower during inspiration than during expiration. The normal difference is accentuated when the patient has airflow limitation. If the difference between these 2 pressures is at least 15 mm Hg, the patient has pulsus paradoxus. Palpation

HOW TO ELICIT SYMPTOMS AND SIGNS OF AIRFLOW LIMITATION A concise evaluation for airflow limitation includes a focused medical history and physical examination.

History The history should elicit background features and specific symptoms. Background Information

The most important background features are exposure to cigarette smoke and to occupational or environmental pollutants. The duration of cigarette exposure can most easily be elicited by asking at what age the patient started smoking and in what year he or she quit. Although pack-years is the traditional measure of cigarette exposure, quantifying years of exposure works at least as well.13 The patient’s personal and family history of atopic diseases is also associated with increased likelihood of asthma.

Palpation should include locating the cardiac apical impulse. Chest palpation should be performed with the patient supine and disrobed from the waist up. A sheet or gown should be used to maintain patient comfort and privacy; however, palpation should be performed with the hand directly on the chest wall. When the chest volume is increased because of hyperinflation, the cardiac apex shifts to a more central location and either may not be palpable or may be palpable in the subxiphoid area. Percussion

The chest should be percussed to determine the quality of the sound that resonates. Percussion of the chest wall should be performed by placing a digit (usually the second or third) of the nondominant hand firmly against the chest wall parallel to and between the ribs. The second and third digits of the dominant hand are flexed slightly at the metacarpophalangeal and proximal and distal interphalangeal joints to form a slight arch with the 2 fingertips even. The fingertips of the dominant hand tap the distal interphalangeal joint of the nondominant hand with a firm pecking motion. If the sound is more hollow than normal, the chest is hyperresonant.

Symptoms

Auscultation

The most important symptoms to elicit from patients with suspected airflow limitation are wheezing, coughing, and sputum production. In fact, chronic bronchitis is defined by sputum production for at least 3 consecutive months in at least 2 consecutive years.1

Clinicians should auscultate the chest for wheezes, rhonchi, and breath sound intensity. Chest auscultation should be performed in a quiet room with the patient disrobed from the waist up. The warmed stethoscope diaphragm should be placed with moderate pressure on the patient’s chest to ensure good sound transmission. The chest should be auscultated bilaterally over the lower, middle, and upper lung fields posteriorly, anteriorly, and along the midaxillary line. Patients should be breathing heavily, but not forcefully. Wheezing will be heard as high-pitched musical tones especially during expiration. Rhonchi are lower-pitched wheezes.39 The intensity of breath sounds should be observed. Although elaborate scoring systems for breath sound intensity9,26 and for wheezing16 have been developed, they are not clearly better than the customary normal vs abnormal dichotomization.

Physical Examination The physical examination for airflow limitation should include inspection, measuring vital signs, palpation, percussion, auscultation, and expiratory airflow. Inspection

While assessing the patient’s overall appearance, the clinician should observe for the presence of a barrel chest. If the anteroposterior diameter appears greater than normal, the patient has a barrel chest deformity. This finding may be more an illusion than a true deformity because the anteroposterior dimensions have not been shown to be increased in patients with clinically defined barrel chests.38 Vital Signs

While measuring blood pressure, the clinician can determine whether there is pulsus paradoxus. This maneuver may be most helpful in patients with suspected acute airflow limitation. During tidal breathing, the sphygmomanometer is inflated to above the systolic blood pressure. The cuff pressure is slowly released until the first Korotkoff sound is heard only during expiration;

Measures of Airflow

Measures of expiratory airflow include the forced expiratory time13,17 and the match test.16,24,25 To perform a forced expiratory time test, the patient must take a deep breath and forcefully exhale until no more air can be expelled. During this maneuver, the patient must keep mouth and glottis fully open as if the patient were yawning. While the patient is performing the forced expiration, the clinician listens over the larynx or lower trachea with a stethoscope and times the duration of audible airflow. To obtain the best results, the forced expiratory time should be measured with a stopwatch and recorded to the nearest 0.1 second. An alternative 151

CHAPTER 13

The Rational Clinical Examination

maneuver is the match test. During this test, the patient performs a forced expiration exactly as in the forced expiratory time maneuver. However, the clinician holds a burning match 10 cm from the patient’s widely open mouth. If the match is still burning after the forced expiration, the test result is positive. Others have used a candle for this test. However, one needs a match to light a candle, and we can find no benefit in carrying around both except for those who frequently practice in the dark. Also, to avoid malpractice claims and personal injury, we do not recommend this test in patients receiving supplemental oxygen!

PRECISION OF HISTORY AND SYMPTOMS FOR AIRFLOW LIMITATION The observer agreement for smoking history, dyspnea, coughing, wheezing, chronic bronchitis, and orthopnea has been described with the κ statistic.13 Two physicians almost always agree on the smoking history (κ = 0.95). Physicians agree frequently on the presence or absence of wheezing (κ = 0.61), chronic bronchitis (κ = 0.55), dyspnea (κ = 0.44-0.48), and coughing (κ = 0.46).

ACCURACY OF MEDICAL HISTORY AND SYMPTOMS FOR AIRFLOW LIMITATION Table 13-2 summarizes the operating characteristic estimates for airflow limitation, obtained for each historical item and symptom, after pooling data from referenced studies.

Background Information The best background information for diagnosing airflow limitation is exposure to cigarette smoke. Although patients

who have smoked are only slightly more likely to have airflow limitation,5,6,13 never having smoked cigarettes is moderately well associated with decreased likelihood of disease.5,6,13 Perhaps more useful is the fact that the number of years the patient has smoked correlates well with the likelihood of disease (Figure 13-1).13 Patients with at least a 70-pack-year history of smoking are much more likely to have airflow limitation.16 Age is related to airflow limitation. Asthma is more common in the young, whereas chronic bronchitis and emphysema are more common in older patients. The prevalence of airflow limitation appears to be lowest between ages 10 and 30 years.40 The higher prevalence at younger ages is due to asthma, which frequently remits after childhood. The higher prevalence in the older age group is probably due to 2 factors. First, age is a proxy for exposure to toxins, especially cigarette smoke. When smokers and nonsmokers are analyzed separately, the prevalence of airflow limitation does not appear to increase significantly with age in nonsmokers.41 Second, in adults, most airflow limitation is a chronic disease, so new incident cases are added faster than attrition from mortality, except in the very old. Therefore, advancing age is associated with increased likelihood of airflow limitation in adult smokers, but airflow limitation should not be considered a normal process of aging.

Symptoms Symptoms of chronic bronchitis,13,19 sputum production of at least one-fourth of a cup when present,16 or wheezing13,36 are associated with a moderate increase in the likelihood of airflow limitation. However, symptoms of cough5,13 or exertional dyspnea13,36 are associated with only a slight increase in the likelihood of airflow limitation. Orthopnea is not useful in diagnos-

Table 13-2 Composite Operating Characteristics of History Items Predicting Airflow Limitation Item Smoking history ≥70 vs 9 6-9 15 mm Hg) Decreased breath sounds Accessory muscle use Excavated supraclavicular fossae

Grade of Recommendationb

References

Sensitivity, %

Specificity, %

LR+

LR–

A Bc B B B B A

14, 17, 34 32 17 17, 25, 26 31, 32 17 14, 18

15 10 13 61 8 32

99.6 99 99 91 99 94

36 10 10 7.1 5.9 4.8

0.85 0.90 0.88 0.43 0.95 0.73

B C B C C

14, 17 8, 23, 24 14, 17 33, 37 37

8 45 37 24 31

98 88 90 100 100

4.8 2.7 0.45 4.6 3.7 3.7 e e

0.94 0.62 0.70 0.70 0.69

Abbreviations: LR+, positive likelihood ratio; LR–, negative likelihood ratio. aListed in order of decreasing LR+. bSee Table 1-7 for a summary of Evidence Grades and Levels. cThis recommendation includes only children. Sensitivity in adults was 4% (grade C).12 d Because the forced expiratory time test has 3 levels, LR–, sensitivity, and specificity cannot be calculated. eThis item was studied in too few subjects to yield meaningful results.

Vital Signs The presence of pulsus paradoxus of at least 15 mm Hg is associated with only a moderate increase in the likelihood of airflow limitation, and the absence of this sign is associated with only a slight reduction in the likelihood of disease.7,22,23 Other vital signs have not been studied and cannot be recommended for use in determining the likelihood of airflow limitation.

Palpation Palpating a subxiphoid cardiac apical impulse is associated with a moderate increase in the likelihood of airflow limitation. However, the absence of this finding is not useful.13,16 Absent apical impulse has been studied only in patients with known disease,32 so its usefulness has not yet been determined. Therefore, according to current evidence, we recommend palpating the subxiphoid region for the cardiac apical impulse. We recommend this despite the reportedly low observer agreement because the low prevalence of this finding may lead to underestimates of the chance-corrected agreement.

Percussion Chest hyperresonance on percussion is associated with a moderate increase in the likelihood of disease.16,32 Neither decreased cardiac dullness nor decreased diaphragmatic movement has been studied in enough patients to determine 154

definitively the extent of usefulness.16 However, patients with decreased cardiac dullness are more likely to have airflow limitation. Decreased liver dullness has been studied only in patients with known disease,32 so its usefulness has not yet been determined. Patients without chest hyperresonance are only slightly less likely to have airflow limitation.16,32 Normal cardiac dullness and normal diaphragmatic movement are likely not useful for decreasing the likelihood of airflow limitation.16 We recommend percussing the chest for the resonance sound. Hyperresonance over the precordium may be particularly useful for increasing the likelihood of airflow limitation.

Auscultation Objective wheezing, or wheezing observed on physical examination, is the most potent predictor of airflow limitation. Patients with wheezing almost certainly have airflow limitation.13,15,16,37 However, this is true only of wheezing on unforced expiration. Forced expiration is associated with increased sensitivity of wheezing, and with decreased specificity. The current literature suggests that the presence or absence of wheezing on forced expiration is of no value in diagnosing or ruling out airflow limitation.15,20 Additionally, the sensitivity of wheezing increases with the severity of airflow limitation.13 Studies that recruited patients referred for spirometry15,36 yielded sensitivities greater than those found in unreferred populations.13,16 Although the sensitivity of wheezing varies greatly (10%-50%) by study population,

CHAPTER 13 the LR+ and LR– change little. Rhonchi were associated with a moderate increase in the likelihood of airflow limitation in 2 studies30,31; however, because neither study explicitly defined rhonchi and because there is significant variability in how physicians define rhonchi,43 this result must be interpreted cautiously. Decreased breath sounds are associated with only a moderate increase in the likelihood of disease.13,16,32 Absent wheezing,13,15,16,36 normal breath sound intensity,13,16,32 or absent rhonchi30,31 are associated with only a moderate decrease in the likelihood of disease. We recommend auscultating the chest for wheezes and for breath sound intensity. Patients with wheezing should be considered to have airflow limitation, and patients with decreased breath sound intensity should be considered somewhat more likely to have airflow limitation. Patients without wheezing or with normal breath sound intensity should be considered somewhat less likely to have this disorder. Neither the presence nor absence of crackles (rales) helps with the diagnosis of airflow limitation.8,13,29

Measures of Airflow Patients who are unable to extinguish a lighted match held 10 cm from the open mouth are significantly more likely to have airflow limitation than patients who are able to extinguish a match. The ability to extinguish a match is associated with a moderate decrease in the likelihood of disease.16,24,25 The forced expiratory time4,5,10,11,13,16-18 is a continuous variable that can range from a few tenths of a second to more than 20 seconds. Unfortunately, each of the 4 best studies of forced expiratory time10,13,16,17 used different methods. Two studies10,16 used average expiratory time, which makes bedside use cumbersome. Of the other 2 studies, one used the shortest expiratory time of 3 trials;13 the other, the longest expiratory time of 2 trials.17 Because the ability to discriminate between patients with and without airflow limitation is the same regardless of whether the shortest or longest time is used,13 there is no clear advantage to one method over the other. To allow pooling of results, one of the studies13 was reanalyzed with the longest rather than the shortest time. When the longest expiratory time is chosen, a result less than 6 seconds was associated with a modest decrease in the likelihood of airflow limitation; a result between 6 and 9 seconds was associated with a modest increase in the likelihood of airflow limitation; and a result greater than 9 seconds was associated with a great increase in the likelihood of airflow limitation. A forced expiratory time of approximately 9 seconds predicts an FEV1/FVC of 70%,8 a level suggesting the diagnosis of airflow limitation. Peak expiratory flow rates predict airflow limitation (Figure 13-1).13 However, 2 studies have shown that peak expiratory flow adds little to the clinical examination for airflow limitation.13,16 In one study,16 peak expiratory results improved the accuracy of the clinical examination for only 1 of the 4 physicians studied. In the other study,13 peak expiratory flow was equivalent to auscultating for wheeze, but more difficult to assess. Therefore, we cannot recommend routine peak flow measurements in the diagnosis of airflow limitation. Peak flow

Chronic Obstructive Airways Disease

measurements may be useful in assessing benefit from therapy, especially for asthma.

CAN THE CLINICAL EXAMINATION PREDICT SEVERITY OF AIRFLOW LIMITATION? Stubbing et al3 found that the number of positive findings (tracheal descent during inspiration, sternomastoid contraction, scalene contraction, supraclavicular fossae excavation, supraclavicular fossae recession, intercostal recession, or costal margin movement) predicted the severity of airflow limitation in patients with known disease. These findings tended to be present only if the FEV1 was less than 50% of the predicted value. The American Thoracic Society1,2 found that the number of positive findings (barrel chest, low diaphragm, decreased diaphragmatic excursion, decreased breath sounds, prolonged expiratory phase, wheezing, noisy inspiration, or crackles) predicted the severity of airflow limitation (r = 0.6). The literature suggests that, as airflow becomes more limited, more physical examination findings become apparent.

ACCURACY OF THE OVERALL CLINICAL IMPRESSION FOR PREDICTING AIRFLOW LIMITATION Three studies14,17,33 evaluated the accuracy of the overall clinical impression or a clinician’s ability to integrate all aspects of the clinical examination in forming an impression about the likelihood of airflow limitation. Clinicians’ overall impressions13 (graded as moderate to severe limitation [LR+ = 4.2], mild [LR+ = 0.82], or none [LR+ = 0.42]), predicted any airflow limitation only moderately well. However, Badgett et al16 found that clinicians’ impressions (blinded to medical history but not physical examination) predicted moderate to severe airflow limitation somewhat better (LR+ = 7.3; LR– = 0.53) and about as well as some of the individual findings in Table 13-3. On the other hand, Fletcher32 evaluated the clinical impressions of 6 physicians and found sensitivities ranging from 15% to 95% for airflow limitation. Therefore, clinicians’ ability to diagnose airflow limitation clinically is variable, but accuracy seems to improve as the severity of airflow limitation increases.

COMBINATIONS OF INDIVIDUAL FINDINGS Six studies (Table 13-4) assessed the usefulness of combining clinical examination items to predict airflow limitation. Unfortunately, as with individual findings, combinations of findings do not effectively rule out airflow limitation. The best combination is never having smoked, no reported wheezing, and no wheezing on examination (Figure 13-1; LR–, 0.18).13 Other combinations have LR– values ranging from 0.33 to 0.77. Even the best combination is no better than smoking history alone (LR–, 0.16). Therefore, combinations of findings are more helpful for ruling in than for ruling out this disorder. In fact, a patient with any combination 155

CHAPTER 13

The Rational Clinical Examination

Table 13-4 Combinations of Clinical Examination Items Predicting Airflow Limitation Clinical Examination Item Years of cigarette exposure, patient-reported wheezing, objective wheezing13 Patient-reported chronic obstructive pulmonary disease, ≥ 70 pack-years of cigarette smoking, decreased breath sounds16 Dyspnea, subjective wheezing, objective wheezing, accessory muscle use, excavation of supraclavicular fossae, and distention of external jugular veins36 Breath sound intensity, use of scalene muscle, objective wheezing, and rales during cough27 Decreased breath sounds, objective wheezing, rales, and prolonged expiratory time33 History by questionnaire, standardized physical examination21

Interpretation

Relation to Airflow Limitation

Reference Standard

See Figure 13-1

LR+ varies (see Figure 13-1) LR– = 0.18

FEV1/FVC and FEV1 < fifth percentile

≥2 Findings present 50 mm Hg at cardiac catheterization. bEllipsis indicates not applicable.

438

Mitral Regurgitation We report the accuracy of the clinical examination for detecting moderate to severe regurgitation confirmed through echocardiography or cardiac catheterization (Table 33-7). Detection of moderate to severe MR, even in asymptomatic patients, may influence recommendations for echocardiographic monitoring30 or medical treatment.31 If a cardiologist hears a murmur in the mitral area (mid left thorax, fifth intercostal space), then the likelihood of MR is increased slightly, but absence of a murmur significantly reduces the likelihood of MR.15,32,33 Similarly, a late systolic or holosystolic murmur slightly increases the likelihood of MR, but absence of such a murmur significantly reduces the likelihood of MR. In the setting of acute MI, absence of a murmur is less useful for ruling out acute MR (LR–, 0.66; 95% CI, 0.25-1.0).34 Transient arterial occlusion was accurate for ruling in and ruling out left-sided regurgitant murmurs, such as MR and ventricular septal defect.27 Internal medicine house staff are less accurate than cardiologists for detecting the murmur of MR, with positive LRs ranging from 1.1 (for interns) to 4.6 (for medical students) and negative LRs ranging from 0.7 (for junior residents) to 1.0 (for interns and senior residents)35 (grade A study).

CHAPTER 33

Murmur, Systolic

Table 33-7 Accuracy of the Clinical Examination for Detecting Mitral Regurgitation Reference Standard (No. of Patients)

LR+ (95% CI)

LR– (95% CI)

Quality Gradea

Echocardiogram: moderate to severe MR (394) Cardiac catheterization: moderate to severe MR (35) Echocardiogram: moderate to severe MR (80) Cardiac catheterization: moderate to severe MR (206) Cardiac catheterization: severity not statedb

3.9 (3.0-5.1) 3.6 (1.9-7.7) 1.8 (1.2-2.5) 4.7 (1.3-11) 7.5 (2.5-23)

0.34 (0.23-0.47) 0.12 (0.02-0.50) 0 (0-0.8) 0.66 (0.25-1.0) 0.28 (0.13-0.60)

C C C C C

Finding Murmur in mitral Study 133 area Study 232 Late or holosystolic murmur15 Any murmur during acute MI34 With transient arterial occlusion, murmur increases in intensity27

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; MI, myocardial infarction; MR, mitral regurgitation. aSee Table 1-7 for a summary of Evidence Grades and Levels. bOutcome of interest was left-sided regurgitant lesions, including MR or ventricular septal defect.

The Bottom Line for Mitral Regurgitation • For cardiologists, absence of a mitral area murmur or a late systolic/holosystolic murmur significantly reduces the likelihood of MR, except in the setting of acute MI. • Cardiologists can accurately distinguish left-sided regurgitant murmurs, such as MR and ventricular septal defect, using transient arterial occlusion. • Noncardiologists’ assessments for MR are considerably less accurate.

Tricuspid Regurgitation Cardiologists are reasonably accurate for diagnosing the murmur of moderately severe to severe TR in patients (n = 21, with TR; n = 295, without TR) referred for echocardiography (LR+, 10.1; 95% CI, 5.8-18; LR–, 0.41; 95% CI, 0.24-0.70) (grade C).33 Special maneuvers may also be helpful for diagnosing TR and other right-sided lesions such as pulmonic stenosis. One study (n = 10, with TR or pulmonic stenosis; n = 40, without TR or pulmonic stenosis) using cardiologist examiners found that an increase in murmur intensity with inspiration significantly increased the likelihood of a right-sided valvular lesion, whereas the absence of increased intensity made these conditions less likely (LR+, 8.0; 95% CI, 3.5-18; LR–, 0.0; 95% CI, 0-0.43) (grade C).27 In another study, patients with severe MR (n = 15) or TR (n = 15) were examined by experienced cardiologists before cardiac catheterization.10 To distinguish TR from MR, increased murmur intensity on inspiration had a positive LR of ∞ (95% CI, 3.1-∞) and a negative LR of 0.20 (95% CI, 0.070.45). For the finding of increased murmur intensity with sustained abdominal pressure, the positive LR was ∞ (95% CI, 2.5-∞) and the negative LR was 0.33 (95% CI, 0.15-0.58) (grade C).

The Bottom Line for Tricuspid Regurgitation • Cardiologists can accurately detect the murmur of TR. • Cardiologists can accurately rule in and rule out TR with the quiet inspiration and sustained abdominal pressure maneuvers.

Hypertrophic Cardiomyopathy There are limited data on the accuracy of clinical examination for hypertrophic cardiomyopathy (also termed idiopathic

hypertrophic subaortic stenosis). Many studies evaluate phonocardiography or intracardiac tracings rather than auscultation,36-40 whereas others include fewer than 15 patients.41-45 One study evaluated carotid sinus pressure, which is not routinely recommended for the clinical examination.46 Special maneuvers may help distinguish the murmur of hypertrophic cardiomyopathy.27 Using cardiologist examiners, if a murmur decreased in intensity with passive leg elevation, then hypertrophic cardiomyopathy was significantly more likely (LR+, 8.0; 95% CI, 3.0-21), whereas if the murmur did not decrease in intensity, the likelihood was significantly reduced (LR–, 0.22; 95% CI, 0.06-0.77). If murmur intensity was decreased or unchanged with standing to squatting, then hypertrophic cardiomyopathy was significantly more likely (LR+, 4.5; 95% CI, 2.3-8.6), whereas if the murmur increased in intensity, the likelihood of hypertrophic cardiomyopathy was significantly reduced (LR–, 0.13; 95% CI, 0.02-0.81) (grade C).

The Bottom Line for Hypertrophic Cardiomyopathy Cardiologists can rule in or rule out hypertrophic cardiomyopathy by evaluating for decreased murmur intensity with passive leg elevation or increased murmur intensity when the patient goes from a squatting to standing position.

Mitral Valve Prolapse The accuracy of the clinical examination for diagnosing MVP cannot be defined, because clinical findings alone are sufficient for the diagnosis of MVP. A patient with a systolic click and a systolic murmur meets the diagnostic criteria for MVP even if the patient has a normal echocardiogram result.47,48 However, we can examine the relationship between clinical findings and echocardiographic findings (Table 33-8).49-53 With cardiologist examiners, a systolic click accompanied by a systolic murmur helped to rule in echocardiographic MVP. The accuracy of an isolated systolic click is variable, possibly because of unreliability of the clinical examination and differences between studies regarding the definition of echocardiographic MVP. An isolated systolic murmur has little effect on the likelihood of echocardiographic MVP, whereas absence of both a systolic click and a murmur appears to reduce the likelihood of echocardiographic MVP. Noncardiologists are less accurate than cardiologists for all of these findings. 439

CHAPTER 33

The Rational Clinical Examination

Table 33-8 Accuracy of the Clinical Examination for Detecting Echocardiographic Mitral Valve Prolapse Finding

Clinician (No. of Patients)

LR (95% CI)

Quality Grade a

Systolic click and murmur Study 149 Cardiologists (401) 19 (4.6-80) Study 250 Noncardiologists (104) 2.4 (1.0-5.7) Systolic click Study 149 Cardiologists (401) 12 (5.4-25) Study 250 Noncardiologists (104) 1.3 (0.7-2.2) Nonejection click, with or without a murmur Study 151 Cardiologists (155) 3.8 (2.3-6.8) Study 252 Cardiologists (140) 1.7 (1.3-2.1) Murmur, with or without a systolic click Study 152 Cardiologists (140) 1.9 (1.3-3.0) 53 Study 2 Noncardiologists (259) 1.2 (0.9-1.5) Murmur only Study 150 Cardiologists (401) 2.4 (1.0-5.7) Study 251 Noncardiologists (104) 0.7 (0.3-1.3) No murmur, no systolic click Study 151 Cardiologists (155) 0.04 (0.02-0.11) Study 252 Cardiologists (140) 0.26 (0.12-0.54) Study 349 Cardiologists (401) 0.21 (0.15-0.29) 50 Study 4 Noncardiologists (104) 0.53 (0.23-1.20)

C C C C A C C C

Mitral valve leaflet redundancy or thickening is the echocardiographic variable most strongly associated with adverse clinical outcomes.54,55 In one study, neither a systolic click (LR+, 2.8; 95% CI, 1.8-4.6; LR–, 0.76; 95% CI, 0.69-0.84) nor a systolic murmur (LR+, 1.3; 95% CI, 1.1-1.5; LR–, 0.57; 95% CI, 0.43-0.76) affected the likelihood of echocardiographic mitral valve leaflet thickening or redundancy (grade C study).56 Several longitudinal studies of patients with echocardiographic MVP have related baseline clinical findings to the development of adverse clinical events, including cardiac death, progressive MR requiring surgery, endocarditis, and systemic embolism.57,58 A holosystolic murmur without a systolic click significantly increased the likelihood of an adverse event, whereas absence of both a systolic click and murmur was associated with no adverse events. Other clinical findings had little effect on the likelihood of adverse events (Table 33-9).

The Bottom Line for Mitral Valve Prolapse • A systolic click, with or without systolic murmur, is sufficient for the diagnosis of MVP. • If a cardiologist hears a systolic click, with or without a murmur, then the likelihood of echocardiographic MVP is significantly increased. The absence of both a systolic click and murmur significantly reduces the likelihood of echocardiographic MVP. • In patients with echocardiographic MVP, a holosystolic murmur without a systolic click significantly increases the likelihood of long-term complications, whereas absence of both a systolic click and murmur significantly reduces the likelihood of long-term complications.

C C A C C C

Abbreviations: CI, confidence interval; LR, likelihood ratio. aSee Table 1-7 for a summary of Evidence Grades and Levels.

Table 33-9 Accuracy of the Clinical Examination for Predicting Adverse Clinical Outcomes Related to Mitral Valve Prolapse a Finding

Clinician (No. of Patients)

Holosystolic murmur Study 156 Cardiologists (316) Study 257 Cardiologists (321) Late systolic murmur or click and murmur Study 158 Cardiologists (316) 57 Study 2 Cardiologists (321) Click and holosytolic Cardiologists (321) murmur57 Any click or isolated click Study 158 Cardiologists (316) Study 257 Cardiologists (321) No click/no murmur Study 154 Cardiologists (237) Study 258 Cardiologists (316)

LR (95% CI)

Quality Gradeb

18 (6.6-51) 5.1 (2.2-9.9)

C C

1.2 (0.7-1.7) 0.8 (0.3-1.5) 0.8 (0.2-2.4)

C C C

0.4 (0.2-0.8) 0.26 (0.05-1.1)

C C

0 (0-4.1) 0 (0-1.4)

C C

Abbreviations: CI, confidence interval; LR, likelihood ratio. aOutcomes include death (cardiac or all-cause, depending on study), stroke, endocarditis, or progressive mitral regurgitation requiring surgery. Most outcomes were progressive mitral regurgitation requiring surgery. bSee Table 1-7 for a summary of Evidence Grades and Levels.

440

WHEN TO EXAMINE FOR SYSTOLIC MURMURS We are unaware of data by which one might give an evidencebased recommendation regarding the examination for systolic murmurs. Auscultation for systolic murmurs should probably be carried out in any patient for whom a complete cardiovascular database is necessary.

ARE SYSTOLIC MURMURS EVER NORMAL? In unreferred young adults, the prevalence of systolic murmurs ranges from 5% to 52%8,59-61; echocardiography result is normal in 86% to 100%.62-64 Echocardiography result is normal in 90% to 94% of pregnant women with systolic murmurs who are referred for testing.21,24,65 In elderly medical outpatients or residents of long-term care facilities, the prevalence of systolic murmurs ranges from 29% to 60%66-68; echocardiography is normal in 44% to 100%.24,25,29,69,70 This wide range of normal in the elderly can be partially explained by various study definitions of normal echocardiograms. Commonly detected abnormalities in the elderly were left ventricular systolic dysfunction, aortic stenosis, and MR. Other studies include aortic valve sclerosis as an abnormality, although the clinical importance of aortic valve sclerosis is uncertain.

CHAPTER 33 A venous hum71 and a mammary souffle are both normal conditions that present either as systolic murmurs or, more commonly, as continuous murmurs.

HOW TO IMPROVE SKILLS IN EXAMINING THIS AREA The characteristics of murmurs can be learned using cardiovascular auscultatory tapes or cardiac patient simulators, although the effectiveness of these aids is uncertain.72,73 Most audiotapes are accompanied by phonocardiographic and expert cardiologist analyses, so these tapes can help clinicians to calibrate their ears to those of experts. Most commercially available stethoscopes have similar acoustic properties, although some have poor performance at low frequencies.74 Good stethoscope maintenance is essential because dirt or cracked tubing75 will significantly reduce accuracy. Large earpieces are better because small earpieces can be occluded by the sharp bony angle at the external auditory meatus.3 At the bedside, eliminate background noise whenever possible. If background noise is unavoidable, try to repeat your examination in a quieter setting. Finally, relate your clinical findings to the results of assessments by a colleague, a cardiologist, or an echocardiogram whenever possible. Resolving disagreements between your assessments and those of others is an excellent way of upgrading your clinical skills.

RECOMMENDATIONS FOR FURTHER RESEARCH Most studies used cardiologists or senior cardiology fellows to conduct the clinical examinations. There are few data on the precision and accuracy of the clinical examination conducted by noncardiologists. Some studies include inappropriately narrow spectrums of patients, such as only patients with moderate and severe aortic stenosis.5,6,17 Further studies should focus on a broad spectrum of patients from primary or secondary care settings, particularly patients older than 40 years when the prevalence of abnormal murmurs is significantly increased.

CLINICAL SCENARIOS—RESOLUTIONS CASE 1 Your first patient, who is awaiting urgent surgery

for an open fracture, had a systolic murmur that did not radiate to the right carotid artery. The likelihood of aortic stenosis is significantly reduced by this finding. In addition, the carotid artery pulsation had normal volume, the S2 intensity was normal, and there was no S4. These findings also help to reduce the likelihood of aortic stenosis. You are confident in your assessment because it was conducted in a quiet room with a comfortable and cooperative patient. You can advise the surgeon that aortic stenosis is unlikely. CASE 2 Your second patient has a systolic click and a sys-

tolic murmur, strongly suggesting MVP. If you are an experienced auscultator, then these findings significantly increase

Murmur, Systolic

the likelihood that the echocardiogram will show evidence of MVP. However, even if the echocardiogram result is normal, you already have enough evidence to diagnose MVP. You may wish to obtain an echocardiogram at a later date to determine the severity of the valvular abnormality.

Author Affiliations at the Time of the Original Publication

Division of General Internal Medicine and Clinical Epidemiology, Department of Medicine, University of Toronto, Toronto, Ontario, Canada; and The Toronto Hospital, Toronto, Ontario, Canada. Acknowledgments

We thank Wilbert Aronow, MD, for providing additional data and methodologic information about his studies, Eugene Oddone, MD, MHS, for his helpful comments on earlier drafts of this article, and Ms Sharon Smith for her assistance in the preparation of the manuscript and tables.

REFERENCES 1. Ballard DJ, Khanderia BK, Tajik AJ, Seward JB, Weber VP, Melton LJ. Population-based study of echocardiography: time trends in utilization and diagnostic profile of an evolving technology. Int J Technol Assess Health Care. 1989;5(2):249-261. 2. Rushmer RF, Sparkman DR, Polley RFL, et al. Variability in detection and interpretation of heart murmurs. Am J Dis Child. 1952;83(6):740-754. 3. Constant J. Essentials of Bedside Cardiology. 2nd ed. Totowa, NJ: Humana Press; 2003:30, 121-153. 4. Marriot HJ. Bedside Cardiac Diagnosis. Philadelphia, PA: JB Lippincott; 1991:19. 5. Leach RM, McBrien DJ. Brachioradial delay: a new clinical indicator of the severity of aortic stenosis. Lancet. 1990;335(8699):1199-1201. 6. Chun PKC, Dunn BE. Clinical clue of severe aortic stenosis: simultaneous palpation of the carotid and apical impulses. Arch Intern Med. 1982;142(13):2284-2288. 7. Levine SA. The systolic murmur: its clinical significance. JAMA. 1933; 101:436-438. 8. Freeman AR, Levine SA. The clinical significance of the systolic murmur: a study of 1000 consecutive cardiac cases. Ann Intern Med. 1933;6:1371-1385. 9. Rivero-Carvallho JM. Signo para el diagnostico de las insufficiencias tricuspideas. Arch Inst Cardiol Mex. 1946;16:531-540. 10. Maisel A, Atwood JE, Goldberger AL. Hepatojugular reflux: useful in the bedside diagnosis of tricuspid regurgitation. Ann Intern Med. 1984;101 (6):781-782. 11. Lembo NJ, Dell’Italia LJ, Crawford MH, O’Rourke RA. Diagnosis of leftsided regurgitant murmurs of transient arterial occlusion: a new maneuver using blood pressure cuffs. Ann Intern Med. 1986;105(3):368-370. 12. Sharpey-Schafer EP. Effects of squatting on the normal and failing circulation. BMJ. 1956;1(4975):1072-1074. 13. Ravin A, Craddock LD, Wolf PS, Shander D. Auscultation of the Heart. 3rd ed. Chicago, IL: Mosby-Year Book; 1977:2:45-67. 14. Taranta A, Spagnuolo M, Snyder R, Gerbarg DS, Hofler JJ. Auscultation of the heart by physicians and by computer. In: Data Acquisition and Processing in Biology and Medicine, Vol 3. New York, NY: Macmillan Publishing Co; 1964:23-52. 15. Panidis IP, McAllister M, Ross J, Mintz GS. Prevalence and severity of mitral regurgitation in the mitral valve prolapse syndrome: a Doppler echocardiographic study of 80 patients. J Am Coll Cardiol. 1986;7(5):975-981. 16. Raftery EB, Holland WW. Examination into the heart: an investigation into variation. Am J Epidemiol. 1967;85(3):438-444. 17. Forssell G, Jonasson R, Orinius E. Identifying severe aortic valvular stenosis by bedside examination. Acta Med Scand. 1985;218(4):397-400.

441

CHAPTER 33

The Rational Clinical Examination

18. Dobrow RJ, Calatayud JB, Abraham S, Caceres CA. A study of physician variation in heart sound interpretation. Med Annu Dist Columbia. 1964;33:305-308. 19. Spodick DH, Sugiura T, Doi Y, Paladino D, Haffty BG. Rate of rise of the carotid pulse: an investigation of observer error in a common clinical measurement. Am J Cardiol. 1982;49(1):159-162. 20. Holleman DR, Simel DL. Does the clinical examination predict airflow limitation? JAMA. 1995;273(4):313-319. 21. Mishra M, Chambers JB, Jackson G. Murmurs in pregnancy: an audit of echocardiography. BMJ. 1992;304(6839):1413-1414. 22. Ahuja IM. Functional systolic murmurs. Indian Heart J. 1982;34(4):241-244. 23. Lockhart PB, Crist D, Stone PH. The reliability of the medical history in the identification of patients at risk for infective endocarditis. J Am Dent Assoc. 1989;119(3):417-422. 24. Xu M, McHaffie DJ. Nonspecific systolic murmurs: an audit of the clinical value of echocardiography. N Z Med J. 1993;106(950):54-56. 25. Aronow WS, Kronzon I. Correlation of prevalence and severity of valvular aortic stenosis determined by continuous-wave Doppler echocardiography with physical signs of aortic stenosis in patients aged 62 to 100 years with aortic systolic ejection murmurs. Am J Cardiol. 1987;60(4):399-401. 26. Hoagland PM, Cook EF, Wynne J, Goldman L. Value of noninvasive testing in adults with suspected aortic stenosis. Am J Med. 1986;80(6):1041-1050. 27. Lembo NJ, Dell’Italia LJ, Crawford MH, O’Rourke RA. Bedside diagnosis of systolic murmurs. N Engl J Med. 1988;318(24):1572-1578. 28. Jaffe WM, Roche AHG, Coverdale HA, McAlister HF, Ormiston JA, Greene ER. Clinical evaluation versus Doppler echocardiography in the quantitative assessment of valvular heart disease. Circulation. 1988;78(2):267-275. 29. McKillop GM, Stewart DA, Burns JMA, Ballantyne D. Doppler echocardiography in elderly patients with ejection systolic murmurs. Postgrad Med J. 1991;61(794):1059-1061. 30. Gaasch WH, John RM, Aurigemma GP. Managing asymptomatic patients with chronic mitral regurgitation. Chest. 1995;108(3):842-847. 31. Seneviratne B, Morre GA, West PD. Effect of captopril on functional mitral regurgitation in dilated heart failure: a randomized double blind placebo controlled trial. Br Heart J. 1994;72(1):63-68. 32. Meyers DG, McCall D, Sears TD, Olson TS, Felix GL. Duplex pulsed Doppler echocardiography in mitral regurgitation. J Clin Ultrasound. 1986;14(2):117-121. 33. Rahko PS. Prevalence of regurgitant murmurs in patients with valvular regurgitation detected by Doppler echocardiography. Ann Intern Med. 1989;111(6):466-472. 34. Lehmann KG, Francis CK, Dodge HT; TIMI Study Group. Mitral regurgitation in early myocardial infarction: incidence, clinical detection, and prognostic implications. Ann Intern Med. 1992;117(1):10-17. 35. Kinney EL. Causes of false-negative auscultation of regurgitant lesions: a Doppler echocardiographic study of 294 patients. J Gen Intern Med. 1988;3(5):429-434. 36. Buda A, Mackenzie G, Wigle D. The Mueller maneuver in muscular subaortic stenosis [abstract]. Circulation. 1977;56(suppl 3):III-138. 37. Nellen M, Gotsman MS, Vogelpoel L, Beck W, Schrire V. Effects of prompt squatting on the systolic murmur in idiopathic hypertrophic obstructive cardiomyopathy. BMJ. 1967;3(5558):140-143. 38. Braunwald E, Oldham HN Jr, Ross J Jr, Linhart JW, Mason DT, Fort L III. The circulatory response of patients with idiopathic hypertrophic subaortic stenosis to nitroglycerin and the Valsalva maneuver. Circulation. 1964;29:422-431. 39. Kramer DS, French WJ, Criley JM. The postextrasystolic murmur response to gradient in hypertrophic cardiomyopathy. Ann Intern Med. 1986;104(6):772-776. 40. Petrin TJ, Tavel ME. Idiopathic hypertrophic subaortic stenosis as observed in a large community hospital. J Am Geriatr Soc. 1979;27(1):43-46. 41. Rosenblum R, Delman AJ. Valsalva’s maneuver and the systolic murmur of hypertrophic subaortic stenosis. Am J Cardiol. 1965;15:868-870. 42. Stefadouros MA, Mucha E, Frank MJ. Paradoxic response of the murmur of idiopathic hypertrophic subaortic stenosis to the Valsalva maneuver. Am J Cardiol. 1976;37(1):89-92. 43. Bartall H, Amber S, Desser KB, Benchimol A. Normalization of the external carotid pulse tracing of hypertrophic subaortic stenosis during Muller’s maneuver. Chest. 1978;74(1):77-78. 44. Cassidy J, Aronow WS, Prakash R. The effect of isometric exercise on the systolic murmur of patients with idiopathic hypertrophic subaortic stenosis. Chest. 1975;67(4):395-397. 45. Battle WE, Siegel FA, Fox LM. The older patient with idiopathic hypertrophic subaortic stenosis. Geriatrics. 1977;1:61-69.

442

46. Klein HO, DiSegni E, Dean H, Beker B, Bakst A, Kaplinsky E. Increased intensity of the murmur of hypertrophic obstructive cardiomyopathy with carotid sinus pressure. Chest. 1988;93(4):814-820. 47. Perloff JK, Child JS, Edwards JE. New guidelines for the clinical diagnosis of mitral valve prolapse. Am J Cardiol. 1986;57(13):1124-1129. 48. Devereux RB, Kramer-Fox R, Shear K, Kligfield P, Pini R, Savage DD. Diagnosis and classification of severity of mitral valve prolapse: methodologic, biologic, and prognostic consideration. Am Heart J. 1987;113(5):1265-1280. 49. Devereux RB, Kramer-Fox R, Brown WT, et al. Relation between clinical features of the mitral valve prolapse syndrome and echocardiographically documented mitral valve prolapse. J Am Coll Cardiol. 1986;8(4):763-772. 50. Olive KE, Grassman ED. Mitral valve prolapse: comparison of diagnosis by physical examination and echocardiography. South Med J. 1990;83(11): 1266-1269. 51. Abbasi AS, DeCristofaro D, Anabtawi J, Irwin L. Mitral valve prolapse: comparative value of M-mode, two-dimensional and Doppler echocardiography. J Am Coll Cardiol. 1983;2(6):1219-1223. 52. Barron JT, Manrose DL, Liebson PR. Comparison of auscultation with two-dimensional and Doppler echocardiography in patients with suspected mitral valve prolapse. Clin Cardiol. 1988;11(6):401-406. 53. Retchin SM, Waugh RA, Fletcher RH. The impact of echocardiographic results on treatment decisions for patients suspected of mitral valve prolapse. Med Care. 1987;25(7):652-657. 54. Nishimura RA, McGoon MD, Shub C, Miller FA Jr, Ilstrup DM, Tajik AJ. Echocardiographically documented mitral-valve prolapse: long-term follow-up of 237 patients. N Engl J Med. 1985;313(21):1305-1309. 55. Orencia AJ, Petty GW, Khanderia BK, et al. Risk of stroke with mitral valve prolapse in population-based cohort survey. Stroke. 1995;26(1):7-13. 56. Marks AR, Choong CY, Sanfilippo AJ, Ferre M, Weyman AE. Identification of high-risk and low-risk subgroups of patients with mitral valve prolapse. N Engl J Med. 1989;320(16):1031-1036. 57. Tofler OB, Tofler GH. Use of auscultation to follow patients with mitral systolic clicks and murmurs. Am J Cardiol. 1990;66(19):1355-1358. 58. Zuppiroli A, Rinaldi M, Kramer-Fox R, Favilli S, Roman MJ, Devereux RB. Natural history of mitral valve prolapse. Am J Cardiol. 1995;75(15):10281032. 59. McCracken D, Everett JE. An investigation of the incidence of cardiac murmurs in young healthy women. Practitioner. 1976;216(1293):308-309. 60. Stewart IMG. Systolic murmurs in 525 healthy young adults. Br Heart J. 1951;13(4):561-565. 61. Vaughan JP. Blood pressure and heart murmurs in a rural population in the United Republic of Tanzania. Bull World Health Organ. 1979;57(1):89-97. 62. Cotter L, Logan RL, Poole A. Innocent systolic murmurs in healthy 40year-old men. J R Coll Physicians Lond. 1980;14(2):128-129. 63. Markiewicz W, Stoner J, London E, Hunt SA, Popp RL. Mitral valve prolapse in one hundred presumably healthy young females. Circulation. 1976;53(3):464-473. 64. Tan CC, Hiew TM. Innocent systolic murmurs in healthy young males: a study of the characteristics and prevalence in the SAF pre-enlistee population. Singapore Med J. 1988;29(4):337-340. 65. Northcote RJ, Knight PV, Ballantyne D. Systolic murmurs in pregnancy: value of echocardiographic assessment. Clin Cardiol. 1985;8(6):327-328. 66. Bethel CS, Crow EW. Heart sounds in the aged. Am J Cardiol. 1963;11:763767. 67. Bruns DL, Van Der Hauwaert LG. The aortic systolic murmur developing with increased age. Br Heart J. 1958;20(3):370-378. 68. Griffiths RA, Sheldon MG. The clinical significance of systolic murmurs in the elderly. Age Ageing. 1975;4(2):99-104. 69. Perez GL, Jacob M, Bhat PK, Rao DB, Luisada AA. Incidence of murmurs in the aging heart. J Am Geriatr Soc. 1976;24(1):29-31. 70. Knight PV, Martin BJ, Ballantyne D. Echocardiographic diagnoses in elderly patients with systolic murmurs and cardiac disease. Age Ageing. 1986;15(3):169-173. 71. Bujack W, Gioia F, Cayler GG. An innocent thrill: a common finding with an innocent murmur. JAMA. 1976;235(22):2417. 72. Oddone EZ, Waugh RA, Samsa G, Corey R, Feussner JR. Teaching cardiovascular examination skills: results from a randomized controlled trial. Am J Med. 1993;95(4):389-396. 73. Ewy GA, Felner JM, Juul D, Mayer JW, Sajid AW, Waugh RA. Test of a cardiology patient simulator with students in fourth year electives. J Med Educ. 1987;62(9):738-743. 74. Abella M, Formolo J, Penney DG. Comparison of the acoustic properties of six popular stethoscopes. J Acoust Soc Am. 1992;91(4 pt 1):2224-2228. 75. Orton D, Stryker R. Sick stethoscope syndrome. JAMA. 1986;256(20):2817.

U P D A T E : Murmur, Systolic

33

Prepared by David Cescon, MD, and Edward Etchells, MD, MSC Reviewed by Eugene Oddone, MD

CLINICAL SCENARIO A 62-year-old man scheduled for elective total knee replacement has been referred to you for preoperative assessment of a systolic murmur. The orthopedic surgeon detected a systolic murmur and wants to rule out aortic stenosis (AS) before surgery. The patient has no cardiovascular symptoms. On auscultation, you hear normal first and second heart sounds (S1 and S2). There is a grade 3 early systolic murmur, loudest at the lower left sternal border, which does not radiate to either the right clavicle or carotids. You detect a normal volume and normal rate of increase of the carotid pulse. The rest of the clinical examination results, including those for the electrocardiogram (ECG) and chest radiograph, are normal.

Original Review Etchells EE, Bell C, Robb K. Does this patient have an abnormal systolic murmur? JAMA. 1997;277(7):564-571.

UPDATED LITERATURE SEARCH Our literature search combined the parent search strategy for The Rational Clinical Examination with the following terms: “systolic and murmur,” “heart valve diseases,” “aortic valve stenosis,” “pulmonary valve stenosis,” “mitral valve prolapse,” “mitral valve insufficiency,” “tricuspid valve insufficiency,” “hypertrophic cardiomyopathy,” and “heart murmurs.” Results were limited to English-language publications in the MEDLINE database from 1996 to July 2004. The titles and abstracts of the search results were screened, case reports were excluded, and 28 potentially relevant primary studies and review articles were retrieved. We scanned the reference list of each article for additional studies. For accuracy studies, we retained those of adult subjects that included sensitivity and specificity data of physical findings and had a quality score of level 3 or greater. We excluded level 3 studies with fewer than 100 patients. Five new studies were ultimately included in this update.

NEW FINDINGS 1. Cardiologists are able to distinguish normal (“innocent”) murmurs from abnormal murmurs by the physical examination alone. 2. Emergency department physicians are able to detect normal murmurs by clinical evaluation (including physical examination; medical history; ECG, chest radiograph, and laboratory test results; and previously recorded chart data). 3. The presence of a holosystolic murmur, loud murmur, decreased carotid upstroke, or systolic thrill makes it much more likely that a systolic murmur represents an underlying cardiac abnormality rather than a functional murmur. 4. In patients for whom examiners did not know whether a murmur was present before examination, emergency department physicians and cardiologists identified valvular heart disease with good accuracy 5. Absence of murmur radiation to the right clavicle makes moderate to severe AS much less likely. 6. The presence of any 3 of the following findings makes moderate to severe AS much more likely: maximal murmur intensity in second right intercostal space, reduced carotid pulse volume, slow rate of increase of carotid pulse, and reduced or absent second heart sounds (S2). 7. When mitral regurgitation (MR) is identified, murmur intensity equal to or more than grade 3 makes severe regurgitation more likely.

IMPROVEMENTS IN DATA PRESENTED IN THE ORIGINAL PUBLICATION The newer studies do not alter the results reported in the original publication but do provide new information on the role of individual auscultatory findings. In the original article, the need to identify patients at higher risk for endocarditis because of valvular abnormalities was suggested as a rationale for performing the clinical examination. The recommendations for endocarditis prophylaxis have changed. Patients with murmurs from structural abnormalities of a native valve do not automatically require antibiotic prophylaxis to prevent infective endocarditis.1

443

CHAPTER 33

Update

CHANGES IN THE REFERENCE STANDARD The reference standard is an echocardiogram or a cardiac catheterization that assesses valvular competency.

RESULTS OF THE LITERATURE REVIEW Precision Since the original review, 2 published studies involving noncardiologist examiners have evaluated the precision of various physical examination maneuvers in actual patients.2,3 In a large study of medical patients presenting to the emergency department, there was substantial agreement on the presence of systolic murmurs (κ = 0.8). The precision of examining for a loud murmur (κ = 0.59) and for an S2 in the clinical setting is moderate (κ = 0.54), whereas the precision of other findings is only fair. In both of these studies, the various findings were not evaluated independently, so the examiners’ opinions may have been influenced by the presence or absence of related findings.

Accuracy Distinguishing Abnormal From Normal (Innocent) Murmurs

Two new studies evaluated examiners’ ability to distinguish murmurs caused by an underlying cardiac abnormality from those generated by structurally normal hearts (innocent murmurs). One of these studies evaluated the accuracy of the entire clinical evaluation (including physical examination, medical history, echocardiogram, chest radiograph, laboratory tests, and data from old charts) by noncardiologist emergency department physicians, and one evaluated the accuracy of the cardiologist’s physical examination alone.4 In a study of high methodologic quality, Reichlin et al2 evaluated the performance of emergency department physicians’ clinical assessments of patients with systolic murmurs. Although these noncardiologists are somewhat less accurate at distinguishing normal from innocent murmurs

Table 33-10 Ability of Findings to Identify Patients With Significant Cardiac Lesions vs Functional Systolic Murmur LR for a Significant Systolic Murmura Clinical Sign

LR+ (95% CI)

LR– (95% CI)

Holosystolic murmur (n = 26) Loud murmur (n = 29) Plateau-shaped murmur (n = 20) Loudest at the apex (n = 30)

8.7 (2.3-33) 6.5 (2.3-19) 4.1 (1.4-12) 2.5 (0.58-11)

0.19 (0.08-0.43) 0.08 (0.02-0.31) 0.48 (0.30-0.77) 0.84 (0.65-1.1)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aThe LR+ is the likelihood ratio when the finding is present and indicates an increased likelihood that the systolic murmur is associated with moderate to severe aortic stenosis or mitral regurgitation, congenital shunt, or intraventricular pressure gradients. The LR– is the likelihood ratio when the finding is absent and shows the likelihood that a significant lesion will be present when the finding is absent.

444

than cardiologists, a normal clinical assessment significantly reduces the likelihood of a cardiac abnormality (negative likelihood ratio [LR–], 0.29; 95% confidence interval [CI], 0.17-0.45). The second study assessed the ability of cardiologists to distinguish innocent from pathologic murmurs by physical examination alone in patients referred for evaluation of a systolic murmur. The cardiologists’ overall assessments of significant heart disease (defined as moderate to severe valvular heart disease, congenital shunt, or an intraventricular gradient identified by echocardiography) performed with a positive likelihood ratio (LR+) of 11 (95% CI, 5.0-26) and LR– of 0.22 (95% CI, 0.10-0.41). In addition, several clinical signs were assessed to appraise their performance in categorizing significant systolic murmurs confirmed by echocardiography. The most frequently detected findings, and those that were most useful, are shown in Table 33-10. Patients with mild AS or regurgitation are not included in the calculation of these LRs. Patients with a loud, plateaushaped, or holosystolic murmur are more likely to have significant lesions than functional murmurs or mild valvular heart disease. Similarly, the absence of holosystolic or loud murmur suggests that there are no significant lesions. However, an echocardiogram must be obtained when the clinician wants to determine whether the murmur represents moderate to severe AS or regurgitation, a congenital shunt, or an intraventricular pressure gradient. Identifying Valvular Heart Disease by Physical Examination

The ability to distinguish innocent from pathologic murmurs is important in stratifying patients for referral for echocardiography. However, the ability to make this distinction does not reflect examiners’ true ability to determine the presence of valvular heart disease: by excluding patients with no audible murmur, the specificity of the physical examination for valvular disease is underestimated. In the study by Reichlin et al,2 the inclusion criteria required that at least 2 of 3 screening physicians agree that a subject had a murmur: 203 patients were enrolled from 852 screened, whereas 582 were excluded because no systolic murmur was heard. There was excellent agreement among examiners about the presence of a murmur, with disagreement in only 18 patients (2%). The exclusion of those patients with no murmur is an example of verification bias. Verification bias occurs when the gold standard test is not applied to all the potentially eligible patients to confirm their disease status. In this case, patients without systolic murmurs were excluded from the analysis and had no echocardiogram to confirm the absence of structural heart disease. Typically, selective inclusion creates an overestimate of the sensitivity and an underestimate of the specificity of the clinical assessment. However, because the study provides complete information on all patients, we are able to correct for verification bias, with the assumption that patients with no murmur truly had no valvular disease. Recalculation yields an LR+ of 14 (95% CI, 10-19) for a clinical assessment suggesting an abnormal murmur and a LR– of 0.21 (95% CI, 0.13-0.34) when either no murmur was heard or the murmur was deemed normal. Because some patients without systolic murmurs can

CHAPTER 33 still have AS or MR, these corrected LRs represent the best possible clinical performance. Another study using cardiologist examiners addressed the performance of a complete cardiovascular physical examination without additional information in a population of asymptomatic individuals. The patients were not selected because of an auscultated abnormality.5 A murmur was heard in 63 patients, with 17 murmurs classified as abnormal; transesophageal echocardiography identified valvular abnormalities in 33 patients. In this population, the cardiovascular physical examination alone performed with an LR+ of 38 (95% CI, 9.5-154) and LR– of 0.31 (95% CI, 0.18-0.52). Thus, these 2 studies provide information on the clinician’s ability to identify valvular heart disease irrespective of the presence of a murmur, which better reflects an initial assessment in clinical practice. Although the populations of patients studied are different and the emergency department assessment includes supplementary information, the examiners’ overall performance in these studies is similar. When an abnormal murmur is identified, the pooled LR+ for echocardiographic valvular disease is 15 (95% CI, 11-20; results homogenous with P = .11; I 2 = 48%; 95% CI, 0%-86%); when no murmur is heard or the murmur is determined to be “normal,” the pooled LR– is 0.25 (95% CI, 0.17-0.36; results homogenous with P = .29; I 2 = 16%; 95% CI, 0%-55%).6 Aortic Stenosis

One new grade 2 study (n = 123),3 performed by noncardiologists, prospectively evaluated individual findings and combinations of findings for the detection of moderate or severe AS (defined as an aortic valve area less than 1.2 cm2 or peak transvalvular gradient of 25 mm Hg or more). A slow carotid upstroke was the most important individual finding for ruling in AS (LR+, 9.2; 95% CI, 3.4-24) (Table 33-11). The 2-step process for using combinations of findings begins with examination for the presence of a murmur over the right clavicle. If this murmur is absent, AS is considerably less likely (LR–, 0.1; 95% CI, 0.020.44). When a murmur radiates to the right clavicle, 4 associated findings are sought: highest intensity of murmur at second right intercostal space, reduced intensity of S2, reduced carotid volume, and slow carotid upstroke. When zero to 2 of these associated findings are present, the result is indeterminate (LR, 1.8; 95% CI, 0.93-2.9), whereas if 3 to 4 of these findings are present, the likelihood of AS is significantly increased (LR, 40; 95% CI, 6.6-239). Mitral Regurgitation

One study evaluated the accuracy of isolated findings in predicting severe MR,7 defined as a regurgitant fraction of 40% or more detected by echocardiography (Table 33-12). The clinical findings of interest were abstracted from the patients’ personal charts, as recorded by the patients’ own physicians (cardiologists and internists), who were unaware of the study. The study evaluated the relationship between the intensity of the murmur and the severity of regurgitation, and demonstrated a significant correlation. Mitral Valve Prolapse

No new high-quality studies added to the information in the original review. The absence of a murmur and click rules out

Murmur, Systolic

Table 33-11 Accuracy of the Physical Examination for Detecting Aortic Stenosis Finding Slow carotid upstroke Murmur radiating to right carotid Reduced or absent S2 Murmur over right clavicle Any systolic murmur Reduced carotid volume

LR+ (95% CI) 9.2 (3.4-24) 8.1 (4-16) 7.5 (3.2-17) 3.0 (2-4.1) 2.6 (1.8-3.5) 2.0 (1-3.2)

LR– (95% CI) 0.56 (0.32-0.8) 0.29 (0.12-0.57) 0.50 (0.27-0.76) 0.10 (0.02-0.44) 0 (0-0.45) 0.64 (0.34-0.99)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Table 33-12 Accuracy of the Physical Examination for Detecting Severe Mitral Regurgitation7 Finding Murmur grades 4-5 Murmur grade 3 Murmur grades 0-2

LR+ (95% CI) 14 (3.3-56) 3.5 (2.1-5.7) 0.19 (0.11-0.33)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio.

mitral valve prolapse (MVP) (LR, 0.04). The presence of a nonejection click (a high-pitched sound of short duration in mid or late systole) with or without a murmur slightly increases the likelihood of echocardiographic MVP (LR 3.8).8

EVIDENCE FROM GUIDELINES The American College of Cardiology/American Heart Association guidelines (2003)9 recommend echocardiography to evaluate heart murmurs in patients with cardiovascular symptoms or in asymptomatic patients with clinical features that suggest a moderate or greater probability that the murmur is reflective of underlying structural heart disease. Echocardiography is not recommended in asymptomatic adults whose murmur has been identified as functional or innocent by an experienced observer.9

CLINICAL SCENARIO—RESOLUTION Your patient’s murmur did not radiate to the right clavicle. This finding makes AS much less likely (LR, 0.1). There are no other concerning features that raise the possibility of other serious structural heart disease, including the ECG and chest radiograph. If you are an experienced clinician, this reduces the likelihood of important structural heart disease (LR, 0-0.1). If you are less experienced and not certain of your overall assessment that the murmur is “functional,” concentrating on whether the murmur is holosystolic or “loud” and whether the patient has a decreased carotid upstroke or systolic thrill may yield more useful information than your clinical gestalt. Conditions that can cause increased blood flow through a structurally normal heart should be excluded, such as anemia, renal failure, and thyrotoxicosis. 445

CHAPTER 33

Update

SYSTOLIC MURMURS—MAKE THE DIAGNOSIS Systolic murmurs are common, and echocardiography is normal in the majority of asymptomatic individuals with murmurs. Clinical evaluation offers the potential to identify those patients with increased likelihood of underlying structural disease and to avoid costly echocardiographic evaluation in all patients with systolic murmurs.

PRIOR PROBABILITY One study of randomly selected elderly Finnish persons (aged 75-86 years) found a prevalence of moderate to severe AS of 8.8% in women and 3.6% in men.10 The prevalence in younger patients ought to be less. The Framingham Heart Study showed that echocardiographic evidence of MR is common and a function of both age and sex.11 A useful approximation for the prevalence of mild to moderate MR is 15% from age 40 to 60 years for both men and women. After age 60, women have a prevalence of about 25% compared with men, who have an increasing frequency of MR that approximates 40% by age 80 years. The prevalence of MVP is about 2.5%.12,13

POPULATION FOR WHOM A SYSTOLIC MURMUR SHOULD BE ASSESSED • It is sensible to listen for a systolic murmur in every patient for whom a complete cardiac database is necessary. • Once a patient with a systolic murmur is identified, the clinical examination helps identify those more likely to have significant underlying cardiac lesions. However, a cardiac echocardiogram is required to determine whether the finding represents a significant or less significant cardiac lesion. • The presence of a murmur can be heard with a variety of underlying lesions such as myocardial ischemia, endocarditis, and disturbances that cause a high flow rate.

IDENTIFYING NORMAL (INNOCENT) MURMURS Cardiologists and emergency physicians are accurate at distinguishing abnormal from innocent murmurs (Tables 33-13 and 33-14).

Table 33-14 Likelihood Ratios of Individual Findings for Identifying Murmurs That Are Significant LR for a Significant Systolic Murmur a Clinical Sign

LR+ (95% CI)

LR– (95% CI)

Systolic thrill (n = 8) Holosystolic murmur (n = 26) Loud murmur (n = 29) Plateau-shaped murmur (n = 20) Loudest at the apex (n = 30) Radiation to the carotid (n = 9)

12 (0.76-205) 8.7 (2.3-33) 6.5 (2.3-19) 4.1 (1.4-12) 2.5 (0.58-11) 0.91 (0.28-3.0)

0.73 (0.58-0.93) 0.19 (0.08-0.43) 0.08 (0.02-0.31) 0.48 (0.30-0.77) 0.84 (0.65-1.1) 1.0 (0.78-1.3)

Abbreviations: CI, confidence interval; LR, likelihood ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Moderate to severe aortic stenosis or mitral regurgitation, congenital shunt, or intraventricular pressure gradient.

AORTIC STENOSIS The presence of AS requires detection of a systolic murmur, generally radiating to the right clavicle. For such patients, evaluate the S2 to determine whether it is reduced in intensity, feel the carotid artery to assess whether the volume is reduced and the upstroke slower than normal, and assess whether the murmur is loudest in the second right intercostal space (Table 33-15). Table 33-15 Likelihood Ratios of Combinations of Findings for Aortic Stenosis Clinical Findingsa Systolic murmur over right clavicle + 3-4 associated findings Systolic murmur over right clavicle + 0-2 associated findings No systolic murmur over right clavicle

LR (95% CI) for Moderate or Greater Aortic Stenosis 40 (6.6-239) 1.8 (0.93-2.9) 0.1 (0.02-0.44)

Abbreviations: CI, confidence interval; LR, likelihood ratio. aReduced or absent second heart sound, reduced carotid volume, slow rate of Table 33-13 Likelihood Ratio for the Overall Examination for Detecting increase of carotid pulse, and maximal murmur intensity in second right intercostal space. Valvular Disease

LR for Valvular Disease

Cardiologists5 Emergency department physicians2 Summary

LR+ (95% CI)

LR– (95% CI)

38 (9.5-154) 14 (10-19)

0.31 (0.18-0.52) 0.21 (0.13-0.34)

15 (11-20)

0.25 (0.17-0.36)

Abbreviations: CI, confidence interval; LR, likelihood ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Because the overall performance of generalist physicians has not been described, attention to individual findings may be even more useful than the overall clinical impression when a murmur is auscultated. 446

MITRAL REGURGITATION AND MITRAL VALVE PROLAPSE Although cardiologists are accurate at identifying echocardiographic MR (Table 33-16), the performance of generalist physicians has not been evaluated as well. Once MR is identified, the intensity of the murmur helps to identify the severity of the regurgitation.6

CHAPTER 33

Table 33-16 Likelihood Ratio for the Murmur Intensity to Identify Severe Mitral Regurgitation Finding Murmur grades 4-5 Murmur grade 3 Murmur grade 0-2

LR+ (95% CI) 14 (3.3-56) 3.5 (2.1-5.7) 0.19 (0.11-0.33)

Murmur, Systolic

The absence of a murmur and click rules out MVP (LR, 0.04), whereas the presence of a systolic click, with or without a murmur, slightly increases the likelihood of echocardiographic MVP (LR, 3.8).

REFERENCE STANDARD TEST Echocardiography or cardiac angiography.

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio.

REFERENCES FOR THE UPDATE 1. Wilson W, Taubert KA, Gewitz M, et al. Prevention of infective endocarditis: guidelines from the American Heart Association: a guideline from the American Heart Association Rheumatic Fever, Endocarditis, and Kawasaki Disease Committee, Council on Cardiovascular Disease in the Young, and the Council on Clinical Cardiology, Council on Cardiovascular Surgery and Anesthesia, and the Quality of Care and Outcomes Research Interdisciplinary Working Group. Circulation. 2007;116(15):1736-1754. 2. Reichlin S, Dieterle T, Camli C, Leimenstoll B, Schoenenberger RA, Martina B. Initial clinical evaluation of cardiac systolic murmurs in the ED by noncardiologists. Am J Emerg Med. 2004;22(2):71-75.a 3. Etchells E, Glenns V, Shadowitz S, Bell C, Siu S. A bedside clinical prediction rule for detecting moderate or severe aortic stenosis. J Gen Intern Med. 1998;13(10):699-704.a 4. Attenhofer Jost CH, Turina J, Mayer K, et al. Echocardiography in the evaluation of systolic murmurs of unknown cause. Am J Med. 2000;108(8):614-620.a 5. Roldan CA, Shively BK, Crawford MH. Value of the cardiovascular physical examination for detecting valvular heart disease in asymptomatic subjects. Am J Cardiol. 1996;77(15):1327-1331.a 6. Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539-1558.

7. Desjardins VA, Enriquez-Sarano M, Tajik AJ, Bailey KR, Seward JB. Intensity of murmurs correlates with severity of valvular regurgitation. Am J Med. 1996;100(2):149-156.a 8. Abbasi AS, DeCristofaro D, Anabtawi J, Irwin L. Mitral valve prolapse: comparative value of M-mode, two dimensional and Doppler echocardiography. J Am Coll Cardiol. 1983;2(6):1219-1223. 9. Cheitlin MD, Armstrong WF, Aurigemma GP, et al. ACC/AHA/ASE 2003 guideline update for the clinical application of echocardiography: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines, p. 8. http://www.acc.org/qualityandscience/ clinical/guidelines/echo/index_clean.pdf. Accessed June 4, 2008. 10. Lindroos M, Kupari M, Heikkila J, Tilvis R. Prevalence of aortic valve abnormalities in the elderly: an echocardiographic study of a random population sample. J Am Coll Cardiol. 1993;21(5):1220-1225. 11. Singh JP, Evans JC, Levy D, et al. Prevalence and clinical determinants of mitral, tricuspid, and aortic regurgitation (the Framingham Study). Am J Cardiol. 1999;83(6):897-902. 12. Freed LA, Levy D, Levine RA, et al. Prevalence and clinical outcome of mitral valve prolapse. N Engl J Med. 1999;341(1):1-7. 13. Theal M, Sleik K, Anand S, Yi Q, Yusuf S, Lonn E. Prevalence of mitral valve prolapse in ethnic groups. Can J Cardiol. 2004;20(5):511-515. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

447

This page intentionally left blank

E V I D E N C E TO S U P P O R T T H E U P D A T E : Murmur, Systolic

33

MAIN OUTCOME MEASURES TITLE A Bedside Clinical Prediction Rule for Detecting Moderate or Severe Aortic Stenosis.

Sensitivity, specificity, and likelihood ratios; κ for interobserver variability.

AUTHORS Etchells E, Glenns V, Shadowitz S, Bell C, Siu S. CITATION J Gen Intern Med. 1998;13(10):699-704.

MAIN RESULTS

QUESTION Can a clinical prediction rule using simple physical examination findings accurately detect aortic stenosis (AS) in a broad spectrum of patients?

Seventeen patients (14%) were found to have AS, with complete physical examination data available for 15.

DESIGN Consecutive patients were prospectively enrolled when they were referred for echocardiography. Two examiners (a third-year medical resident and a staff general internist) performed the maneuvers on all enrolled patients. An echocardiographer, blinded to the findings, identified all patients with moderate or greater AS.

CONCLUSION

SETTING General medical/cardiology wards in an urban university hospital in Toronto. PATIENTS One hundred twenty-three patients admitted to the general medicine and cardiology wards. The majority had some history of congestive heart failure, angina, or myocardial infarction. The median age was 68 years, 58% were men, and 56% had Canadian Cardiovascular Society class I symptoms at the study. Exclusion criteria were age younger than 50 years, cardiac care unit/ intensive care unit admission, unstable angina within 48 hours, history of cardiovascular surgery or valve replacement, severe dyspnea at rest, or inability to consent.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Two examiners, blinded to echocardiographic findings, independently performed a structured physical examination and focused medical history on all enrolled patients. Transthoracic echocardiography was performed on all patients by an echocardiographer blinded to the clinical findings, who identified moderate to severe AS, defined as aortic valve area of 1.2 cm2 or smaller or peak transvalvular gradient of 25 mm Hg or higher.

LEVEL OF EVIDENCE Level 2. STRENGTHS Prospective data collection with valid refer-

ence standard and confirmed independence of clinical examination. LIMITATIONS This study included only 17 patients with the condition of interest.

Table 33-17 Likelihood Ratios for Findings to Predict Aortic Stenosis Test Slow carotid upstroke (n = 12) Murmur radiating to right carotid (n = 20) Reduced S2 (n = 15) Murmur over right clavicle (n = 45) Any systolic murmur (n = 52) Reduced carotid volume (n = 35)

Sensitivity Specificity LR+ (95% CI)

LR– (95% CI)

0.47

0.95

9.2 (3.4-24)

0.56 (0.32-0.8)

0.73

0.91

8.1 (4-16)

0.29 (0.12-0.57)

0.53

0.93

7.5 (3.2-17) 0.50 (0.27-0.76)

0.93

0.69

3.0 (2-4.1)

0.10 (0.02-0.44)

1.0

0.64

2.6 (1.8-3.5)

0 (0-0.45)

0.53

0.73

2.0 (1.0-3.2) 0.64 (0.34-0.99)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E33-1

CHAPTER 33

Evidence To Support The Update

Table 33-18 Combination of Findings for Predicting Aortic Stenosis LR (95% CI) Murmur over clavicle + 3-4 associated findingsa (n = 7) Murmur over clavicle + 0-2 associated findings (n = 38) No murmur over right clavicle (n = 69)

40 (6.6-239)

0.1 (0.02-0.44)

This study validates several physical examination maneuvers as performed by generalist physicians in a broad spectrum of older general medical inpatients (Table 33-17). These patients are typical of those admitted into hospitals or referred for cardiovascular evaluation. The use of moderate to severe AS as the finding of interest is a clinically significant endpoint. The study confirms that the absence of any murmur or the absence of a murmur over the right clavicle is the best finding for ruling out AS. A reduced carotid upstroke by palpation, a murmur radiating to the right carotid, or S2 that is reduced in intensity increases the likelihood the most. In contrast to previous studies, a murmur radiating to the right carotid is useful for identifying patients with AS if detected, but AS can still exist without the presence of a murmur radiating to the carotid. The examiners participating in the study underwent a brief training period (30 minutes) and performed a standardized physical examination. As a result, the performance of the examination might be lower among examiners without the training, although the brief training period could be easily replicated. In addition, because the findings are assessed as part of a standardized physical examination, it is impossible to evaluate their independence. In other words, an examiner who observes that one of the findings is present might be more influenced and likely to describe other abnormal findings. The authors also created and prospectively evaluated combinations of findings (Table 33-18), which performed with excellent accuracy: a lack of a murmur radiating to the right clavicle effectively rules out AS of moderate or greater severity, whereas the presence of such a murmur in association with 3 or more other findings rules in the diagnosis.

Table 33-19 Reliability of Findings for Aortic Stenosis

S2 (normal vs decreased) Loud murmur (>II/VI) second RICS Radiation to right clavicle Radiation to right carotid Delayed carotid upstroke Reduced carotid volume Presence of any systolic murmur

Generalized κ (Lower 95% CI) 0.54 (0.46) 0.45 (0.37) 0.36 (0.28) 0.33 (0.25) 0.26 (0.18) 0.24 (0.16) 0.19 (0.11)

Abbreviations: CI, confidence interval; RICS, right intercostal space.

E33-2

Reviewed by David Cescon, MD

1.8 (0.93-2.9)

Abbreviations: CI, confidence interval; LR, likelihood ratio. aAssociated findings include reduced second heart sound (S ), reduced carotid vol2 ume, slow carotid upstroke, and murmur loudest at second right intercostal space.

Finding

The reliability assessment of individual maneuvers is useful and demonstrates that individual findings have reliabilities that vary from slight to moderate (Table 33-19).

TITLE Echocardiography in Evaluating Systolic Murmurs of Unknown Cause. AUTHORS Attenhofer Jost CH, Turina J, Mayer K, et al. CITATION Am J Med. 2000;108(8):614-620. QUESTION How well can cardiologists identify pathologic murmurs by auscultation or palpation alone? DESIGN Consecutive patients were prospectively identified at referral for evaluation of a systolic murmur of unknown cause. Each subject was independently examined by 2 cardiologists from a pool of 8, blinded to supplementary data and echocardiography results. Twodimensional (2D)/Doppler echocardiography was performed as the gold standard in all participants. It is not clear whether the ultrasonographers were blinded to the clinical examination. SETTING Cardiology division in Switzerland. PATIENTS One hundred patients referred for evaluation of systolic murmur of unknown cause were enrolled. Patients were excluded if they had a previously documented echocardiographic examination. The mean age of subjects was 55 ± 22 years, and 57% were women.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Full cardiac examination with or without dynamic auscultation as deemed appropriate by 2 blinded cardiologist examiners. Murmurs were classified by Levine grade and described and characterized as functional or organic according to the examiners’ clinical expertise. All patients underwent transthoracic 2D/Doppler echocardiography; valvular stenosis and regurgitation were classified according to standard criteria.

MAIN OUTCOME MEASURES Raw data, sensitivity, specificity.

MAIN RESULTS Twenty-one patients had a “functional” murmur and were considered normal. Of the 79 patients with “organic” murmurs, 29 patients had aortic stenosis (AS) of various severity and 30 patients had mitral regurgitation (MR). Although the patients were referred for evaluation of systolic murmurs,

CHAPTER 33

Table 33-20 Likelihood Ratios for Overall Assessment of a Valvular Lesion of Any Severity

Aortic stenosis (n = 33) Mitral regurgitation (n = 33) Aortic regurgitation (n = 9)

LR+ (95% CI)

LR– (95% CI)

2.1 (1.1-3.9) 2.3 (1.5-3.6) 5.1 (1.5-3.9)

0.78 (0.61-0.95) 0.43 (0.23-0.71) 0.82 (0.63-0.95)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

echocardiography revealed aortic regurgitation in 28.The data in Table 33-20 indicate the likelihood of the finding when the cardiologists’ overall assessment results were positive. The cardiologists’ overall clinical assessments of significant heart disease (defined as moderate to severe valvular heart disease, congenital shunt, or an intraventricular gradient) performed with a positive likelihood ratio (LR+) of 11 (95% confidence interval [CI], 5.0-26) and negative likelihood ratio (LR–) of 0.22 (95% CI, 0.10-0.41). The characteristics of the murmur and response to a few maneuvers were assessed to identify their performance in categorizing significant systolic murmurs confirmed by echocardiography (Table 33-21). A loud (diagnostic odds ratio, 81) or holosystolic murmur (diagnostic odds ratio, 46) was the most accurate finding for identifying those patients with a significant murmur vs those with a functional murmur. No patient had a diminished carotid upstroke, so this finding cannot be assessed from the data. A diminished second heart sound (S2) was assessed, but the finding was heard in 5 patients only. One maneuver, the response to Valsalva, was assessed. Typically, patients with AS or MR would have a decreased intensity with the initiation of the maneuver, whereas patients with hypertrophic cardiomyopathy would have an increase. The maneuver in this study did not help identify patients with significant lesions (LR+, 1.2; 95% CI, 0.66-2.2; and LR–, 0.84; 95% CI, 0.50-1.4), but no patients with hypertrophic cardiomyopathy were found.

Table 33-21 Likelihood Ratio of Signs for a Significant Systolic Murmur LR for a Significant Systolic Murmur vs a Functional Murmur Clinical Sign

LR+ (95% CI)

LR– (95% CI)

Systolic thrill (n = 8) Holosystolic murmur (n = 26) Loud murmur (n = 29) Plateau-shaped murmur (n = 20) Loudest at the apex (n = 30) Radiation to the carotid (n = 9)

12 (0.76-205) 8.7 (2.3-33) 6.5 (2.3-19) 4.1 (1.4-12) 2.5 (0.58-11) 0.91 (0.28-3.0)

0.73 (0.58-0.93) 0.19 (0.08-0.43) 0.08 (0.02-0.31) 0.48 (0.30-0.77) 0.84 (0.65-1.1) 1.0 (0.78-1.3)

Abbreviations: CI, confidence interval; LR, likelihood ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Murmur, Systolic

CONCLUSION LEVEL OF EVIDENCE Level 3. STRENGTHS Prospective, consecutive patients. LIMITATIONS Small referral population referred for evaluation of a murmur. The echocardiographers were not blinded to the clinical findings. The CIs around some of these findings are large. For the individual clinical signs, we could calculate the LR comparing patients with a significant murmur vs those with a functional murmur. This analysis ignores the patients who had less significant cardiac lesions as the explanation for their systolic murmur (eg, mild AS or MR). Thus, clinicians must understand that although these findings might identify patients more likely to have a significant vs a functional murmur, an echocardiogram must be done to determine whether the findings are associated with a significant or less-significant cardiac lesion. The results suggest that a cardiologist’s examination is useful even when the referring clinician is uncertain that a murmur is innocent. Because these patients are likely the most difficult to examine, the results for the cardiologist might be a “worst-case” scenario for the LRs. We can anticipate that for all patients with systolic murmurs, the LRs would suggest greater accuracy. The presence of a variety of findings increases the likelihood that a systolic murmur will be significant. Loud, plateau-shaped, holosystolic murmurs with a thrill will have a high likelihood of emanating from significant cardiac abnormalities. These individual findings might work better than the clinician’s overall clinical assessment for assessing systolic murmurs for patients in whom the diagnosis might not be readily apparent from the physical examination findings. An important caveat is that this analysis suggests only the presence of a significant lesion as defined by the authors as opposed to a functional murmur. Thus, the presence of findings with a high LR+ means that the clinician must request an echocardiogram to determine whether the underlying cardiac lesions are significant or less significant. Similarly, the absence of a loud or holosystolic murmur makes a significant lesion less likely, but an echocardiogram would be required to identify patients with less significant lesions. The results of this study should be interpreted in light of the clinical population—patients referred for evaluation of systolic murmurs that likely included those for whom the referring clinician was uncertain of the diagnosis. The data in the table do not represent the LRs for a specific diagnosis (eg, AS), but for any significant lesion associated with a systolic murmur. The response to Valsalva does not help identify significant AS or mitral regurgitant murmurs, but this maneuver could still be important for identifying significant hypertrophic cardiomyopathy.

Reviewed by David Cescon, MD, and Edward Etchells, MD, MSC

E33-3

CHAPTER 33

Evidence To Support The Update

MAIN RESULTS TITLE Intensity of Murmurs Correlates With Severity of Valvular Regurgitation.

The intensity of the murmur predicts the severity of MR (Table 33-22).

AUTHORS Desjardins VA, Enriquez-Sarano M, Tajik AJ, Bailey KR, Seward JB. CITATION Am J Med. 1996;100(2):149-156. QUESTION Does the intensity of regurgitant murmurs on clinical examination correlate with the degree of echocardiographic regurgitation? DESIGN Investigators prospectively enrolled 210 consecutive patients undergoing Doppler echocardiography who were found to have chronic isolated mitral or aortic regurgitation. Results of a physical examination performed within 2 weeks of echocardiography by the patient’s own physician (179 cardiologists, 31 general internists), who was unaware of the study, were abstracted from chart data. SETTING Echocardiography laboratory in a major US center.

Murmur Grade 4 Or 5 3 0-2

LR (95% CI) 14 (3.3-56) 3.5 (2.1-5.7) 0.19 (0.11-0.33)

Abbreviations: CI, confidence interval; LR, likelihood ratio.

CONCLUSION LEVEL OF EVIDENCE Level 2. STRENGTHS The population included in this study repre-

sents a difficult sample because all had some degree of regurgitation. The study examines a relevant clinical question because the ability to correlate the intensity of a regurgitant murmur with the degree of regurgitation is a useful clinical tool.

PATIENTS Two hundred ten consecutive patients prospectively identified with chronic, isolated mitral regurgitation (MR) or aortic insufficiency (AI) of mild or greater severity. Exclusion criteria included previous valve repair or replacement, associated valvular stenosis or acute regurgitation, and lack of physical examination performed by the referring physician within 2 weeks of echocardiography. For the 40 patients with isolated AI, the mean age was 58 ± 16 years, 65% were men, 8% were in atrial fibrillation, and the mean regurgitant fraction was 36% ± 16%. For the 170 patients with MR, the mean age was 64 ± 13 years, 54% were men, 21% were in atrial fibrillation, and the mean regurgitant fraction was 36% ± 18% by Doppler echocardiography.

LIMITATIONS Only patients with isolated lesions were included. The results demonstrate that the evaluation of murmur intensity of isolated MR by internists and cardiologists is a useful diagnostic test: a loud murmur (grade 4 or greater) is a good predictor of severe MR, whereas a murmur of grade 2 or less effectively rules out the presence of severe MR. This study simulated normal clinical conditions without special training or standardized instructions to the examiner. These results are valid only in chronic, isolated MR and cannot be applied to the acute setting or to patients with complex murmurs.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD

Reviewed by David Cescon, MD, and Edward Etchells, MD, MSC

Quantitative Doppler and 2-dimensional echocardiography were performed on all patients before enrollment. It is not clear whether the echocardiographers were blinded to clinical data. Severe regurgitation was defined as a regurgitant fraction of 40% or higher. The clinical examination documenting murmur severity was performed independently by each patient’s personal physician, who was not aware of the study and did not receive any special training or instruction regarding standardization of murmur grading.

MAIN OUTCOME MEASURES Raw data, correlation coefficients (r). Likelihood ratios were calculated from the data provided.

E33-4

Table 33-22 Likelihood Ratios for the Presence of Severe Mitral Regurgitation as a Function of the Murmur Intensity

CHAPTER 33

Murmur, Systolic

MAIN RESULTS TITLE Initial Clinical Evaluation of Cardiac Systolic Murmurs in the Emergency Department by Noncardiologists. AUTHORS Reichlin S, Dieterle T, Camli C, Leimenstoll B, Schoenenberger RA, Martina B. CITATION Am J Emerg Med. 2004;22(2):71-75. QUESTION How well do noncardiologists distinguish innocent systolic murmurs from those produced by valvular heart disease in a typical emergency department evaluation? DESIGN Medical patients presenting to the emergency department were prospectively identified and evaluated for the presence of a systolic murmur. If 2 of 3 physicians, including 1 study physician, agreed on the presence of a murmur, the patient was enrolled in the study. SETTING Emergency department of a university teaching hospital in Switzerland. PATIENTS Two hundred three patients were enrolled from 852 medical patients screened in the emergency department. The patients were typical medical patients, with mean age of 64.7 (± 22.3) years, and 58% were women. A significant percentage of the enrolled patients, had chest pain at presentation, and the majority had a pathologic electrocardiogram (ECG) (61%) or chest radiograph (53%) in the emergency department.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The emergency department attending physician’s clinical evaluation (including medical history, physical examination, ECG, chest radiograph, and laboratory tests) sought to distinguish normal from abnormal murmurs in all enrolled patients. Transthoracic echocardiography was performed to identify valvular heart disease in all enrolled subjects within 24 hours by 2 cardiologists blinded to the results of the clinical evaluation.

MAIN OUTCOME MEASURES Sensitivity, specificity, and likelihood ratios (LRs).

Seventy-one of 203 patients had structural heart disease evident on echocardiography. Twenty-one patients were excluded because there was no informed consent (17) or the echocardiography was not performed (4), leaving 582 patients with no systolic murmur. Of the entire sample size, there was disagreement for only 18 patients, for whom a third examiner settled the discordance. The κ statistic for the presence of a murmur was 0.8; the κ statistic for murmur grades 0 to 2 vs those greater than grade 2 was 0.59.

CONCLUSION LEVEL OF EVIDENCE Level 1. STRENGTHS Prospective, consecutive patients with inde-

pendent application of the reference standard in a population typical for those in whom distinguishing a normal from an abnormal systolic murmur by clinical examination is an important clinical question. Because the patients provided information on all potentially eligible patients, we can correct for verification bias. LIMITATIONS Entrance criteria required that 2 of 3 examiners agree that a murmur was present. Although this may decrease generalizability, it improves our confidence that a murmur was present. This large, high-quality study evaluated the utility of the clinical evaluation by noncardiologists. The examiners in this study had access to all available clinical information, including patient charts that documented previously identified valvular heart disease in 10% of patients; however, this represents a realistic clinical scenario. The level of agreement among examiners in identifying the presence of a systolic murmur of intensity greater than grade II/VI documented in this study compares favorably to that of previous studies involving cardiologists examining patients. This study provides complete information on all patients, allowing us to correct for verification bias by making certain assumptions about the patients for whom both clinicians did not hear a murmur or for whom there was a disagreement about the presence of a murmur. The majority of patients who did not undergo echocardiography did not have a systolic murmur, as judged by 2 examiners. If we assume that none of these patients truly had valvular heart disease, the LRs are as shown in Table 33-23. These LRs estimate the efficiency of the

Table 33-23 Likelihood Ratio of the Overall Examination for an Abnormal Murmur Clinical Evaluation Overall examination suggests abnormal murmur, corrected for verification bias Overall examination, uncorrected for verification bias

Patients

Sensitivity

Specificity

All patients Only patients with systolic murmurs

0.80

0.69

LR+ (95% CI)

LR– (95% CI)

14 (10-19)

0.21 (0.13-0.34)

2.6 (2.0-3.4)

0.29 (0.17-0.45)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E33-5

CHAPTER 33

Evidence To Support The Update

clinicians to identifying aortic or mitral valvular disease among all patients. Because most patients do not have valvular heart disease, the specificity of the examination is excellent. The LRs reported by the investigators, uncorrected for verification bias, show the performance of the clinical examination among patients known to have a systolic murmur. In clinical practice, these patients would be more reflective of those referred for echocardiography to determine the presence of a systolic murmur. Reviewed by David Cescon, MD, and Edward Etchells, MD, MSC

evaluation of murmur change with respiration, Valsalva maneuver, handgrip, and changes in body position. Transesophageal echocardiography was performed on all subjects by an echocardiographer blinded to the clinical examination and other data. Diagnosis of valvular disease was based on established criteria.

MAIN OUTCOME MEASURES Sensitivity, specificity.

MAIN RESULTS TITLE Value of the Cardiovascular Physical Examination for Detecting Valvular Heart Disease in Asymptomatic Subjects. AUTHORS Roldan CA, Shively BK, Crawford MH. CITATION Am J Cardiol. 1996;77(15):1327-1331. QUESTION How useful is the physical examination in detecting the presence or absence of valvular heart disease in asymptomatic individuals? DESIGN Nonconsecutive patients were prospectively identified for inclusion and were examined by a cardiologist blinded to other data. An echocardiographer, blinded to clinical findings, identified valvular heart disease. SETTING Outpatient clinic in the United States. PATIENTS The population consisted of 75 patients with connective tissue diseases and 68 healthy volunteers. The patients with connective tissue diseases had systemic disease without cardiac symptoms and constituted a group of patients for whom most physicians would auscultate the heart to detect asymptomatic cardiac disease associated with their underlying disorder (systemic lupus erythematosus, ankylosing spondylitis, rheumatoid arthritis, antiphospholipid antibody syndrome). The mean age of participants was 38 ± 11 years, 56 were men, and none had cardiovascular symptoms. Only 5% of subjects were known to have murmur or valvular heart disease.

Thirty-three patients had echocardiographic evidence of valvular abnormalities, the majority (24 of 33) of which were mitral valve regurgitant lesions or prolapse. The predictive value of the individual findings is reported, but none occurred in more than 8% of patients, providing broad confidence intervals. It is difficult to disentangle the individual findings from the overall assessment because the individual components and categorization of individual murmurs were based on the total evaluation (Table 33-24).

CONCLUSION LEVEL OF EVIDENCE Level 3. STRENGTHS Prospective, blinding of examination, and

gold standard test. LIMITATIONS Cardiologist examiner may limit generalizability to generalist physicians. Nonconsecutive patients. The study population is unique in that these patients were not selected according to an auscultated abnormality. They represent a combination of healthy patients and patients with noncardiac disease, all of whom might undergo auscultation in the course of “routine” medical care. By including healthy patients, a high specificity for the examination could be expected because most patients would not have abnormal findings and would not have cardiac abnormalities shown by echocardiogram. This study evaluated physical examination by a cardiologist alone, without supplementary information or investiga-

Table 33-24 Likelihood Ratio for the Overall Clinical Examination to Identify Patients With Abnormal Cardiac Valves Valvular Heart Disease by Echocardiography

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Subjects were randomly sequenced for a complete physical examination, including dynamic auscultation by a cardiologist blinded to other data. The cardiologist recorded the findings for jugular venous pulse; the palpated carotid pulse; the palpated precordial maximal impulse; the presence of a right ventricular lift; abnormalities of the second, third, and fourth heart sounds; clicks; and ejection sounds. The dynamic auscultation included E33-6

Test Overall clinical assessment for a valvular abnormality

Sensitivity Specificity LR+ (95% CI) 0.70

0.98

LR– (95% CI)

38 (9.5-154) 0.31 (0.18-0.52)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

CHAPTER 33 tions in a healthy population at risk for valvular heart disease. It is useful that the report includes the actual individual components used by the cardiologists to determine their overall clinical assessment. The cardiologists heard a surprising number of murmurs, but when they described a murmur

Murmur, Systolic

as abnormal, the likelihood of an echocardiograph abnormality increased greatly. Reviewed by David Cescon, MD, and Edward Etchells, MD, MSC

E33-7

This page intentionally left blank

34

C H A P T E R

Does This Patient Have Myasthenia Gravis? Katalin Scherer, MD Richard S. Bedlack, MD, PhD David L. Simel, MD, MHS

CLINICAL SCENARIOS CASE 1 A 45-year-old man has a 2-month history of fluctuating double vision, a droopy right eye that improves with rest, and a complaint that food gets stuck halfway down. Your examination confirms severe right eyelid ptosis that dramatically improves with rest. His right eye adduction and up gaze are markedly impaired. The left eye demonstrates complete horizontal ophthalmoplegia. The limb muscle strength and reflexes are normal. You wonder whether there is an accurate and clinically useful bedside test to help confirm the diagnosis of myasthenia gravis. CASE 2 A 69-year-old man has a 2-month history of

intermittent spells of double vision, generalized weakness that worsens toward the evening, and unspecified dizziness. Although he has normal strength and reflexes and no ophthalmoplegia, he does report fluctuating diplopia during the examination. As in case 1, you must decide whether to perform additional bedside tests, obtain electrodiagnostic or acetylcholine antibody testing, or pursue a broader diagnostic evaluation of the various causes of dizzy spells and fatigue.

WHY IS THIS AN IMPORTANT QUESTION TO ANSWER WITH A CLINICAL EXAMINATION? Myasthenia gravis is an autoimmune disease associated with circulating acetylcholine receptor antibodies, modification of the synaptic cleft, and destruction of the postsynaptic neuromuscular membrane. The clinical hallmark of the disease is fatigable weakness. The clinical severity ranges from mild, purely ocular, forms to severe generalized weakness and respiratory failure. Myasthenia gravis is a rare disease; its prevalence in the United States is reported at 14.2 in 100000. Prevalence rates have been increasing steadily during the past decades, likely because of decreased mortality, longer survival, and higher rates of diagnosis.1-3 Men older than 50 years have the highest incidence in the population, with the peak at approximately aged 70 years. Women have 2 incidence peaks: one at approximately aged 20 to 40 years and one at approximately aged 70 years.4,5 Clinicians must be alert to the symptoms and signs of myasthenia gravis because it is an eminently treatable disease, and the earlier treatment is started, the better the clinical response.6-8 Only 54% to 69% of patients with myasthenia gravis are diagnosed within 1 year of onset, and the mean time to diagnosis is more than 1 year.3,9-12 Untreated patients are at risk for deterioration and “crisis,” which occurs when weakness becomes severe enough to require mechanical ventilation.13,14 Left untreated, reversible and fatigable weakness may become fixed. An erroneous diagnosis of myasthenia gravis may expose patients to unnecessary diagnostic procedures and treatments. Copyright © 2009 by the American Medical Association. Click here for terms of use.

449

CHAPTER 34

The Rational Clinical Examination

The acetylcholine receptor antibody test is the most specific diagnostic test for myasthenia gravis. This test has reasonable sensitivity in generalized myasthenia gravis (80%-96%), but up to 50% of patients with purely ocular myasthenia have seronegative test results.15-19 Single-fiber electromyography, performed by highly trained experts at specialized centers, is highly sensitive for disorders of the neuromuscular junction but is not specific for myasthenia gravis. The purpose of this review was to determine the value of clinical symptoms and signs, as well as the results of simple provocative clinical tests, in deciding whether myasthenia gravis should be considered as a diagnosis and in enabling the physician to determine whether further confirmatory testing (including the highly specific and sensitive antibody test) is warranted.

Anatomic and Physiologic Origins of the Symptoms and Signs Used to Answer This Question In the normal neuromuscular junction, acetylcholine is released into the synaptic cleft, diffuses to the postsynaptic membrane, binds to ligand-sensitive ion channels (nicotinic acetylcholine receptors), and causes an excitatory postsynaptic end-plate potential. If the threshold depolarization is achieved, an action potential will spread along the muscle fiber membrane, causing muscle contraction. Acetylcholine is cleared from the synaptic cleft by presynaptic reuptake and by the metabolic action of acetylcholinesterase (Figure 34-1). The failure of transmission at many neuromuscular junctions in myasthenia results in diminished end-plate potentials that are insufficient to generate action potentials in a number of muscle fibers.20 This results in fatigable weakness of striated

Normal Neuromuscular Junction Synaptic vesicle NERVE TERMINUS

Acetylcholine Synaptic vesicles Acetylcholine receptor (AchR) M U SC L E CELL

Suprathreshold end-plate potential

Synaptic cleft Released acetylcholine

Junctional folds

Action potential

Acetylcholine Receptor Antibody–Positive Myasthenia Gravis

Acetylcholine AChR antibodies

NERVE TERMINUS

Synaptic vesicles

Crosslinking of AChRs

M U SC L E CELL

Blocking of ACh binding Widened synaptic cleft Morphological changes in postsynaptic membrane

Complement activation Membrane damage

Internalization and degradation Decreased availablility and number of AChRs

Subthreshold end-plate potential

No action potential

Figure 34-1 Neuromuscular Junction In patients with acetylcholine receptor (AChR) antibody-positive myasthenia gravis, circulating antibodies bind to the AChRs, which may block acetylcholine binding, lead to cross-linking of receptors promoting internalization and degradation, and induce postsynaptic membrane damage via complement activation. The number and availability of receptors are reduced such that end-plate potentials are insufficient to generate action potentials in a number of muscle fibers, causing weakness.

450

CHAPTER 34 muscles, which is the basis for the clinical diagnosis. Sustained or repetitive muscle contraction causes fatigue and weakness of myasthenic muscles. Cooling a weak muscle improves neuromuscular transmission.21 Rest and acetylcholinesterase inhibitors transiently increase acetylcholine levels in the synaptic cleft. The change in strength after these manipulations can be assessed during the clinical examination.

Symptoms and Signs and How to Elicit Them Patients with myasthenia gravis complain of weakness in specific muscles. Up to 65% of patients initially have ocular symptoms of double vision and drooping of the eyelids. Less than one-fourth of patients present with bulbar weakness (ie, in lower cranial nerve–innervated oropharyngeal muscles) and report slurred or nasal speech, alterations of the voice (eg, softness, breathiness, hoarseness), and difficulty chewing or swallowing. Limb weakness is an unusual initial complaint (14%-27%) and should be differentiated from nonspecific generalized fatigue. Patients may report shortness of breath. The symptoms of myasthenia are typically better on awakening or after rest and become progressively worse with prolonged use of the affected muscles or later in the day.3,22-24 Reduced muscle power by manual testing in specific muscles that worsens with repetition and improves with rest is the characteristic examination finding in myasthenia. Most muscles with voluntary activation have a large variability of strength even under normal conditions because of effort. Evaluating extremity strength greatly depends on the experience of the examiner. Ptosis and extraocular muscle deficits are relatively free of a voluntary component and provide a more objective measure. Fatigable and rapidly fluctuating asymmetric ptosis is a hallmark of myasthenia gravis. The rapid fluctuation results from improvement during even very short periods of rest, such as blinking. Besides fast variability in the degree of ptosis, it may altogether shift quickly from one eye to the other, known as “shifting ptosis.”22 Ptosis should be evaluated with the patient sitting comfortably, the head held in primary position without tilting. The patient fixates on a distant object (eg, a spot on the wall) and is asked to refrain from blinking and to relax the forehead muscles. Frontalis contraction, a mostly involuntary compensatory mechanism, is a common and characteristic sign in myasthenic patients with ptosis. Relaxing the forehead muscles may be difficult for some patients. The examiner measures palpebral fissure width at eye level during forward gaze and again during prolonged upward or lateral gaze for 30 seconds.22,25 The more ptotic eyelid should be used for additional provocative tests, such as the ice pack, rest, and sleep tests. The ice pack test is performed by placing a latex glove finger filled with crushed ice over the more ptotic eyelid for 2 minutes. During the rest test, the patient places a glove filled with cotton (a placebo) over the more ptotic eyelid while holding the eyes closed for 2 minutes. During the sleep test, the patient is left in a quiet dark room with the eyes closed for 30 minutes. Complete or almost complete resolution of ptosis or at least a 2-mm increase in palpebral fissure width

Myasthenia Gravis

constitutes a positive response to these maneuvers. It is important to evaluate the improvement immediately after the tests because the lids may quickly begin to droop again. The curtain sign (also known as “enhanced ptosis” or “paradoxic ptosis”) is usually observed in patients with some initial ptosis. The patient looks straight ahead and refrains from blinking. The examiner holds one eye open, which results in the other lid starting to droop more (like a curtain falling). The lid twitch sign occurs when the patient opens the eyes after gentle closure or follows the examiner’s finger down and then back up to eye level. The lids overshoot or twitch for a fraction of a second before settling into position and starting to droop.26 Asymmetric weakness of extraocular muscles is commonly observed in myasthenia when sustained lateral gaze or up gaze worsens or induces double vision. The cover-uncover test may be performed to bring out subtle extraocular weakness. As the patient fixates on an object in the distance, the examiner covers one eye while observing for deviation of the uncovered eye during lateral and then upward gazing. With extraocular weakness, the uncovered eye will drift. The examination is completed by repeating the procedure for the opposite eye. Quiver eye movements are fast, small-twitch, “lightning-like” or “jerk-like” movements of the eyes on changing direction of gaze. They are said to occur even in the setting of profound ophthalmoplegia.27 Although patients rarely complain of facial weakness, it is often found on examination. Severe facial weakness results in a characteristic transverse smile. Orbicularis oculi weakness is demonstrated as the examiner tries to separate the eyelids against forced eye closure. Orbicularis oculi fatigue may be observed on gentle eye closure. After complete initial apposition of the lid margins, they separate within seconds and the white of the sclera starts to show (positive peek sign) (Figure 34-2).28 The iris should not be visible because of the eyeballs being rolled up (Bell phenomenon). The iris may be visible if the patient is not trying to close the eyes voluntarily (in the case of a conversion reaction and functional weakness) or in case of severe ophthalmoplegia. Tongue and pharyngeal weakness will result in the patient’s speech becoming slurred or nasal, especially with prolonged speaking. Other commonly weak muscles include neck flexors, deltoids, hip flexors, finger/wrist extensors, and foot dorsiflexors. The muscles should be repeatedly tested against manual resistance, with a brief rest between repetitions. Having the patient hold the head above the pillow in the supine position and having the patient hold the arms outstretched in abduction at the shoulder for 1 minute are ways to test for fatigability of neck flexors and deltoids, respectively. Involvement is often asymmetric. The remainder of the neurologic examination results, including those for deep tendon reflexes and sensory examination, must be normal.

Anticholinesterase Tests Edrophonium chloride is a fast- and short-acting acetylcholinesterase inhibitor that may be administered in the office setting to diagnose myasthenia gravis (Box 34-1). Its effect 451

CHAPTER 34

The Rational Clinical Examination

Primary Position

Gentle Voluntary Lid Closure

Rapid muscular fatigue

Presence of asymmetric ptosis

Bilateral apposition of lids

Separation of lid margins and exposure of sclera

Figure 34-2 Peek Sign Orbicularis oculi weakness may be indicated by a positive peek sign after gentle eyelid closure. After complete initial apposition of the lid margins, they quickly (within 30 seconds) start to separate, and the sclera starts to show (ie, a positive peek sign). The presence of a peek sign increases the likelihood of myasthenia gravis (likelihood ratio, 30; 95% confidence interval, 3.2-278), but absence of the peek sign does not rule it out.

Box 34-1 Edrophonium Test

Establish reliable peripheral intravenous access. Prepare a syringe with 2 mg of atropine (available in ampoules of 0.4 or 1 mg/mL) as a precaution. Prepare 1 mL (10 mg) of edrophonium in a tuberculin syringe (edrophonium is available in a 10 mg/mL solution in a 1-mL ampoule [10 mg] or in a 10-mL vial [total of 100 mg]). Inject 2 mg (0.2 mL) slowly for 15 seconds while observing for an objective improvement in target muscles. Improvement should occur within 30 seconds and disappear in 5 minutes; if there is no response or no significant adverse effects, administer the remaining edrophonium (8 mg [0.8 mL]), for a total dose of 10 mg. Atropine should be injected (0.5 or 1 mg) in case of clinically significant bradycardia, respiratory distress, or syncope.a a

Routine administration of atropine simultaneously with edrophonium for the purpose of diagnostic testing for myasthenia gravis is not recommended. Bartley and Bullock29 recommend using a 3-way stopcock, with the edrophonium-containing syringe attached to the direct port and the atropine-containing syringe attached to the side port so that atropine may be quickly injected in case of severe adverse effects.

usually occurs within 30 seconds and lasts less than 5 minutes. Most myasthenic muscles respond to the test dose of 2 mg, but many will require more. Adverse effects are rare and usually mild (excess salivation, sweating, abdominal cramps, or fecal incontinence). Serious adverse effects, such as bradycardia, asystole, and bronchoconstriction, occur infrequently ( .05) or range of sensitivity and specificity across studies of 15% or less. We pooled studies satisfying at least 1 criterion and calculated LRs by simple combination of results across studies. The 95% confidence intervals (CIs) were calculated according to the method of Simel et al.11

RESULTS

Myocardial Infarction

lower probability of MI, namely, pleuritic, positional, and sharp chest pain, typically showed a lower level of agreement for all comparisons. The precision of the medical history obtained also depends on the reliability of the sources themselves. Kee et al13 assessed the reliability of a reported family history of MI from patients who had recently survived MI with that of other documented sources, including hospital charts and death certificates. They reported a moderate level of agreement, with a κ of 0.65. Few studies have evaluated the precision of features of the physical examination in the assessment of patients with suspected MI. One study did evaluate the interobserver agreement among 3 clinicians in the assessment of physical symptoms and signs of heart failure in 102 MI patients.14 As shown in Table 35-3, agreement was high for dyspnea, as well as for the displaced apex beat. However, the level of agreement for the other physical symptoms and signs of heart failure, particularly the assessment of pulmonary rales and hepatomegaly, was considerably lower.

Precision of the Medical History and Physical Examination

Precision of the Electrocardiogram Interpretation

Precision refers to the degree of variation between observers (interobserver variation) or within observers (intraobserver variation) regarding a particular clinical finding. Hickan et al12 studied the precision of an important aspect of the history, namely, that of chest pain. They assessed the interobserver agreement in chest pain histories obtained by general internists, nurse practitioners, and self-administered questionnaires for 197 inpatients and 112 outpatients with chest pain. As outlined in Table 35-2, the 2 internists, who each independently interviewed 47 of 197 inpatients, showed high agreement for 7 of the 10 items, including location and description of the pain, as well as aggravating and relieving factors. Agreement was slightly lower between internist and questionnaire and between the nurse practitioners and internist, with the lowest level of agreement between nurse and questionnaire. Features of the chest pain associated with a

Unfortunately, most studies that assessed the precision of ECG interpretation reported the percentage agreement between clinicians, without taking into account chance agreement through the use of κ or other statistical measures.15 Precise interpretations are important because they are made at the bedside and set off immediate management strategies. There are several factors that may influence the interpretation of the ECG, including the clinical observation of the patient and clinical data (expectation bias), as well as the training and experience of the individual reading the ECG. Although they must be interpreted with caution, the results of earlier studies suggest appreciable variability in precision in the interpretation of ECGs. In one of the earlier studies,16 10 clinicians with experience in cardiology read 100 ECGs on 2 separate occasions and classified the tracings as normal, abnormal, or infarction. The 3 clinicians

Table 35-2 Interobserver Agreement in Recording Chest Pain Histories a Inpatients (n = 197) Attribute Pain radiates to left arm Pain relieved by nitroglycerin History of myocardial infarction Pain in substernal location Pain brought on by exertion Pain described as “pressure” Patient must stop activities when pain occurs Pain brought on by cough or deep breath Pain described as “sharp” Pain brought on by moving arms or torso

Outpatients (n = 112)

Two Internists, κ

Internists and Questionnaire, κ

Nurse and Internists, κ

0.89 0.79 0.78 0.74 0.63 0.57 0.50 0.44 0.30 0.27

0.58 0.51 0.81 0.50 0.51 0.37 0.47 0.30 0.26 0.44

0.43 0.94 0.70 0.38 0.42 0.49 0.44 0.55 0.33 0.52

Nurse and Questionnaire, κ 0.41 0.77 0.81 0.19 0.22 0.50 0.40 0.62 0.31 0.54

a

Adapted, with permission, from Hickan et al.12

465

CHAPTER 35

The Rational Clinical Examination

agreed completely in only one-third of the ECGs. After a second reading, the clinicians disagreed with 1 of 8 of their original reports. Gjorup et al17 had 16 residents in internal medicine read 107 ECGs of suspected MI patients and assess whether signs indicative of acute infarction were present. There was disagreement in approximately 70% of the cases. Brush et al18 reported much higher agreement in a study in which 2 clinicians classified 50 ECGs according to evidence of infarction, ischemia or strain, left ventricle hypertrophy,

Table 35-3 Interobserver Agreement in Assessment of Physical Symptoms and Signs of Heart Failure in Patients With Myocardial Infarctiona Physical Sign

Range, κ

Dyspnea Displaced apex beat S3 gallop Rales Neck vein distention Hepatomegaly Dependent edema

0.62-0.75 0.53-0.73 0.14-0.37 0.12-0.31 0.31-0.51 0-0.16 0.27-0.64

a

Adapted, with permission, from Gadsboll et al.14

LBBB, or paced rhythm. They obtained agreement in 45 of the 50 cases (κ = 0.69). The precision in the interpretation of ECGs appears to increase with experience. Eight cardiologists interpreted ECGs of 1220 clinically validated cases of various cardiac disorders, including anterior, inferior, or combined MI, as well as right, left, or biventricular hypertrophy.19 The interobserver agreement among cardiologists was reasonably high, with an average κ of 0.67. For the 125 selected ECGs that were read twice by each cardiologist, different diagnoses were given for 10% to 23% of the ECGs (intraobserver reproducibility, 77%-90%). Sgarbossa et al20 assessed the precision of features of the ECG that may aid in the diagnosis of acute MI in the presence of LBBB. In this study, 4 investigators read 2600 ECGs and achieved a κ of more than 0.85 for QRS-complex and T-wave polarities, with a high degree of correlation among the investigators for interpretation of ST-segment deviation (Pearson product moment correlation coefficient, > 0.9).

Studies Used to Determine Accuracy of the Medical History, Physical Examination, and Electrocardiogram Table 35-4 summarizes features of the 14 studies8,21-33 used to determine the accuracy of the medical history, physical

Table 35-4 Features of Studies Used to Determine Accuracy of the Medical History, Physical Examination, and Electrocardiogram Methodologic Qualitya

Inclusion Criteria

Rude et al,21 1983

A

Yusuf et al,22 1984

B

Pozen et al,8 1984

A

Lee et al, 23 1985

A

Tierney et al,24 1986

B

Herlihy et al,25 1987

B

Klaeboe et al,26 1987

B

Rouan et al,27 1989

A

Solomon et al,28 1989

A

Berger et al,29 1990

B

Weaver et al,30 1990

C

Jonsbu et al,31 1991

B

Karlson et al,32 1991

A

Kudenchuk et al,33 1991

C

Consecutive patients admitted to CCU with suspected MI Consecutive patients admitted to CCU with suspected MI Consecutive patients presenting to ED with chest pain Consecutive patients presenting to ED with chest pain Consecutive patients presenting to ED with chest pain Consecutive patients admitted to CCU with suspected MI Consecutive patients admitted to CCU with suspected MI Consecutive patients presenting to ED with chest pain Consecutive patients presenting to ED with chest pain Consecutive patients admitted to hospital with chest pain Patients with chest pain brought to ED by paramedics Consecutive patients admitted to hospital with suspected MI Consecutive patients admitted to hospital with suspected MI Patients brought to ED by paramedics

Source, y

Incidence of MI, %

No. of Patients (% Women)

Age, y

50

3697 (38)

Mean = 61

United States

85

475 (15)

Mean = 56

United Kingdom

NR

2801 (NR)

United States

17

596 (52)

Men ≥ 30 Women ≥ 40 ≥25

12

492 (NR)

United States

44

265 (NR)

Men ≥ 30 Women ≥ 40 NR

59

237 (36)

14

7115 (50)

≥30

United States

14

7734 (50)

≥30

United States

36

278 (31)

57

Switzerland

18

2472 (NR)

.50 and were not tested in the multivariate model. A multivariate model was developed using the independent predictors: MI score = 116 + 1.0 × (age) + 23 × (male) + 21 × (right arm pain) + 18 × (ex-smoker) + 11 × (left arm pain) + 15 × (vomit) + 15 × (smokes) + 10 × (burning pain) (Male = 1, female = 0. If symptom present, substitute 1; if negative or unknown, substitute 0.) MI probability = [exp(score/11)]/[1 + exp(score/11)]

CONCLUSIONS LEVEL OF EVIDENCE Level 1. STRENGTHS These data were prospectively collected from

chest pain patients who did not intially have an obvious diagnosis. Current standards for diagnosis with troponin were used, and 91% of the patients with ACSs had ischemia. LIMITATIONS It is impossible to know whether the findings

would work better or worse had the results been reported for all patients with chest discomfort. Most excluded patients were excluded for findings not related to the clinical symptoms (eg, an abnormal ECG result, previous diagnosis of coronary heart disease with prolonged pain). The authors observed that their population was younger (average age, 50 years) than most populations of patients with chest pain. These data are important in that they used current standards of diagnosis with cardiac troponin levels. Clinicians must understand the study population before using the results—these patients had an uncertain diagnosis because

Myocardial Infarction

those with an abnormal ECG or with prolonged or recurrent chest pain typical of diagnosis before their previous angina were excluded. In addition, patients with obvious noncardiac chest pain or those requiring admission independent of their ACS were excluded. After these exclusions, the remaining patients were those for whom the clinician might be most reliant on the clinical symptoms, representing a common problem for emergency department physicians. The data for indigestion/burning pain are counterintuitive in suggesting that the finding increases the likelihood for an acute MI. Astute clinicians will recognize that the study population did not include patients discharged from the emergency department who presented with indigestion/burning as the primary symptom, despite less important chest discomfort associated with their indigestion. Thus, the sensitivity and specificity of indigestion/burning pain might be quite different among all patients presenting with chest discomfort. In addition, indigestion/burning might have been a referral filter applied at the patient level in that patients presenting to the emergency department with a burning/indigestion type of pain likely represented those who could have self-medicated without relief or those with exceptionally severe discomfort. The relative lack of importance for left arm pain radiation in comparison to right arm radiation also seems counterintuitive. Of the total patient population, there was no pain radiation in 38% and radiation to the left arm in 27%; only 6% of patients had right arm pain radiation. Most clinicians consider left arm pain radiation as a feature that suggests chest pain of cardiac origin. Patients may recognize left arm pain radiation as suggestive of an MI, making those experiencing any left arm pain more likely to come to the emergency department even when cardiac ischemia is an unlikely diagnosis (eg, musculoskeletal pain or cervical pain radiating to the left arm). There are 2 other possible explanations for the lesser importance of left arm pain in this population. First, it is possible that left arm pain occurs even more frequently in patients with obvious ACSs associated with ECG changes (these patients were excluded from this study). Second, in other studies, most patients with right arm pain also had bilateral arm pain radiation that would make left arm pain alone appear less important. Once the importance of left arm pain is used to identify patients with possible ACS, the presence of left arm pain may no longer be independently useful in identifying those with MI vs those without MI. Acknowledgment

Steven Goodacre kindly provided the results of the multivariate model and information about the clinical exclusion of patients with an obvious noncardiac cause for chest discomfort. Reviewed by David L. Simel, MD, MHS

REFERENCE FOR THE EVIDENCE 1. Alpert JS, Thygesen K, Antman E, Bassand JP. Myocardial infarction redefined—a consensus document of the Joint European Society of Cardiology/American College of Cardiology Committee for the redefinition of myocardial infarction. J Am Coll Cardiol. 2000;36(3):959-969.

E35-3

CHAPTER 35

Evidence To Support The Update

CONCLUSIONS TITLE Using Patient-Reportable Clinical History Factors to Predict Myocardial Infarction.

LEVEL OF EVIDENCE Level 4. STRENGTHS These data could be important when they are

AUTHORS Wang SJ, Ohno-Machado L, Fraser HSF, Kennedy RL.

applied to the appropriate patient population before the ECG is obtained.

CITATION Comp Biol Med. 2001;31(1):1-13.

LIMITATIONS There is no description of how the study population was obtained and the disease status verified. However, the incidence of MI suggests that the study included all patients with chest pain in the emergency department. The data support the commonly held notion that chest pain with diaphoresis or nausea, especially when radiating to the left arm or in a smoker, increases the probability that the patient is having a MI. One of the important findings in the study was the comparison of variables selected by the cardiologist to those remaining in the final model because of their statistical significance. The differences between variables selected by the cardiologist vs the computer highlight findings that might be inappropriately overweighted or underweighted by clinicians. After age, the findings of diaphoresis, nausea, and left arm pain are the variables that increased the probability of MI the most. The variables that decrease the likelihood the most are pleuritic type pain and episodic pain. These data should be applied to patients who have not had ECGs. Thus, the model could be used at triage of the patient, but it requires validation in an appropriate population with current diagnostic standards. The finding that a previous MI or angina decreases the probability of a current MI seems counterintuitive. However, if patients with a history of ischemic heart disease are more likely to use emergency services for any given episode of pain, then the proportion of visits for a new MI might be less than in those with no history.

QUESTION Using only clinical factors, without electrocardiogram (ECG) data, can a logistic model be created that predicts myocardial infarction (MI)? DESIGN The variables identified in a previous study1 were collected in 2 patient populations. The details of whether the study included prospective, consecutive patients who had an independently applied reference standard are not provided. SETTING Two British hospitals. The logistic model was developed on patient data from one hospital and then tested on patients from a second hospital. PATIENTS The patient population consisted of patients with an MI prevalence of 22% in the first hospital and 31% in the hospital where the model was verified. Presumably, the clinicians suspected that all these patients had acute cardiac ischemia, but details about the patient population are not specified.

MAIN OUTCOME MEASURE Accuracy (c-index) of the logistic model when evaluated in the validation sample from the second hospital.

MAIN RESULTS The model had an accuracy of 84% in the validation set. MI score = –92 + 1.0 × (age) + 17 × (diaphoresis) + 14 × (nausea) + 11 × (smokes) + 11 × (left arm pain) + 8 × (male) – 44 × (pleuritic pain) – 30 (episodic pain) – 15 × (sharp pain) – 15 × (previous angina) – 12 (previous MI) (If symptom present, substitute 1; if symptom absent, substitute 0.) MI probability = [exp(score/15)]/[1 + exp(score/15)] An expert cardiologist picked the variables anticipated to be important in the logistic model. The variables identified by the cardiologist as important, but that were not independently valuable in a multivariable model included diabetes, hyperlipidemia, severe chest pain quality, retrosternal pain, left chest pain location, postural pain, pain that worsened, and pain that was worse than previous angina. The variables identified in variable selection by the computer that were not selected by the cardiologist were a sharp quality to the pain and the presence of nausea.

E35-4

Reviewed by David L. Simel, MD, MHS

REFERENCE FOR THE EVIDENCE 1. Kennedy RL, Garrison RF, Burton AM, et al. An artificial neural network system for diagnosis of acute myocardial infarction (AMI) in the accident and emergency department: evaluation and comparison with serum myoglobin measurements. Comput Methods Programs Biomed. 1997;52 (2):93-103.

36

C H A P T E R

Does This Woman Have Osteoporosis?

CLINICAL SCENARIOS You recommend screening densitometry to a healthy 64-year-old woman. She will have to drive 1 hour to the nearest testing center, and she does not believe that she needs the test. To further assess her risk, you note that she weighs 49 kg (108 lb). What can you tell this patient about her probability of osteoporosis?

CASE 1

CASE 2 A frail, 79-year-old woman is admitted to the hospital with a diverticular bleeding event. On examination, you observe that she has significant kyphosis. When she stands upright against a wall, she cannot touch the back of her head to the wall. You wonder whether she has vertebral fractures. CASE 3 A 58-year-old woman presents for her annual

Amanda D. Green, MD Cathleen S. Colón-Emeric, MD, MHSc Lori Bastian, MD, MPH

examination. She experienced physiologic menopause 8 years ago but is asymptomatic and has no other risk factors for osteoporosis. On examination, you note that her rib-pelvis distance is 1 fingerbreadth. She tells you that she has developed a humped back. Should this patient be referred for densitometry?

Matthew T. Drake, MD, PhD Kenneth W. Lyles, MD

WHY IS THE CLINICAL EXAMINATION IMPORTANT? Osteoporosis causes 1.5 million fractures per year in the United States.1 As the population continues to age, this number is expected to double by 2040.2 Half of all postmenopausal women and 15% of white men older than 50 years will have an osteoporosis-related fracture in their lifetime, with 15% of those occurring in the hip. Pain, loss of independence, impaired ambulation, depression, and nursing home admission are common sequelae.3-8 In 1995, health care spending for osteoporotic fractures in the United States was $13.8 billion and is estimated to be $31 billion to $62 billion by 2020.9 The US Preventive Services Task Force recommends that women 65 years of age or older be screened routinely for osteoporosis and women younger than 65 years be screened if they have risk factors.10 There are no current guidelines on when to screen healthy perimenopausal women, and few to no risk factors identified for men. The physical examination may assist clinicians in preventing osteoporotic fractures in several ways. First, it may identify patients with low bone mineral density (BMD), in whom routine screening is not currently recommended or has not been completed. It may also identify patients at low risk of osteoporosis, in whom BMD testing is unnecessary. Although it is an imperfect indicator of fracture risk, BMD measurement is widely used both in randomized controlled trials and in clinical practice as the primary criterion for initiating osteoporosis therapies.

Copyright © 2009 by the American Medical Association. Click here for terms of use.

477

CHAPTER 36

The Rational Clinical Examination

Second, the physical examination could identify patients with occult vertebral fracture. Two-thirds of vertebral fractures are clinically silent but are associated with a 2- to 3-fold increased risk of further fractures. Several osteoporosis therapies reduce the risk of further fractures in women with vertebral fractures, and the National Osteoporosis Foundation algorithm suggests that patients found to have vertebral fracture should be treated regardless of their BMD measurement.11 Thus, the objective of this review was to identify clinical examination findings that improve the identification of patients with low BMD or occult vertebral fractures who would benefit from therapy or in whom further screening with BMD testing is unnecessary.

Case Definitions and Pathophysiology Osteoporosis is a skeletal disorder characterized by compromised bone strength, predisposing a person to an increased risk of fracture. For this review, we used the World Health Organization’s definition of osteoporosis, based on BMD that compares a patient’s density to normative values for a population of 20- to 40-year-olds in terms of the number of deviations from the mean value. Osteoporotic bones have a density that is more than 2.5 SD below the mean (T score < –2.5). Osteopenic bones have a T score that is between –2.5 and –1. Normal bones have a BMD T score of –1 or higher.12 Vertebral fractures are compression deformities that reduce vertebral body height by 20% or more on imaging studies; most of the articles included in this review used a semiquantitative technique to diagnose vertebral fractures on plain lateral radiographs of the spine. Spinal fractures are classified by the maximal percentage of vertebral body height loss as follows: grade 1, 20% to 24%; grade 2, 25% to 39%; and grade 3, 40% or more.13 The prevalence of osteoporosis in large population-based studies allows an estimation of the pretest probability in women of various ages. The prevalence of BMD-defined osteoporosis at the spine, wrist, or hip in white women in the United States by decade is as follows: for aged 50 to 59 years, 15%; 60 to 69 years, 22%; 70 to 79 years, 38%; and 80 years or older, 70%.14

Table 36-1 Prevalence of Vertebral Deformities in Women Aged 50 Years or Older17 Vertebral Deformity, %a Age, y

≥ Grade 1

≥ Grade 2

50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 ≥ 90

10 12 12 17 30 33 56 49 75

4.7 6.6 8.9 12 21 29 49 47 75

a

Grade 1 or greater is equal to 20% or more vertebral body height loss; grade 2 or greater is equal to 25% or more vertebral body height loss.

478

For nonwhite women older than 50 years, the prevalence of BMD-defined osteoporosis in the Third National Health and Nutrition Examination Survey was reported as follows: nonHispanic black women, 12%; Mexican Americans, 19%; and women in other ethnic groups, 28%.15 In special populations, the prevalence of osteoporosis can be much higher. For example, in residents of skilled nursing facilities who are older than 75 years, the prevalence of osteoporosis exceeds 50% for all the residents, regardless of race and sex.16 Occult vertebral fractures are also common and increase with age (Table 36-1). Grade 2 vertebral deformities are found in 6.6% of women aged 55 to 59 years and in 49% of women aged 80 to 84 years.17 Clinical characteristics or historical items that might increase a clinician’s pretest probability of osteoporosis or vertebral fracture include older age, low activity level, family history, hypogonadism (men), and exposure to glucocorticoids and alcohol. The pretest probability threshold for testing BMD depends on the anticipated benefit of treatment for an individual patient and the patient’s desire for treatment. The pathophysiology of osteoporosis is related to physical examination findings in several ways. The loading or mechanical forces on bone tend to increase bone formation and bone mass through osteoblast stimulation. Thus, increasing body weight and muscle strength are inversely related to osteoporosis. Type I collagen is a major constituent of both bone and skin that is reduced with advancing age and low estrogen levels.18-20 Skinfold thickness may therefore reflect skeletal collagen content. Similarly, tooth loss is influenced by mandibular alveolar bone quality and may provide an easily observed marker of bone health in the rest of the skeleton. The sequelae of clinically occult vertebral fractures can also lead to physical examination findings that may become apparent before a symptomatic fracture occurs. Height loss resulting from vertebral compression fractures can be measured in the clinic over time or with the patient’s recalled maximal adult height. Vertebral fractures affect height but not arm span, so arm span–height differentials may identify individuals with occult vertebral fractures.21 Thoracic kyphosis can result from anterior compression fractures in the thoracic spine (“dowager’s hump”). Kyphosis can be measured on physical examination with a curved ruler such as an architect’s rule or by measuring the wall-occiput distance. The wall-occiput distance describes the difference between the wall and the patient’s occiput when he or she stands straight with heels and back against the wall. Lumbar fractures also result in decreased rib-pelvis distance that can be measured in fingerbreadths on examination.

How to Elicit the Relevant Signs Data for several physical examination signs are included in this review. Weight and height are routinely measured in the clinical setting. Aside from clinic notes, height change can be documented from alternate sources (such as a driver’s license) or from the patient’s memory of height at age 25 years.22-24 Several studies have shown good to excellent correlation between elderly patients’ recalled maximal height and

CHAPTER 36 previous health records.25-27 A stadiometer (an upright bar marked with a height scale with a sliding notch to designate height) is the most accurate method of height measurement. Arm span–height differential is determined by subtracting a patient’s height in centimeters from the arm span in centimeters measured with arms at a 90-degree angle from the trunk. The arm span is the distance between the tips of the middle fingers while the patient faces forward with the arms fully extended and palms facing forward. Measurements of thoracic kyphosis can be made indirectly on radiographs but can also be directly measured by applying an architect’s semiflexible rule, called a flexicurve, to the patient’s back.28 The flexicurve is a device that can be bent in 1 plane only and retains its shape after application to the curvature of the back between the C7 spinous process and S2 spinous process. The outline is traced on paper, and the maximal angle is measured with calipers or a ruler.29 The kyphosis index is the ratio of thoracic curvature to the length of the upper back and is calculated as 100 times the maximum horizontal distance divided by the vertical length of the upper back curve. Flexicurve measurements, although painless, inexpensive, and safe, are time consuming.30

A

Wall-Occiput Test for Occult Thoracic Vertebral Fractures Negative test result

Positive test result

Osteoporosis

Another measure that quantitates the degree of kyphosis is wall-occiput distance. It is measured while the patient stands straight with his or her back against the wall and heels touching the wall (Figure 36-1). While the head faces forward so that an imaginary line connecting the lateral corner of the eye to the superior junction of the auricle of the ear is parallel to the floor, the distance between the occipital prominence and the wall is quantified with a tape measure.31 For the purpose of this review, the inability to touch the wall with the back of the head is a positive finding. Rib-pelvis distance is a measure of lumbar fracture. The patient stands erect with arms outstretched at 90 degrees. The examiner stands behind the patient and inserts his or her fingers into the space between the inferior margin of the ribs and the superior surface of the pelvis in the midaxillary line. The rib-pelvis distance is the closest whole number of fingerbreadths between these structures.32 Skinfold thickness is measured at the back of the hand with calipers.18-20 The back of the hand is a convenient site for measurement in the clinic.19 The fourth metacarpal longitudinal fold site was used in the studies of skinfold thickness included in this review.

B

Rib-Pelvis Distance Test for Occult Lumbar Vertebral Fractures Negative test result

Positive test result

Wall-occiput distance >0 cm

Rib-pelvis distance ≤2 fingerbreadths

Figure 36-1 Physical Examination Tests for Detection of Occult Vertebral Fractures A, Wall-occiput test is used to detect occult thoracic vertebral fractures. A positive test result in this review is defined as being unable to touch the wall with the occiput when standing with the back and heels against the wall and the head positioned such that an imaginary line from the lateral corner of the eye to the superior junction of the auricle is parallel to the floor. B, Rib-pelvis distance test is used to detect occult lumbar vertebral fractures. A positive test result is defined as a distance of less than or equal to 2 fingerbreadths between the inferior margin of the ribs and the superior surface of the pelvis in the midaxillary line.

479

CHAPTER 36

The Rational Clinical Examination

Hand grip strength is measured using a small hydraulic hand grip or isometric dynamometer and is defined as the maximal force recorded while the patient squeezes the device with arms straight to the side.33,34

METHODS We searched MEDLINE for articles from 1966 through August 2004, with a search strategy similar to that used by other authors in this series.35 We used several National Library of Medicine Medical Subject Headings to encompass osteopenia, osteoporosis, and spinal fracture disease states: “exp osteoporosis,” “exp spinal fracture,” “exp metabolic bone disease” (for osteopenia), and “exp bone density.” The MEDLINE search was supplemented with a manual review of the bibliographies of all identified articles, additional review articles including recent osteoporosis guidelines, 4 clinical skills textbooks,36-39 and contact with experts in the field. Two authors (A.D.G. and M.T.D.) independently executed the MEDLINE search strategy and reviewed titles and abstracts from the search results. Two authors (A.D.G. and C.S.C.-E.) then independently reviewed and extracted data from articles or abstracts identified as relevant. We contacted authors for original data when articles reported data on the precision of signs in diagnosing osteoporosis or spinal fracture but did not include enough information to calculate likelihood ratios (LRs). We included studies in our review if they included original data on the accuracy or precision of the medical history or physical examination in diagnosing osteoporosis, osteopenia, or spinal fracture. We required that the gold standard comparison for the clinical examination parameters be bone densitometry at any site or documented vertebral fracture using either a semiquantitative technique or vertebral morphometry. When BMD values were reported directly, the corresponding T score was obtained with sex-appropriate tables provided by the manufacturer of the densitometer used in the study. Articles were excluded if they contained insufficient data to allow calculation of LRs. We included in our tables and results only the physical examination parameters that are feasible to perform in a clinical setting.

Quality Assessment of Included Articles Two authors (A.D.G. and C.S.C.-E.) independently assessed the methodologic quality of included articles using criteria adapted from other authors in this series.40 Level 1 evidence classifies articles that were independent (neither the test result nor the gold standard result was used to select patients for the study), studied consecutive patients representative of a population for which the test is likely to be used, were blinded, measured the gold standard (BMD measurement or documented fracture) in all patients, and included at least 100 study participants. Level 2 evidence met criteria for level 1 evidence, but fewer than 100 patients were studied. Level 3 evidence was the same as level 2 evidence, but the population was nonconsecutive or nonrepresentative. Studies of lower 480

levels of evidence were excluded. Disagreements were resolved by discussion and consensus.

Data Analysis We used raw data from reported studies that met our inclusion criteria to calculate values and 95% confidence intervals for sensitivity, specificity, and positive likelihood ratio (LR+) and negative likelihood ratio (LR–), using SAS statistical software, version 8.0 (SAS Institute Inc, Cary, North Carolina).

RESULTS Study Characteristics We identified 246 articles with our search strategy and an additional 79 from reference lists and expert consultation. Fourteen studies met inclusion criteria and were identified for final review (Tables 36-2 and 36-3).

Precision Table 36-4 lists reported precision estimates for the physical examination maneuvers. Interrater reliability was not reported for studies of height and weight included in this review. Differences in sensitivity and specificity for the same maneuver across different studies could be related to examiner differences that were not reported.

Diagnostic Accuracy The most clinically relevant cut points and their associated LRs for the physical examination maneuvers are listed in Table 36-5 for osteoporosis and Table 36-6 for vertebral fracture. In general, the patient populations were women, with most patients from osteoporosis clinics or older than 65 years. Translating these results to younger women might yield error that is difficult to quantify. Because many of the examination findings may be measuring similar or identical physiologic phenomena, we do not recommend using the LRs in series. For postmenopausal women, prediction rules using osteoporosis risk factors, such as the Simple Calculated Osteoporosis Risk Estimation41 or the Osteoporosis Risk Assessment Instrument,42 have some predictive value in selected populations (Table 36-7).11,43-46 Variables included in these prediction rules include age, weight, and race, which overlap with the clinical examination. An exhaustive review of prediction rules for the diagnosis of osteoporosis or fracture was not attempted in this study because reviews already exist in the literature.42 Although the LR+ of the prediction rules is not clinically informative (1.2-1.7), the LR– is far superior to the physical examination maneuvers listed here (0.02-0.3), making prediction rules much more useful for ruling out osteoporosis or fracture. Thus, clinical prediction rules are the most useful means of identifying women who are at low risk of fracture, in whom BMD screening can safely be deferred.

CHAPTER 36

Osteoporosis

Table 36-2 Studies Used to Determine the Accuracy of Clinical Examination for Diagnosing Osteoporosis Source

Setting and Country

Methodologic Prevalence of Osteoporosis, % Qualitya

Sanila et al23

Outpatients, Finland

Level 3

34

Dargent-Molina et al47

Volunteers for prospective, multicenter trial, France (EPIDOS)

Level 1

50

Level 1

4

Level 1

50

Bedogni et al51

Volunteers for prospective, multicenter trial, France (EPIDOS) Community, Italy

Level 1

8

Ettinger et al28

Outpatients, California

Level 1

Kantoret al53

Outpatients, Ohio

Level 1

Di Monaco et al33

Outpatients, Italy

Level 3

Foley et al34

Outpatients, Ohio

Level 1

Dargent-Molina et al47

Volunteers for prospective, multicenter trial, France (EPIDOS)

Level 1

Orme and Belchetz20

Outpatients, California

Level 3

Earnshaw et al60

Outpatients in multicenter alendronate trial, United Kingdom, United States, and Denmark Outpatients, Japan

Level 1

Michaelsson et al44 Outpatients, Sweden Dargent-Molina et al47

Inagaki et al64

Level 1

Inclusion Criteria

Height Loss Women aged 55-70 y with rheumatoid arthritis, able to walk White women aged ≥ 75 y, general population, without past fractures Weight Random sample of women aged 28-74 y, no exclusions White women aged ≥ 75 y, general population, without past fractures Women aged ≥ 18 y without disease

Kyphosis Consecutive sample of women aged 65-91 y Self-reported Humped Back 10 White women aged ≥ 18 y referred for bone density scan Grip Strength 34 Consecutive postmenopausal, white female volunteers 18 Older, independent adults in the community 50 White women aged ≥ 75 y, general population, without past fractures Hand Skinfold 63 Consecutive women in osteoporosis clinic Tooth Count 33 White postmenopausal women aged 45-59 y 10

11.5

Community women

No. of Patients

Mean Age, y

61

62

4638

80

175

51

4638

80

Diagnosis Used

BMD-diagnosed osteoporosis lumbar BMD < 0.9 on Lunar machine BMD-diagnosed osteoporosis T score < –3.5 SD

BMD-diagnosed osteoporosis T score < –2.5 BMD-diagnosed osteoporosis T score < –3.5 SD

1873

Not reported (range, 49-77)

BMD-diagnosed osteoporosis T score < –2.5

610

73 (range, BMD-diagnosed osteopo72-91) rosis T score < –2.5

2577

60

BMD-diagnosed osteoporosis at the hip T score < –2.5

102

63

BMD-diagnosed osteoporosis T score < –2.5

73

71

4638

80

BMD-diagnosed osteoporosis T score < –2.5 BMD-diagnosed osteoporosis T score < –3.5 SD

225

59

BMD-diagnosed osteoporosis T score < –2.0

1365

53

BMD-diagnosed osteoporosis T score < –2.5

190

Not reported (range, 31-79)

BMD-diagnosed osteoporosis quartiles of BMD reported according to aluminum standard (results calculated in current report using lowest quartile)

Abbreviations: BMD, bone mineral density; EPIDOS, the European Patient Information and Document Service. a See Table 1-7 for a description of Evidence Levels.

481

CHAPTER 36

The Rational Clinical Examination

Height Loss Three studies of postmenopausal women using recalled heights found an association between height loss and vertebral fractures, with 2 of the studies including enough data to calculate LRs (Table 36-5).23,47,48 In the first study, a height loss of more than 3 cm was useful in classifying patients with and without low BMD (LR+, 3.2; LR–, 0.4).23 However, the study population was nonconsecutive female patients with rheumatoid arthritis. In a study of women in the general population, Dargent-Molina et al47 did not find a strong association between height loss of more than 3 cm and osteoporosis (LR+,

1.1; LR–, 0.6). The third study, based on 13732 women in the Fracture Intervention Trial, reported that a self-reported height loss greater than 4 cm since age 25 years was associated with an odds ratio (OR) of 2.8 for vertebral fractures.48 Thus, although height loss is a potentially useful examination tool, the generalizability of this measure is uncertain.

Arm Span–Height Difference Versluis et al21 reported that with age, height declined at twice the rate of arm span. The mean difference in arm span and height was 1.4 cm in women aged 55 to 59 years and

Table 36-3 Studies Used to Determine the Accuracy of Clinical Examination for Diagnosing Spinal Fracture Setting and Country

Source

21

Methodologic Quality

Versluis et al

General practice, The Netherlands

Level 1

Wang et al50

Osteoporosis clinic, Australia

Level 1

Siminoski et al31

Outpatients, Canada

Abstract only

Siminoski et al32

Outpatients, Canada

Level 1

Prevalence of Fracture, %

No. of Patients Mean Age, y

Inclusion Criteria

Arm Span–Height Difference 3.4 If aged 55- White women, aged 55-84 y, 59 y, 21.9 if healthy, in general practices aged 80-84 y 26 in men, not White male and female reported in healthy volunteers aged 18women 92 y compared with consecutive osteoporosis clinic patients aged 45-90 y Wall-Occiput Distance 29 Women aged > 18 y referred to osteoporosis clinic

14

Rib-Pelvis Distance Consecutive women in osteoporosis clinic

Diagnosis Used

449

67.6

Vertebral fractures by morphometry

480

63 for women, 66 for men

Vertebral fractures by morphometry

216

53

Thoracic vertebral fractures by morphometry

781

56.8

Lumbar vertebral fractures by morphometry

Table 36-4 Precision Data Reported in the Studies Used in the Review Source

Sanila et al23 Versluis et al21

Ettinger et al28 Siminoski et al31 Di Monaco et al33 Di Monaco et al33 Orme and Belchetz20

Precision Estimate Used

Height Loss 2.3 cm For height, with range of 0.4 mm to 0.5 cm for tape measure positions Arm Span–Height Difference Intraobserver mean differences Height, 1 mm (–2.3 to 2.5 mm); arm span, 0.9 mm (–2.5 to 4.3 mm) Interobserver mean differences Height, –1.6 mm (–3.2 to 0.1 mm); arm span, –4.6 mm (–7.7 to –1.5 mm) Kyphosis Coefficient of variation (2 independent technicians) 13% For kyphosis index Wall-Occiput Distance Not reported Single examiner measured 3 times Rib-Pelvis Distance Interobserver κ κ = 0.87 For cutoff of 2 finger breadths or less Grip Strength Coefficient of variation 3% Hand Skinfold Mean of 3 measurements Reproducible to within 0.2 mm Coefficient of repeatability

Abbreviation: CI, confidence interval.

482

Precision Estimate (95% CI)

CHAPTER 36 increased to 3.2 cm in women aged 80 to 84 years. Finding an arm span–height difference of 5 cm or greater yielded an LR+ of 1.6 and an LR– of 0.8 for spinal fracture based on these data (Table 36-6). Verhaar et al49 reported that an arm span–height difference cutoff of 3 cm resulted in a sensitivity of 58% and a specificity of 56% for BMD-diagnosed osteoporosis, for an LR+ of 1.3. Wang et al50 found no association between arm span and vertebral fractures in both men and women (LR+ for men, 1.0; LR+ for women, 0.9). We con-

Osteoporosis

clude that the arm span–height difference does not predict vertebral deformities or BMD-diagnosed osteoporosis.

Weight For women, the relationship between both low weight and body mass index (BMI) and osteoporosis has been consistently reported.44 In cohort studies examining clinical risk factors in women, weight lower than 70 kg (154 lb) is the sin-

Table 36-5 Clinical Signs and Symptoms in the Diagnosis of Osteoporosis Source

Cutoff Values

47

Sensitivity, %

Dargent-Molina et al Sanila et al23

>3 cm >3 cm

92 68

Dargent-Molina et al47 Bedogni et al51 Michaelsson et al44

95%a

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aProbability of no allergic-like event with second administration of penicillin.

REFERENCE STANDARD TESTS Penicillin allergy is confirmed by a reliable history of an immediate anaphylactic reaction, positive skin test reactivity, or welldocumented response to a second observed penicillin challenge.

525

This page intentionally left blank

EVIDENCE TO SUPPORT THE UPDATE: Penicillin Allergy

TITLE Represcription of Penicillin After Allergic-like Events.

Table 39-5 The Presence of a Previous Penicillin Allergy Predicts a Future Reaction Test

AUTHORS Apter AA, Kinman JL, Bilker WB, et al. CITATION J Allergy Clin Immunol. 2004;113(4):764770. QUESTION How well does the history of an allergic-like event from penicillin predict subsequent responses after readministration? DESIGN Analysis of a large database. SETTING AND PATIENTS United Kingdom General Practice Research Database of 687 general practitioner practices, representative of England and Wales, and comprising 6% of the population. The database contained records for 3375162 patients who received at least 1 prescription of penicillin.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The General Practice Research Database was assessed for patients who received at least 2 penicillin doses at least 60 days apart. The patients were sorted by those who had an allergic-like response to the first administration and then to subsequent administrations. Allergic reactions were identified by computerized codes within 30 days of the penicillin prescription for anaphylaxis, urticaria, angioedema, erythema multiforme, laryngeal spasm, dermatitis attributed to a drug, toxic epidermal necrolysis, or adverse drug reactions attributed to a medication.

MAIN OUTCOME MEASURES Tables (2 × 2) were created for the documented history of penicillin reactions as a predictor for a subsequent reaction.

MAIN RESULTS With the initial penicillin course, 0.18% of patients had an allergic-like event. Almost 60% of the patients who received

39

History of penicillin reaction

LR+ (95% CI)

LR– (95% CI)

11 (8.5-14)

0.98 (0.98-0.99)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

at least 1 prescription for penicillin also received a second prescription (n = 2017957). According to the history of the initial response to penicillin, the likelihood ratio (LR) for predicting a second reaction can be derived as shown in Table 39-5. For patients who had an initial reaction to penicillin, 1.9% had a reaction to the second course of penicillin. For patients who did not have an allergic-like event to the first prescription, 0.17% had a reaction to the second prescription. The serious reactions to the first penicillin course (n = 3014) included anaphylaxis (n = 16), angioedema (n = 106), laryngospasm (n = 19), and toxic epidermal necrolysis (n = 6). Most patients had urticaria (n = 2275) or erythema multiforme (n = 237). The pattern of reactions to the second penicillin course was similar. About 75% of the prescriptions were for amoxicillin.

CONCLUSIONS LEVEL OF EVIDENCE Review of a large database with out-

comes of uncertain reliability. STRENGTHS Large database that allows the detection of low-

frequency events. The physicians had to concur with the diagnosis, as evidenced by their reporting the diagnostic code. LIMITATIONS Lack of standardized case definitions. A “case” required a second visit by the patient and appropriate coding, both of which would bias the outcomes to underestimate all allergic-like reactions. Although the specificity of the diagnosis for an allergic-like event seems reasonable, there is an assumption that the event was attributable to the penicillin and not to another drug or to illness. The limitations of this study were addressed in an editorial.1

E39-1

CHAPTER 39

Evidence to Support the Update

It is surprising that 48% of patients with an initial allergic-like event received a second course of penicillin. This could have happened because patients forgot their previous reaction and the physician was therefore unaware or because the previous reaction was attributed to another cause. Few patients (1.89%) had a second reaction. When the authors expanded their case definition of reactions to include bronchospasm, asthma, and eczema, the allergic-like events increased to 9% for patients with a previous reaction. The event rate of 9% matches the event rate for patients with a history of penicillin allergy who have a negative skin test result and then are treated again with penicillin.2 If the physicians were efficient in identifying the patients most likely to have a second reaction, then the positive LR of 11 is underestimated (Table 39-5). To highlight this, we can project the low-event rate of second reactions (1.9%) onto the 3198 initial reactors who did not receive a second course of a penicillin. Had those patients received a second course with an allergic-like event, the positive LR for a previous penicillin reaction would have been 16. The inclusion of those patients creates minimal change in the negative LR (0.95). Given the caveats about the data set, it is probably safe to say that the history of a penicillin reaction documented by a physician confers a positive LR greater than 11 for a second reaction. This LR for penicillin allergy is much higher than the LR for the clinical history in predicting allergy as defined by the response to skin testing.

Penicillin allergy was defined by a history of life-threatening anaphylaxis, a positive skin-test result, or no response to an observed oral challenge of penicillin.

MAIN OUTCOME MEASURE Positive predictive value of the history for penicillin allergy.

MAIN RESULTS Two patients had “convincing” life-threatening anaphylaxis and 3 had a positive intradermal skin test result. The remaining 69 patients had an oral challenge with penicillin; none had an adverse reaction, so the negative predictive value is 100% (lower 95% confidence interval [CI], 96%). The positive predictive value of the history of penicillin allergy was 6.7% (95% CI, 2.9%-15%).

CONCLUSIONS LEVEL OF EVIDENCE Positive predictive value study. STRENGTHS Patients with a negative skin test result for

penicillin allergy were given an oral challenge with penicillin.

TITLE History of Penicillin Allergy and Referral for Skin Testing: Evaluation of a Pediatric Penicillin Allergy Testing Program.

LIMITATIONS Small population of patients with uncertainty about whether these were consecutive patients. As with all referral studies of penicillin allergy, this likely does not capture the universe of patients with possible penicillin reactions. Nine percent of the patients were referred for cephalosporin reactions. Although the study was small, the information presented is enhanced by the oral penicillin challenge in patients who had a negative skin test result. Using the generally held notion that about 10% of the population will have a reaction to penicillin, the true allergy rate would be 0.067 × 0.10 = 0.67%. If we take 0.67% as the prior probability and use 6.7% as the posterior probability for a patient with a positive penicillin allergy history, we can solve for the likelihood ratio: Penicillin allergy likelihood ratio = posterior odds/prior odds, or (0.067/0.933)/(0.0067/0.9933) = 10.6.

AUTHORS Langley JM, Halperin SA, Bortolussi R.

Reviewed by David L. Simel, MD, MHS

Reviewed by David L. Simel, MD, MHS

REFERENCES FOR THE EVIDENCE 1. Josephson AS. Penicillin allergy: a public health perspective. J Allergy Clin Immunol. 2004;113(4):605-606. 2. Macy E, Mangat R, Burchette R. Penicillin skin testing in advance of need: multiyear follow-up in 568 test result-negative subjects exposed to oral penicillins. J Allergy Clin Immunol. 2003;111(5):1111-1115.

CITATION Clin Invest Med. 2002;25(5):181-184. QUESTION Does a history of penicillin allergy predict response to skin testing and oral challenge with penicillin? DESIGN Prospective, protocol assessment. SETTING Canadian ambulatory infectious disease clinic. PATIENTS Seventy-four children referred for possible penicillin allergy. Ninety-six percent had generalized cutaneous eruptions.

E39-2

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD

CHAPTER 39

TITLE Penicillin Skin Testing in Advance of Need: Multiyear Follow-up of 568 Test Result–Negative Subjects Exposed to Oral Penicillins.

Table 39-6 The Predictive Value of a History of a Penicillin Allergy Is Modified by Knowing the Response to Skin Testing

AUTHORS Macy E, Mangat R, Burchette RJ. CITATION Allerg Clin Immunol. 2003;111(5):1111-1115. QUESTION Among patients with a history of penicillin allergy, does a negative skin test reaction confirm the lack of penicillin allergy? DESIGN Retrospective medical record review. SETTING Allergy clinic as part of a health care management organization. PATIENTS Patients were adults referred to an allergist for skin testing for suspected penicillin allergy. The symptoms of their allergic reaction were not described.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The computerized pharmacy records of patients who had a negative skin test result for penicillin were reviewed. Using the narrative of the patient’s clinical records, the investigators searched for documentation of an allergic response to penicillin.

MAIN OUTCOME MEASURES Allergic responses were recorded as anaphylaxis, gastrointestinal reactions, hives, other rashes, or other reactions.

MAIN RESULTS During 7 years, 1383 patients were skin tested for penicillin allergy. Among this population of patients with a clinical suspicion for penicillin allergy, 137 had positive skin test results (9.9%) (Table 39-6). The charts of the remaining 1246 patients were studied for penicillin exposures; 568 patients received subsequent penicillin challenges. Among the patients with a history of penicillin allergy, with a negative skin test result, and who were challenged with penicillin, 65 had a reaction. None of the reactions were documented as truly anaphylactic (by chart review, upper 95% confidence interval < 0.7%), with most being “hives” (72%) or other rashes (12%).

Penicillin Allergy

Test

Predictive Predictive Value Value (95% CI) (95% CI) for No for Positive Allergic Response Skin Test to Penicillin Result, % Administration, %

History of penicillin allergy History of penicillin allergy with negative skin test result

9.9 (8.4-12)

Predictive value (95% CI) for Positive Skin Test Result or Subsequent Allergic Reaction, % 15 (13-17)

90 (86-91)

Abbreviation: CI, confidence interval.

CONCLUSIONS LEVEL OF EVIDENCE Level 4. STRENGTHS Large sample size of patients referred for a

possible penicillin reaction. LIMITATIONS Retrospective chart review, relying on nonstandardized clinical documentation of reactions. Patients may have received care outside of the health care management organization, so their results would not have been captured. By defining true penicillin allergy as a positive skin test response or an allergic reaction to a second course of penicillin, then the positive predictive value of a patient’s history of a penicillin allergy is 15%. With the skin test result as a reference standard, about 10% of patients with a reported reaction to penicillin will have a positive reaction. A negative skin test result among these patients makes a subsequent anaphylactic reaction unlikely (20 >25 >30 Heart rate, beats/min >100 >120 Temperature > 37.8°C (100°F) Any abnormal vital sign Chest examination Asymmetric respiration Dullness to percussion Decreased breath sounds Crackles Bronchial breath sounds Rhonchi Egophony Any chest finding

LR–c

Diehr et al,26 1984

Gennis et al,27 1989

Singal et al,28 1989

Heckerling et al,29 1990

Diehr et al,26 1984

Gennis et al,27 1989

Singal et al,28 1989

Heckerling et al,29 1990

...d 3.4 ...

1.2 ... 2.6

... NSe ...

... 1.5 ...

... 0.78 ...

0.66 ... 0.80

... NS ...

... 0.82 ...

NS ... 4.4 ...

1.6 1.9 1.4 1.2

NSe ... 2.4 ...

2.3 ... 2.4 ...

NS ... 0.78 ...

0.73 0.89 0.63 0.18

NS ... 0.68 ...

0.49 ... 0.58 ...

∞ NS ... 2.7 ... NS 8.6 ...

... 2.2 2.3 1.6 ... 1.5 2.0 1.3

... ... ... 1.7 ... ... ... ...

... 4.3 2.5 2.6 3.5 1.4 5.3 ...

0.96 NS ... 0.87 ... NS 0.96 ...

... 0.93 0.78 0.83 ... 0.85 0.96 0.57

... ... ... 0.78 ... ... ... ...

... 0.79 0.64 0.62 0.90 0.76 0.76 ...

Abbreviations: LR+, positive likelihood ratio; LR–, negative likelihood ratio; NS, result not significant. a Only those findings that were significantly associated with the presence or absence of pneumonia in at least 1 study are included (P < .05 in a 2-tailed χ2 or Fisher exact test). b LR+ for pneumonia when finding present, sensitivity/(1 – specificity). c LR– for pneumonia when finding absent, (1 – sensitivity)/specificity. d Ellipses indicate result is not available. e Actual cut points not specified in this study.

from the physical examination in patients with suspected pneumonia according to results from the 4 previously identified studies. LRs for the presence of any individual vital sign abnormality (LR+), including tachypnea, tachycardia, or fever, ranged from 2 to 4. Moreover, various cut points for these abnormalities did not have a substantial effect on the calculated LRs.27 Similarly, LRs for the absence of any individual vital sign abnormality (LR–) ranged from 0.5 to 0.8. However, Gennis et al27 demonstrated an LR– of 0.18 (95% CI, 0.07-0.46) for the diagnosis of pneumonia according to the absence of all 3 vital sign abnormalities (ie, respiratory rate < 30/min, heart rate < 100/min, and temperature < 37.8°C [100°F]). According to this finding, if the baseline prevalence of pneumonia among ambulatory patients with respiratory illnesses is assumed to be 5%, a patient without any vital sign abnormalities would have a predicted probability of pneumonia of less than 1%. The presence of several findings on chest examination significantly raised the likelihood of pneumonia. For example, in one study the presence of asymmetric respirations essentially guaranteed the diagnosis of pneumonia (LR+, ∞; 95% CI, 3.2-∞).26 However the usefulness of this finding was limited because only 4% of patients with pneumonia

had asymmetric respirations. The presence of other findings, including egophony and dullness to percussion, significantly increased the likelihood of pneumonia. However, given the low prevalence of pneumonia in the overall study populations, the effect of observing these findings on estimating the probability of pneumonia was only modest. For example, the presence of egophony had a positive predictive value ranging from as low as 20%26 to no higher than 56%.27 Finally, all 4 studies support the conclusion that the presence or absence of crackles on examination would not be sufficient to rule in or rule out the diagnosis. For example, with a prevalence of pneumonia of 5%, the absence of crackles reduces the probability to 3%, at the lowest, and the presence of crackles increases the probability to 10%, at the highest. Moreover, the absence of any abnormality on chest examination yielded an LR– of 0.57 (95% CI, 0.39-0.83),27 which is too close to the indeterminate LR value of 1.0 to substantially reduce the probability of pneumonia. The low accuracy of individual findings on chest examination for detecting pneumonia has also been supported by studies that relied on retrospective data gathering30,31 or incomplete application of chest radiography to all study 531

CHAPTER 40

The Rational Clinical Examination

patients.32 In one study, the absence of crackles yielded an LR– of only 0.71 (95% CI, 0.47-0.90), and the absence of any abnormal auscultatory finding yielded an LR– of only 0.68 (95% CI, 0.44-0.89), both of which would translate into small effects on the probability of pneumonia.32 In contrast, another study found that the absence of any abnormality on chest auscultation resulted in an LR– of 0.13 (95% CI, 0.070.24),31 which might substantially reduce the probability of pneumonia. However, this result has not been replicated in prospective studies, which would be subject to less bias in the recording of physical examination findings.

Evaluating Algorithms to Predict Pneumonia Because the accuracy of individual symptoms or signs for predicting pneumonia is low, several studies developed prediction rules that incorporate the presence or absence of several medical history or physical examination findings. Table 40-4 summarizes the features of 3 such rules. Though initially designed as aids in the ordering of chest radiographs for patients with suspected pneumonia, they are reasonably considered as prediction rules for the diagnosis of pneumonia in these patients and yield probabilities of pneumonia after

Table 40-4 Predictive Rules for Pneumonia Diagnosed by Chest Radiographya Diehr et al26 b

Add points when present Rhinorrhea –2 Points Sore throat –1 Point Night sweats 1 Point Myalgias 1 Point Sputum all day 1 Point Respiratory rate > 25/min 2 Points Temperature ≥ 37.8°C (100°F) 2 Points 28 Singal et al Probabilityc = 1/(1 + e–Y) Y = –3.095 + 1.214 (Cough) + 1.007 (Fever) + 0.823 (Crackles) Each variable = 1 if present Heckerling et al29 Determine the number of findings present d Absence of asthma Temperature > 37.8°C (100°F) Heart rate > 100/min Decreased breath sounds Crackles Abbreviations: LR+, positive likelihood ratio; LR–, negative likelihood ratio. aAdapted from Emerman et al.33 bFor example, a threshold score of –1 (ie, all patients with scores ≥ –1 are considered to have pneumonia) yields an LR+ of 1.5 and an LR– of 0.22; a threshold score of +1 yields an LR+ of 5.0 and an LR– of 0.47; and a threshold score of +3 yields an LR+ of 14.0 and an LR– of 0.82, according to the original study data.26 cFirst calculate Y and then calculate the predicted probability of pneumonia. dFor example, according to a prevalence of pneumonia of 5%, the presence of 0, 1, 2, 3, 4, or 5 findings yields probabilities of pneumonia of 25/min = 2; temperature ≥ 37.8°C (100°F) = 2 Threshold Score

LR

≥3 ≥1 ≥–1 100/min; decreased breath sounds; crackles Findings Present 5 4 3 2 1 0

536

Probability, % (Baseline Prevalence 5%) 50 25 20 3 1 65 y or comorbiditya Acute onset Chills Pleuritic chest pain Purulent sputum Signs of consolidation on auscultationa Leukocytosis or leukopeniab

LR+ (95% CI)

LR– (95% CI)

2.7 (1.6-4.6) 3.6 (2.0-6.5) 1.60 (0.73-3.40) 1.40 (0.97-2.00) 1.20 (0.56-2.50) 1.10 (0.79-1.50)

0.43 (0.26-0.71) 0.31 (0.17-0.55) 0.86 (0.67-1.10) 0.62 (0.36-1.10) 0.95 (0.74-1.20) 0.86 (0.48-1.60)

2.0 (1.3-2.8)

0.32 (0.16-0.66)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aThe specific comorbidities or signs of consolidation were not described. bLeukocytosis defined as white blood cell count ≥ 11000/μL and leukopenia defined as white blood cell count ≤ 4000/μL.

or comorbidity (OR, 6.9; 95% CI, 2-23) were the only findings that did not include 1 in the OR CI.

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS Microbiologic proof of infection in most

Community-Acquired Pneumonia, Adult

Table 40-8 Likelihood Ratio of Bacterial Pneumonia From a Scoring System Test Bacterial pneumonia score ≥ 5b

Sensitivity (95% CI)

Specificity (95% CI)

0.89 (0.78-0.96) 0.63 (0.54-0.81)

LR+a

LR–a

2.4

0.17

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aThe LRs are estimated from the sensitivity and specificity. CIs cannot be calculated without the raw values. bAge > 65 years or comorbidity = 3 points; acute onset = 5 points; leukocytosis or leukopenia = 2 points.

Among admitted patients with community-acquired pneumonia, an acute onset of disease is the variable that most increases the likelihood of bacterial pneumonia attributed to pyogenic bacteria. An onset that is not acute decreases the likelihood of pyogenic bacterial pneumonia the most. The lack of significance (diagnostic OR not statistically different from 1) for chills, pleurisy, purulent sputum, and auscultatory signs of consolidation is also important. We do not know whether the results can be applied to the 25% of patients receiving ambulatory treatment, because those patients did not have the same microbiologic studies.

patients, including results from lung parenchyma samples. LIMITATIONS Study includes only patients admitted to the

Reviewed by David L. Simel, MD, MHS

hospital. Radiographic results not provided. Patient population not well described in terms of comorbid illness.

E40-3

This page intentionally left blank

41

C H A P T E R

Does This Infant Have Pneumonia? Peter Margolis, MD, PhD Anne Gadomski, MD, MPH

CLINICAL SCENARIO A mother brings her 8-month-old infant to your office in midwinter with a cough. She reports that the illness began 4 days ago with a runny nose. Two days ago, the baby developed a fever. Now the baby’s symptoms are getting worse. The baby has become more irritable, is eating less, and seems to be having more difficulty breathing. This is the third child you have treated today with a cough. While the first two children were treated for acute upper respiratory tract infections, you wonder if the findings in this infant suggest pneumonia.

WHY IS THIS AN IMPORTANT QUESTION TO ANSWER WITH A CLINICAL EXAMINATION? Acute respiratory illnesses are among the most common conditions of infants treated in primary care offices. Although the majority of respiratory illnesses involve infections of the upper respiratory tract, most infants will experience a lower respiratory tract illness (LRI) in the first year of life. Of those with LRIs, about 30% visit a physician,1,2 and about 2% are hospitalized.3 LRIs can be defined simply as infections at an anatomic level below the vocal cords. The majority of LRIs in infants are caused by viruses; only a small proportion is due to bacteria. The differential diagnosis for cough is long (Table 41-1). Therapies are available to treat a variety of manifestations of lower respiratory tract disease, so it is important to diagnose these complaints accurately and estimate their severity to deliver the appropriate treatment. Identifying infants at lower risk of bacterial disease may help clinicians avoid the unnecessary use of antibiotics, which may reduce the risk of subsequent bacterial infection and slow the emergence of resistant strains of bacteria within the population.4 Greater certainty about the presence of a viral LRI may also help clinicians avoid additional testing such as radiography or blood culture. This overview focuses on the medical history and physical examination findings of infants that distinguish pneumonia from other LRIs.

METHODS We conducted a MEDLINE search from 1982 to 1995 to identify articles about the diagnosis of pneumonia in children. We searched for articles with any of the following Medical Subject Heading terms: “pneumonia,” “diagnostic tests,” “sensitivity and specificity,” “reproducibility of results,” “physical examination,” or “medical history taking.” This search was further limited to studies published in English about humans and that involved children. This search strategy identified 38 articles. Four more articles were identified by reviewing a compendium of references prepared by the World Health Organization.5 Etiologic studies, which did not include a chest Copyright © 2009 by the American Medical Association. Click here for terms of use.

539

CHAPTER 41

The Rational Clinical Examination

Table 41-1 Differential Diagnosis of Cough in Infants Anatomic Foreign body Congenital malformation (eg, vascular ring, cystic adenomatous malformation, bronchogenic cyst, tracheomalacia) Inflammatory Reactive airway disease Infectious Viral Croup Laryngotracheobronchitis Bronchitis Viral pneumonia Bacterial Epiglottis Tracheitis Bronchitis Bacterial pneumonia Chlamydia Tuberculosis Other Cystic fibrosis Congestive heart failure Gastroesophageal reflux

radiograph examination as part of the gold standard, involved only inpatients. Studies of illness in families’ homes, rather than in clinical settings, were excluded (n = 29). All the articles were reviewed by the authors, and disagreements were resolved by discussion. We used the methods developed for this series to assess the quality of the articles. The highest-quality studies are emphasized in the “Results” section. We did not aggregate statistically the results of the studies because of differences in the ages of the study samples and differences in cutoff points of key variables, such as respiratory rate. Confidence intervals (CIs) were calculated according to the method suggested by Koopman6 and Centor and Keightley.7

Reference Standard for Diagnosing Pneumonia The reference standard for diagnosing pneumonia is an aspirate from the lower respiratory tract obtained by bronchoalveolar lavage or lung puncture. The use of bronchial lavage is appropriate in guiding antibiotic choice in patients with refractory or complicated pneumonia. In general practice, chest radiographs are readily obtained and can be considered a pragmatic reference standard for pneumonia. A number of studies evaluated the accuracy of the chest radiograph in differentiating viral from bacterial disease in children.8-13 It is difficult to determine the accuracy of the chest radiograph from these studies because of methodologic limitations, as well as problem with study design introduced by the biology of pneumonia. It is not possible to obtain cultures from a lung in most patients. Therefore, investigators 540

have had to use combinations of other clinical features as a proxy for bacterial pneumonia. Reliance on less than perfect gold standards for diagnosing bacterial pneumonia may produce over- or underestimates of the association of a positive chest radiographic finding with bacterial pneumonia. Two studies used the same definition of bacterial pneumonia (duration of symptoms < 2 days, temperature > 39.5°C, total white blood cell count > 15000/μL).8,9 Both found the sensitivity of the chest radiograph for diagnosing bacterial vs viral pneumonia to be approximately 75%. However, one reported a specificity of 100%; the other, a specificity of 63%. The reported sensitivity for studies with varying definitions ranges from 42% to 80% and the specificity from 42% to 100%. Studies of the accuracy of chest radiographs have also been compromised by other methodologic problems, such as interobserver variability in the interpretation of the radiograph, oversampling patients with relatively severe disease, and the relatively small numbers of patients with bacterial pneumonia. Such problems make estimates of chest radiographic accuracy unreliable. Variation in the biologic manifestations of bacterial pneumonia also presents challenges in the interpretation of published studies. For example, bacterial pneumonia is classically associated with lobar consolidation on the radiograph. However, studies report that bacterial pneumonia may be associated with infiltrates that are lobar, perihilar, segmental, interstitial, or nodular infiltrates.14-16 Consolidation can also be observed with viral pneumonia, but it is unclear whether this radiologic appearance is due to segmental consolidation, atelectasis, or bacterial coinfection. Such variability in the radiographic appearance of bacterial pneumonia may produce over- or underestimates of the association of a positive chest radiographic finding with bacterial pneumonia. Clinicians should be aware that the chest radiographic results may be negative in patients with early bacterial pneumonia.17 The sensitivity of the chest radiograph will be reduced in this group. The implications of this observation are important for studies of the clinical examination. For the purposes of this systematic review, we included studies that used the chest radiograph as the reference standard. Studies that combined the clinical diagnosis with the chest radiographic results as the reference standard were excluded because inclusion of the diagnostic test in the reference standard may overestimate the accuracy of clinical findings. The significance of clinical findings of pneumonia in the absence of a positive chest radiographic findings remains to be studied.

Normal Anatomy and Pathophysiology of Pneumonia Lower respiratory tract infections occur at or below the larynx and include epiglottitis, laryngitis, laryngotracheobronchitis (croup), bronchiolitis, and pneumonia (Figure 41-1). Pneumonia typically follows an upper respiratory tract illness in which the lower respiratory tract is invaded by bacteria, viruses, or other pathogens that trigger the immune response and produce inflammation. Histamines, leukotrienes, and chemotactic factors are released that recruit white blood cells to the area. This response fills the air spaces

CHAPTER 41 of the lower respiratory tract with white blood cells, fluid, and cellular debris. This process reduces lung compliance, increases resistance, obstructs smaller airways, and possibly results in collapse of distal air spaces. The resultant physical findings vary with the site of infection, ranging from coarse breath sounds or rhonchi in bronchopneumonia to crackles in the alveoli in cases of pneumonia or bronchiolitis. Crackles are the result of the explosive equalization of gas pressure between the terminal bronchiole and the alveoli.18 Wheezes result from the oscillation of air through a narrowed airway that produces a musical sound likened to a vibrating reed.19 Decreased breath sounds may also be observed in areas of consolidation.

Pneumonia, Infant and Child

Nasal cavity UPPER R E S P I R AT O R Y TRACT

Pharynx

LOWER R E S P I R AT O R Y TRACT

Epiglottis Larynx

Esophagus Lung

Trachea Bronchus Bronchiole

How to Elicit the Relevant Symptoms and Signs The physician’s first goal when taking the medical history and performing the physical examination in a child who presents with a cough is identification of the clinical syndrome and level of involvement, as shown in Figure 41-1. The second goal is to estimate the severity of the illness. The physician should ask the parent about symptoms associated with pneumonia, as well as those that may discriminate pneumonia from other lower respiratory tract diseases. In addition to cough, symptoms that may increase the likelihood of pneumonia include trouble breathing, rattling in the chest, noisy breathing, trouble feeding, fever, rapid breathing, anxiety, or restlessness. Clinicians working in different regions or with different cultures need to familiarize themselves with local terminology for lower respiratory tract symptoms. It may also be useful to ask about previous episodes of these chest symptoms because recurrent bouts of pneumonia or bronchitis may suggest reactive airway disease. In early infancy ( 38°C

8 wk to 6 y

30 (56/185)

Radiographic pneumonia

1 d to 2 mo

12 (27/228)

Respiratory signs and symptoms, 1 wk to 3 y cough < 7 d Children with fever or respiratory 3 mo to 15 y symptoms for whom chest radiograph was ordered Excluded major chronic disease, asthma, croup, and trauma Temperature > 38°C 1 d to 2 y Excluded infants with chronic lung disease, bronchopulmonary dysplasia, wheezing, and stridor Children with chest radiographic 1 d to 17 y examination as part of emergency department evaluations

65 (130/200)

Positive chest radiographic examination result Radiographic pneumonia

Source, y

Country, Setting

New Guinea outpatient department Crain et al,42 1991 US emergency department 43 Lozano et al, 1994 Columbia emergency department Leventhal,44 1982 US emergency department

1 4 4

Taylor et al,45 1995

US emergency department

4

Zukin et al,46 1986

US emergency department

4

Definition of Pneumonia

19 (26/136)

Abnormal chest radiographic examination result

7.3 (42/572)

Positive chest radiographic examination result

14 (18/125)

Radiographic pneumonia

a

See Table 1-7 for a summary of Evidence Grades and Levels.

543

CHAPTER 41

The Rational Clinical Examination

Table 41-3 Operating Characteristics of Selected Clinical Findings Source, y

Item

39,40

Redd et al, 1994 Harari et al,41 1991 Crain et al,42 1991 Lozano et al,43 1994 Leventhal,44 1992 Taylor et al,45 1995 Zukin et al,46 1986 Redd et al,39,40 1994 Harari et al,41 1991 Lozano et al,43 1994 Crain et al,42 1991 Redd et al,39,40 1994 Lozano et al,43 1994 Leventhal,44 1992 Lozano et al,43 1994 Leventhal,44 1992 Harari et al,41 1991 Zukin et al,46 1986 Leventhal,44 1992 Lozano et al,43 1994 Crain et al,42 1991 Zukin et al,46 1986 Lozano et al,43 1994 Crain et al,42 1991 Zukin et al,46 1986

Description of Breathing (Time or Explanation) Respiratory rate ≥ 50/min (3-11 mo) Respiratory rate ≥ 50/min Respiratory rate ≥ 60/min (38°C Fever b Auscultation Crepitations Crepitations Crepitations Crepitations Wheezes Wheezes Wheezes

LR+ (95% CI)a

LR– (95% CI)a

1.9 (…) 2.2 (…) 8.0 (5.3-12) 1.7 (1.2-2.3) 2.0 (1.5-2.7) 3.2 (2.5-4.1) 1.6 (0.9-2.6)

0.36 (…) 0.52 (…) 0.55 (0.4-0.8) 0.52 (0.4-0.7) 0.32 (0.1-0.7) 0.34 (0.2-0.6) 0.75 (0.5-1.2)

2.4 (…) 2.5 (…) 1.3 (1.0-1.5) 26 (5.7-119) 6.6 (…) 1.2 (0.9-1.6) 1.9 (1.0-3.8) 1.2 (0.8-1.8) 3.2 (1.1-9.2)

0.70 (…) 0.78 (…) 0.53 (0.3-0.9) 0.75 (0.6-0.9) 0.71 (…) 0.83 (0.6-1.1) 0.79 (0.6-1.1) 0.89 (0.7-1.1) 0.86 (0.7-1.0)

1.1 (…) 1.5 (1.3-1.7)

0.95 (…) 0.17 (0.02-1.1)

2.1 (1.2-3.8) 1.8 (1.4-2.3) 15 (2.9-78) 2.9 (1.4-3.7) 0.63 (0.4-1.1) 4.0 (0.4-37) 0.19 (0.03-1.3)

0.73 (0.5-1.0) 0.36 (0.2-0.5) 0.86 (0.7-1.0) 0.57 (0.3-0.97) 1.12 (1.0-1.3) 0.97 (0.9-1.1) 1.30 (1.2-1.5)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Ellipses indicate CIs could not be calculated because insufficient information was reported. b Fever defined as 2 SD for age.

the accuracy of physical examination signs would be lower than that reported in studies from developing countries. However, the few studies performed in developed countries reported results similar to those cited above.42,44,45 These studies may have overestimated the accuracy of clinical findings because chest radiographs were more likely to be obtained in patients with signs and symptoms of disease. In a study by Leventhal,44 the absence of tachypnea, as observed by the clinician examining the patient, was useful for ruling out pneumonia (LR–, 0.32), whereas the presence of tachypnea somewhat increased the odds of pneumonia (LR+, 2.0). Grunting and crepitations were more useful in ruling in disease (LR+, 3.2 and 2.1, respectively). Their absence did not appreciably decrease the likelihood of disease (LR–, 0.86 and 0.73). The study by Taylor et al45 reported a somewhat higher LR+ for tachypnea (LR+, 3.2), but this study included only

544

febrile children, and chest radiographs were not obtained for all study patients. A study by Crain et al42 included only infants with fever and younger than 8 weeks who were treated in an emergency department. The authors reported that tachypnea (LR+, 8.0; 95% CI, 5.3-12) and chest indrawing (LR+, 26; 95% CI, 2.7-119) substantially increased the likelihood of pneumonia. Although these likelihood ratios are high, the number of patients with pneumonia in this study was small and the reported estimates are imprecise (as indicated by the wide 95% CIs). In addition, the high likelihood ratios also reflect the high specificity of tachypnea and indrawing in a particular group of patients (early infants). The value of the clinical examination may differ in this group of children. As in other studies, the absence of these findings did not dramatically decrease the likeli-

CHAPTER 41 hood of disease for tachypnea (LR–, 0.55) or for indrawing (LR–, 0.75).

Accuracy of Combinations of Findings Clinicians typically evaluate the presence of many findings simultaneously to rule in or rule out pneumonia. Despite the large number of studies, few have examined the value of clinical findings when they are used together. Two studies assessed the value of combinations of clinical findings. Leventhal44 found that the absence of pulmonary findings defined as respiratory distress (nasal flaring, grunting, retractions), tachypnea, rales, or decreased breath sounds ruled out pneumonia (LR–, 0.0; 95% CI, 0.0-0.4). When present, these findings raised the likelihood of pneumonia to 1.6 (95% CI, 1.3-31). In this study, information about the presence or absence of respiratory symptoms was used in the decision to obtain the gold standard examination (a chest radiographic examination). Thus, the reported data are likely to overestimate the diagnostic accuracy of these combinations of findings so that the true LR– is not as good as reported and the LR+ is better than reported. In a study of children younger than 2 months, Crain et al42 found that the absence of any respiratory findings (rhinorrhea, cough, adventitious sounds, or retractions) decreased substantially the likelihood of a positive chest radiographic finding (LR–, 0.10; 95% CI, 0.03-0.4). The presence of any of these findings increased the likelihood of pneumonia to 3.4 (95% CI, 2.6-4.3). Because this study included only infants younger than 8 weeks, it is not clear how well the results apply to older age groups. Crain et al42 also found that as the number of positive respiratory findings increased, so did the probability of an abnormal chest radiographic finding. To summarize, physical examination findings can help primary care physicians be more certain that an infant does or does not have pneumonia. In developed countries, where the prevalence of bacterial pneumonia is lower, pneumonia is unlikely if all signs are negative. The presence of a positive sign will be more useful in increasing clinicians’ certainty that an infant has pneumonia in developing countries compared with developed countries because the prevalence of bacterial pneumonia is higher. In developed countries, clinicians will be more certain if multiple findings are positive. Further studies are needed to examine the diagnostic accuracy of the chest radiographic examination, the value of certain signs (such as fever and toxic appearance), and how to best take advantage of combinations of clinical findings.

THE BOTTOM LINE First, the initial observation of the infant may be the most critical component of the examination. Observation is important before interacting with a child. Second, because of its moment-to-moment variability, the respiratory rate should be counted by observing the chest while the child is quiet during two 30-second intervals or during a full minute. Clinicians need to be especially aware of the variability of the examination as the child’s level of activity changes.

Pneumonia, Infant and Child

Third, auscultation is relatively unreliable for examination of infants. Clinicians need better training and better terminology to describe abnormal chest sounds. The overall clinical appearance may be accurate but the delineation of its value needs more study. Fourth, the best individual finding for ruling out pneumonia is the absence of tachypnea. Chest indrawing and other signs of increased work of breathing (eg, nasal flaring) and abnormal auscultatory findings are better for ruling in pneumonia. In developed countries, multiple findings must be present for more certainty about the presence of pneumonia. Fifth, if all clinical signs (respiratory rate, auscultation, and work of breathing) are negative, the chest radiographic finding is unlikely to be positive. Author Affiliations at the Time of the Original Publication

Division of Community Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill (Dr Margolis); Bassett Health Care, Cooperstown, New York (Dr Gadomski). Acknowledgments

Dr Margolis is supported as a Robert Wood Johnson Generalist Faculty Physician Scholar. We greatly appreciate the thoughtful comments of the following individuals who reviewed the manuscript: David Simel, MD, MHS, Joshua Metlay, MD, PhD, Dennis Clemments, MD, PhD, and Jeffery Baker, MD, PhD. Debbie Sears provided assistance in the preparation of the manuscript and tables.

REFERENCES 1. Henderson F, Reid H, Morris R, et al. Home air nicotine levels and urinary cotinine excretion in preschool children. Am Rev Respir Dis. 1989;140(1):197-201. 2. Henderson F, Clyde W, Collier A, Denny F. The etiologic and epidemiologic spectrum of bronchiolitis in pediatric practice. J Pediatr. 1979;95 (2):183-190. 3. McConnochie K, Hall C, Barker W. Lower respiratory tract illness in the first two years of life: epidemiologic patterns and costs in a suburban pediatric practice. Am J Public Health. 1988;78(1):34-39. 4. Hall C, Powell K, Schnabel K, Gala C, Pincus P. Risk of secondary bacterial infection in infants hospitalized with respiratory syncytial viral infection. J Pediatr. 1988;113(2):266-271. 5. World Health Organization. Pneumonia and related infections in young children: an annotated selective bibliography, volume 1: articles published before 1991. In: Programme for the Control of Acute Respiratory Infections. Geneva, Switzerland: World Health Organization; 1993:173. 6. Koopman P. Confidence intervals for the ratio of two binomial proportions. Biometrics. 1984;40:513-517. 7. Centor R, Keightley J. The Two by Two Analyzer. Birmingham, AL: CenterSoft Statistical Programs for Medical Decision Making Research; 1992. 8. Swischuk L, Hayden C. Viral vs bacterial pulmonary infections in children (is roentgenographic differentiation possible?). Pediatr Radiol. 1986;16(4):278-284. 9. Bettenay F, Campo J, Crossin D. Differentiating bacterial from viral pneumonia in children. Pediatr Radiol. 1988;18(6):453-454. 10. McCarthy P, Spiesel S, Stashwick C, et al. Radiographic findings and etiologic diagnosis in ambulatory childhood pneumonia. Clin Pediatr (Phila). 1981;20(11):686-691. 11. Isaacs D. Problems in determining the etiology of community-acquired childhood pneumonia. Pediatr Infect Dis J. 1989;8(3):143-148. 12. Korppi M, Kiekara O, Heiskanen-Kosma T, Soimakallio S. Comparison of radiological findings and microbial aetiology of childhood pneumonia. Acta Paediatr. 1993;82(4):360-363.

545

CHAPTER 41

The Rational Clinical Examination

13. Courtoy I, Lande A, Turner R. Accuracy of radiographic differentiation of bacterial from nonbacterial pneumonia. Clin Pediatr (Phila). 1989; 28(6):261-264. 14. Simpson W, Hacking P, Court S, Gardner P. The radiological findings in respiratory syncytial virus infections in children, II: the correlation of radiological categories with clinical and virological findings. Pediatr Radiol. 1974;2(3):155-160. 15. Kirchner S, Horev G. Diagnostic imaging in children with acute chest and abdominal disorders. Pediatr Clin North Am. 1985;32(6):1363-1382. 16. Wildin S, Chonmaitree T, Swischuk L. Roentgenographic features of common viral respiratory tract infections. Am J Dis Child. 1988;142(1)43-46. 17. Mulholland E, Simoes E, Costales M, McGrath E, Manalac E, Gove S. Standardized diagnosis of pneumonia in developing countries. Pediatr Infect Dis J. 1992;11(2):77-81. 18. Lehrer S. Understanding Breath Sounds. Philadelphia, PA: WB Saunders Co; 1984:85. 19. Forgacs P. Lung Sounds. London, England: Bailliere Tindall; 1978:44-49. 20. Richards J, Alexander J, Shinebourne E, de Swiet M, Wilson A, Southhall D. Sequential 22-hour profiles of breathing patterns and heart rate in 110 full-term infants during their first 6 months of life. Pediatrics. 1984;74(5):763-777. 21. Ashton R, Connolly K. The relation of respiration rate and heart rate to sleep states in the human newborn. Dev Med Child Neurol. 1971;13(2):180-187. 22. Hoppenbrouwers T, Harper R, Hodgeman J, Sterman M, McGinty D. Polygraphic studies of normal infants during the first six months of life, II: respiratory rate and variability as a function of state. Pediatr Res. 1978;12:120-125. 23. Iliff A, Lee V. Pulse rate, respiratory rate and body temperature of children between two months and eighteen years of age. Child Dev. 1952; 23(4):237-245. 24. Kendig E, Chernick V. Disorders of Respiratory Tract in Children. 4th ed. Philadelphia, PA: WB Saunders Co; 1983:62. 25. Finer N, Abroms I, Taeusch H. Influence of sleep state on the control of ventilation. Pediatr Res. 1975;9:365. 26. Gadomski A, Permutt T, Stanton B. Correcting respiratory rate for the presence of fever. J Clin Epidemiol. 1994;47(9):1043-1049. 27. Simoes E, Roark R, Berman S, Esler L, Murphy J. Respiratory rate: measurement of variability over time and accuracy at different counting periods. Arch Dis Child. 1991;66(10):1199-1203. 28. Gadomski A, Khallaf N, El A, Black R. Control of acute respiratory infections: respiratory rate and chest indrawing assessment by primary care physicians in Egypt. Bull World Health Organ. 1993;71(5):523-527. 29. Berman S, Simoes E, Lanata C. Respiratory rate and pneumonia in infancy. Arch Dis Child. 1991;66(1):81-84. 30. World Health Organization. Acute Respiratory Infections in Children: Case Management in Small Hospitals in Developing Countries. Geneva, Switzerland: World Health Organization, Programme for the Control of Acute Respiratory Infections; 1990 (WHO/ARI/90.5). 31. Campbell E. Physical signs of diffuse airways obstruction and lung distension. Thorax. 1969;24(1):1-3. 32. Hoover C. Definitive percussion and inspection in estimating size and contour of the heart. JAMA. 1920;75:1626-1630. 33. Poole S, Chetham M, Anderson M. Grunting respirations in infants and children. Pediatr Emerg Care. 1995;11(3):158-161.

546

34. Davis G, Bureau M. Pulmonary and chest wall mechanics in the control of respiration in the newborn. Clin Perinatol. 1987;14(3):551579. 35. Murphy R, Holford S. Lung sounds. In: Basics of Respiratory Disease. New York, NY: American Thoracic Society; 1980:1-6. 36. Forbes J. A Treatise on the Diseases of the Chest and on Mediate Auscultation. Philadelphia, PA: Desilver Thomas & Co; 1835. 37. Margolis P, Ferkol T, Marsocci S, et al. Accuracy of the clinical examination in detecting hypoxemia in infants with respiratory illness. J Pediatr. 1994;124(4):552-560. 38. Wang E, Milner R, Navas L, Maj H. Observer agreement for respiratory signs and oximetry in infants hospitalized with lower respiratory infections. Am Rev Respir Dis. 1992;145(1):106-109. 39. Redd S, Patrick E, Vreuls R, et al. Comparison of the clinical and radiographic diagnosis of paediatric pneumonia. Trans R Soc Trop Med Hyg. 1994;88(3):307-310. 40. Redd S, Vreuls R, Metsing M, et al. Clinical signs of pneumonia in children attending a hospital outpatient department in Lesotho. Bull World Health Organ. 1994;72(1):113-118. 41. Harari M, Shann F, Spooner V, et al. Clinical signs of pneumonia in children. Lancet. 1991;338(8772):928-930. 42. Crain E, Bulas D, Bijur P, et al. Is a chest radiograph necessary in the evaluation of every febrile infant less than 8 weeks of age? Pediatrics. 1991;88(4):821-824. 43. Lozano J, Steinhoff M, Ruiz J, et al. Clinical predictors of acute radiological pneumonia and hypoxaemia at high altitude. Arch Dis Child. 1994;71(4):323-327. 44. Leventhal J. Clinical predictors of pneumonia as a guide to ordering chest roentgenograms. Clin Pediatr. 1982;21(12):730-734. 45. Taylor J, Del Beccaro M, Done S, Winters W. Establishing clinically relevant standards for tachypnea in febrile children younger than 2 years. Arch Pediatr Adolesc Med. 1995;149(3):283-287. 46. Zukin D, Hoffman J, Cleveland R, et al. Correlation of pulmonary signs and symptoms with chest radiographs in the pediatric age group. Ann Emerg Med. 1986;15(7):792-796. 47. Greenes R, Begg C. Assessment of diagnostic technologies: methodology for unbiased estimation from samples of selectively verified patients. Invest Radiol. 1985;20(7):751-756. 48. Dai Y, Foy H, Zhu Z, et al. Respiratory rate and signs in roentgenographically confirmed pneumonia among children in China. Pediatr Infect Dis J. 1995;14(1):48-50. 49. Wafula E, Tindyebwa D, Onyango F. The diagnostic value of various features for acute lower respiratory infection among under fives. East Afr Med J. 1989;66(10):678-685. 50. Mulholland E, Olinsky A, Shann F. Clinical findings and severity of acute bronchiolitis. Lancet. 1990;335(8700):1259-1261. 51. Selwyn B. The epidemiology of acute respiratory tract infection in young children: comparison of findings from several developing countries. Rev Infect Dis. 1990;12(suppl 8):S870-S888. 52. Pio A, Leowski J, Ien Dam H. The magnitude of the problem of acute respiratory infections. In: Douglas R, Kerby-Eaton E, eds. Acute Respiratory Infections in Childhood: Proceedings of an International Workshop. Adelaide, Australia: University of Adelaide; 1985:3-16.

U P D A T E : Pneumonia, Infant and Child

41

Prepared by Daniel A. Ostrovsky, MD Reviewed by Anne Gadomski, MD, MPH

CLINICAL SCENARIO A 15-month-old child is brought to your office in May. She had been “breathing heavy” the previous day. She was well until about 2 days ago, when she developed nasal congestion with clear rhinorrhea, cough, and a low-grade fever. Your review shows this child had a normal birth history, demonstrated normal growth and development, and has not had any significant respiratory infections or reactive airway disease. On examination, you find a temperature of 38.2°C and a respiratory rate of 45/min. She has clear rhinorrhea and mild subcostal retractions but no abnormal lung sounds on auscultation.

had confusing likelihood ratio (LR) results, and we were unable to contact the author for verification.1

NEW FINDINGS • Diminished breath sounds show substantial interrater reliability (κ, 0.73).1 • Pulse oximetry with values less than 98% has a sensitivity of only 55% for pneumonia and has no independent utility after consideration of the auscultatory findings and respiratory rate. • The LR for pneumonia is 3.4 when the onset of a respiratory illness was equal to or greater than 6 days.

Details of the Update UPDATED SUMMARY ON PEDIATRIC PNEUMONIA Original Review Margolis P, Gadomski A. Does this infant have pneumonia? JAMA. 1998;279(4):308-313.

UPDATED LITERATURE SEARCH A MEDLINE search was conducted from 1996 to 2005 to identify English-language articles about pneumonia in infants or children, using the search strategy techniques of The Rational Clinical Examination series. The search yielded 49 articles. Additionally, Scientific Citation Index was used to identify articles that cited the original publication in The Rational Clinical Examination series, yielding 18 additional articles. The abstracts of these 67 articles were reviewed and all case-control, cohort, or randomized trials that addressed clinical signs and symptoms of pneumonia were selected for further inspection. The references for these articles were also reviewed to find any other relevant articles. The focus of the original publication was on identifying symptoms and signs that help distinguish pneumonia from other types of pediatric lower respiratory tract illnesses. In this update, we shifted the focus slightly and attempted to discover the findings that help identify the pediatric patient who will have an abnormal chest radiograph result. In total, 5 articles were selected for inclusion, although we subsequently excluded one article that

Since the publication of the original review, 4 additional studies evaluated different clinical findings for predicting radiographic changes suggestive of pneumonia in pediatric patients. Overall, there remains a paucity of data that examine combinations of clinical signs. Additionally, there remains difficulty in combining data from multiple studies because of differences in the definitions of certain clinical findings such as tachypnea and respiratory distress. Finally, the broad age range of patients included in the studies makes generalization of findings to infants more difficult. For example, grunting and nasal flaring would not be typical findings in older pediatric patients with pneumonia. The studies included in this update used age-based criteria for the finding of tachypnea. A prospective study of children presenting to an emergency department with any type of acute respiratory illness provides useful information that allows comparison between signs and the overall clinical judgment.2 As a single finding, tachypnea had the best diagnostic odds ratio (DOR; 5.8) that came from its positive LR of 2.2 (95% confidence interval [CI], 1.5-3.2) and negative LR of 0.39 (95% CI, 0.22-0.70). The additional information from chest indrawing and alveolar rales did not clinically improve the diagnostic odds or LRs. Clinical judgment that factors in all items from the medical history and physical examination (DOR, 3.6; 95% CI, 1.5-8.7) had results that were slightly less efficient than the single finding of tachypnea. Clinicians should recall the age-based World Health Organization (WHO) definitions of tachypnea for infants (Table 41-4).

547

CHAPTER 41

Update

Table 41-4 World Health Organization Age-based Criteria for Tachypnea Age, mo 12

Tachypnea, Breaths/min >60 >50 >40

A case-controlled study from retrospectively collected data suggested that pulse oximetry at a threshold of 98% has no value for diagnosing pneumonia.3 Although clinical examination data reported from case-controlled studies typically provide a low level of evidence, the findings here supported the usefulness of tachypnea. Pulse oximetry added no significance to a model containing the respiratory rate and auscultatory findings. Unfortunately, the model itself was not particularly powerful for predicting pneumonia. (R2 = 0.072 is a measure of how well the model predicts the outcome. The value means that the model explains only 7.2% of the variance, a statistically significant result, although one that will lead to incorrect diagnoses for many patients.) The patient selection criteria affects the interpretation of the results. A study that included wheezing children (younger than 18 months) first determined the factors associated with the clinician ordering a chest radiograph.4 The presence of any typical clinical sign for pneumonia was associated with a request for chest radiograph. When confined to wheezing young children, the presence of grunting worked better than tachypnea with a respiratory rate of greater than 60/min (a rapid rate in comparison with the WHO standards noted above). The presence of grunting had an LR of 2.7 for pneumonia (95% CI, 1.6-4.4). However, when combined with a low oxygen saturation of less than 93% (much lower than the threshold in the case-control study), the combination of grunting and a low oxygen saturation in a wheezing young child had an LR of 4.0 (95% CI, 1.3-12). Unfortunately, the absence of both these findings had little effect on ruling out pneumonia (LR, 0.90 when both signs were normal; 95% CI, 0.81-1.0). A prospective study provides some insight into the probability of pneumonia once the physician requests chest radiography (prevalence, 36%).5 The prevalence among all children with res-

piratory symptoms would be lower. Tachypnea, using the WHO criteria for respiratory rate, was similar in utility to that of previous studies (positive likelihood ratio [LR+], 2.8; 95% CI, 1.65.0; negative likelihood ratio [LR–], 0.91; 95% CI, 0.86-0.97). Another prospective cohort study 6 found that tachypnea had an LR+ of 1.4 (95% CI, 1.0-1.9) and an LR– of 0.67 (95% CI, 0.44-1.0). However, the age-adjusted definitions for tachypnea required a much higher respiratory rate than the WHO criteria. These poor results for tachypnea allow the inference that the WHO criteria are necessary to optimize the clinical utility of the finding.

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION New data allowed for comparison with the data from the original review. For the most part, the new data confirmed the findings of the original review. Among infants and young children with respiratory symptoms or signs, a broad range of prevalence should be considered (15%-35%) that may show seasonal and geographic variation. The data confirm that although tachypnea may be the most predictive in ruling in or ruling out pneumonia, no clinical examination finding alone is sufficiently powerful to predict the presence or absence of pneumonia.

CHANGES IN THE REFERENCE STANDARD The reference standard for the diagnosis of pediatric pneumonia remains the chest radiograph.

RESULTS OF LITERATURE REVIEW Selected Univariate Findings for Pediatric Pneumonia Clinical judgment, a measure that allows the clinician to consider all findings, may not work better than individual findings (Table 41-5). When the clinician suspects pneumonia, the LR is 1.7 to 2.5; when the clinician suspects the child has no pneumonia, the LR is 0.29 to 0.46. 2,5

Table 41-5 Likelihood Ratios of Univariate Findings for Pediatric Pneumonia Source Lynch et al5 Palafox et al2 Mahabee-Gittens et al4 Mahabee-Gittens et al4 Lynch et al5 Palafox et al2 Palafox et al2 Lynch et al5

Finding Tachypnea (WHO criteria) Tachypnea (WHO criteria) Grunting and pulse oximetry < 93% Grunting among children wheezing, < 18 mo Retractions Chest indrawing (retractions) Clinical judgment Fever

LR+ (95% CI)

LR– (95% CI)

2.8 (1.6-5.0) 2.2 (1.5-3.2) 4.0 (1.3-12) 2.7 (1.6-4.4) 2.7 (1.1-6.9) 1.7 (1.2-2.4) 1.7 (1.2-2.3) 1.2 (1.1-1.3)

0.91 (0.86-0.97) 0.39 (0.22-0.70) 0.90 (0.81-1.0) 0.7 (0.55-0.89) 0.97 (0.93-1.0) 0.54 (0.32-0.91) 0.46 (0.25-0.84) 0.30 (0.18-0.49)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; WHO, World Health Organization.

548

CHAPTER 41

Multivariate Findings for Pediatric Pneumonia In a study by Lynch et al,5 a multivariate model was assessed for diagnosing pediatric pneumonia. The study evaluated combinations of findings and created a pneumonia score that also supported a role for assessing tachypnea: Pneumonia score = –4.71 + 1.10 × (tachypnea) + 0.74 × (crackle) + 0.42 × (decreased breath sound) + 1.15 × (measured fever) Probability of pneumonia = expscore/(1 + expscore) (The presence of a finding is coded as 1, whereas the absence of a finding is coded as 0. The presence of tachypnea is based on age-adjusted rates.) The most useful finding from this model is that the absence of all 4 findings leads to a less than 1% probability of pneumonia. The presence of all 4 findings creates a probability of 21%, which suggests the need for a chest radiograph but does not establish a clinical diagnosis with a high degree of confidence. The area under the receiver operating characteristic curve was only 0.67 (a measure of accuracy), highlighting the finding that even combinations of signs lack a high level of efficiency for diagnosing pneumonia. The model may be best for identifying signs that physicians might consider as part of their clinical judgment. However, clinicians should recognize that their overall clinical judgment and the results from a more structured approach in the form of a logistic model lack accuracy.

EVIDENCE FROM GUIDELINES Jadavji et al7 published guidelines in 1997 for the diagnosis and management of pediatric pneumonia. They conducted a systematic review on the etiology, diagnosis, and management of pediatric pneumonia. The evidence from this review includes the studies that were reviewed in the original Rational Clinical Examination article, with 2 exceptions. One study focused on infants younger than 4 months and therefore not as easily generalized to the overall pediatric population.8 Overall, the data from this guideline are consistent with the findings of the original Rational Clinical Examination article.

Pneumonia, Infant and Child

CLINICAL SCENARIO—RESOLUTION This infant may have pneumonia. According to WHO criteria, she has tachypnea, although she is febrile, which could explain her mildly increased respiratory rate. Tachypnea should raise your suspicion for pneumonia, with its best LR+ of about 2.8. Although she has mild retractions that would seem to further increase the likelihood of pneumonia, the additional information provided by this sign is less accurate than the information from tachypnea alone. The clinical history and time of year would make you less suspicious of other entities such as asthma or infection from respiratory syncytial virus (RSV). From the original Rational Clinical Examination article and this Update, you estimate a prevalence range of 15% to 35% at the lower end of this range. The posttest probability of pneumonia with an LR of 2.8 for tachypnea is 33%. From the multivariate model, she has tachypnea and fever, making the probability of pneumonia 7.9%. In this infant, it would be reasonable to check a chest radiograph to confirm or exclude pneumonia.

REFERENCES FOR THE UPDATE 1. Rothrock S, Green S, Fanelli JM, Cruzen E, Costanzo K, Pagane J. Do published guidelines predict pneumonia in children presenting to an urban ED? Pediatr Emerg Care. 2001;17(4):240-242. 2. Palafox M, Guiscafre H, Reyes H, Munoz O, Martinez H. Diagnostic value of tachypnea in pneumonia defined radiographically. Arch Dis Child. 2000;82(1):41-45.a 3. Tanen DA, Trocinski DR. The use of pulse oximetry to exclude pneumonia in children. Am J Emerg Med. 2002;20(6):521-523. 4. Mahabee-Gittens EM, Dowd MD, Beck JA, Smith SZ. Clinical factors associated with focal infiltrates in wheezing infants and toddlers. Clin Pediatr. 2000;39(7):387-393.a 5. Lynch T, Platt R, Gouin M, Larson C, Patenaude Y. Can we predict which children with clinically suspected pneumonia will have the presence of focal infiltrates on chest radiographs? Pediatrics. 2004(3 pt 1);113:186-189.a 6. Grossman L, Caplan S. Clinical, laboratory, and radiological information in the diagnosis of pneumonia in children. Ann Emerg Med. 1988;17(1):87-90.a 7. Jadavji T, Law B, Lebel M, Kennedy W, Gold R, Wang E. A practical guide for the diagnosis and treatment of pediatric pneumonia. CMAJ. 1997; 156(5 suppl):S703-S711. 8. Berman S, Simoes EAF, Lanata C. Respiratory rate and pneumonia in infancy. Arch Dis Child. 1991;66(1):81-84. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

549

CHAPTER 41

Update

PNEUMONIA, INFANT AND CHILD—MAKE THE DIAGNOSIS

PRIOR PROBABILITY Given cough or respiratory symptoms, the prevalence of pneumonia is approximately 15% to 35%. However, prevalence of pneumonia may be lower during RSV season. Prevalence may also be slightly higher in children younger than 3 years.

POPULATION FOR WHOM PEDIATRIC PNEUMONIA SHOULD BE CONSIDERED Patients with symptoms of acute respiratory illness, primarily cough, respiratory distress, or tachypnea, need to have pneumonia considered as part of the differential diagnosis.

DETECTING THE LIKELIHOOD OF PEDIATRIC PNEUMONIA The individual clinical symptoms used to identify patients with pneumonia have relatively poor predictive value. Tachypnea, respiratory distress, and abnormal lung sounds (rales) have the best operating characteristics, although the data from different sources conflict on their significance (Table 41-6). Additionally, the clinician’s overall clinical judgment/impression may have operating characteristics similar to individual signs and symptoms in diagnosing pneumonia, but the overall judgment is admittedly a complex and difficult “finding” to quantify. To date, there are no randomized controlled studies to validate any proposed multivariate model for predicting pneumonia.

550

Table 41-6 Likelihood Ratios of Symptoms and Signs for Pediatric Pneumonia Symptom or Sign Grunting among children with wheezing, < 18 mo Grunting Retractions Rales Tachypnea (use WHO ageadjusted criteria) Fever

LR+ (95% CI) or Range

LR– (95% CI) or Range

2.8 (1.6-4.4)

0.7 (0.55-0.89)

2.8-3.2 2.7 (1.1-6.9) 1.8-15 1.6-8.0

0.70-0.86 0.97 (0.93-1.0) 0.69-0.86 0.32-0.91

1.2-1.5

0.17-0.30

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; WHO, World Health Organization.

REFERENCE STANDARD TESTS The reference standard for pediatric pneumonia remains the chest radiograph. Sputum production is not a frequent finding in pediatric patients, and therefore, isolation of sputum for microbiologic correlation with pneumonia remains both difficult and impractical. The development of rapid antigen detection of common viruses such as RSV and influenza will help the clinician rule out causes of respiratory symptoms other than bacterial pneumonia. As of now, there is still no way to differentiate bacterial vs viral pneumonia by chest radiograph.

E V I D E N C E TO S U P P O R T T H E U P D A T E : Pneumonia, Infant and Child

41

MAIN RESULTS TITLE Clinical, Laboratory, and Radiological Information in the Diagnosis of Pneumonia in Children. AUTHORS Grossman L, Caplan S. CITATION Ann Emerg Med. 1988;17:(1)43-46. QUESTION In pediatric patients with suspected pneumonia who undergo chest radiograph, are there signs or symptoms that predict radiographic pneumonia? DESIGN This is a prospective nonconsecutive cohort study of 155 patients during 7 months. SETTING Two pediatric emergency departments. PATIENTS Pediatric patients younger than 19 years in whom pneumonia was considered and a chest radiograph was ordered. None of the patients had a history of pneumonia, chronic lung disease, chronic heart disease, or immunodeficiency. Sixty-two percent of the study patients were younger than 2 years. Eleven potential subjects were not enrolled because a decision to treat was made without a chest radiograph being performed.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The presenting signs and symptoms of patients before chest radiography were systematically recorded. Clinicians (pediatricians, pediatric nurse practitioners, and medical students) recorded their overall clinical impression of pneumonia and what their treatment plan would be if radiography were not available before performance of the chest radiograph. Chest radiograph was the reference standard for the diagnosis of pneumonia.

MAIN OUTCOME MEASURES Sensitivity and specificity of recorded signs, symptoms, and clinical impression for predicting radiographic pneumonia. Assessing accuracy of clinical impression in predicting pneumonia. Assessing combinations of signs and symptoms in predicting pneumonia.

Cough, tachypnea, moderate/severe degree of illness, and fever were the only symptoms and signs that were present in more than 50% of patients enrolled in the study (66%, 52%, 62%, and 55%, respectively). Clinician accuracy in the diagnosis of pneumonia was 77%, and both the positive and negative likelihood ratios (LRs) were more promising than the individual findings (Table 41-7). Despite the results for clinical judgment, regression analysis did not find any combination of signs or symptoms that adequately predicted the presence of pneumonia.

CONCLUSIONS LEVEL OF EVIDENCE Level 4. STRENGTHS This was a prospective cohort study. All

patients enrolled underwent the reference standard of chest radiograph. There was quantification of the different signs and symptoms that led clinicians to order chest radiographs. LIMITATIONS The definition of tachypnea used in this study is different from the World Health Organization (WHO) criteria. If WHO criteria had been used, there might have been a

Table 41-7 Likelihood Ratios of Findings for Pediatric Pneumonia Findings Clinical judgment Rales Tachypneaa Decreased breath sounds Degree of illnessb Sudden onset of illnessc

Sensitivity, Specificity, % % LR+ (95% CI)

LR– (95% CI)

80 43 64 23

68 77 54 84

2.5 (1.8-3.4) 1.9 (1.2-3.0) 1.4 (1.0-1.9) 1.4 (0.74-2.8)

0.29 (0.17-0.52) 0.74 (0.57-0.96) 0.67 (0.44-1.0) 0.92 (0.77-1.1)

67 17

40 84

1.1 (0.87-1.4) 0.83 (0.52-1.3) 1.1 (0.50-2.2) 0.99 (0.85-1.1)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a >80/min < 1 year; >40/min > 1 year; >30/min > 2 years; >25/min > 5 years; >22/min > 10 years; >20/min > 15 years.1 (This differs from the World Health Organization definition for tachypnea.) bDegree of illness not further clarified in article. cLess than 12 hours of symptoms before presentation.

E41-1

CHAPTER 41

Evidence To Support The Update

much higher number of patients who would have been classified as having tachypnea, which might have increased the positive likelihood ratio (LR+) of tachypnea. There was no explanation of what “degree of illness” means, and therefore, it has limited clinical utility. There was no mention of blinding of radiologists to the clinical presentation of study patients. There was no description of what qualified a radiograph as being diagnostic of pneumonia. The results provided did not include 95% confidence interval. Additionally, it is unclear how many “observations” were made for each patient because there were multiple examiners for each patient.

CONCLUSIONS The focus of this study was to determine whether there were any signs or symptoms that were helpful in diagnosing pneumonia in children younger than 18 years and presenting with symptoms sufficient to warrant a chest radiograph. Additionally, it sought to assess how the results of the radiograph influenced management decisions by the ordering clinician. Finally, it attempted to assess the clinician’s overall impression as a predictor of pneumonia. It is useful to know that cough, tachypnea, “moderate/severe degree of illness,” and fever are the most common symptoms and signs for which a radiograph is ordered. This may give a hint as to what goes into the clinician’s overall clinical impression when he or she considers the diagnosis of pneumonia. In this study, the overall clinical impression performed better (LR+, 2.5) than any individual sign or symptom in diagnosing pneumonia. Physician accuracy of diagnosing pneumonia was 77%. Obtaining radiographs is useful because they changed management plans for 22% of study patients. In this study, only rales and tachypnea reached statistical significance in predicting pneumonia. However, both of these signs had only marginal diagnostic power, with LR+s of only 1.9 and 1.4, respectively.

TITLE Can We Predict Which Children With Clinically Suspected Pneumonia Will Have the Presence of Focal Infiltrates on Chest Radiographs? AUTHORS Lynch T, Platt R, Gouin M, Larson C, Patenaude Y. CITATION Pediatrics. 2004;113(3 pt 1):186-189. QUESTION In patients presenting to the emergency department with “clinically suspected pneumonia,” what clinical factors predict an infiltrate on radiograph? DESIGN Prospective nonconsecutive cohort study of 570 patients. SETTING Tertiary-referral-center pediatric emergency department. PATIENTS Children aged 1 to 16 years who were suspected by the pediatric emergency physician to have pneumonia and who were receiving a chest radiograph. Children with chronic respiratory disease, congenital or complex heart disease, gastroesophageal reflux, sickle cell anemia, malignancy, spastic quadriplegia, acute asthma exacerbation, or recent pneumonia treatment with antibiotics were excluded.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Baseline demographic data and a study questionnaire were obtained prospectively. Physicians filled out a clinical questionnaire about symptoms at evaluation. All subjects had a posterior-anterior and lateral chest radiograph evaluated by 3 radiologists for the presence or absence of infiltrates.

MAIN OUTCOME MEASURES Reviewed by Daniel Ostrovsky, MD

REFERENCE FOR THE EVIDENCE 1. Johnson TR. Development of the lungs. In: Johnson TR, Moore WM, Jeffries JE, eds. Children Are Different. 2nd ed. Columbus, OH: Ross Laboratories; 1978:128-129.

Six symptoms (fever history, cough, coryza, shortness of breath, wheezing, and pleurisy), 6 signs (decreased breath sounds, crackles, bronchial sounds, wheezing, retractions, and grunting), and 3 vital signs (measured temperature for fever, age-adjusted tachypnea, and tachycardia) were entered into a logistic model to determine their independent significance.1 Previously recommended guidelines were assessed for their sensitivity and specificity. The interobserver agreement for the chest radiographs was assessed.

MAIN RESULTS Five hundred seventy patients were enrolled, of whom 204 (36%) had pneumonia. The agreement between 7 radiologists for the presence of pneumonia was moderate (weighted κ, 0.57).

E41-2

CHAPTER 41 Only 2 findings, when present, had a likelihood ratio (LR) that was greater than 2 and that excluded 1 in its 95% confidence interval (CI): the presence of tachypnea (8% of children) had an LR of 2.8 (95% CI, 1.6-5.0), whereas retractions (3% of children) had an LR of 2.7 (95% CI, 1.1-6.9). For decreasing the likelihood of pneumonia, the absence of a fever (LR, 0.30; 95% CI, 0.18-0.49) or cough (LR, 0.35; 95% CI, 0.54-0.81) was the only finding with an LR less than 0.6 and that excluded 1.0 from the 95% CI. A logistic model (Box 41-1) identified 4 findings that were independently useful. However, the model was not highly accurate because of its poor specificity (area under the receiver operating characteristic curve of 0.67).

Pneumonia, Infant and Child

Box 41-1 Logistic Model for Calculating the Pneumonia Score

Pneumonia score = –4.71 + 1.10 × (tachypnea) + 0.74 × (crackle) + 0.42 × (decreased breath sound) + 1.15 × (measured fever) The presence of a finding is coded as 1, whereas the absence of a finding is coded as 0. The presence of tachypnea is based on age-adjusted rates. Probability of pneumonia = expscore/(1 + expscore) The investigators attempted to validate the guidelines by Leventhal2 (respiratory distress, tachypnea, rales, and decreased breath sounds) but found these yielded a sensitivity of only 81% and a specificity of 37%.

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS Multiple radiologists who were blinded to the

clinical presentation evaluated the reference standard. There was a large sample size. Multiple clinical predictors were assessed with regression analysis. LIMITATIONS The enrollment process may have caused selection bias, but only fever history and cough occurred in more than half the patients, which suggests that the remaining findings might have valid results because they were not used to preferentially identify children. When the likelihood of focal opacities is predicted in children clinically suspected of having pneumonia, only 4 signs and symptoms are independently statistically significant. The model is highly sensitive but poorly specific. To highlight this, the probability of pneumonia can be contrasted for the child with no tachypnea, crackles, decreased breath sounds, or fever (probability, 0.9%) vs a child with all 4 findings present (probability, 21%). Thus, a child with no findings has a less than 1% chance of having pneumonia. On the other hand, even with the presence of all 4 findings, most children will not have a radiographic infiltrate.

Reviewed by Daniel Ostrovsky, MD

REFERENCES FOR THE EVIDENCE 1. Chamberlain JM, Patel KM, Ruttimann UE, Pollack MM. Pediatric risk of admission (PRISA): a measure of severity of illness for assessing the risk of hospitalization from the emergency department. Ann Emerg Med. 1998;32(2):161-169. 2. Leventhal JM. Clinical predictors of pneumonia as a guide to ordering chest roentgenograms. Clin Pediatr. 1982;21(12):730-734.

TITLE Clinical Factors Associated With Focal Infiltrates in Wheezing Infants and Toddlers. AUTHORS Mahabee-Gittens EM, Dowd MD, Beck JA, Smith SZ. CITATION Clin Pediatr. 2000;39(7):387-393. QUESTION In wheezing infants presenting to an emergency department, are there clinical factors that can predict focal infiltrates on chest radiograph? DESIGN Prospective cohort of infants up to 18 months of age. SETTING The study took place during October and April at the Children’s Hospital Medical Center in Cincinnati, Ohio, a tertiary-care hospital pediatric emergency department. PATIENTS Infants aged 18 months or younger and presenting to the emergency department. Inclusion was a convenience sample of patients with documented wheezing on physical examination by a physician.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Patients selected for inclusion had baseline information collected prospectively. Physical examination findings were documented at evaluation. The reference standard was a chest radiograph for the presence of focal infiltrates. The reference standard was applied at the discretion of the evaluating physician. A radiologist who was not masked to the clinical presentation interpreted the radiographs. A report result was considered positive if it recorded “focal infiltrate,” “pneumonia,” “consolidation,” or “atelectasis vs infiltrate.”

E41-3

CHAPTER 41

Evidence To Support The Update

MAIN OUTCOME MEASURES The authors collected data on all potential eligible patients and compared the odds ratios for physical examination signs for individuals selected for chest radiographs vs those who did not undergo radiography. These odds ratios describe the factors associated with requesting a radiograph. Sensitivity, specificity, and odds ratios of the clinical findings for diagnosing pneumonia were calculated for children who underwent radiography. Interobserver variability was assessed in 12% of the children.

were hospitalized within 2 days of presentation. All 3 patients had a chest radiograph on representation and none of them had an infiltrate. Seventeen other patients who did not initially undergo chest radiography had a subsequent chest radiograph in the following 48 hours. Three of these patients had an infiltrate. Oxygen saturation, nasal flaring, grunting, crackles, and retractions were all reliable and had κ > 0.70.

CONCLUSIONS LEVEL OF EVIDENCE Level 3.

MAIN RESULTS

STRENGTHS The study design was prospective. A uniform

Among 471 children who made a visit to the emergency department with wheezing and were potentially eligible, 212 had chest radiographs. Twenty-three percent (49/212) had a focal infiltrate. Except for localized wheezing, each sign in Table 41-8 was more likely present in a child receiving a chest radiograph than one who did not (odds ratio with lower 95% confidence interval ≥ 1.0). In patients who did not undergo chest radiograph, followup telephone calls and searches of admission databases were made 48 hours after presentation to look for patients who may have incorrectly been classified as not having pneumonia. Only 3 patients who did not undergo chest radiography

Table 41-8 Likelihood Ratios of Findings for Pediatric Pneumonia Sign (No. With the Finding)

LR+ (95% CI)

LR– (95% CI)

DOR (95% CI)

Vital Signs Temperature ≥ 100.4°F (38.0°C) (115) Respiratory rate > 60/ min (61) Oxygen saturation ≤ 93% (41) Nasal flaring (82) Grunting (45) Crackles (67) Decreased breath sounds (19) Localized wheezing (21) Retractions (202) I:E ≥ 1:2 (166)

1.12 (0.95-1.6) 0.76 (0.51-1.1) 1.6 (0.8-3.1) 1.1 (0.67-1.8) 0.97 (0.78-1.2) 1.1 (0.6-2.2) 2.0 (1.1-3.4)

0.82 (0.67-1.1) 2.4 (1.1-5.0)

Physical Examination 1.3 (0.90-1.9) 0.70 (0.55-0.89) 2.7 (1.6-4.4) 0.90 (0.79-1.0) 1.6 (1.1-2.4) 0.76 (0.58-1.0) 2.4 (1.0-5.7) 0.90 (0.79-1.0)

1.0 (0.4-2.7) 1.0 (0.90-1.1) 1.0 (0.94-1.1) 0.83 (0.18-3.8) 1.2 (1.0-1.3) 0.43 (0.18-1.0) Combination of Findings Grunting and oxygen satu- 4.0 (1.3-12) 0.90 (0.81-1.0) ration ≤ 93% (48)a

1.6 (0.8-3.0) 3.8 (1.9-7.8) 2.1 (1.1-4.1) 2.7 (1.0-7.2) 1.0 (0.4-3.0) 1.2 (0.2-5.9) 2.8 (1.0-7.5)

LIMITATIONS The enrollment process created a convenience sample that leads to selection bias. Radiologists who were not masked to the clinical presentation interpreted the outcome measure. There was only 1 radiologist per case, which could lead to accuracy issues in interpretation. This study was done in a population during a period when respiratory syncytial virus bronchiolitis typically has a high prevalence. Children younger than 18 months who wheezed were the focus of this study. A number of clinical signs were more likely present in children who were referred for chest radiography. However, most of these signs were not particularly useful when either present or absent. As a single finding, the presence of grunting (present in 60 children overall and in 45 referred for radiography [21%]) was the most useful finding, with a likelihood ratio of 2.7. The absence of any of these findings was clinically not useful. When combined with low oxygen saturation, a logistic model selected grunting with low oxygen saturation as useful. The likelihood ratio increased to 4.0 for the presence of these 2 signs. Clinicians should recognize that the prior probability of pneumonia has seasonal variation in the pediatric population. Bronchiolitis, an illness that may “look like pneumonia,” is more common in the winter and is associated with tachypnea and abnormal lung findings. Thus, the complex relationship between changing prevalence of disease and seasonal variation in signs affects the interpretation of the predictive power of these findings.

Reviewed by Daniel A. Ostrovsky, MD 4.4 (1.3-15)

Abbreviations: CI, confidence interval; DOR, diagnostic odds ratio; I:E, length of time in inspiration in proportion to time in expiration; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aVariables selected as independently useful in a logistic model.

E41-4

entrance criterion (wheezing) identified potentially eligible patients with a narrower age range than was present in some of the other studies. The authors compared the signs present in children who underwent radiographs vs those who did not.

CHAPTER 41

TITLE Diagnostic Value of Tachypnoea in Pneumonia Defined Radiologically.

Table 41-9 World Health Organization Age-based Criteria for Tachypnea Age, mo

AUTHORS Palafox M, Guiscafre H, Reyes H, Munoz O, Martinez H. CITATION Arch Dis Child. 2000;82(1):41-45. QUESTION In children presenting with acute respiratory infection, what are the sensitivity and specificity of tachypnea for diagnosing pneumonia? DESIGN This study is a prospective cohort study of children presenting to an emergency department with an acute respiratory tract infection. All children had chest radiography. Baseline characteristics and physical examination findings were obtained prospectively. SETTING A general hospital in Mexico that is a referral center for sick children. PATIENTS Eligible children were between 3 days and 5 years of age, required medical care during the 6-month study period, had been clinically diagnosed with pneumonia, and had the disease for fewer than 2 weeks. Each child in the study had a matched control who was the next child treated in the clinical unit and had a diagnosis of an acute respiratory infection without pneumonia. Exclusion criteria were children with chronic diseases, genetic abnormalities, neurologic diseases, asthma, or sepsis.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The respiratory rate was measured for 1 minute, with the child lying down, not crying, and without fever. Tachypnea was defined by age-based World Health Organization (WHO) criteria (Table 41-9). A chest radiograph, evaluated by a single radiologist blinded to the clinical diagnosis, served as the reference standard.

MAIN OUTCOME MEASURES Sensitivity and specificity of tachypnea, chest indrawing, alveolar rales, and combinations of these findings. Ten radiographs were reassessed to determine the intraobserver variation.

MAIN RESULTS Thirty-five children (32%) had pneumonia. There were 7 signs and symptoms or combinations that had significant sensitivity and specificity for predicting pneumonia on chest radiograph. Tachypnea had the best sensitivity of the signs studied (74%), followed by chest indrawing (71%). Although combining signs did slightly improve specificity, it decreased sensitivity. Alveolar rales had the best specificity but had poor sensitivity (Table 41-10).

Pneumonia, Infant and Child

Tachypnea, Breaths/min

12

>60 >50 >40

A discriminant analysis, using all the recorded symptoms and signs, was 71% accurate but not appreciably different from the accuracy of tachypnea alone (69% accurate). The discriminant analysis performed better than clinical judgment (62% accurate). If a patient had disease at least 6 days, the likelihood ratio was 3.4. A discriminant analysis revealed that duration of disease correctly classified 83.3% of patients. The κ statistic for intraobserver variability of the radiologist was 0.68.

CONCLUSIONS LEVEL OF EVIDENCE Level 2. STRENGTHS Whereas other studies included children

according to whether they had a chest radiograph, this study included a broader population of patients for whom the diagnosis of pneumonia was a reasonable consideration. The “control” patients were patients with some type of respiratory illness (cough or rhinorrhea with systemic signs of infection). These patients were not really “controls,” but rather patients at risk for pneumonia and in whom pneumonia could have been part of the differential diagnosis (although at a lower likelihood than the for case patients). All included study patients underwent the reference standard. The radiologists were masked to the clinical presentation. Intraobserver variability of the radiologist reading the radiographs was tested. LIMITATIONS Among all children with respiratory illnesses, the “case patients” were oversampled, which can lead to an overestimation of sensitivity (and underestimation of specificity). A single radiologist performed the interpretation of the radiographs, although there was an attempt to account for this by measuring intraobserver variability and masking the radiologist to the clinical presentation. Of the presenting clinical signs, all except chest indrawing (51%) occurred in less than half the patients, which allows us to make inferences about the utility of the findings because no one finding was required in each patient. It is remarkable that the overall clinical judgment had a diagnostic odds ratio (a measure of accuracy) that was not quite as good as the single finding of tachypnea. Tachypnea, defined by WHO criteria, was the most accurate finding as evidenced by its diagnostic odds ratio. In subgroup analysis using tachypnea as the clinical sign being evaluated, there was no significant difference in the sensitivity and specificity generated for children of differing

E41-5

CHAPTER 41

Evidence To Support The Update

Table 41-10 Likelihood Ratios of Findings for Pediatric Pneumonia Test (No. With Finding) Tachypnea, chest indrawing, and alveolar rales (27) Tachypnea and alveolar rales (29) Tachypnea (51) Tachypnea and chest indrawing (47) Alveolar rales (32) Chest indrawing and alveolar rales (30) Clinical judgment (59) Chest indrawing (56)

LR+ Sensitivity Specificity (95% CI) 0.43

0.84

LR– (95% CI)

DOR (95% CI)

2.7 0.68 4.0 (1.4-5.1) (0.50-0.92) (1.6-9.8)

Reviewed by Daniel A. Ostrovsky, MD 0.46

0.83

2.6 0.70 4.1 (1.4-4.9) (0.48-0.91) (1.7-10)

0.74

0.67

0.68

0.69

2.2 0.39 5.8 (1.5-3.2) (0.22-0.70) (2.4-14) 2.1 0.50 4.7 (1.4-3.2) (0.31-0.80) (2.0-11)

0.46

0.79

0.42

0.80

0.74

0.56

1.7 0.46 3.6 (1.2-2.3) (0.25-0.84) (1.5-8.7)

0.71

0.59

1.7 0.54 3.5 (1.2-2.4) (0.32-0.91) (1.5-8.3)

2.1 0.69 3.2 (1.2-3.8) (0.50-0.96) (1.3-7.6) 2.1 0.71 1.2 (1.2-3.9) (0.53-0.97) (2.9-7.0)

Abbreviations: CI, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E41-6

age groups. There was a significant difference in the sensitivity and specificity of tachypnea when disease duration was considered. Sensitivity increased from 55% to 93% if disease was fewer than 3 days’ duration or more than 6 days’ duration, respectively. Specificity increased from 64% to 73% as well.

42

C H A P T E R

Is This Patient Pregnant?

CLINICAL SCENARIOS

Are These Patients Pregnant? For each of the following cases, the clinician may need to determine the probability that the patient is pregnant. CASE 1 A 36-year-old woman telephones her primary care

physician, complaining of symptoms consistent with uncomplicated sinusitis. Before treating her with an antibiotic, you ask her about the possibility of pregnancy; she states her last menstrual period was 3 weeks ago and she is not pregnant.

Lori A. Bastian, MD, MPH

CASE 2 A sexually active 16-year-old girl requests birth control pills and asks during the pelvic examination, when her mother has stepped out of the room, if you can tell whether she is pregnant. Her last menstrual period was 8 weeks ago, her home pregnancy test result was negative, and findings on her pelvic examination were normal.

Joanne T. Piscitelli, MD

CASE 3 A 41-year-old woman presents with breast ten-

derness, and her last menstrual period was 6 weeks ago. She wants to know whether she is “going through the change.”

WHY IS THIS AN IMPORTANT QUESTION TO ANSWER WITH A CLINICAL EXAMINATION? Frequent laboratory analyses are performed in the outpatient clinic and emergency department to rule in or to rule out the possibility of pregnancy. Generally accepted clinical indicators of pregnancy include amenorrhea, morning sickness, tender or tingling breasts, and, after 8 weeks’ gestational age (defined as weeks since the last menstrual period), an enlarged uterus with a soft cervix. Standard textbooks of obstetrics do not indicate the value (ie, sensitivity and specificity) of these symptoms and signs as predictors of the diagnosis of early pregnancy. In the outpatient clinical setting, there are many reasons to determine whether the patient is pregnant, including avoiding nonurgent radiographs; avoiding teratogenic drugs, such as anticonvulsants; initiating early prenatal care; reassuring the patient; and explaining the multiple nonspecific complaints easily confused with the early symptoms of pregnancy. We are reviewing a common problem facing the primary care physician: When treating or evaluating a woman of childbearing years, what is the value of historical or physical examination features in determining the probability of early pregnancy? We will focus on the patient’s medical history and physical examination findings that help the clinician rule in or rule out early pregnancy. We intend to answer the following questions: (1) What is the value of history and symptoms in determining the probability of early pregnancy? (2) How accurate are home pregnancy tests (often part of the patient’s medical history) for determining early pregnancy? Copyright © 2009 by the American Medical Association. Click here for terms of use.

551

CHAPTER 42

The Rational Clinical Examination

(3) What is the value of physical examination findings in determining the probability of early pregnancy?

ANATOMIC AND PHYSIOLOGIC ORIGINS OF THE SIGNS AND SYMPTOMS OF PREGNANCY DURING THE FIRST TRIMESTER Pregnancy is suspected whenever a woman of childbearing years who has had regular menstrual cycles notices abrupt cessation of her menses. However, cessation of menses is a difficult symptom to evaluate in patients with previously irregular bleeding patterns. Occasionally, women have unexplained cyclic bleeding during pregnancy, especially in the first few months, and thus lack the symptom of amenorrhea. About 8% of pregnant women have a small amount of bleeding on or before the 40th day, which is thought to be related to implantation.1 The term morning sickness refers to the tendency of many women (approximately 50%) to develop nausea, often with vomiting, between 6 and 12 weeks’ gestational age.1 Usually the nausea is worse when the pregnant woman awakens in the morning, whereas it tends to diminish as the day progresses. Shortly after missing her first period, the pregnant woman may notice a heavy sensation in her breasts, accompanied by tingling and soreness. These symptoms relate to hormone stimulation of the ducts and alveoli of the breast parenchyma, but may occur in identical form just before a menstrual period. As early as 6 weeks’ gestational age, there may be noticeable enlargement of the breasts, with engorgement of the superficial veins in the breasts.2 During the first trimester, the nipples darken and become more sensitive. The areolar areas darken and become puffy. These symptoms and signs are thought to be of more value in primigravida because in multigravida women, areolar and nipple changes often remain from previous pregnancies.3 A few weeks after implantation (6 weeks’ gestational age), distinct enlargement of the uterus may be felt on bimanual palpation. In early pregnancy, the uterus becomes softened and changes from a pear-shaped configuration to a globular contour.1 The congestive hyperemia of the pelvis in early pregnancy is manifested by a softening of the vagina and cervix, as well as a change in the color of these tissues. A significant increase in uterine artery pulsatile activity may occur as blood flow to the pregnant uterus increases.4 In early pregnancy, the enlarging uterus exerts pressure on the bladder. Some patients note an increase in urinary frequency and nocturia during the first trimester.

HOW TO ELICIT THESE SYMPTOMS AND SIGNS Medical History Although patients may give a simple description such as “I may be pregnant,” the examiner should seek a more complete medical history. Histories that indicate an increased likelihood of pregnancy include amenorrhea, morning sickness, 552

breast symptoms (swelling, tingling, or tenderness), sexual activity, not using or inconsistent use of contraception, patient suspects she is pregnant, and a positive home pregnancy test result. Specific questions to ask include the following: (1) When was your last menstrual period, and was it normal? (2) Do you use any form of contraception? (3) Do you have any symptoms of pregnancy? (4) Is there a chance you are pregnant? Frequently, the patient may report, “My home pregnancy test was positive, and I want to know whether I am pregnant.” Important questions regarding this type of history would be these: (1) How many days or weeks after your last menstrual period did you perform the test? (2) Did you feel comfortable performing the test? (3) Did the instructions seem complicated to you? (4) What kind of home pregnancy test did you use? (5) Did you repeat the test and get a similar result?

Physical Examination To diagnose pregnancy, the clinician might examine the patient’s breasts, as well as the vaginal wall, cervix, and uterus, by bimanual examination. The breasts may become engorged and enlarged, with darkening of the areolar area. The venous pattern over the breasts becomes increasingly visible as pregnancy progresses.5 Vaginal examination can be performed to elicit the Chadwick sign associated with early pregnancy. As early as 8 to 12 weeks’ gestational age, the mucous membranes of the vulva, vagina, and cervix become congested and take on a bluishviolet hue (Chadwick sign).1 This hue is especially well defined in the anterior vaginal wall but is also present to some extent throughout the vagina and on the cervix. The Chadwick sign is rarely seen before 7 weeks’ gestational age.6 On bimanual examination, softening of the cervix (Goodell sign) may be detected by 8 weeks’ gestational age.7 The cervix of a nonpregnant woman is fibrous and normally feels like the tip of the nose. By contrast, the progressive edema that develops during pregnancy softens the consistency of the cervix tip to approximate that of the lips (Goodell sign). Examination of the uterus on bimanual examination can be performed to detect changes in uterine consistency and size. A palpable softening of the lowermost portion of the corpus occurs at about 6 weeks’ gestational age (Hegar sign).7 To elicit this sign, when the uterus is anteverted, the examiner places two fingers in the anterior vaginal fornix (or the posterior fornix in the presence of a retroverted uterus) and then compresses behind the fundus at the lower uterine segment with the other hand, using suprapubic pressure (Figure 42-1). In this way, a distinct area of uterine softening is observed between 2 firmer structures: the fundus above and the cervix below.5 Occasionally, the softening at the isthmus is so marked that the cervix and the body of the uterus seem to be separate organs.3 Another early sign of pregnancy is the uterine artery pulsation that can be palpated on a bimanual examination.4 During a bimanual examination, the second and

CHAPTER 42 third digits of the examining hand can be placed in the lateral vaginal fornix, and the presence of uterine artery pulsations can often be palpated with minimal pressure on the parametrium.4 A few weeks after the embryo has become implanted, a distinct enlargement of the uterus may be felt on bimanual examination. The uterus remains confined in the pelvis until 12 weeks’ gestational age, when the fundus becomes palpable above the pubic symphysis (Figure 42-2). The identification of the fetal heart rate distinct from the maternal heart rate establishes a diagnosis of pregnancy. Transvaginal ultrasonography can detect fetal heart activity as early as 5 weeks’ gestational age, and transabdominal ultrasonography can detect this activity as early as 6 weeks’ gestational age. Instruments that use the Doppler effect can detect fetal cardiac activity at 10 to 12 weeks’ gestational age. The fetal heart can usually be auscultated with a fetoscope by 20 weeks’ gestational age.

Early Pregnancy

PUBIC SYMPHYSIS

Uterus

Fundus Isthmus

Cervix RECTUM

Vagina

Reference Standard for Diagnosing Early Pregnancy In this review, the detection of the β subunit of human chorionic gonadotropin (HCG) in urine or serum is the routine reference standard (or gold standard) for diagnosing early pregnancy. The diagnostic reliability of both the serum and urine HCG tests is comparable. The sensitivity and specificity for the diagnosis of pregnancy for both tests are between 97% and 100% when performed in the laboratory.8 In this review, we also report the results of studies conducted before the development of the HCG test. These earlier studies used delivery as the reference standard.

Figure 42-1 Examination Eliciting the Hegar Sign The Hegar sign is a softening of the lower uterine segment that can be appreciated during a bimanual examination.

Weeks

METHODS

36

Search Strategy We searched the MEDLINE database for English-language articles concerning the diagnosis of pregnancy that were published between 1966 and 1996. The key words used were “pregnancy,” “diagnosis,” and “pregnancy tests.” Additional articles listed in the bibliographies of standard obstetric texts and references cited in articles included in our study were also included among the articles considered. Articles were systematically reviewed by authors and given a grade of A, B, or C according to the study design and level of evidence (see Table 1-7 for a summary of Evidence Grades and Levels).9 Articles were excluded if the results of the symptom or sign being investigated were not compared with the gold standard or the results could not be classified into a contingency table (attempts were made to reach authors of potential articles to obtain additional information needed to create contingency tables). Through the MEDLINE, textbook reference, and bibliography searches, we initially identified 55 articles, 40 of which were rejected because the test was not compared with the gold standard (urine or serum HCG test) or a pregnancy outcome. The remaining 15 articles were then analyzed by us, and 6 more were excluded because the reported data were

40 32

25 20

16

12 8

Figure 42-2 Uterine Height at Different Gestational Weeks The height of the fundus at comparable gestational dates varies greatly among patients. Those shown are the most common. A convenient rule of thumb is that, at 20 weeks of gestation, the fundus is usually at or slightly above the umbilicus.

553

CHAPTER 42

The Rational Clinical Examination

not sufficient to permit construction of contingency tables. Therefore, the results of 9 studies form the basis for this review. We used data from contingency tables to calculate sensitivity and specificity. Likelihood ratios were also calculated to characterize the behavior of the diagnostic tests. The positive likelihood ratio (LR+) is defined as sensitivity/(1 – specificity) and expresses the change in odds favoring a disease, given a positive test result (LR+ values are ≥ 1), whereas the negative likelihood ratio (LR–) is defined as (1 – sensitivity)/specificity and expresses the change in odds favoring disease, given a negative test result (LR– values are 0 to 1).10 Data were sufficiently similar in design to assess for statistical similarity. The data were pooled when the Breslow-Day test for homogeneity was not significant (P > .05).11

Accuracy of History and Symptoms for Pregnancy Diagnosis Several studies have been performed to evaluate the value of patient history in ruling in or ruling out early pregnancy compared with the gold standard HCG test (Tables 42-1, 42-2, 42-3, and 42-4). Among 208 consecutive patients for whom a qualitative serum HCG determination is ordered, emergency department physicians recorded the date of the patient’s last menstrual period, whether her menstrual period was on time, if birth control had been used, and whether the patient suspected she was pregnant.12 The main indication for ordering a pregnancy test in this study was abdominal pain (138 patients). Sixty-eight women (33%) were pregnant. Three historical variables were statistically

Table 42-1 Does a Delayed Menstrual Period Predict Pregnancy? a

Study Robinson and Barber15

Ramoska et al12

Stengel et al13 c

Zabin et al16

Evidence Gradeb Characteristics A

A

B

A

Delayed menses Menses on time Delayed menses Menses on time Delayed menses Menses on time Delayed menses Menses on time

Pregnant Yes

No

LR (95% CI)

618

248

1.6 (1.4-1.7)

361

365

0.62 (0.56-0.69)

58

58

2.1 (1.6-2.6)

10

82

0.25 (0.14-0.45)

3

43

1.0 (0.38-2.9)

9

136

0.99 (0.70-1.4)

703 1078 331

707

1.1 (1.0-2.9) 0.81 (0.68-0.76)

Abbreviations: CI, confidence interval; LR, likelihood ratio. aIn testing for homogeneity, χ2 = 37 and P = .001. Therefore, data were not pooled. bSee Table 1-7 for a summary of Evidence Grades and Levels. cUnpublished data from this study provided by David Seaberg, MD, University of Pittsburgh, Pennsylvania, June 1995.

554

less likely to be associated with pregnancy: a last menstrual period that was on time, the patient thinking that she was not pregnant, and the patient stating that there was no chance that she could be pregnant (P < .001). Combinations of historical criteria were unsuccessful at ruling out pregnancy; there was still a 10% chance of pregnancy’s being overlooked using any combination of these historical variables. Women may not associate symptoms with early pregnancy. Investigators measured the effectiveness of a standardized patient history questionnaire in detecting unrecognized pregnancies.13 Consecutive fertile women (n = 191) presenting to the emergency department for any reason completed a menstrual and sexual history questionnaire and had a pregnancy test. This study reports a 6.3% prevalence of unrecognized pregnancy, defined as a “pregnancy not definitely known to exist” when the patient presented to the emergency department.13 Among those with abdominal pain or pelvic complaints (70 patients), the prevalence of unrecognized pregnancy was found to be 13%. Historical factors were analyzed for correlation with positive pregnancy test results. Two factors were found to be statistically significant correlates: the patient thought there was a chance she could be pregnant and an abnormal last menstrual period (P < .001). One factor, the delayed menstrual period, was not found to be significant (LR+, 1.0). Among the historical factors analyzed, “Is there any chance that you could be pregnant now?” was the most sensitive for pregnancy (92%), with a specificity of 71% (David Seaberg, MD, University of Pittsburgh, Pennsylvania, unpublished data, June 1995). Unlike women who do not associate symptoms with early pregnancy, others self-diagnose pregnancy and request medical confirmation. Women (n = 283) with late menstrual periods who requested evaluation in a health center completed a structured contraception and sexual history questionnaire that included questions on whether the woman believed she was pregnant and whether subjective symptoms of pregnancy were present.14 The patient sealed her answers to the questionnaire in an envelope before the results of the pregnancy tests were available. One hundred eighteen women (42%) were pregnant. Women were better at ruling out pregnancy (sensitivity, 92%) than ruling in pregnancy (specificity, 42%). In another study,15 general practitioners performed a study to determine the value of pregnancy symptoms (presence or absence of amenorrhea and morning sickness) in determining the probability of pregnancy. Information was collected prospectively about women who consulted their general practitioner for a diagnosis of pregnancy; the gold standard was a positive pregnancy test result. General practitioners throughout Scotland (n = 155) participated in the study, which was restricted to women between the ages of 16 and 45 years. Of the 1592 women enrolled, 979 (62%) were pregnant. The symptom of amenorrhea was 63% sensitive and 60% specific for pregnancy. Morning sickness as a symptom of pregnancy had a sensitivity of 39% and a specificity of 86%. This study did not ask the participants whether they thought they were pregnant.

CHAPTER 42 In 1996, Zabin et al16 performed a similar study in a population of adolescents (younger than 17 years) to determine historical predictors of pregnancy. They performed a crosssectional study of 2926 adolescents who presented to 52 clinics in the United States and requested a pregnancy test. The girls were asked to complete an anonymous questionnaire (98% response rate) while they waited for the results of their pregnancy test. Thirty-six percent of adolescents in this study were pregnant. A late menstrual period was the most frequent reason (63%) for the visit (for pregnancy: sensitivity, 68%; specificity, 40%). Although a delayed menstrual period yields statistically significant results for predicting pregnancy, with an LR+ of 1.1 to 2.1 (Table 42-1), the results are inconsistent and, therefore, not a reliable symptom of pregnancy. Typical early symptoms of pregnancy provide more consistent results across studies and serve to increase slightly the likelihood of pregnancy (LR+, 2.4) (Table 42-2). Unfortunately, the absence of early symptoms of pregnancy, such as morning sickness, does not rule out pregnancy (LR–, 0.71). Likewise, the patient’s use of birth control decreases the likelihood of pregnancy (LR–, 0.29), but not enough to efficiently rule it out (Table 42-3). Even the patient’s suspicion of pregnancy statistically alters the likelihood of pregnancy, but not enough to be reliable (Table 42-4).

Accuracy of Home Pregnancy Tests It has been reported that one-third of women who think they may be pregnant have used a home pregnancy test.17 A recent study of teenagers requesting pregnancy tests in health departments revealed that 28% of adolescents had used an in-home pregnancy test before their visit.16 In-home pregnancy test kits became available in 1976 and used the hemagglutinationinhibition method of detecting HCG. Currently, most test kits use monoclonal HCG antibodies, which can produce test results that can be read as a color change. The accuracy of these tests is claimed to be 97% to 99% by the manufacturers.18 Studies have shown that accuracy depends on several factors, such as whether the woman read the instructions carefully and the number of days beyond the missed menstrual period.19 In 1986, Doshi20 published a study measuring the accuracy of 3 in-home tests for early pregnancy. The author studied 109 women of childbearing age whose menses were late by at least 6 days, but not more than 20 days. Volunteers for the study were obtained from 3 sites; the majority were white and educated. Participants brought to the study site their first morning urine, which was then divided in half. One portion of the sample was returned to the participant to use in performing a pregnancy test at home. Using 1 of 3 study kits (Answer [Carter Products; Carter-Wallace, Inc, New York, New York]; Daisy 2 [Boehringer-Mannheim Corp, Ingelheim, Germany]; and e.p.t. [Warner-Lambert Co, Morris Plains, New Jersey]), the participants were instructed to follow the package directions in performing the test, call the site with results, and complete and return the data collection survey to the investigator. The investigator performed an identical test using the other portion of the urine sample. Despite

Early Pregnancy

Table 42-2 Probability of Pregnancy if Patient Reports Symptoms of Pregnancya

Study

Pregnant

Evidence Gradeb

Robinson and Barber15

A

Bachman14

A

Morning sickness No morning sickness Any pregnancy symptoms No pregnancy symptoms

Yes No

LR (95% CI)

380 88 599 525

2.7 (2.2-3.3) 0.71 (0.67-0.76)

59

34

59 131

2.4 (1.7-3.4) 0.63 (0.52-0.77)

Abbreviations: CI, confidence interval; LR, likelihood ratio. a Pregnancy symptoms defined as morning sickness, breast tenderness and fullness, urinary frequency, or fatigue. b See Table 1-7 for a summary of Evidence Grades and Levels.

Table 42-3 Probability of Pregnancy if Patient Reports Not Using Birth Control

Study

Pregnant

Evidence Gradea

Ramoska et al12

A

Stengel et al13,b

B

No birth control Birth control No birth control Birth control No birth control Birth control

Pooledc

Yes No

LR (95% CI)

61 96 7 44 9 88 3 91 70 184 10 135

1.3 (1.1-1.5) 0.33 (0.16-0.69) 1.5 (1.1-2.2) 0.49 (0.18-1.3) 1.5 (1.3-1.7) 0.29 (0.16-0.53)

Abbreviations: CI, confidence interval; LR, likelihood ratio. a See Table 1-7 for a summary of Evidence Grades and Levels. b Unpublished data from this study provided by David Seaberg, MD, University of Pittsburgh, Pennsylvania, June 1995. c In testing for homogeneity, χ2 = 0.097 and P = .76. Therefore, data were pooled.

Table 42-4 Probability of Pregnancy if Patient Thinks There Is a Chance She Is Pregnant

Study

Evidence Patient Thinks She Is Gradea

Bachman14

A

Ramoska et al12

A

Stengel et al13b

B

Zabin et al16

A

Pooled resultsc

Pregnant Not pregnant Pregnant Not pregnant Pregnant Not pregnant Pregnant Not pregnant Pregnant Not pregnant

Pregnant Yes

No

LR (95% CI)

109 9 58 10 11 1 789 254 967 270

95 70 63 77 52 127 640 1148 850 1422

1.6 (1.4-1.8) 0.18 (0.09-0.34) 1.9 (1.5-2.3) 0.27 (0.15-0.48) 3.2 (2.4-4.2) 0.12 (0.02-0.77) 2.1 (2.0-2.3) 0.38 (0.34-0.42) 2.1 (2.0-2.2) 0.35 (0.31-0.39)

Abbreviations: CI, confidence interval; LR, likelihood ratio. a See Table 1-7 for a summary of Evidence Grades and Levels. b Unpublished data from this study provided by David Seaberg, MD, University of Pittsburgh, Pennsylvania, June 1995. c In testing for homogeneity, χ2 = 4.3 and P = .23. Therefore, data were pooled.

555

CHAPTER 42

The Rational Clinical Examination

manufacturer claims of 97% overall accuracy for the test kits used, the investigator found an accuracy of 77%. The participants had a sensitivity of 80% and specificity of 68% for detecting early pregnancy with the home pregnancy tests (LR+, 2.5; LR–, 0.29), with similar diagnostic efficiency observed for all 3 kits. These results concerned Doshi20 because of missed opportunities for early prenatal care and the postponement of discontinuing teratogenic substances. In 1993, investigators from France published an extensive analysis of the reliability and feasibility of home pregnancy tests.21 They looked at 27 different test kits (manufacturers were not identified) and selected 11 kits for the study, which were found to have a 100% sensitivity and specificity under ideal laboratory conditions. Laywomen volunteers (aged 1449 years; n = 638) were asked to test a home-use test kit for pregnancy using a coded urine specimen. They also were asked to complete a questionnaire after they performed the test. The results of the diagnostic study showed that 5 of the 11 kits had 100% specificity; the others had specificity values between 77% and 94%. Two kits had a high diagnostic sensitivity (>90%), and 2 kits were found to have a low diagnostic sensitivity ( 50) to ensure accuracy of the derived rule. For each eligible study, where possible, the pretest probability categories, corresponding disease prevalences, and likelihood ratios (LRs) (and corresponding 95% confidence intervals [CIs]) are summarized. The clinical gestalt must have been determined according to information available from the patient’s medical history and findings from physical examination and routine investigations (eg, chest radiograph, ECG, and arterial blood gas analysis) without predetermined elements or a standardized score, and most important, it must have been assessed before other diagnostic testing. A clinical prediction rule used a mathematically derived formula that combined the individual contribution of each component of the medical history, physical examination findings, and routine laboratory results before diagnostic testing.

Data Analysis Likelihood ratios and their 95% CIs were calculated with Metstat (version 1)26 and CI Analysis (version 1.1).27 Summary LRs were derived with random-effects measures that provide conservative CIs around the estimates.28,29 Decisions to include or exclude studies were made before the analysis according to the reported methods, rather than their actual results. We determined the summary LRs to get a general sense of whether structured models performed as well as the clinical gestalt. Furthermore, we pooled data only from studies that derived a structured model and specifically did not include data from subsequent validation studies, because these studies varied substantially in their study design (retrospective assessment and concomitant use of D-dimer) from the derivative studies.

RESULTS Our search yielded a total of 1709 articles, and after scanning the abstracts and titles, we selected 443 abstracts for detailed review. Of these, 30 articles were selected for complete review and 16 were included in the final analysis. These studies involved a total of 8306 patients.

Clinical Gestalt In the PIOPED study, physicians used their clinical gestalt to estimate the probability of pulmonary embolism according to patient medical history and physical examination findings, together with the results of a chest radiograph, an ECG, and

Pulmonary Embolus

Box 43-1 Criteria for Diagnosis and Exclusion of Pulmonary Embolism POSITIVE RESULT FOR PULMONARY EMBOLISM

Positive pulmonary angiogram result.20 High-probability lung scan (≥ 1 segmental perfusion defect21 or ≥ 2 large [>75% of a segment] segmental perfusion defects5 with corresponding normal ventilation). Nondiagnostic lung scan with either a positive venogram result22 or a compression ultrasonogram diagnostic for deep vein thrombosis. Positive lung perfusion scan23 (single or multiple wedge-shaped defect with or without matching chest radiograph abnormalities; wedge-shaped areas of overperfusion usually exist). NEGATIVE RESULT FOR PULMONARY EMBOLISM

Normal perfusion lung scan result23 and a normal 3month follow-up result. Negative pulmonary angiogram result20 and a normal 3-month follow-up result. Nondiagnostic lung scan and negative venogram result,22 serial leg compression ultrasonography,14 or impedance plethysmography24 and a normal 3-month follow-up result. Negative spiral computed tomographic scan result and negative venogram or negative serial compression ultrasonographic result and a normal 3-month follow-up result. Negative D-dimer test result and a normal 3-month follow-up result, provided anticoagulants were withheld.

an arterial blood gas analysis (Table 43-2).5,23,30-34 The results of this study showed that the prevalence of pulmonary embolism correlated reasonably well with the pretest probability estimates of pulmonary embolism. The Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis (PISA-PED) tested the accuracy of perfusion scan alone compared with pulmonary angiography.23 In this study, experienced clinicians estimated the probability of pulmonary embolism from their clinical gestalt according to patient symptoms, signs, and risk factors, together with the results of a chest radiograph, an ECG, and an arterial blood gas analysis. Perrier et al30-32 reported the clinical gestalt from 3 separate studies, using a diagnostic strategy in which a ventilation/ perfusion lung scan, a D-dimer assay, and compression ultrasonography followed the clinical evaluation. In the first 2 studies,30,31 all patients underwent a ventilation/perfusion scan and then were treated according to the pretest probability assessment, D-dimer assay result, and compression ultrasonographic finding. In the third study,32 patients were assessed initially with a highly sensitive (but nonspecific) enzyme-linked immunosorbent assay D-dimer laboratory analysis. The results of these studies are consistent with those reported in the PISA-PED23 and PIOPED5 studies. 563

CHAPTER 43

The Rational Clinical Examination

Table 43-2 Accuracy of Pretest Probability Assessment for Pulmonary Embolism With Clinical Gestalt Source, y

No. of Prevalence of Patients Pulmonary Embolism, %

PIOPED,5 1990

887

28

Miniati et al,23 1996

783

44

Perrier et al,30-32 1996, 1997, 1999

985

27

Sanson et al,33 2000

413

31

Musset et al,34 2002 (ESSEP)

1041

34

Category

Probability Estimate, %

No. of Patients

Actual Probability, %

Low Moderate High Unlikely Possible Very likely Low Moderate High Low Moderate High Low Moderate High

0-19 20-79 80-100 10 50 90 ≤20 21-79 ≥80 0-19 20-80 >80 0-19 20-79 80-100

228 569 90 349 179 225 368 523 94 58 278 77 231 525 285

9 30 68 8 47 91 9 33 66 19 29 46 12 26 68

LR (95% CI)a 0.26 (0.17-0.4) 1.1 (0.96-1.2) 5.3 (3.5-8.0) 0.13 (0.09-0.18) 1.1 (0.86-1.4) 12 (8.1-18) 0.21 (0.15-0.29) 1.1 (1.0-1.3) 4.5 (3.0-6.7) 0.53 (0.28-0.99) 0.92 (0.79-1.1) 1.9 (1.3-2.8) 0.26 (0.18-0.38) 0.67 (0.58-0.78) 4.0 (3.3-5.0)

Abbreviations: CI, confidence interval; ESSEP, Evaluation du Scanner Spirale dans l’Embolie Pulmonaire; LR, likelihood ratio; PIOPED, Prospective Investigation of Pulmonary Embolism Diagnosis. a Summary data (LR [95% CI]) for empirical pretest probability assessments are the following: low, 0.25 (0.14-0.45); moderate, 0.92 (0.71-1.2); and high, 4.7 (2.3-9.7). These summary data exclude results from the studies by Perrier et al30-32 because the pretest probability was used to manage subgroups of patients.

Sanson et al33 conducted a study in 6 Dutch teaching hospitals. The clinical gestalt was quantified into the pretest probability for pulmonary embolism, and patients underwent ventilation/perfusion lung scanning followed by angiography if the lung scan finding was nondiagnostic. The estimate of the pretest probability was performed by the attending physician on a visual analog scale; however, the results of chest radiographs, ECGs, and arterial blood gas analysis were not always available when the pretest probability was documented. In this study, assessment of pretest probability was less predictive than other studies of the clinical gestalt. The Evaluation du Scanner Spirale dans l’Embolie Pulmonaire study group34 assessed the accuracy of contrast spiral computed tomography (CT) of the chest for pulmonary embolism in 1041 patients. Using simple prespecified guidelines and empirical assessment based on patient medical history, physical examination findings, and results of routine investigations, clinicians stratified patients into low-, moderate-, or high-pretest-probability groups. The presence or absence of pulmonary embolism largely was based on the combined results of spiral CT and routine bilateral compression ultrasonography of the legs. If the clinical suspicion was high and the test results were negative, or if test results were inconclusive, further assessment with lung scanning and pulmonary angiography was performed. The study demonstrated reasonable discriminative ability among the 3 pretest groups. When interpreted together, the studies show that, when experienced clinicians use clinical gestalt, the prevalence of pulmonary embolism increases with increasing pretest probability. 564

The PIOPED and PISA-PED studies demonstrate the influence that clinical gestalt has on the interpretation of results of subsequent tests. In the PISA-PED study, a positive scan result for pulmonary embolism (single or multiple perfusion defects with or without matching chest radiograph abnormalities), together with a possible or likely clinical pretest probability, was associated with pulmonary embolism in 92% and 99% of patients, respectively.34 On the other hand (similar to the PIOPED study results), when patients had an unlikely (low) clinical pretest probability but a positive finding on perfusion scan, pulmonary embolism was diagnosed in only 50% to 60% of individuals. The findings in the study by Sanson et al33 suggest that the clinical gestalt is not particularly discriminating. However, the study still showed increasing prevalence of pulmonary embolism according to pretest probability.

Clinical Prediction Rules The PISA-PED study group analyzed clinical data from their accuracy study (Table 43-2)23 to derive a structured clinical rule.35 Clinical variables were divided into 3 categories: (1) signs and symptoms; (2) results of routine tests (chest radiograph, ECG, and arterial blood gas analysis); and (3) evidence of an obvious alternative diagnosis. Wells et al14 initially developed a 40-variable clinical rule and subsequently refined the rule after a limited pilot study. This rule (extended Wells) was used in a large multicenter study in which 1239 patients were enrolled and assigned a clinical probability of pulmonary embolism after taking a patient medical history, performing a physical examination,

CHAPTER 43 and assessing chest radiography, arterial blood gas analysis, and ECG findings. A checklist of specific symptoms and signs was compiled to help assign the pretest probability. Patients were assessed for type of symptoms (“typical,” “atypical,” or “suggestive” of severe pulmonary embolism), the presence or absence of risk factors, and the presence or absence of an alternative diagnosis as likely as or more likely than pulmonary embolism to account for the patient’s symptoms. The corresponding prevalence and LRs for pulmonary embolism in each of the 3 pretest probability categories are listed in Table 43-3.14,35-38 The utility of pretest probability assessment in combination with lung scanning again was highlighted. Only 8 of 27 (30%) patients with a low pretest probability and a high-probability lung scan result had angiographically proven pulmonary embolism.14 Clinical data collected on the 1239 patients by Wells et al39 also were used to derive a simplified clinical rule. With a stepwise logistic regression model, 7 key variables were identified and selected for inclusion in the final rule. Cut points were identified to classify patients as low (6) probability for pulmonary embolism (Table 43-4).39 With this simplified rule, only 3% (LR, 0.17; 95% CI, 0.11-0.27) of patients with a low pretest probability had pulmonary embolism vs 63% (LR, 8.6; 95% CI, 5.7-13) of those with a high pretest probability. Wicki et al36 pooled clinical data obtained from the patient medical history and physical examination, together with results of the chest radiograph, ECG, and arterial blood gas analysis collected during the 3 studies, involving 986 consecutive patients. A 7-variable rule was derived by logistic

Pulmonary Embolus

regression and statistically cross-validated (Table 43-5). A score based on a weighted sum of variables present, was used to estimate the pretest probability of pulmonary embolism. Patients with scores of less than 5 had low pretest probability of pulmonary embolism, of 5 to 8 had moderate pretest probability, and of greater than 8 had high pretest probability. The prevalence of pulmonary embolism correlated well with pretest probability. A large emergency department–based study involving 7 US centers systematically assessed 934 patients with suspected pulmonary embolism and derived a 6-variable model from this database (Figure 43-1).37 This model used 2 screening variables to assess all patients’ age and shock index (heart rate divided by systolic blood pressure). Patients younger than 50 years and with a shock index less than 1 are deemed “non–high risk”; the remaining patients are then further assessed with 4 variables. The model classified 79% of patients as non–high risk patients in whom the prevalence of pulmonary embolism was 13%, whereas the prevalence in the high-risk group (21% of patients) was 42%. Two medical students subsequently were employed to assess 117 patients presenting to one of the participating centers, and they demonstrated a high degree of interobserver agreement (weighted κ, 0.83).37 The PISA-PED investigators have reanalyzed data from their initial study and included data on a further 350 patients; the latter were assessed and treated as in the first study.38 Using appropriate statistical techniques, they derived and cross-validated a 15-variable model (Table 43-6). Unlike other structured models, the authors calculated and displayed the actual pretest probability for individual patients rather than the ordinal descriptors of low, moderate, and

Table 43-3 Accuracy of Clinical Prediction Rules for Assessing Pretest Probability of Pulmonary Embolism in Derivative Studiesa

No. of Patients

Prevalence of Pulmonary Embolism, %

Prospective Validation

Wells et al,14 1998 (Extended)

1239

17.5

Yes

Miniati et al,35 1999 (PISA-PED)

750

41

Yes

Wicki et al,36 2001 (Geneva rule)

986

27

Yes

Kline et al,37 2002

934

19.4

No

Miniati et al,38 2003 (PISA-PED II)

1100

40

No

Source, y

Pretest Probability Category Low Moderate High Unlikely Possible Very likely High Low Moderate High Nonhigh High Low Moderate High

Pretest Probability, %

LR (95% CI)

3 28 78 6 46 97 63 10 38 81 13.3 42.1 4 26 98

0.17 (0.12-0.25) 1.8 (1.5-2.1) 17 (11-27) 0.05 (0.03-0.10) 0.99 (0.75-1.3) 47 (23-98) 8.6 (5.7-13) 0.31 (0.24-0.40) 1.7 (1.5-1.9) 11 (6.1-21) 0.64 (0.56-0.73) 3.0 (2.4-3.8) 0.07 (0.04-0.11) 0.72 (0.6-0.87) 66 (31-137)

Abbreviations: CI, confidence interval; LR, likelihood ratio; PISA-PED, Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis. a Summary of pretest probability (LR [95% CI]) of structured clinical rules is as follows: low, 0.12 (0.05-0.31); moderate, 1.1 (0.76-1.6); and high, 23 (7.6-69). This summary excludes data from Kline et al,37 because that study categorized patients only into low and high categories, and from Wells et al14 because the pretest probability was used to guide management, which likely resulted in case-finding bias.

565

CHAPTER 43

The Rational Clinical Examination

Table 43-4 The Simplified Wells Scoring Systema Scoreb

Findings Clinical signs/symptoms of deep vein thrombosis (minimum of leg swelling and pain with palpation of the deep veins of the leg) No alternate diagnosis likely or more likely than pulmonary emboli Heart rate > 100/min Immobilization or surgery in last 4 wk History of deep vein thrombosis or pulmonary emboli Hemoptysis Cancer actively treated within last 6 months

3.0 3.0 1.5 1.5 1.5 1.0 1.0

a

Adapted from Wells et al39 with permission. Category scores are as follows: low, 6. The patient’s clinical score is calculated by the summing of the scores (weight) of the predictor variables that are present. b

Table 43-5 The Clinical Prediction Rule by Wicki et al36 (Geneva Rule)a Variable

Point Scoreb

Age, y 60-79 ≥80 Previous pulmonary emboli or deep vein thrombosis Recent surgery Pulse rate > 100/min PaCO2, kPac 50 y?

No

Not high risk

Yes Unexplained hypoxemia? (SaO2 < 95%; nonsmoker; no asthma; no COPD)

Yes

High risk

No Unilateral leg swelling?

Yes

High risk

No Recent surgery (within past 4 wk)?

Yes

High risk

No Hemoptysis?

Yes

High risk

No Not high risk

Figure 43-1 Decision Rule for Pulmonary Embolism This model uses 2 screening variables to assess all patients’ age and shock index (HR divided by SBP). Abbreviations: COPD, chronic obstructive pulmonary disease; HR, heart rate; SBP, systolic blood pressure. Adapted from Kline et al,37 with permission from the American College of Emergency Physicians.

Table 43-6 Structured Clinical Model Derived by the PISA-PED Group a Factor Male sex Age, y 63-72 ≥73 Preexisting disease Cardiovascular Respiratory Thrombophlebitis (ever) Symptoms Dyspnea (sudden onset) Chest pain Hemoptysis Temperature > 38°C Electrocardiogram signs of acute right ventricular overload Chest radiograph findings Oligemia Amputation of hilar artery Consolidation (infarction) Consolidation (no infarction) Pulmonary edema

Regression Coefficient 0.81 0.59 0.92 –0.56 –0.97 0.69 1.29 0.64 0.89 –1.17 1.53

3.86 3.92 3.55 –1.23 –2.83

Abbreviation: PISA-PED, Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis. aAdapted from Miniati et al,38 with permission from Excerpta Medica.

567

CHAPTER 43

The Rational Clinical Examination

Table 43-7 Accuracy of Clinical Prediction Rules for Pulmonary Embolism When Tested Prospectively

No. of Patients

Prevalence of Pulmonary Embolism, %

Sanson et al,33 2000

237

38

Extended Wells14

Sanson et al,33 2000

414

29

Simplified Wells39

Wells et al,41 2001

930

9.5

Simplified Wells39

Kruip et al,40 2002

234

22

Extended Wells14

Chagnon et al,42 2002

277

26

Simplified Wells39

Chagnon et al,42 2002

277

26

Wicki (Geneva rule)36

Source, y

Rule Prospectively Tested

Pretest Probability Posttest Probability, Category % Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High

28 39 46 28 30 38 1.3 16 41 4 28 63 12 40 91 13 38 67

LR (95% CI) 0.66 (0.4-1.1) 1.1 (0.86-0.13) 1.4 (0.81-2.5) 0.93 (0.69-1.3) 1.0 (0.88-1.2) 1.4 (0.35-5.9) 0.13 (0.06-0.26) 1.9 (1.6-2.3) 5.9 (3.7-9.3) 0.15 (0.07-0.33) 1.5 (1.01-2.2) 5.85 (3.51-9.74) 0.39 (0.26-0.58) 2.0 (1.5-2.6) 29 (3.8-223) 0.44 (0.30-0.65) 1.8 (1.4-2.3) 5.8 (1.8-19)

Abbreviations: CI, confidence interval; LR, likelihood ratio.

results is sufficiently low. Newer assays can be performed rapidly, making them suitable for use in individual patients.43-47 The D-dimer assay is complementary to the clinical pretest probability because pulmonary embolism can be reliably excluded in patients with a negative D-dimer result and a low pretest probability.41 The accuracy indices of 3 currently available D-dimer assay types are summarized in Table 43-8.43,45,46 Unfortunately, D-dimer assays vary in their sensitivities and specificities, so the posttest probability for a given patient with suspected pulmonary embolism will vary according to which D-dimer assay is used. Before clinicians use a particular D-dimer assay to revise their pretest probability, they should be aware of the differences and interpret the results of the assay accordingly.44,47

Table 43-8 Estimated Accuracy Indices of 3 D-dimer Assays D-dimer Assay

% (95% CI) Sensitivity

Specificity

LR (95% CI) Positive

Negative

96 (90-99) 45 (40-49) 1.7 (1.5-1.9) 0.09 (0.04-0.11) Organon Teknika latex immunoassay45 90 (81-96) 45.1 (39-51) 1.6 (1.4-1.8) 0.22 (0.11-0.44) Vidas Rapid ELISA assay46 SimpliRED 84.8 (79-89) 68.4 (65-71) 2.7 (2.4-3.0) 0.22 (0.16-0.3) D-dimer assay43 Abbreviations: CI, confidence interval; ELISA, enzyme-linked immunosorbent assay; LR, likelihood ratio.

568

CLINICAL SCENARIOS—RESOLUTIONS CASE 1 This young woman has no risk factors or signs of

pulmonary embolism (no tachycardia, features of deep vein thrombosis, or hemoptysis). No clear alternate diagnosis is present that is at least as likely as or more likely than pulmonary embolism. According to the Wells simplified clinical prediction rule, her score would be 3, a moderate pretest probability for pulmonary embolism (approximately 20%). Her whole-blood red blood cell agglutination D-dimer assay result is negative (negative LR, 0.22).43 Therefore, the probability of pulmonary embolism after the results of the Ddimer assay are obtained is about 5%. The finding from a perfusion scan is normal (LR for pulmonary embolism with a normal lung scan, 0.1).48 Therefore, her posttest probability after the above combination of tests is 0.5%, and pulmonary embolism can be ruled out. CASE 2 This elderly patient has a high pretest probability

for pulmonary embolism (approximately 65%) with the simplified Wells rule because of the combination of immobilization, tachycardia, previous deep vein thrombosis/pulmonary embolism, and the absence of an alternate diagnosis as likely as or more likely than pulmonary embolism. This combination of findings results in a score of 7, which falls into the category of a high pretest probability. In combination with a negative whole-blood red blood cell agglutination D-dimer assay result (LR, 0.22),43 the revised pretest probability is approximately 30%. A ventilation/perfusion scan is reported as intermediate probability (LR, 1.2)48; therefore, his posttest

CHAPTER 43

probability of pulmonary embolism is about 33% and pulmonary embolism has not been ruled out. Further testing with compression ultrasonography and, if the finding is normal, pulmonary angiography should be considered.

THE BOTTOM LINE Clinical assessment alone is insufficient to diagnose or rule out pulmonary embolism, although experienced clinicians can use clinical gestalt to assign a pretest probability of pulmonary embolism with reasonable accuracy. Clinical prediction rules appear to have similar accuracy to that of the clinical gestalt for patients in the low- and high-probability categories. We advocate the use of any one of the clinical prediction rules because they are simple and maintain their accuracy when used by less-experienced clinicians. In deciding which of the several rules to use, clinicians could justifiably make decisions on the scale that is easiest for them to use consistently. Factors that could affect the decision are availability of the rule in clinical reminder systems and the availability of the required clinical data. We are unable to say with confidence whether one structured clinical rule performs better than another. In outpatients with new onset or recent worsening of symptoms within the preceding 3 days, the combination of pretest probability assessment with the results of D-dimer testing improves diagnostic accuracy. Furthermore, there is emerging evidence that outpatients with a low pretest probability for pulmonary embolism can have anticoagulant therapy safely withheld when the results of D-dimer testing are negative.41,43 Author Affiliations at the Time of Original Publication

Department of Medicine, McMaster University, Hamilton, Ontario, Canada (Drs Chunilal and Ginsberg); Department of Haematology, The Queen Elisabeth Hospital, Woodville, South Australia, Australia (Dr Chunilal); Department of Haematology, Royal Perth Hospital, Perth, Australia (Dr Eikelboom); Center for Clinical Epidemiology and Biostatistics, University of Newcastle, Newcastle, Australia (Dr Attia); Institute of Clinical Physiology, National Research Council of Italy, Pisa, Italy (Dr Miniati); Henderson Division, Hamilton Health Sciences Corporation, Hamilton, Ontario, Canada (Dr Panju); and Durham Veterans Affairs Medical Center and Department of Medicine, Duke University, Durham, North Carolina (Dr Simel). Acknowledgments

Dr Chunilal is the recipient of the Noonan Fellowship, McMaster University. Dr Eikelboom is the recipient of a research fellowship from the Heart and Stroke Foundation of Canada. Dr Ginsberg is the recipient of a Career Investigator Award from the Heart and Stroke Foundation of Ontario and a research chair from CIHR-AstraZeneca. We thank Jonathan Edlow, MD, David B. Matchar, MD, and Eugene Oddone, MD, for their important suggestions on the manuscript, as well as Arnaud Perrier, MD, Harry Buller,

Pulmonary Embolus

PhD, and Marieke J. H. A. Kruip, MD, for providing clarification of data.

REFERENCES 1. Anderson FA Jr, Wheeler HB, Goldberg RJ, et al. A population-based perspective of the hospital incidence and case fatality rate of deep vein thrombosis and pulmonary embolism: the Worcester DVT study. Arch Intern Med. 1991;151:933-938. 2. Silverstein MD, Heit JA, Mohr DN, Petterson TM, O’Fallon WM, Melton LJ III. Trends in the incidence of deep vein thrombosis and pulmonary embolism: a 25-year population based study. Arch Intern Med. 1998;158(6):585593. 3. Barritt DW, Jordan SC. Anticoagulant drugs in the treatment of pulmonary embolism: a controlled trial. Lancet. 1960;1(7138):1309-1312. 4. Goldhaber SZ, Visani L, De Rosa M. Acute pulmonary embolism: clinical outcomes in the International Cooperative Pulmonary Embolism Registry (ICOPER). Lancet. 1999;353(9162):1386-1389. 5. PIOPED Investigators. Value of the ventilation/perfusion scan in acute pulmonary embolism: results of the Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED). JAMA. 1990;263(20):2753-2759. 6. Stein PD, Wills PW, DeMets DL. History and physical examination in acute pulmonary embolism in patients without pre-existing cardiac or pulmonary disease. Am J Cardiol. 1981;47:218-223. 7. Hoellerich VL, Wigton RS. Diagnosing pulmonary embolism using clinical findings. Arch Intern Med. 1986;146(9):1699-1704. 8. Heit JA, Mohr DN, Silverstein MD, Petterson TM, O’Fallon WM, Melton LJ III. Risk factors for deep vein thrombosis and pulmonary embolism: a population-based case control study. Arch Intern Med. 2000;160(6):809-815. 9. Grady D, Wenger NK, Herrington D, et al. Postmenopausal hormone replacement therapy increases the risk of venous thromboembolic disease: the Heart and Estrogen/Progestin Replacement Study. Ann Intern Med. 2000;132(9):689-696. 10. Douketis JD, Ginsberg JS, Holbrook A, Crowther M, Duku EK, Burrows RF. A reevaluation of the risk for venous thromboembolism with the use of oral contraceptives and hormone replacement therapy. Arch Intern Med. 1997;157(14):1522-1530. 11. Goldhaber SZ. Pulmonary embolism. N Engl J Med. 1998;339(2):93-104. 12. Hampson NB. Pulmonary embolism: difficulties in the clinical diagnosis. Semin Respir Infect. 1995;10(3):123-130. 13. Moser KM, Fedullo PF, LitteJohn JK, Crawford R. Frequent asymptomatic pulmonary embolism in patients with deep vein thrombosis. JAMA. 1994;271(3):223-225. 14. Wells PS, Ginsberg JS, Anderson D, et al. Use of a clinical model for safe management of patients with suspected pulmonary embolism. Ann Intern Med. 1998;129(12):997-1005. 15. Wells PS, Hirsh J, Anderson DR, et al. Accuracy of clinical assessment of deep vein thrombosis. Lancet. 1995;345(8961):1326-1333. 16. Anand SS, Wells PS, Hunt DH, Brill-Edwards P, Cook D, Ginsberg JS. The rational clinical examination: does this patient have deep vein thrombosis? JAMA. 1998;279(14):1094-1099. 17. Kearon C, Ginsberg JS, Crowther M, Brill-Edwards P, Weitz JI, Hirsh J. Management of suspected deep vein thrombosis in outpatients by using clinical assessment and D-dimer testing. Ann Intern Med. 2001;135(2): 108-111. 18. Bates SM, Kearon C, Crowther M, et al. A diagnostic strategy involving a quantitative latex D-dimer assay reliably excludes deep venous thrombosis. Ann Intern Med. 2003;138(10):787-794. 19. Becker DM, Philbrick JT, Bachhuber T, Humphries JE. D-dimer testing and acute venous thromboembolism: shortcut to diagnosis? Arch Intern Med. 1996;156(9):939-945. 20. Stein PD, Athanasoulis C, Alavi A, et al. Complications and validity of pulmonary angiography in acute pulmonary embolism. Circulation. 1992;85(2):462-468. 21. Hull RD, Hirsh J, Carter CJ, et al. Pulmonary angiography, ventilation lung scanning, and venography for clinically suspected pulmonary embolism with abnormal perfusion lung scan. Ann Intern Med. 1983;98 (6):891-899. 22. Kruit WHJ, De Boer AC, Sing AK, Van Roon F. The significance of venography in the management of patients with clinically suspected pulmonary embolism. J Intern Med. 1991;230(4):333-339.

569

CHAPTER 43

The Rational Clinical Examination

23. Miniati M, Pistolesi M, Marini C, et al. Value of perfusion lung scan in the diagnosis of pulmonary embolism: results of the Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis (PISA-PED). Am J Respir Crit Care Med. 1996;154(5):1387-1393. 24. Hull RD, Raskob GE, Ginsberg JS, et al. A noninvasive strategy for the treatment of patients with suspected pulmonary embolism. Arch Intern Med. 1994;154(3):289-297. 25. Wasson JH, Sox HC, Neff RK, Goldman L. Special article: clinical prediction rules: application and methodological standards. N Engl J Med. 1985;313(13):793-799. 26. Suskin D, Super DH. Metastat: Version 1. Cleveland, OH: MetroHealth Medical Center, Case Western Reserve University, Department of Pediatrics; 1990. 27. Gardner SD, Winter PD, Gardner MJ. Confidence Interval Analysis: Statistics With Confidence. London, England: British Medical Journal; 1989. 28. Eddy DM, Hasselblad V, Shachter RD. Meta-analysis by the Confidence Profile Method: The Statistical Synthesis of Evidence. San Diego, CA: Academic Press; 1992. 29. Eddy DM, Hasselblad V. Fast*Pro V1.8: Software for Meta-analysis by the Confidence Profile Method. San Diego, CA: Academic Press; 1992. 30. Perrier A, Desmariais S, Goerhing C, et al. D-dimer testing for suspected pulmonary embolism in outpatients. Am J Respir Crit Care Med. 1997; 156(2 pt 1):492-496. 31. Perrier A, Bounameaux H, Morabia A, et al. Diagnosis of pulmonary embolism by a decision analysis-based strategy including clinical probability, D-dimer levels, and ultrasonography: a management study. Arch Intern Med. 1996;156(5):531-536. 32. Perrier A, Desmarais S, Miron MJ, et al. Non-invasive diagnosis of venous thromboembolism in outpatients. Lancet. 1999;353(9148):190-195. 33. Sanson BJ, Lijmer JG, MacGillavry MR, Turkstra F, Prins MH, Buller HR. Comparison of a clinical probability estimate and two clinical models in patients with suspected pulmonary embolism. Thromb Haemost. 2000;83(2):199-203. 34. Musset D, Parent F, Meyer G, et al; Evaluation du Scanner Spirale dans l’Embolie Pulmonaire. Diagnostic strategy for suspected pulmonary embolism: a prospective multicenter outcome study. Lancet. 2002;360 (9349):1914-1920. 35. Miniati M, Prediletto R, Formichi B, et al. Accuracy of clinical assessment in the diagnosis of pulmonary embolism. Am J Respir Crit Care Med. 1999;159(3):864-871. 36. Wicki J, Perneger TV, Junod AF, Bounameaux H, Perrier A. Assessing clinical probability of pulmonary embolism in the emergency ward. Arch Intern Med. 2001;161(1):92-97.

570

37. Kline JA, Nelson RD, Jackson RE, Courtney DM. Criteria for the safe use of D-dimer testing in emergency department patients with suspected pulmonary embolism: a multicenter US study. Ann Emerg Med. 2002; 39(2):144-152. 38. Miniati M, Monti S, Bottai M. A structured clinical model for predicting the probability of pulmonary embolism. Am J Med. 2003;114(3):173179. 39. Wells PS, Anderson DR, Rodger M, et al. Derivation of a simple clinical model to categorize patients with a probability of pulmonary embolism: increasing the models utility with the SimpliRED D-dimer. Thromb Haemost. 2000;83(3):416-420. 40. Kruip MJHA, Slob MJ, Schijen JHEM, van der Heul C, Buller HR. Use of a clinical decision rule in combination with D-dimer concentration in diagnostic workup of patients with suspected pulmonary embolism: a prospective management study. Arch Intern Med. 2002;162(14):16311635. 41. Wells PS, Anderson D, Rodger M, et al. Excluding pulmonary embolism at the bedside without diagnostic imaging: management of patients with suspected pulmonary embolism presenting to the emergency department by using a simple clinical model and D-dimer. Ann Intern Med. 2001;135(2):98-107. 42. Chagnon I, Bounameaux H, Aujesky D, et al. Comparison of two clinical prediction rules and implicit assessment among patients with suspected pulmonary embolism. Am J Med. 2002;113(4):269-275. 43. Ginsberg JS, Wells PS, Kearon C, et al. Sensitivity and specificity of a rapid whole blood assay for D-dimer for the diagnosis of pulmonary embolism. Ann Intern Med. 1998;129(12):1006-1011. 44. van Beek EJR, van den Ende B, Berckmans RJ, et al. A comparative analysis of D-dimer assays in patients with clinically suspected pulmonary embolism. Thromb Haemost. 1993;70(3):408-413. 45. Bates SM, Grand’Maison A, Johnston M, Naguit I, Kovacs MJ, Ginsberg JS. A latex D-dimer reliably excludes venous thromboembolism. Arch Intern Med. 2001;161(3):447-453. 46. Sijens PE, van Ingen HE, van Beek EJ, Berghout A, Oudkerk M. Rapid ELISA assay for plasma D-dimer in the diagnosis of segmental and subsegmental pulmonary embolism: a comparison with pulmonary angiography. Thromb Haemost. 2000;84(2):156-159. 47. de Moerloose P. D-dimer assay for the exclusion of venous thromboembolism: which test for which diagnostic strategy? Thromb Haemost. 2000;83(2):180-181. 48. Jaeschke R, Guyatt GH, Sackett DL. User’s guide to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients? JAMA. 1994;271(9):703-707.

U P D A T E : Pulmonary Embolus

43

Prepared by Sanjeev Chunilal, MB ChB, FRACP, FRCPA, and Jeff Ginsberg, MD, FRCPC Reviewed by Phil Wells, MD

CLINICAL SCENARIO A 25-year-old woman presents to the emergency department with pleuritic chest pain, having just returned home after a 12-hour plane flight. She is taking no medications, other than an oral contraceptive pill. Her clinical examination reveals coryza without tachypnea, and the remainder of the examination results are unremarkable. A pregnancy test result is negative and a chest radiograph result is normal. A D-dimer test shows a positive result.

Original Review Chunilal S, Eikelboom J, Attia J, et al. Does this patient have pulmonary embolism? JAMA. 2003;290(21):2849-2858.

UPDATED LITERATURE SEARCH We applied the same search criteria as was used in the original Rational Clinical Examination article to identify studies of the clinical pretest probability of pulmonary emboli. We ran a second search combining the terms “physical exam,” “medical history taking,” “sensitivity and specificity,” “observer variation,” diagnostic test, routine,” “decision support techniques,” and “pulmonary embolism.” Each search was limited to English-language articles published between 2002 and 2004. The first strategy yielded a total of 160 articles; the latter yielded 123 articles. Titles and abstracts were reviewed with the same criteria used for the original article. To find studies in which patients with suspected pulmonary embolism were enrolled in an unselected consecutive manner, participating physicians in the studies had to have been blinded to the results of diagnostic testing and had to estimate the pretest probability of pulmonary embolism. Validated algorithms to exclude or confirm the diagnosis of pulmonary embolism had to have been used.

New Findings • New studies focus primarily on whether a low or moderate clinical probability estimate in combination with a normal D-dimer result rules out a pulmonary embolus. For such patients, the summary likelihood ratio (LR) for a pulmonary embolus is 0 with an upper 95% confidence interval

(CI) of 0.06. This combination of results effectively rules out a pulmonary embolus. • The simplified Wells criteria have good reliability.

Details of the Update For this update, no new clinical prediction rules were identified. Four management studies were identified with the above search strategy. One of these evaluated the performance of a logistic model that used only demographic features, symptoms, clinical signs, and radiograph results without a Ddimer assay. The other 3 studies evaluated outcomes after management that combined the results of a clinical prediction rule with the D-dimer (see Table 43-9).

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION In the original publication, a weighted κ was available for a limited number of structured clinical models. In a recent small study of patients with suspected pulmonary embolism, 2 clinicians assessed the patient initially with the extended Wells model, and then, from the data collected, each clinician was asked to determine the pretest probability by applying the simplified Wells model. The weighted κ value for the extended Wells model was 0.54 (95% CI, 0.28-0.80) vs 0.6 (95% CI, 0.34-0.85) for the simplified Wells model. In the same substudy, there was less agreement between the extended clinical model and the pretest probability determined by clinical gestalt (weighted κ, 0.23; 95% CI, 0.05-0.42).5 These data suggest that the reproducibility of the clinical assessment with a structured clinical prediction rule is at best moderate but not dissimilar to the other components of the clinical examination.6 A reliability study of 153 patients7 (11% with pulmonary emboli, using helical computed tomography [CT]) assessed the simplified Wells study. The criteria had substantial agreement, with κ less than 0.70. The criterion of an “alternative diagnosis that is less likely than pulmonary embolism” had a κ of 0.58 (95% CI, 0.44-0.72), which is still considered moderate agreement. The weighted κ value for a low vs moderate vs high probability of pulmonary embolus, recalculated from the raw data displayed in the article, showed substantial agreement (0.62; 95% CI, 0.50-0.74). The results need confirmation in a larger sample of patients. 571

CHAPTER 43

Update

Table 43-9 Likelihood Ratios for the Pretest Probability of Pulmonary Embolus Derived From the Clinical Gestalt or Structured Clinical Models Number of Patients (Prevalence, Source, y %) Perrier et al,1 2004

965 (23)

Model Tested

Pretest Pretest Probability, Probability % Category (95% CI)

Geneva High (with implicit Moderate override) Low

Miniati et al,2 2003

390 (41)

PISAPED

High Moderately High Intermediate Low

Ten Wolde et al,3 2004

504 (20)

Empiric

81%100% 51%-80% 21%-50% 0%-20%

Leclerq et al,4 2003

202 (29)

Wells High extended Moderate Low

85 (75-92) 34 (29-39) 7 (5-9) 100 (97-100) 86 (70-95) 24 (16-34) 3 (1-7.0) 67 (52-81) 29 (21-37) 15 ( 11-19) 8 (5-14) 50 (32-68) 27 (17-39) 25 (17-35)

LR (95% CI) 19 (10-36) 1.7 (1.5-2.0) 0.23 (0.17-0.31) 297 (16-4746) 8.6 (3.4-22) 0.48 (0.31-0.74) 0.04 (0.02-0.11) 8.5 (4.6-16) 1.6 (1.2-2.3) 0.74 (0.57-0.93) 0.37 (0.22-0.63) 2.4 (1.3-4.5) 0.87 (0.56-1.4) 0.79 (0.56-1.1)

Abbreviations: CI, confidence interval; LR, likelihood ratio; PISA-PED, Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis.

Table 43-10 A Low to Moderate Clinical Probability With a Normal D-dimer Result Makes Pulmonary Emboli Unlikely Source, y

Findings

Ten Wolde et al,3 2004 Clinical probability < 20% and normal D-dimer result Perrier et al,1 2004 Moderate or low probability and normal D-dimer result Moderate or low probability and Leclerq et al,4 2003 normal D-dimer result Summary

LR (95% CI) 0 (0-0.32)

CHANGES IN THE REFERENCE STANDARD Computed Tomography Angiography Despite recent advances in the visualization of pulmonary arteries with the advent of spiral CT, there are no welldesigned clinical outcome studies validating its role as a standalone test in the treatment of unselected patients with suspected pulmonary embolism. Newer advances in this technology, with the advent of multidetector modalities, may improve scan acquisition times and image quality. At present, although spiral CT continues to improve in its diagnostic accuracy for pulmonary embolism, a negative spiral CT result by itself is still not sufficient to reliably exclude pulmonary embolism.9

D-dimer With a Rapid Enzyme-Linked Immunosorbent Assay Technique The results of a recent systematic review10 confirm that a negative D-dimer test result by itself may safely exclude pulmonary embolism. However, these data relate primarily to the enzyme-linked immunosorbent assay (ELISA) D-dimer testing format, which has superior sensitivity and negative LR compared with other D-dimer assays. Therefore, a negative D-dimer test result with the rapid ELISA format is as diagnostically useful as a negative lung scan result in patients with suspected pulmonary embolism who present with recent onset of symptoms.

0 (0-0.13) 0 (0-0.36) a

0 (0-0.06)

Abbreviations: CI, confidence interval; LR, likelihood ratio. a Summary LR is statistically homogenous (P = .94).

Righini et al8 reanalyzed data from which the original Geneva rule was derived and also retrospectively calculated the pretest probability by applying the simplified Wells rule. The a priori hypothesis was that the discriminative ability of the clinical models would be lower in older patients com572

pared with younger patients. There was no clinically or statistically significant effect of age (younger than 50 years, 5174 years, and older than 75 years) on the discriminative value of either model.8 Because of heterogeneity in the earlier studies reported in the original Rational Clinical Examination article, we did not provide summary estimates for the prediction rules or Ddimer results. The 3 new studies that added D-dimer to the established prediction rules focused primarily on the utility of the D-dimer to rule out pulmonary emboli. The studies yielded more consistent, homogenous results and provide the opportunity to create summary measures that are especially useful for understanding the role of a negative D-dimer result in ruling out pulmonary emboli (see Tables 43-10 and 43-11).

RESULTS OF THE LITERATURE REVIEW Miniati et al2 applied the Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis (PISA-PED) structured model to assess 390 patients with suspected pulmonary embolism by categorizing them as having low, intermediate, moderately high, or high probability of pulmonary embolism. Pulmonary embolism was diagnosed or excluded by combining the pretest probability assessment with the results of perfusion lung scanning. Within these probability groups, the prevalence of pulmonary embolism was 3%, 24%, 86%, and 100%, respectively.2 These data confirm the accuracy of this model when used by this group of clinicians. To date, no

CHAPTER 43 other group has tested this clinical prediction rule. Because the structured model contains many more variables than other models, clinicians cannot apply the results directly without the use of a handheld calculator that contains the variables and their regression coefficients. Thus, the results are most useful for identifying the findings that are independently useful for diagnosis of pulmonary emboli. The remaining 3 studies used the results of D-dimer testing combined with the clinical pretest probability assessment. In a study by Perrier et al,1 consecutive patients presenting to the emergency department with suspected pulmonary embolism were evaluated with the “Geneva rule.”11 Clinicians were allowed to override the pretest probability assessment if their clinical judgment disagreed with the prediction rule. The prediction rule and clinician override were done before any additional tests were obtained, including the D-dimer. All patients had a D-dimer test performed (rapid ELISA Vidas DD; BioMerieux, Marcy l’Etoile, France) and, if the assay result was negative, no further testing was performed, anticoagulant therapy was withheld, and patients were followed up for 3 months. Patients with a positive D-dimer test result underwent a preestablished standardized sequence of tests to exclude or confirm the diagnosis. The Geneva score pretest probability score was available for only 771 patients of the total cohort of 965 patients; for the remaining 126 patients, clinicians used “implicit judgment” to assess pretest probability. Of the 771 patients who were evaluated with the Geneva rule, clinicians used their judgment to change the pretest probability in 179 (23%). The pretest probability was increased in 126 patients and decreased in 53 patients. Overall, 7% (95% CI, 5%-9%) of patients with a low pretest probability had objectively confirmed pulmonary embolism compared with 35% (95% CI, 29%-39%) in the moderateand 85% (95% CI, 75%-92%) in the high-pretest-probability groups. Strictly speaking, this study does not validate the accuracy of the Geneva rule. On the other hand, as the authors observe, allowing physicians to override the rule improves its acceptability to clinicians and makes clinical sense. The Geneva rule does not have a variable taking into account an alternative diagnosis, which might otherwise accommodate the “implicit override” feature. No patient with a moderate or low probability and a normal D-dimer result had a pulmonary embolus (LR, 0; 95% CI, 0-0.13). Leclerq et al4 assessed 202 patients referred for clinically suspected pulmonary embolism. The clinical pretest probability for pulmonary embolism was formally documented with the extended Wells model12; subsequent investigations were based on the results of D-dimer testing (Tinaquant DDimer; Roche Diagnostics, Mannheim, Germany). Patients with a low or moderate pretest probability and a negative Ddimer test result were discharged without anticoagulant therapy and followed up for 3 months; none of these patients (0%; 95% CI, 0%-5.6%) had venous thromboembolism in follow-up. The remainder of patients underwent perfusion lung scanning, followed by bilateral compression ultrasonography of the legs if the lung scan result was nondiagnostic; when the ultrasonographic result was normal, pulmonary angiography was performed. The overall prevalence of

Pulmonary Embolus

Table 43-11 Likelihood Ratios of an Abnormal D-dimer Result for a Pulmonary Embolus Source, y

LR (95% CI)

Leclerq et al,4 2003 Perrier et al,1 2004 Summary

1.9 (1.6-2.2) 1.7 (1.5-1.8) 1.7 (1.6-1.8)a

Abbreviations: CI, confidence interval; LR, likelihood ratio. a Summary LR is statistically homogenous (P = .09).

pulmonary embolism was 29%; 25% (95% CI, 17%-35%) in patients with a low pretest probability, 26% (95% CI, 17%39%) in the moderate-pretest-probability group, and 50% (95% CI, 32%-68%) in the high-pretest-probability group. These results show less discrimination than the original study by Wells et al12 but are consistent with another Dutch study.13 Finally, in a multicenter study,3 clinical gestalt or “informed intuition” was used to define the pretest probability of pulmonary embolism. This study group had previously used the Wells simple clinical prediction rule14 for assessing pretest probability and found it to be no more discriminatory than the pretest probability determined by an overall assessment of the clinical signs and symptoms, along with the results of basic investigation.13 A total of 631 patients were assessed by study physicians in 3 trial centers and were assigned to one of the 4 pretest probabilities (0%-20%, 21%-50%, 51%-80%, >81%); patients also had blood drawn for a D-dimer assessment after the clinical probabilities were assigned (Tinaquant D-Dimer; Roche Diagnostics). Patients with a low-pretest probability for pulmonary embolism and a negative D-dimer test result were discharged without anticoagulant therapy and were followed up for 3 months. Clinicians were able to reliably discriminate between low-, intermediate-, moderate-, and high-pretest-probability groups, with the prevalence of pulmonary embolism in each of the 4 groups being 8% (95% CI, 5%-14%), 15% (95% CI, 11%-19%), 29% (95% CI, 21%-37%), and 67% (95% CI, 52%-81%). These data compare favorably with studies in which the pretest probability was assessed with a structured clinical model. However, these were experienced clinicians who had extensive and specific training in assessing patients with pulmonary embolism. No patient with a low-probability clinical assessment (0%-20%) and a normal D-dimer result had a pulmonary embolus (LR, 0; 95% CI, 0-0.32). One of the major criticisms of 2 of the structured models (extended12 and simplified14 Wells) is the need to specify or weight the likelihood of an alternative diagnosis apart from pulmonary embolism, introducing a global assessment of the probability of pulmonary embolism, not unlike the gestalt pretest assessment. This variable has been the most problematic in terms of its reliability.5 Therefore, a third model, the Geneva rule, now encompasses this variable, in part by allowing physicians to upgrade or downgrade the pretest probability with an implicit override. Pragmatically, this makes sense and may improve the acceptance of this model among clinicians. 573

CHAPTER 43

Update

There has been considerable controversy with respect to the association between risk of venous thromboembolism and airline travel. Well-designed case-control studies suggest an odds ratio of 2 for the association.6 Studies in selected patients6-9 suggest the absolute risk for venous thromboembolism increases with duration of travel, the presence of thrombophilia, and use of estrogen-containing therapy. There are conflicting data on the true incidence of venous thrombosis within the traveling public, with estimates ranging from as low as 1.6 events per million passengers to as high as 10% for asymptomatic, ultrasonographically detected, calf vein thrombosis.5-9

EVIDENCE FROM GUIDELINES Recent guidelines from the British Thoracic Society support assessing and formally documenting the pretest probability for pulmonary embolism, but they do not specifically advocate a structured model approach over the clinical gestalt.15 The guideline reiterates the importance of a establishing the pretest probability before reviewing the results of a ventilation perfusion lung scan or the results of a D-dimer test. This process can identify low-risk patients who do not need further testing.

CLINICAL SCENARIO—RESOLUTION A 25-year-old woman presents to the emergency department with pleuritic chest pain, having just returned home after a 12-hour plane flight. Your examination confirms the coryza and absence of tachypnea (16/min). The remainder of the examination results, including that for the legs, are unremarkable. A pregnancy test result is normal, as is a chest radiograph. Results of an arterial blood gas analysis do not show hypoxia, and an electrocardiogram result is normal. This woman poses a challenge to the assessment of the pretest probability for pulmonary embolism, largely because of the uncertainty of the magnitude of the risk for venous thrombosis after airplane travel. According to the simplified Wells model, her pretest probability for pulmonary embolism is low (absence of tachycardia, active cancer, and signs of deep vein thrombosis; no history of venous thrombosis; and no hemoptysis). The overall clinical assessment suggests that the young woman is more likely to have viral pleurisy. A low pretest probability (3%-5%), combined with a positive D-dimer result (MDA D-Dimer; BioMerieux, Inc, Durham, North Carolina) (positive LR, 1.716), places her posttest probability for pulmonary embolism at 8%. A ventilation perfusion lung scan result is normal (LR for pulmonary embolism with a normal lung scan result, 0.1).17 Her posttest probability of pulmonary embolism is less than 2%. If you had chosen a pretest “low” probability of as much as 15%, the posttest probability would still be low, at 2.9%, after the positive D-dimer result and normal ventilation-perfusion scan result. See next page for the “Make the Diagnosis” section. 574

REFERENCES FOR THE UPDATE 1. Perrier A, Roy PM, Aujesky D, et al. Diagnosing pulmonary embolism in outpatients with clinical assessment, D-dimer measurement, venous ultrasound and helical computed tomography: a multicenter management study. Am J Med. 2004;116(5):291-299.a 2. Miniati M, Monti S, Bauleo C, et al. A diagnostic strategy for pulmonary embolism based on standardised pretest probability and perfusion lung scanning: a management study. Eur J Nucl Med Mol Imaging. 2003;30(11):14501456.a 3. Ten Wolde M, Hagen PJ, Macgillavry MR, et al. Non-invasive diagnostic work-up of patients with clinically suspected pulmonary embolism: results of a management study. J Thromb Haemost. 2004;2(7):1110-1117.a 4. Leclerq MGL, Lutisan JG, van Marwijk M, et al. Ruling out clinically suspected pulmonary embolism by assessment of clinical probability and Ddimer levels: a management study. J Thromb Haemost. 2003;89(1):97-103.a 5. Hughes RJ, Hopkins RJ, Hill S, et al. Frequency of venous thromboembolism in low to moderate risk long distance air travellers: the New Zealand Air Travellers Thrombosis Study (NZATT). Lancet. 2003;362(9401):239-244. 6. Cook DJ. Clinical assessment of central venous pressure in the critically ill. Am J Med Sci. 1990;299(3):175-178. 7. Wolf SJ, McCubbin TR, Feldhaus KM, Faragher JP, Adcock DM. Prospective validation of Wells criteria in the evaluation of patients with suspected pulmonary embolism. Ann Emerg Med. 2004;44(5):503-510. 8. Righini M, le Gal G, Perrier A, Bounameaux H. Effect of age on the assessment of clinical probability of pulmonary embolism by prediction rules. J Thromb Haemost. 2004;2(7):1206-1208. 9. Stijen MJL, De Monye W, Schiereck J, et al. Single-detector helical computed tomography as the primary diagnostic test in suspected pulmonary embolism: a multi-centre clinical management study of 510 patients. Ann Intern Med. 2003;138(4):307-314. 10. Stein PD, Hull RD, Patel K, et al. D-dimer for the exclusion of acute venous thrombosis and pulmonary embolism: a systematic review. Ann Intern Med. 2004;140(8):589-602. 11. Wicki J, Perneger TV, Junod AF, Bounameaux H, Perrier A. Assessing the clinical probability of pulmonary embolism in the emergency ward. Arch Intern Med. 2001;161(1):92-97. 12. Wells PS, Ginsberg JS, Anderson DR, et al. Use of a clinical model for safe management of patients with suspected pulmonary embolism. Ann Intern Med. 1998;129(12):997-1005. 13. Sanson BJ, Lijmer LG, Macgillavry M, Turkstra F, Prins MH, Buller HR; ANTELOPE Study Group. Comparison of a clinical pretest probability estimate and two clinical models in patients with suspected pulmonary embolism. J Thromb Haemost. 2000;83(2):199-203. 14. Wells PS, Anderson DR, Rodger M, et al. Derivation of a simple clinical model to categorise patients with a probability of pulmonary embolism: increasing the models utility with SimpliRED D-dimer. J Thromb Haemost. 2000;83(3):416-420. 15. British Thoracic Society Standards of Care Committee Pulmonary Embolism Guideline Development Group. British Thoracic Society guidelines for the management of suspected acute pulmonary embolism. Thorax. 2003;58(6):470-483. 16. Bates SM, Kearon CJ, Crowther M, et al. A diagnostic strategy involving a quantitative latex D-dimer assay reliably excludes deep vein thrombosis. Ann Intern Med. 2003;138(10):787-794. 17. Jaeschke R, Guyatt GH, Sackett DL; Evidence Based Medicine Working Group. User’s guide to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients? JAMA. 1994;271(9):703-707. 18. Cushman M, Tsai AW, White RH, et al. Deep vein thrombosis and pulmonary embolism in two cohorts: the longitudinal investigation of thromboembolism etiology. Am J Med. 2004;117(1):19-25. 19. Chunilal SD, Eikelboom JW, Attia J, et al. The rational clinical examination: does this patient have pulmonary embolism? JAMA. 2003;290(21):28492858. 20. Bates SM, Ginsberg JS. Treatment of deep vein thrombosis. N Engl J Med. 2004;351(3):268-277. 21. Tapson V, Carrol B, Davidson BL, et al. The diagnostic approach to acute venous thromboembolism: clinical practice guideline. Am J Respir Crit Care Med. 1999;160(3):1043-1066. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

CHAPTER 43

Pulmonary Embolus

PULMONARY EMBOLUS—MAKE THE DIAGNOSIS

PRIOR PROBABILITY Venous thrombosis occurs in 1 to 2 persons per 1000 person-years, with approximately one-half to one-third of these episodes from pulmonary embolism.18 In published studies, the prevalence of pulmonary embolism in patients who present with a clinical suspicion ranges from 9% to more than 30%,19 which undoubtedly relates to a combination of factors, including differences in referral patterns and health practices among countries, as well as differences in patient populations. The prior probability of a pulmonary embolus is determined from the clinical findings. Although studies vary in the prevalence of disease, a useful guideline would be to think of “low probability” as approximately less than 15% and “moderate probability” as 15% to 35%.

POPULATION FOR WHOM PULMONARY EMBOLUS SHOULD BE CONSIDERED Patients who have had recent major surgery, major trauma, immobility, or active malignancy are some of the highestrisk groups within the general population, with relative risks varying from 5 to 200.20 The most common presenting symptoms of pulmonary embolism are new or worsening dyspnea, acute chest pain, and, less frequently, cough, fainting, or hemoptysis. Tachypnea and tachycardia, the most common signs of pulmonary embolism, occur frequently with exacerbations of chronic obstructive lung disease, congestive cardiac failure, and pneumonia, which highlights the poor specificity of these signs.21

DETECTING THE LIKELIHOOD OF PULMONARY EMBOLUS Use a structured model to assess the pretest probability of pulmonary emboli. The simplified Wells scoring system may be the easiest to use in clinical practice, shows good reliability, and requires no laboratory tests or radiographs (see Table 43-12). Establishing the pretest probability before, and not after, reviewing the results of a sensitive D-dimer test will identify patients at very low risk for pulmonary emboli (see Table 43-13). When there is discordance between clinician gestalt and a clinical prediction rule, most experts would place the patient

Table 43-12 Simplified Wells Scoring System Score a

Findings in the Simplified Wells Scoring System Clinical signs/symptoms of DVT of the leg (minimum of leg swelling and pain with palpation of the deep veins) No alternate diagnosis that is as likely as or more likely than a pulmonary embolus Heart rate > 100/min Immobilization or surgery in the last 4 weeks History of DVT or PE Hemoptysis Cancer actively treated in the past 6 mo

3.0 3.0 1.5 1.5 1.5 1.0 1.0

Abbreviations: DVT, deep vein thrombosis; PE, pulmonary embolism. aCategory scores determined by the sum of the individual scores: low, 6. Adapted from Chunilal et al.19

Table 43-13 The Likelihood Ratios for Pulmonary Embolus for the Combination of Clinical Probability Estimate With the D-dimer Result Clinical Probability

D-dimer

LR (95% CI)

Any probability (2 studies) Abnormal 1.7 (1.6-1.8) Low ( 63 y Thrombophlebitis (ever) Dyspnea (sudden onset) Chest pain Hemoptysis ECG signs of acute right ventricular overload Radiographic findings Oligemia Amputation of the hilar artery Consolidation (infarction)

Decreased the Likelihood of PE Preexisting cardiovascular disease Preexisting pulmonary disease Temperature > 38°C

Pulmonary Embolus

the probability. These results are most helpful for identifying the clinical findings that are independently useful. Reviewed by Sanjeev Chunilal, MB ChB, FRACP, FRCPA

REFERENCES FOR THE EVIDENCE 1. Miniati M, Monti S, Bottai M. A structured clinical model for predicting the probability of pulmonary embolism. Am J Med. 2003;114(3):173-179. 2. Miniati M, Pistolesi M, Marini C, et al. Value of perfusion lung scan in the diagnosis of pulmonary embolism: results of the Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis (PISA-PED). Am J Respir Crit Care Med. 1996;154(5):1387-1393.

Radiographic findings Consolidation (no infarction) Pulmonary edema

Probability From Model of PE (Calculated From the Regression Model)a

LR (95% CI)

High (>90%) Moderately high (50% to ≤90%) Intermediate (10% to ≤50%) Low (≤10%)

297 (16-4746) 8.6 (3.4-22) 0.48 (0.31-0.74) 0.04 (0.02-0.11)

Abbreviations: CI, confidence interval; ECG, electrocardiogram; LR, likelihood ratio; PE, pulmonary embolism. aData kindly provided by Massimo Miniati, MD, PhD.

[10%]). Only 1 patient had a pulmonary embolus not detected during the initial evaluation. The regression model had a large number of demographics, risk factors, symptoms, signs, and electrocardiographic and radiographic findings (see Table 43-17).

CONCLUSIONS LEVEL OF EVIDENCE Level 3.

TITLE Diagnosing Pulmonary Embolism in Outpatients With Clinical Assessment, D-dimer Measurement, Venous Ultrasound, and Helical Computer Tomography: A Multicentre Management Study. AUTHORS Perrier A, Roy, PM, Aujesky D, et al. CITATION Am J Med. 2004:116(5):291-299. QUESTION What is the efficiency of a diagnostic strategy for venous thromboembolism that combines clinical assessment, plasma D-dimer, lower limb venous ultrasonography, and helical computed tomography (CT)? DESIGN A prospective cohort study. SETTING Two Swiss hospitals and 1 French hospital. PATIENTS Patients presenting to the emergency department with suspected pulmonary embolism (PE) were prospectively enrolled, using predefined criteria: acute onset of new or worsening shortness of breath or chest pain without another obvious etiology. Nine hundred sixty-five patients were enrolled from October 1, 2000, to June 30, 2002.

STRENGTHS Strict adherence to the diagnostic criteria to

objectively exclude or prove the diagnosis of pulmonary embolism, as well as a 1-year follow-up for those patients in whom the initial evaluation result was negative. The clinical model and the perfusion scans were interpreted independently. LIMITATIONS This is a single-center study in which the

clinical prediction rule was applied by one of 12 highly specialized observers. Therefore, the generalizability of this model to other centers and observers remains to be proven. Methodologically, there was incorporation bias in which the results of the model were used as part of the reference standard. However, these criteria were specified in advance and patients who did not meet the criteria required angiography. Clinicians collect clinical data that, when incorporated into a prediction model, is useful in stratifying patients’ likelihood of a pulmonary embolus. The prediction model requires entry of the data into either a spreadsheet or handheld calculator to derive

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Physicians assessed the clinical pretest probability for PE by collating the results of a standardized clinical scoring system.1 The scoring system categorized patients as “low,” “intermediate,” or “high” probability. When physicians disagreed with the results of the prediction rule, they could “override” this by recording their implicit clinical judgment. The probability estimate was recorded with knowledge of the arterial blood gas and chest radiograph results but no other laboratory or radiographic study results. The diagnostic standard for excluding PE consisted of a negative D-dimer result (enzyme-linked immunosorbent assay), a negative helical CT scan result, and compression ultrasonography of the legs when combined with a low or moderate clinical probability or a negative pulmonary angiography result and no subsequent recurrent venous thrombosis during a 3-month follow-up. Pulmonary embolism was diagnosed according to a E43-3

CHAPTER 43

Evidence to Support the Update

positive helical CT scan result, positive result for compression ultrasonography of the legs (proximal deep vein thrombosis), or a positive pulmonary angiogram result.

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS A prospective, multicenter, multinational

study with a large number of patients.

MAIN OUTCOME MEASURES

LIMITATIONS Twenty

The proportion of patients in whom a definitive diagnosis of PE could be made without the need for pulmonary angiography and the risk of venous thromboembolism in patients who had anticoagulants withheld because the strategy excluded PEs. We calculated likelihood ratios from data provided in the tables (see Tables 43-18, 43-19, and 43-20).

MAIN RESULTS Pulmonary embolism was diagnosed in 222 of 965 (23%) patients, with only 2.7% of patients requiring pulmonary angiography for a definitive diagnosis. A total of 194 (20%) patients did not have the standardized scoring system applied to assess pretest probability because of incomplete data. For these patients, physicians assigned an implicit pretest probability assessment. On the other hand, there was disagreement between the standardized pretest probability assessment and physicians’ implicit judgment in 179 patients (23%), with 70% of these instances requiring upgrading of the clinical score. The likelihood ratios for the clinical score alone (Table 43-18), the D-dimer result alone (Table 43-19), and the combination of the clinical score with the D-dimer result (Table 43-20) can be calculated. Table 43-18 Likelihood Ratios for the Probability of Pulmonary Emboli According to a Clinical Score Probability, All Patients High Moderate Low

LR (95% CI) 19 (10-36) 1.7 (1.5-2.0) 0.23 (0.17-0.31)

Abbreviations: CI, confidence interval; LR, likelihood ratio.

Reviewed by Sanjeev Chunilal, MB ChB, FRACP, FRCPA

Table 43-19 Likelihood Ratio of the D-dimer Result for Pulmonary Emboli

REFERENCE FOR THE EVIDENCE

All Patients

LR (95% CI)

D-dimer result positive D-dimer result negative

1.7 (1.5-1.8) 0 (0-0.08)

Abbreviations: CI, confidence interval; LR, likelihood ratio.

Table 43-20 Likelihood Ratio of the D-dimer Result Among Patients With a Moderate or Low Clinical Probability Estimate for Pulmonary Emboli Moderate or Low Clinical Probability

LR (95% CI)

D-dimer result positive D-dimer result negative

1.6 (1.5-1.7) 0 (0-0.13)

Abbreviation: CI, confidence interval; LR, likelihood ratio.

E43-4

percent of potentially eligible patients were excluded according to a number of predefined criteria, though the most frequent reason was a protocol violation. The large number of exclusions makes the patient population nonconsecutive, which is a potentially important limitation if the exclusions were among patients who had a normal D-dimer result. Nearly 40% of study patients did not have a standardized clinical assessment because of the absence of arterial blood gas results or because the standardized score was revised by implicit clinical judgment. According to the presenting feature of acute chest pain or dyspnea, patients with deep vein thrombi confirmed by ultrasonography were assumed to have PE without further studies. The clinical scoring system and diagnostic algorithm used in this study are primarily applicable to outpatients with recent onset of worsening or new symptoms. The standardized scoring system, occasionally overridden by clinical judgment, was good at identifying patients most likely to have a PE. However, the focus of this study was on identifying patients without PE so that additional studies and treatment could be avoided. A normal D-dimer result appears better than the scoring system and clinical judgment. However, because the scoring system and clinical judgment were applied first to identify the eligible patients, the D-dimer should be applied in light of the clinical findings. An intermediate or low probability of PE, combined with a normal D-dimer result, was efficient at identifying patients with a low likelihood of an embolus. Given the prior probability of 22% in this study, taking the upper end of the 95% confidence interval (CI) for intermediate–low probability patients and a normal D-dimer result (upper 95% CI likelihood ratio, 0.13) yields a maximum probability of 3.5%.

1. Wicki J, Perneger TV, Junod A, Bounameaux H, Perrier A. Assessing the clinical probability of pulmonary embolism in the emergency ward with a simple score. Arch Intern Med. 2001;161(1):92-97.

CHAPTER 43

Pulmonary Embolus

TITLE Non-invasive Diagnostic Work-up of Patients With Clinically Suspected Pulmonary Embolism: Results of a Management Study.

Table 43-21 Likelihood Ratio of the D-dimer Result Combined With the Clinical Probability Estimate for Pulmonary Emboli

AUTHORS Ten Wolde M, Hagen PJ, Macgillavry MR, et al; Advances in New Technologies Evaluating the Localization of Pulmonary Embolism Study Group.

Probability > 20% or D-dimer result abnormal Probability < 20% and D-dimer result normal

LR (95% CI) 1.3 (1.2-1.3) 0 (0-0.32)

Abbreviations: CI, confidence interval; LR, likelihood ratio.

CITATION J Throm Haemost. 2004;2(7):1110-1117. QUESTION Does a diagnostic algorithm safely reduce the need for ventilation perfusion lung scintigraphy and pulmonary angiography in patients who have a low clinical pretest probability for pulmonary embolism and a negative D-dimer test result? DESIGN Prospective cohort study. SETTING Three teaching hospitals in The Netherlands. PATIENTS Six hundred thirty-one consecutive inpaients and outpatients enrolled from May 1999 to April 2001.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Patients were all assessed and assigned a clinical pretest probability (≤20%, 21%-50%, 51%-80%, and >80%), which was determined by the responsible physician taking into account the patient’s medical history, findings on physical examination, and the results of routine investigations. All the clinicians were specifically trained to assess patients with suspected pulmonary embolism. All patients had a plasma D-dimer test (rapid immunoturbidimetric assay; Tinaqaunt Roche Diagnostics, Mannheim, Germany) performed after the clinical probability was established. Pulmonary embolism was considered excluded in patients with a low clinical pretest probability combined with a negative D-dimer result, a normal ventilation perfusion lung scan result, or nondiagnostic lung scan with negative serial compression testing result of the lower limbs when these patients remained venous thrombosis free at a 3-month follow-up (not receiving anticoagulants). Patients were diagnosed as having pulmonary embolism according to a high-probability lung scan, positive pulmonary angiography result, or a positive result for compression ultrasonographic examination of the legs. Data collected included patient demographics, patient pretest probability assessment, and D-dimer results, as well as the results of objective tests (ventilation/perfusion lung scan, compression ultrasonographic examination of the legs, and pulmonary angiography).

MAIN OUTCOME MEASURES The primary safety outcome was the incidence of confirmed venous thrombosis in patients who had venous thrombosis initially excluded.

MAIN RESULTS Of 466 patients in whom pulmonary embolism was considered excluded at presentation, 1.3% (95% confidence interval, 0.5%-2.8%) had a subsequent venous thromboembolus. Among the low-pretest-probability group, 95 patients also had a negative D-dimer result, and none of these patients had confirmed recurrence during the subsequent 3 months. A low clinical probability and a normal D-dimer result appeared to rule out pulmonary emboli (Table 43-21). Within the entire cohort, 20% of patients had confirmed pulmonary embolism, with the prevalence of disease increasing along with increasing clinical pretest probability. The corresponding rates of pulmonary embolism for the low-, intermediate-, moderate-, and high-pretest groups were statistically different, at 8%, 15%, 29%, and 67%, respectively, confirming that these experienced clinicians using clinical gestalt were able to accurately categorize patients.

CONCLUSIONS LEVEL OF EVIDENCE Level 2. STRENGTHS Prospective data collection with probability

estimate established before the D-dimer. LIMITATIONS The focus of this study was on the ability of

the D-dimer to rule out pulmonary emboli. In this sense, “rule out” should be interpreted literally because the goal was to determine whether treatment could be withheld for patients with a low probability (65 y; previous surgery of shoulder; interaction with complaints in elbow or neck Zaslav,31 2001

Total No. of Participants Mean Age, (% of y Women)

71 (45)

NA

Active compression test; apprehension test; clunk test; lift-off test; load-and-shift test; posterior stress test; release test; relocation test; resistance test external rotation; test of Speed; sulcus sign Internal rotation resistance strength test

Shoulder surgery after failure of conservative 110 (41)c 44 treatment; positive Neer overhead sign Reference Test Surgery; Prospective Design 45 (31) NA Test of speed Bennett,30 1998 Surgery for shoulder pain 55 (27) 29 Laxity tests under anesthesia in anterior, posterior, Cofield et al,33 Surgery after referral for suspected recurrent 1993 instability inferior, anterior-inferior and posterior-inferior direction 82 (38)d Subluxation or gross dislocation on examination under Gross and 37 Anterior release test anesthesia; abnormal excursion during arthroscopic Distefano,35 examination; Hill-Sachs lesion or Bankart lesion 1997 29 Shoulder pain 268 (NA) NA Active compression test O’Brien et al, 1998 Oliashirazi et Shoulder surgery for unilateral traumatic recurrent 30 (17) 23 Laxity tests under anesthesia in anterior, posterior, al,37 1999 anterior instability inferior, anterior-inferior and posterior-inferior direction Speer et al,19 Shoulder surgery; subtle anterior instability 100 (NA) NA Relocation test apprehension test 1994 Exclusion: treatable/observable rotator cuff lesions; multidirectional instability

Limitationsa

b, d, f, g

e

a, b, c, d, f, g, h a, b, e, f

a, b, e b, d, e

a a, b, c, f

a, b, f, h

a

b

a, b a, b, e a, b, e, f

a, b, c, d, e, f, g, h a, e, f

a,e

Abbreviations: MRI, magnetic resonance imaging; NA, not available; SLAP, superior labrum anterior posterior. a Limitations pertaining to all listed studies: spectrum bias possible, patient on the list for surgery or arthroscopy, and blinding unclear; the reference test might have been interpreted with knowledge of the index test or vice versa. Key to limitations: (a) Selection criteria for waiting list entry not described. (b) Disease progression bias possible; time between index and reference test not described. (c) Partial verification bias; part of the sample did not receive the reference test. (d) Incorporation bias; results of index test are used to establish the final diagnosis. (e) The execution of the reference test was not described, causing problems with study replication. (f) Unclear whether same clinical data (radiography, MRI, or other diagnostic tools) would be available in daily practice. (g) Unclear whether uninterpretable or intermediate test results were reported. (h) Unclear whether all patients entering study were accounted for (withdrawals). Limitations of the studies were determined with the Quality Assessment of Diagnostic Accuracy Studies standardized checklist.27 bAn additional 178 patients retrospectively excluded for various reasons. cFive patients removed for cohort according to physical findings. dAn additional 18 patients retrospectively excluded for dual diagnoses.

584

CHAPTER 44

Shoulder Instability

Table 44-3 Diagnostic Accuracy of Physical Examination for Instability of the Shoulder Index Test and Source

Diagnosis

No. of Shoulders

Sensitivity a

Specificitya

LR+ (95% CI)

LR– (95% CI)

0.23 (0.08-0.69)

Provocation Tests Apprehension test T’Jonck et al,38 2001 Speer et al,19 1994 Pain Apprehension Relocation test T’Jonck et al,38 2001 Speer et al,19 1994 Pain Apprehension Clunk test T’Jonck et al,38 2001 Anterior release test T’Jonck et al,38 2001 Gross and Distefano,35 1997 Load and shift posterior test T’Jonck et al,38 2001 Sulcus sign T’Jonck et al,38 2001 Load and shift anterior test T’Jonck et al,38 2001 Examination under anesthesia Cofield et al,33 1993 Oliashirazi et al,37 1999

Instability Subtle anterior instability

72

0.88 (23/26)

0.50 (23/46)

1.8 (1.3-2.5)

100 100

0.54 0.68

0.44 1.0

…b …

72

0.85 (22/26)

0.87 (40/46)

6.5 (3.0-14)

0.18 (0.07-0.45)

100 100

0.30 0.57

0.58 1.0

… …

… …

Instability

72

0.35 (9/26)

0.98 (45/46)

16 (2.1-119)

0.67 (0.5-0.89)

Instability Occult instability

72 0.85 100 0.92 (34/37) Laxity Tests

0.87 0.89 (40/45)

… 8.3 (3.6-19)

… 0.09 (0.03-0.27)

Instability

72

0 (0/26)

1.0 (46/46)

1.7 (0-83)

0.99 (0.93-1.1)

Instability

72

0.31 (8/26)

0.89 (41/46)

2.8 (1.0-7.7)

0.78 (0.59-1.0)

Instability

72

0.54 (14/26)

0.78 (36/46)

2.5 (1.3-4.8)

0.59 (0.38-0.92)

Instability Anterior instability

55 60

1.0 (25/25) 0.83 (25/30)

0.93 (28/30)c 1.0 (30/30)

13 (3.9-43) 51 (3.2-80)

0.02 (0-0.31) 0.18 (0.08-0.38)

Instability Subtle anterior instability

… …

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a If data of the 2 × 2 table were presented in the study, the sensitivity and specificity calculations are shown in parentheses. b Ellipses indicate data not available. c The healthy contralateral shoulders of the subjects (n = 30) were used as control. Hence, the specificity value and likelihood ratios have been presumably overestimated.

test were missing or unclear in 9 studies.16,19,23,28,29,32,33,35,37 Furthermore, in 16 studies it was unclear whether the examiner of the reference test was blinded for the index test16,18,19,21-23,28-36,38; in 1 study it was evident that the examiner was not blinded.37 These methodologic problems complicate reproduction of study results and may have biased the outcome.

CLINICAL SCENARIO—RESOLUTION Primary care physicians may consider the diagnosis of instability with or without a labral tear for this 24-year-old. The history of trauma at a young age and recurrent shoulder problems associated with a symptom that might have represented an acute dislocation (pop with an excessive stretch) mean that the attending physician may consider clinical tests to assess for instability and labral tears, but diagnostic accuracy would still be uncertain. Because the patient might opt for surgery to prevent recurrent dislocation, the primary care physician might consult an orthopedist to confirm the diagnosis and optimal management strategies for this patient’s case.

THE BOTTOM LINE The available evidence suggests that the relocation test and the anterior release test are best for establishing diagnosis of instability. For labral tears, the biceps loads I and II tests, the pain provocation test of Mimori, and the internal rotation resistance strength test have the best diagnostic performance characteristics (Figure 44-4). However, these results are based on single studies done in groups of selected patients who were evaluated by specialists. Despite the high prevalence of shoulder disorders in the general population, we are uncertain whether the diagnostic value of these tests or combinations thereof will be similar when used in primary care. Nonetheless, an understanding of the tests used in a specialist practice gives primary care physicians the opportunity to focus on physical examination maneuvers that might improve diagnostic skills. Although we recommend that clinicians take a careful history of the mechanism of shoulder injury, the role of the patient’s medical history in diagnosing the presence of instability or labral tears has not been studied. A comparison of relevant historical characteristics of 585

CHAPTER 44

The Rational Clinical Examination

Table 44-4 Diagnostic Accuracy of Physical Examination for Labral Tears Index Test

Diagnosis

16

Guanche and Jones, 2003 Guanche and Jones,16 2003

Labral tears (including SLAP) SLAP lesions

Stetson and Templin,21 2002 O’Brien et al,29 1998 O’Brien et al,29 1998 McFarland et al,22 2002 Guanche and Jones,16 2003 Guanche and Jones,16 2003

Labral tears Labral tears Acromial joint pathology SLAP lesions SLAP lesions Labral tears (including SLAP)

Kibler,34 1995 McFarland et al,22 2002

Superior glenoid labral tear SLAP lesions

Kim et al,32 1999

SLAP lesions

Kim et al,23 2001

SLAP lesions

McFarland et al,22 2002

SLAP lesions

Liu et al,28 1996 Stetson and Templin,21 2002 Guanche and Jones,16 2003 Guanche and Jones,16 2003

Labral tears Labral tears

Zaslav,31 2001

Internal articular derangement

Mimori et al,18 1999

Superior labral tears

Guanche and Jones,16 2003 Guanche and Jones,16 2003

Labral tears (including SLAP) SLAP lesions

Berg and Ciullo,36 1998

SLAP lesions

Guanche and Jones,16 2003 Guanche and Jones,16 2003

Labral tears (including SLAP) SLAP lesions

Bennett,30 1998 Guanche and Jones,16 2003 Guanche and Jones,16 2003

Biceps pathology (including labral tears) Labral tears (including SLAP) SLAP lesions

Guanche and Jones,16 2003 Guanche and Jones,16 2003

Labral tears (including SLAP) SLAP lesions

Labral tears (including SLAP) SLAP lesions

No. of Shoulders

Sensitivity a

Anterior Apprehension 60 0.40 60 0.30 Active Compression (O’Brien Test) 65 0.54 (14/26) 206 1.0 (53/53) 212 1.0 (55/55) 409c 0.47 (18/38) 60 0.54 60 0.63 Anterior Slide 226 0.78 (69/88) 419c 0.07 (3/38) Biceps Load I 74 0.83 (10/12) Biceps Load II 127 0.90 (35/38) Compression Rotation 303c 0.24 (7/29) Crank 62 0.91 (29/32) 65 0.46 (12/26) 60 0.40 60 0.39 Internal Rotation Resistance Strength 110 0.88 (23/26) Pain Provocation Test of Mimori 32 1.0 (22/22) Relocation 60 0.44 60 0.36 SLAP-Prehension 66 0.82 (54/66) Tenderness of Bicipital Groove 60 0.44 60 0.48 Test of Speed 46 0.90 (9/10) 60 60

0.18 0.09 Test of Yergason 60 0.09 60 0.12

Specificity a,b

LR+ (95% CI)b

LR– (95% CI)b

0.87 0.63

… …

… …

0.31 (12/39) 0.98 (150/153) 0.96 (150/157) 0.55 (203/371) 0.47 0.73

0.8 (0.5-1.2) 21 (10-42) 44 (16-123) 1.0 (0.7-1.4) … …

1.5 (0.8-2.8) 0.01 (0-0.16) 0.01 (0-0.16) 0.96 (0.70-1.3) … …

0.92d (125/138) 0.83 (62/381)

8.3 (4.9-14) 0.5 (0.2-1.5)

0.24 (0.16-0.36) 0.99 (1.1-1.2)

0.98 (62/63)

29 (7.3-115)

0.09 (0.01-0.58)

0.96 (85/89)

26 (8.6-80)

0.11 (0.04-0.28)

0.76 (207/274)

1.0 (0.5-2.0)

1.0 (0.81-2.1)

0.93 (28/30) 0.56 (22/39)

14 (3.5-52) 1.1 (0.6-1.9)

0.10 (0.03-0.29) 0.95 (0.61-1.5)

0.73 0.67

… …

… …

0.96 (81/84)

25 (8.1-76)

0.12 (0.04-0.35)

0.90 (9/10)

7.2 (1.6-32)

0.03 (0-0.47)

0.87 0.63

… …

… …







0.40 0.52

… …

… …

0.14 (5/36)

1.1 (0.8-1.3)

0.72 (0.10-5.5)

0.87 0.74

… …

… …

0.93 0.96

… …

… …

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; SLAP, superior labrum anterior posterior. aIf data of the 2 × 2 table were presented in the study, the sensitivity and specificity calculations are shown in parentheses. bEllipses indicate data not available. cThe authors stated in their article that patient numbers for each test were not equal because tests were published at different times (namely, the compression rotation test, 1990; the anterior slide test, 1995; and the active compression test, 1998). dThe healthy contralateral shoulders of the subjects were used as control. Hence, the specificity value and LRs have been presumably overestimated.

586

CHAPTER 44 patients with shoulder complaints, physical examination findings, and noninvasive images (eg, magnetic resonance imaging), along with arthroscopy or surgical results, would greatly enhance the knowledge base of primary care physicians who are first to evaluate shoulder conditions. Author Affiliations at the Time of the Original Publication

The Netherlands Expert Centre for Work-related Musculoskeletal Disorders (Drs Miedema and Kuiper and Ms Luime), Department of General Practice (Drs Verhagen and Koes and Ms Luime), Department of Public Health (Dr Burdorf), and Department of Orthopedics (Dr Verhaar), Erasmus Medical Center, Rotterdam, The Netherlands. Acknowledgment

We thank David L. Simel, MD, MHS, for his critical comments on the manuscript.

REFERENCES 1. Matsen FA, Thomas SC, Rockwood CA. Anterior glenohumeral instability. In: Rockwood CA, Matsen FA, ed. The Shoulder. Volume 1. Philadelphia, PA: WB Saunders Co; 1990:526-622. 2. Pope DP, Croft PR, Pritchard CM, Silman AJ. Prevalence of shoulder pain in the community: the influence of case definition. Ann Rheum Dis. 1997;56(5):308-312. 3. Urwin M, Symmons D, Allison T, et al. Estimating the burden of musculoskeletal disorders in the community. Ann Rheum Dis. 1998;57(11):649-655. 4. Makela M, Heliovaara M, Sainio P, et al. Shoulder joint impairment among Finns aged 30 years or over. Rheumatology. 1999;38(7):656662. 5. Natvig B, Nassioy I. Musculoskeletal complaints in a population: occurrence. Tidsskr Nor Laegeforen. 1994;114(3):323-327. 6. Croft P, Pope D, Silman A; Primary Care Rheumatology Society Shoulder Study Group. The clinical course of shoulder pain. BMJ. 1996;313 (7057):601-602. 7. Van der Windt DA, Koes BW, de Jong BA. Shoulder disorders in general practice. Ann Rheum Dis. 1995;54(12):959-964. 8. Picavet HS, Schouten JS. Musculoskeletal pain in the Netherlands. Pain. 2003;102(1-2);167-178. 9. Macfarlane GJ, Hunt IM, Silman AJ. Predictors of chronic shoulder pain: a population based prospective study. J Rheumatol. 1998;25(8):1612-1615. 10. Pollock RG, Flatow EL. Classification and evaluation. In: Bigliami LU, ed. The Unstable Shoulder. Rosemont, IL: American Academy of Orthopaedic Surgeons; 1996:25-36. 11. Warner JJ. Overview: avoiding pitfalls and managing complications and failures of instability surgery. In: Warner JJ, Iannotti JP, Gerber C, eds. Complex and Revision Problems in Shoulder Surgery. Philadelphia, PA: Lippincott-Raven Publishers; 1997:3-8. 12. Gerber C. Observations on the classification of instability. In: Warner JJ, Iannotti JP, Gerber C, eds. Complex and Revision Problems in Shoulder Surgery. Philadelphia, PA: Lippincott-Raven Publishers; 1997:9-18. 13. Hovelius L. Incidence of shoulder dislocation in Sweden. Clin Orthop. 1982;(166):127-131. 14. Snyder SJ. Labral lesions (non-instability) and SLAP lesions. In: Snyder SJ. Shoulder Arthroscopy. New York, NY: McGraw-Hill; 1994:115-131. 15. Snyder SJ, Karzel RP, Del Pizzo W, Ferkel RD, Friedman MJ. SLAP lesions of the shoulder. Arthroscopy. 1990;6(4):274-279. 16. Guanche CA, Jones DC. Clinical testing for tears of the glenoid labrum. Arthroscopy. 2003;19(5):517-523. 17. Van der Helm FC. A finite element musculoskeletal model of the shoulder mechanism. J Biomech. 1994;27(5):551-569. 18. Mimori K, Muneta T, Nakagawa T, Shinomiya K. A new pain provocation test for superior labral tears of the shoulder. Am J Sports Med. 1999;27(2):137-142. 19. Speer KP, Hannafin JA, Altchek DW, Warren RF. An evaluation of the shoulder relocation test. Am J Sports Med. 1994;22(2):177-183.

Shoulder Instability

20. McFarland EG, Torpey BM, Curl LA. Evaluation of shoulder laxity. Sports Med. 1996;22(4):264-272. 21. Stetson WB, Templin K. The Crank test, the O’Brien test, and routine magnetic resonance imaging scans in the diagnosis of labral tears. Am J Sports Med. 2002;30(6):806-809. 22. McFarland EG, Kim TK, Savino RM. Clinical assessment of three common tests for superior labral anterior-posterior lesions. Am J Sports Med. 2002;30(6):810-815. 23. Kim SH, Ha KI, Ahn JH, Kim SH, Choi HJ. Biceps load test II. Arthroscopy. 2001;17(2):160-164. 24. Deville WL, Buntinx F, Bouter LM, et al. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol. 2002;2:9. 25. Deville WL, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals. J Clin Epidemiol. 2000;53(1):6569. 26. Dinnes J, Loveman E, McIntyre L, Waugh N. The effectiveness of diagnostic tests for the assessment of shoulder pain due to soft tissue disorders: a systematic review. Health Technol Assess. 2003;7(29):1-166. 27. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3:25. 28. Liu SH, Henry MH, Nuccion SL. A prospective evaluation of a new physical examination in predicting glenoid labral tears. Am J Sports Med. 1996;24(6):721-725. 29. O’Brien SJ, Pagnani MJ, Fealy S, McGlynn SR, Wilson JB. The active compression test. Am J Sports Med. 1998;26(5):610-613. 30. Bennett WF. Specificity of the Speed’s test. Arthroscopy. 1998;14(8):789796. 31. Zaslav KR. Internal rotation resistance strength test: a new diagnostic test to differentiate intra-articular pathology from outlet (Neer) impingement syndrome in the shoulder. J Shoulder Elbow Surg. 2001;10(1):23-27. 32. Kim SH, Ha KI, Han KY. Biceps load test. Am J Sports Med. 1999;27(3):300303. 33. Cofield RH, Nessler JP, Weinstabl R. Diagnosis of shoulder instability by examination under anesthesia. Clin Orthop. 1993;(291):45-53. 34. Kibler WB. Specificity and sensitivity of the anterior slide test in throwing athletes with superior glenoid labral tears. Arthroscopy. 1995;11(3):296-300. 35. Gross ML, Distefano MC. Anterior release test. Clin Orthop. 1997;(339):105108. 36. Berg EE, Ciullo JV. A clinical test for superior glenoid labral or “SLAP” lesions. Clin J Sport Med. 1998;8(2):121-123. 37. Oliashirazi A, Mansat P, Cofield RH, Rowland CM. Examination under anesthesia for evaluation of anterior shoulder instability. Am J Sports Med. 1999;27(4):464-468. 38. T’Jonck L, Staes F, Smet L, Lysens R. The relationship between clinical shoulder tests and the findings in arthroscopic examination. Geneeskunde Sport. 2001;34:15-24. 39. Lyons AR, Tomlinson JE. Clinical diagnosis of tears of the rotator cuff. J Bone Joint Surg Br. 1992;74(3):414-415. 40. Leroux JL, Thomas E, Bonnel F, Blotman F. Diagnostic value of clinical tests for shoulder impingement syndrome. Rev Rheum Engl Ed. 1995;62(6):423428. 41. Ure BM, Tiling T, Kirchner R, Rixen D. Zuverlassigkeit der klinischen Untersuchung der Schulter im Vergleich zur Arthroskopie: eine prospektive Studie. Unfallchirurg. 1993;96(7):382-386. 42. Hertel R, Ballmer FT, Lombert SM, Gerber C. Lag signs in the diagnosis of rotator cuff rupture. J Shoulder Elbow Surg. 1996;5(4):307-313. 43. Litaker D, Pioro M, El Bilbeisi H, Brems J. Returning to the bedside. J Am Geriatr Soc. 2000;48(12):1633-1637. 44. MacDonald PB, Clark P, Sutherland K. An analysis of the diagnostic accuracy of the Hawkins and Neer subacromial impingement signs. J Shoulder Elbow Surg. 2000;9(4):299-301. 45. Walch G, Boulahia A, Calderone S, Robinson AH. The “dropping” and “hornblower’s” signs in evaluation of rotator-cuff tears. J Bone Joint Surg Br. 1998;80(4):624-628. 46. Calis M, Akgun K, Birtane M, Karacan I, Calis H, Tuzun F. Diagnostic values of clinical diagnostic tests in subacromial impingement syndrome. Ann Rheum Dis. 2000;59(1):44-47. 47. Read JW, Perko M. Shoulder ultrasound. J Shoulder Elbow Surg. 1998;7(3): 264-271.

587

CHAPTER 44

The Rational Clinical Examination

48. Kaneko K, De Mouy EH, Brunet ME. Massive rotator cuff tears. Clin Imaging. 1995;19(1):8-11. 49. Rahme H, Solem-Bertoft E, Westerberg CE, et al. The subacromial impingement syndrome. Scand J Rehabil Med. 1998;30(4):253-262. 50. McFarland EG, Neira CA, Gutierrez MI, Cosgarea AJ, Magee M. Clinical significance of the arthroscopic drive-through sign in shoulder surgery. Arthroscopy. 2001;17(1):38-43. 51. Adolfsson L, Lysholm J. Arthroscopy and stability testing for anterior shoulder instability. Arthroscopy. 1989;5(4):315-320. 52. Nelson MC, Leather GP, Nirschl RP, Pettrone FA, Freedman MT. Evaluation of the painful shoulder. J Bone Joint Surg Am. 1991;73(5):707716.

588

53. Westerberg CE, Solem-Bertoft E, Lundh I. The reliability of three active motor tests used in painful shoulder disorders. Scand J Rehabil Med. 1996;28(2):63-70. 54. Nove-Josserand L, Levigne C, Noel E, Walch G. Isolated lesions of the subscapularis muscle: a propos of 21 cases [in French]. Rev Chir Orthop Reparatrice Appar Mot. 1994;80(7):595-601. 55. Gazielly DF, Gleyze P, Montagnon C, Bruyere G, Prallet B. Functional and anatomical results after surgical treatment of ruptures of the rotator cuff [in French]. Rev Chir Orthop Reparatrice Appar Mot. 1995;81(1):8-16. 56. Lerat JL, Chotel F, Besse JL, Moyen B, Brunet Guedj E. Dynamic anterior jerk of the shoulder [in French]. Rev Chir Orthop Reparatrice Appar Mot. 1994;80(6):461-467.

U P D A T E : Shoulder Instability

44

Prepared by Catherine P. Kaminetzky, MD, MPH Reviewed by David L. Simel, MD, MHS, and Jolanda J. Luime, PhD

CLINICAL SCENARIO A 24-year-old man with shoulder pain had a shoulder injury when he was 16 years old. For the last 3 years, he experienced sudden right shoulder discomfort and felt a pop every time he tried to throw a baseball with excessive force. However, the discomfort always resolved on its own. He has started to play tennis, and shoulder pain is affecting his performance. Inspection and palpation of the shoulder reveals no abnormalities. He has no neck discomfort or limitation in neck range of motion. Although he has full range of external and internal rotation of the shoulders, the right shoulder causes some discomfort throughout the arc of motion. You decide to assess for instability of the shoulder.

UPDATED SUMMARY ON SHOULDER INSTABILITY AND LABRAL TEARS Original Review Luime JJ, Verhagen AP, Miedema HS, et al. Does this patient have an instability of the shoulder or a labrum lesion? JAMA. 2004;292(16):1989-1999.

UPDATED LITERATURE SEARCH Our literature search replicated that of the original article, confined to 2004 to April 2006. We identified 89 potential articles and reviewed the abstracts to find articles that included consecutive, prospectively identified patients whose shoulder problems were suspicious for instability or a labral tear and who were assessed by arthroscopy or surgery. No new studies describe the sensitivity and specificity of findings for instability or labral tear symptoms and signs. One study described the precision of various maneuvers for anterior instability.

NEW FINDINGS • The recommended tests for shoulder instability, the relocation and anterior release tests, may also be the most reproducible.

The reliability improves when apprehension during the maneuver, rather than pain, is used to judge the results as positive vs negative.

Details of the Update Four members of an orthopedic shoulder clinic team prospectively examined patients referred with shoulder symptoms and a medical history suggestive of instability.1 Each patient had to be able to endure examinations by each member of the team, resulting in 13 of 25 potentially eligible patients undergoing the complete examinations. The final diagnoses were not reported, but the intraclass correlations were reported for 2 laxity tests (load and shift) and 4 provocation tests (apprehension, relocation, augmentation, and release tests). For the laxity tests, the results were reported on an ordinal scale, and for the provocation tests the results were considered “positive” or “negative” according to a response of patient apprehension or pain. The load and shift tests had good reproducibility for motions in the anterior and inferior direction but not the posterior direction. For provocation tests, the assessment of an apprehensive response to each maneuver was more reproducible than the assessment of a response of pain. Among the 4 tests, the relocation test to assess apprehension (intraclass correlation, 0.71) and the release test to assess apprehension (intraclass correlation, 0.63) were the most reproducible. A study by Holtby and Razmjou2 evaluated a large number of patients referred for shoulder problems (n = 152), of whom 50 patients had their disease status confirmed by arthroscopy.2 The 2 tests of interest were Speed test and Yergason test, both initially described as tests for bicipital tendonitis. The verification bias and the categorization of disease (any biceps tendon lesion or a superior labral anterior posterior lesion) prohibited assessment of isolated labral tears, but the positive likelihood ratio (LR+) and negative likelihood ratio (LR–) for Speed and Yergason tests had confidence intervals (CIs) that crossed 1. If the data had been corrected for verification bias, the likelihood ratio for a positive Yergason test result (LR+, 2.0; 95% CI, 0.86-4.7) might have appeared more promising. A systematic review of the incidence and prevalence of shoulder discomfort in the general population provides a context for assessing the likelihood that a patient will have shoulder instability or a labral lesion.3 The annual incidence of shoulder discomfort is 0.9% to 2.5%. However, shoulder discomfort does not 589

CHAPTER 44

Update

immediately resolve, so prevalence rates are much higher. At any given time, shoulder discomfort is present in 6.9% to 26% of the general population.

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION None.

CHANGES IN THE REFERENCE STANDARD None.

RESULTS OF LITERATURE REVIEW Precision of Tests for Instability or Labral Tears The sulcus sign and load and shift laxity tests have similar reproducibility (Table 44-5). Assessing a patient’s apprehension to maneuvers has greater reliability than assessing his or her pain (Table 44-6). Table 44-5 Laxity Maneuvers Intraclass Correlation Coefficient

Tests Sulcus sign Load and shift (at 0-, 20-, and 90-degree arm positions) Anterior direction Posterior direction Inferior direction

0.53-0.72 0.42-0.68 0.65-0.79

Intraclass Correlation Coefficient

Apprehensive Response Apprehension test Relocation Augmentation Release

0.47 0.71 0.48 0.63 Pain Response

Apprehension test Relocation Augmentation Release

0.31 0.31 0.09 0.31 Pain or Apprehensive Response Apprehension test 0.44 Relocation 0.44 Augmentation 0.33 Release 0.45

590

No governmental guidelines address the evaluation of patients for shoulder instability.

CLINICAL SCENARIO—RESOLUTION Shoulder instability, with or without a labral tear, is a diagnostic consideration for this patient with a history of a shoulder injury. The popping sensation is suggestive of instability, but the physical examination maneuvers are more important. The apprehension maneuver should be performed, followed by the relocation test and anterior release tests. The assessment of an apprehensive response to the relocation and anterior release tests is the most reliable provocation test. A positive response increases the likelihood of instability approximately 6 to 8 times, whereas negative responses decrease the likelihood by approximately 0.1 to 0.20 times. Labral tears are assessed through the biceps load tests I and II. These tests differ only by the position of the arm (abduction at 90 degrees for biceps load I and at 120 degrees for biceps load II). An increase in pain on the biceps load tests increases the likelihood of a labral tear by 26 to 29 times, whereas the lack of increased pain decreases the likelihood 0.09 to 0.11 times.

0.60

Table 44-6 Provocation Maneuvers Response to Maneuvers

EVIDENCE FROM GUIDELINES

REFERENCES FOR THE UPDATE 1. Tzannes A, Paxinos A, Callanan M, Murrell GAC. An assessment of the interexaminer reliability of tests for shoulder instability. J Shoulder Elbow Surg. 2004;13(1):18-23. 2. Holtby R, Razmjou H. Accuracy of the Speed’s and Yergason’s tests in detecting biceps pathology and SLAP lesions: comparison with arthroscopic findings. Arthroscopy. 2004;20(3):231-236. 3. Luime JJ, Koes BW, Hendriksen IJ, et al. Prevalence and incidence of shoulder pain in the general population: a systematic review. Scand J Rheumatol. 2004;33(2):73-81.

CHAPTER 44

Shoulder Instability

SHOULDER INSTABILITY—MAKE THE DIAGNOSIS

PRIOR PROBABILITY There are no adequate data for assessing the prevalence of these conditions among patients with shoulder discomfort because the existing data come only from patients undergoing surgery or arthroscopy. The incidence of shoulder discomfort is 0.9% to 2.5%. However, because shoulder pain can be chronic, the prevalence at a single point in time is 6.9% to 26%.

POPULATION FOR WHOM SHOULDER INSTABILITY OR LABRAL TEARS SHOULD BE CONSIDERED Patients with shoulder pain should be screened for shoulder instability and labral tears. The annual incidence of shoulder dislocation in the general population may be as high as 1.7%. There are no data for the incidence or prevalence of labral tears.

DETECTING THE LIKELIHOOD OF SHOULDER INSTABILITY OR A LABRAL TEAR

Table 44-7 Likelihood Ratios for Tests of Shoulder Instability or a Labral Tear a Test

Anterior release test Relocation test Biceps load I Biceps load II

LR+ (95% CI) Shoulder Instability 8.3 (3.6-19) 6.5 (3.0-14) Labral Tear 29 (7.3-115) 26 (8.6-80)

LR– (95% CI)

0.09 (0.03-0.27) 0.18 (0.07-0.45) 0.09 (0.01-0.58) 0.11 (0.04-0.28)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aThese data come from examinations done by orthopedists and not generalist physicians.

REFERENCE STANDARD TESTS Arthroscopy or surgery.

The anterior release and relocation tests have the best measurement properties for shoulder instability (Table 44-7 and Figure 44-3). The assessment of apprehension will be more reliable than the assessment of pain for these maneuvers. The biceps load tests should be performed to assess for labral tears (Table 44-7 and Figure 44-3).

591

This page intentionally left blank

45

C H A P T E R

Does This Patient Have Sinusitis? John W. Williams, Jr, MD, MHS David L. Simel, MD, MHS

CLINICAL SCENARIO A patient presents to your office with a “bad cold.” Her symptoms began 5 days ago, when a runny nose, a scratchy throat, generalized malaise, and a nonproductive cough developed. Her symptoms are gradually improving with an over-the-counter cough medicine, but during the past 24 hours a “sinus headache” has developed. The patient is concerned that she may have “sinus.” It is the middle of cold and flu season, and this is the fifth patient you have treated today who has upper respiratory tract symptoms.

WHY IS THIS AN IMPORTANT QUESTION TO ANSWER WITH A CLINICAL EXAMINATION? The patient’s story is familiar to primary care clinicians. Among the most frequent diagnoses made by primary care practitioners are nasal problems such as allergic and infectious rhinitis, vasomotor rhinitis, and bacterial sinusitis.1 Given the constant assault of allergens, environmental pollutants, respiratory viruses, and rapid temperature changes, it is not surprising that nasal complaints are so common. However, not all “sinus” is sinusitis. Sinusitis can be defined simply as inflammation of one or more paranasal sinuses but usually refers to infection of the sinuses. In recent years, many new medications have become available that allow effective medical treatment of sinus problems so that it is important to diagnose nasal complaints accurately to deliver appropriate treatment.2 When this can be accomplished by the clinical examination, it obviates the need for more expensive testing such as radiography. The list of differential diagnoses for patients with nasal congestion or discharge is long (Table 45-1), but a handful of conditions encompass the majority of cases.3 These conditions can be divided into those causing inflammation of the nose (rhinitis) and those causing inflammation of the sinuses (sinusitis). Rhinitis is most frequently due to viral infection, allergens (seasonal or perennial), or vasomotor instability (eg, caused by extreme temperature change or excessive use of vasoconstrictive medications). When these conditions are severe, the sinus ostia may become blocked and the sinuses infected secondarily. However, the implications of diagnosing rhinitis are different from diagnosing sinusitis. Rhinitis may respond to antihistamines, nasal decongestants, nasal steroids, or cromolyn sodium, but randomized trials have shown that sinusitis requires antibiotics for rapid resolution.4,5 Sinusitis also occurs as an occult illness that may be associated with asthmatic exacerbations or chronic headache. This overview will focus on the medical history and physical examination findings that distinguish bacterial sinusitis from rhinitis and other conditions.

Copyright © 2009 by the American Medical Association. Click here for terms of use.

593

CHAPTER 45

The Rational Clinical Examination

Table 45-1 Differential Diagnosis of Nasal Congestion/Rhinorrhea Allergic Seasonal allergic rhinitis (pollens)a Perennial allergic rhinitis (dusts, molds)a Vasomotor Idiopathic (vasomotor rhinitis)a Abuse of nose drops (rhinitis medicamentosa)a Drugs (reserpine, guanethidine, prazosin, cocaine abuse) Psychological stimulation (anger, sexual arousal) Mechanical Polyps Tumor Deviated septum Crusting (as in atrophic rhinitis) Hypertrophied turbinates (chronic vasomotor rhinitis) Foreign body Central nervous system fluid leak Chronic inflammatory Sarcoidosis Wegener granulomatosis Midline granuloma Infectious Acute viral infectiona Acute or chronic bacterial infection of paranasal sinuses a Atrophic rhinitis (secondary infection) Hormonal Pregnancy Hypothyroidism a

Most common causes of nasal symptoms.

Frontal sinus Superior turbinate

Sphenoidal sinus

Middle turbinate

Inferior turbinate

Communication with Orifice of auditory (eustachian) tube

1

Nasolacrimal duct

2

Anterior ethmoidal sinus

3

Maxillary sinus

4

Posterior ethmoidal sinus

Figure 45-1 Sagittal View of Paranasal Sinuses

594

SINUSITIS REQUIRES ANTIBIOTICS FOR RAPID CURE Reference Standard for Diagnosing Sinusitis The reference (or gold) standard for diagnosing infectious sinusitis is sinus aspiration and culture. Its use is particularly appropriate for guiding antibiotic choice in patients with complicated or refractory sinusitis. However, in general practice, sinus radiographs are readily obtained and can be considered a pragmatic reference standard. A 4-view sinus series is highly concordant with a single Waters view,6,7 and when it reveals sinus opacity, an air-fluid level, or 6 mm or more of mucosal thickening, a 4-view sinus series is 72% to 96% as accurate for maxillary sinusitis as aspiration and culture respectively.8,9 The chief limitations of sinus radiographs are poor visualization of the ethmoid air cells and difficulty distinguishing between infection, tumor, and polyp in the completely opacified sinus. Other potentially useful diagnostic tests are ultrasonography and computed tomography. Ultrasonography is nonionizing but correlates only moderately well with sinus radiographs or sinus aspiration.10-12 Computed tomography of the sinuses is superior to sinus radiography for visualizing the ethmoid air cells, for evaluating opacified sinuses or mucoceles, and for differentiating the bony changes of chronic inflammation from osteomyelitis.13 Sinus computed tomography may become the diagnostic test of choice but is not as readily available as radiographs and has not been evaluated against sinus puncture. This caveat is important because computed tomography may be highly sensitive, yet lack specificity.14

Normal Anatomy and Pathophysiology of Sinusitis The nose humidifies, warms, and filters inspired air as it passes through the nasal vestibule and over the nasal turbinates.15 The nasal turbinates promote turbulent air flow that causes particulate matter to fall on the nasal mucosa, where it is swept by ciliated pseudostratified columnar cells to the nasopharynx. Respiratory epithelium also lines the paranasal sinuses and creates drainage into the nasal cavity via the superior meatus (sphenoid and posterior ethmoid) and middle meatus (maxillary and anterior ethmoids) (Figure 45-1).16 Properly functioning ciliated cells are critical because maxillary sinus drainage is uphill (Figure 45-2). Patients predisposed to infectious sinusitis may have mucosal edema (eg, allergic rhinitis, viral rhinitis), mechanical obstruction of the meatus (eg, polyps, deviated nasal septum), or impaired ciliary activity (eg, Kartagener syndrome).3,17 Under these conditions, viruses and bacteria proliferate in the poorly draining sinus and provoke acute sinusitis.

How to Elicit the Relevant Symptoms and Signs Although patients may give a simple description, such as “sinus trouble,” the examiner should seek a more complete medical history. Symptoms that may increase the likelihood of sinusitis include fever, malaise, cough, nasal congestion, maxillary toothache, purulent nasal discharge, little improvement with nasal decongestants, and headache or facial pain exacerbated by bending forward.

CHAPTER 45 Examination of the nostrils can be performed with a short, wide speculum mounted on a handheld otoscope. The speculum should be directed posterolaterally, avoiding the sensitive nasal septum. The nasal mucosa should be inspected for color, edema, character of nasal secretions, polyps, and structure of the nasal septum (Figure 45-3). Purulent secretion from the middle meatus is reported to be highly predictive of maxillary sinusitis but may be difficult to see unless the examiner shrinks the nasal mucosa with a topical vasoconstrictive agent (eg, oxymetazoline hydrochloride) and uses a nasal speculum to enhance visualization.18 Septal deviation or nasal polyps are important findings because they may contribute to nasal obstruction and promote recurrent sinusitis. Palpation for sinus tenderness should be performed over the maxillary and frontal sinuses (Figure 45-4). In addition, checking for tenderness by tapping the maxillary teeth with a tongue blade may be valuable because 5% to 10% of maxillary sinusitis is a result of dental root infection.19 The ethmoid and sphenoid sinuses cannot be adequately evaluated during the routine physical examination. Transillumination of the maxillary sinuses may be performed by 2 methods. The best-studied method is performed by placing a Welch-Allyn-Finnoff transilluminator (WelchAllyn Inc, Skaneateles Falls, New York) over the infraorbital rim, shielding the light source from the observer’s eyes, and judging light transmission between sides through the hard palate (Figure 45-5). The examination must be performed in a completely darkened room after allowing the observer’s vision to adapt fully to darkness. Obviously, the patient’s dentures should be removed. Most experts report the transillumination results as opaque (no light transmission), dull (reduced light transmission), or normal (light transmission typical of a normal subject). An alternative method is to place a light source in the patient’s mouth and have the patient make a tight seal around the transilluminator; the observer judges light transmitted through the maxillary sinuses. This technique has the advantage of being able to simultaneously compare sides but requires sterilization of the instrument between patient examinations. The frontal sinuses can be examined by placing a light source below the supraorbital rim, but interpretation is difficult because the frontal sinuses naturally develop asymmetrically. This normal variation may falsely suggest sinusitis but is resolved by routine radiography.

Precision of Symptoms and Signs A total of 111 patients with nasal complaints were examined by a general internist and a second examiner who was a physician assistant, internal medicine resident, or attending internist.20 Agreement was high between examiners for 11 of the 15 historical items, including headache (κ, 0.78); subjective fever, chills, or sweats (κ, 0.71); cough (κ, 0.68); colored nasal discharge (κ, 0.68); facial pain (κ, 0.65); and maxillary toothache (κ, 0.60). (Sackett21 gives a further explanation of the κ statistic and the other special terms and ideas used in this overview.) On physical examination, agreement was high

Sinusitis

Frontal sinus

Ethmoid sinus

Maxillary sinus

Nasal turbinates

Figure 45-2 Coronal View of Paranasal Sinuses

Otoscope

Middle meatus (entrance to maxillary sinus) CROSS SECTION

Otoscopic View

Middle turbinate

Septum

Inferior turbinate

Figure 45-3 Examination of the Nose Through an Otoscope With a Disposable Speculum The middle meatus is usually not visible behind the turbinates.

595

CHAPTER 45

The Rational Clinical Examination only for sinus tenderness (κ, 0.59) and was fair for maxillary sinus transillumination (simple agreement, 61%; κ, 0.22). In the only other study of observer variability for transillumination, otolaryngologists also had modest agreement between examiners for the maxillary sinuses (simple agreement, 62%), but agreement was good for the frontal sinuses (simple agreement, 95%).22 Observer agreement is high for most patient symptoms, but for the physical examination agreement is high only for sinus tenderness.

A Surface palpation for frontal sinuses

Accuracy of Symptoms and Signs of Sinusitis

B Surface palpation for maxillary sinuses

Figure 45-4 Surface Landmarks for Palpation of Frontal Sinuses (Left) and Maxillary Sinuses (Right) Some experts recommend palpating the frontal sinuses by placing the fingers on the orbital roof below the eyebrow.

Transilluminator Infaorbital rim

Maxillary sinus

Hard palate

Figure 45-5 Transillumination of the Maxillary Sinus The light source should be shielded from the examiner’s vision with the free hand.

596

There have been few attempts to systematically evaluate the accuracy of the clinical examination for sinusitis. Three studies assessed the discriminate ability of sinusitis symptoms and signs in adults. One evaluated 69 historical items among 164 consecutive patients with sinusitis suspected by the patient or otolaryngologist.23 These symptoms were compared to a reference standard of 4-view radiography (Caldwell, Waters, lateral, and submental vertex projections). Six symptoms (preceding upper respiratory infection, any nasal discharge or purulent nasal discharge, painful mastication, malaise, cough, and hyposmia) were significantly (P < .01) more common in patients with abnormal radiographs, but no single finding was highly accurate. We compared symptoms to radiograph in 247 consecutive male patients who had rhinorrhea or facial pain unrelated to trauma or who suspected they might have sinusitis.20 Colored nasal discharge, cough, and sneezing were the most sensitive symptoms (72%, 70%, and 70%, respectively) but were not specific (52%, 44%, and 34%, respectively). One symptom, maxillary toothache, was highly specific (93%), but only 11% of patients reported this symptom. Historical items thought to make sinusitis less likely, such as sore throat (sensitivity, 52%; specificity, 56%), itchy eyes (sensitivity, 52%; specificity, 43%), and constitutional symptoms (sensitivity, 56%; specificity, 47%), were not useful. A third study compared symptoms to ultrasonographic findings in 400 general practice patients selected for study because their physician intended to test or treat for sinusitis.24 Results from this study should be interpreted with caution because the reference standard (ultrasonography) was not interpreted independent of the clinical findings and is less accurate than radiography.11,12 In the study by van Duijn et al,24 preceding common cold (sensitivity, 85%; specificity, 28%), pain at bending forward (sensitivity, 65%; specificity, 59%), and purulent rhinorrhea (sensitivity, 62%; specificity, 67%) were the most useful findings. Toothache was found to be highly specific (specificity, 83%). Studies in children are limited to sensitivities for a few clinical findings. Clear or purulent discharge (sensitivity, 76%84%) and cough (sensitivity, 48%-80%) are the most sensitive findings (Table 45-2), but the discriminating power of these findings is not known.25-28 The most studied but least understood physical examination maneuver is paranasal sinus transillumination.5,8,20,22,25,27,29-32 Since the technique was first described in 1889 by Voltolini,33

CHAPTER 45 its value as a diagnostic test has been hotly debated. Several authors have described transillumination as “highly predictive of disease,” whereas another author has described the use of transillumination as an act of criminal negligence.34 Most studies of transillumination have methodologic limitations, and 2 of the more complete studies had differing results.20,30 Our own study compared the results of transillumination to paranasal sinus radiographs in 247 consecutive patients with nasal symptoms who were treated in general medicine clinics at a Veterans Affairs medical center.20 Transillumination, using a Welch-Allyn-Finnoff transilluminator or Mini MagLite (Mag Instrument Inc, Ontario, California) placed over the infraorbital rim, did little to change the posttest probability of sinusitis. It generated a likelihood ratio (LR) of only 1.6 if either maxillary sinus was dull or opaque and 0.5 if both maxillary sinuses transilluminated normally. Clearly, as a single finding, transillumination could not be relied on to rule in or rule out sinusitis. The second study included 113 patients with nasal symptoms and abnormal sinus radiographs and found different results.30 In the subset of these patients who were examined by an otolaryngologist (using the same transillumination technique as our study), transillumination was highly useful when the sinus was either completely opaque (LR, ∞) or completely normal (LR, 0.04) but less useful when the finding was dull transillumination (LR, 0.41). In contrast to the previous study, opaque transillumination ruled in sinusitis and normal transillumination ruled out sinusitis. Why did these 2 studies yield such disparate results? First, the study populations were different (a primary care walk-in clinic vs an otolaryngology clinic) and may have created different degrees of expectation bias. Second, the examiners’ training was different; otolaryngologists may be better transilluminators than general internists. These 2 studies suggest that transillumination may be more useful for diagnosing sinusitis when performed by otolaryngologists. Because the paranasal sinuses develop at different rates among children, transillumination may be less reliable than in adult patients. Three studies have examined the value of transillumination in children. In one, the examination could not be performed in 24% of the children because of poor patient cooperation.5 For the remaining children, there was agreement between transillumination and radiographic findings in 53% and disagreement in 27%, and transillumination was nondiagnostic in 20%.5 The other 2 studies reported sensitivities of only 76% (19/25) in one27 and 48% (23/48) in the other, which was performed in children with opaque maxillary sinuses on radiographs who were undergoing sinus drainage for chronic purulent sinusitis.32 The sensitivity of transillumination should have been maximal in this latter patient group with severe disease but nevertheless performed poorly. Information is limited for other commonly assessed physical examination components. In adults, sinus tenderness was found to have poor sensitivity and specificity (48% to 50% and 62% to 65%, respectively),20,24 but other findings (temperature, nasal mucosal color, and percussion tenderness of the maxillary teeth) have not been well studied. In children,

Sinusitis

Table 45-2 Sensitivities (%) for Signs and Symptoms of Acute Sinusitis in Children Source Swischuk et al25 Sign or Symptom (n = 63) Nasal discharge Cough Headache Fever Facial pain or swelling Fetor oris

Wald et al26 (n = 30)

McClean27 (n = 25)

Kogutt and Swischuk28 (n = 96)

76 60 48 46b …

77 80 33 63b 30

84 60 …a 12c 8d

77 48 … 21c …



50





a

Ellipses indicate information not available. Fever not defined. c Temperature > 38.3°C (101°F). d Pain or swelling detected by examination. b

tympanic membrane changes from otitis media (sensitivity, 68%) is the most common physical examination finding associated with sinusitis, whereas a documented temperature higher than 38.3°C (101°F) (sensitivity, 12% to 21%) is uncommon.27,28

Accuracy of Combinations of Symptoms and Signs Despite the poor accuracy of the individual symptoms and signs, these findings used in combination can be diagnostic for sinusitis. We used logistic regression modeling to identify signs and symptoms that best predict sinusitis. This statistical procedure selects findings that independently contribute toward making the diagnosis of sinusitis. Three symptoms (maxillary toothache, poor response to nasal decongestants, and history of colored nasal discharge) and 2 signs (purulent nasal secretion and abnormal transillumination) were the best predictors of sinusitis (Table 45-3).20 When none of these findings were present, sinusitis could be ruled out (LR, 0.1), and when 4 or more were present, the LR was 6.4 (Table 45-4). One study compared 11 clinical findings elicited by experienced otolaryngologists with radiograph and maxillary sinus aspiration in 155 patients presenting to an emergency department with

Table 45-3 Independent Predictors of Sinusitis a Symptom or Sign

LR+ (95% CI)

LR– (95% CI)

Maxillary toothache Purulent secretion Poor response to decongestants Abnormal transillumination History of colored nasal discharge

2.5 (1.2-5.0) 2.1 (1.5-3.0) 2.1 (1.4-3.1) 1.6 (1.3-2.0) 1.5 (1.2-1.9)

0.9 (0.8-1.0) 0.7 (0.5-0.8) 0.7 (0.6-0.9) 0.5 (0.4-0.7) 0.5 (0.4-0.8)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aData from Williams et al.20

597

CHAPTER 45

The Rational Clinical Examination

Table 45-4 Likelihood Ratios by Number of Signs and Symptoms Present a No. of Symptoms and Signs Sinusitis Present Sinusitis Absent

LR

≥4 3 2 1 0 Total

6.4 2.6 1.1 0.5 0.1 . . .b

16 29 27 14 2 88

4 18 39 48 32 141

Abbreviation: LR, likelihood ratio. a Symptoms and signs comprise maxillary toothache, purulent nasal secretion, poor response to decongestants, transillumination (normal bilaterally vs any abnormality), and colored nasal discharge by medical history. Data from Williams et al.20 b Ellipses indicate not applicable.

the odds of sinusitis increase sharply (LR, 6.4), and when none are present, sinusitis is ruled out. 3. Transillumination requires a completely darkened room, adequate time for dark adaptation, and practice. 4. The overall medical history and physical examination in symptomatic adult patients is accurate. Author Affiliations at the Time of the Original Publication

Division of General Internal Medicine, University of Texas Health Sciences Center at San Antonio, and the Ambulatory Care Service, Audie L. Murphy Memorial Veterans’ Hospital, San Antonio, Texas (Dr Williams); Ambulatory Care Service and the Center for Health Services Research in Primary Care, Durham Veterans Affairs Medical Center, and the Division of General Internal Medicine, Duke University Medical Center, Durham, North Carolina (Dr Simel). Acknowledgments

suspected sinusitis.35 With similar statistical techniques, a history of purulent rhinorrhea or unilateral sinus pain and the presence of pus in the nasal cavity on examination were highly predictive of sinusitis. Maxillary toothache, response to decongestants, and transillumination were not studied. Physicians appear able to integrate individual signs and symptoms into an overall assessment that accurately diagnoses sinusitis. In our study, an overall impression that sinusitis was “definitely or most likely present” generated an LR of 4.7, and an overall impression that sinusitis was “unlikely or definitely absent” generated a rather low LR of 0.4. When the impression was intermediate, the LR was 1.4.20,36 These findings are in agreement with a study that investigated otolaryngologists’ ability to diagnose purulent sinusitis in patients with chronic symptoms. In the study by Berg et al,37 the overall clinical evaluation was compared with sinus aspiration, with the following results: definitely sinusitis, LR = 19; probably sinusitis, LR = 4; probably not sinusitis, LR = 0.14; definitely not sinusitis, LR = 0.19. The general internist’s overall assessment of the likelihood of sinusitis performs well compared with radiograph or sinus aspiration. To summarize, primary care practitioners frequently evaluate patients with nasal symptoms, and in many instances, sinusitis can be confidently ruled in or ruled out according to the clinical examination. Further studies are needed to examine clinical findings that have not been studied (such as headache when leaning forward) and to test whether the 5 clinical findings found to be useful for adult men can be exported to other patient populations.

THE BOTTOM LINE 1. Sinusitis is insidious in children. Concurrent otitis media is common. 2. Considered in combination, maxillary toothache, poor response to nasal decongestants, abnormal transillumination, and colored nasal discharge by medical history or examination are the most useful clinical findings in primary care populations. When all 5 features are present, 598

This work was supported in part by a grant from the A. W. Mellon Foundation and Veterans Affairs Health Services Research and Development grant 89-065.A. The authors wish to thank Donald Hitch, MD, for reviewing an early draft of this manuscript.

REFERENCES 1. National Center for Health Statistics. National Ambulatory Medical Care Survey of Visits to General and Family Practitioners, January-December 1975. Advance Data from Vital and Health Statistics, No. 15. DHEW Pub. No. (HRA) 78-1250. Hyattsville, MD: Public Service; 1977. 2. MacKay I, ed. Rhinitis: Mechanisms and Management. Park Ridge, NJ: Parthenon Publishing; 1989:169. 3. Simon HB. Approach to the patient with sinusitis. In: Goroll AH, May LA, Mulley AG, eds. Primary Care Medicine: Office Evaluation of the Adult Patient. Philadelphia, PA: JB Lippincott; 1987:883-885. 4. Axelsson A, Chidekel N, Grebelius N, Jensen C. Treatment of acute maxillary sinusitis: a comparison of four different methods. Acta Otolaryngol (Stockh). 1970;70(1):71-76. 5. Wald ER, Chiponis D, Ledesma-Medina J. Comparative effectiveness of amoxicillin and amoxicillin-clavulanate potassium in acute paranasal sinus infections in children: a double-blind, placebo-controlled trial. Pediatrics. 1986;77(6):795-800. 6. Williams JW Jr, Roberts L, Distell B, Simel DL. Diagnosing sinusitis by xray: comparing a single Waters view to 4-view paranasal sinus radiographs. J Gen Intern Med. 1992;7(5):481-485. 7. Hayward MW, Lyons K, Ennis WP, Rees J. Radiography of the paranasal sinuses: one or three views? Clin Radiol. 1990;41(3):163-164. 8. Evans FO Jr, Sydnor JB, Moore WE, et al. Sinusitis of the maxillary antrum. N Engl J Med. 1975;293(15):735-739. 9. Hamory BH, Sande MA, Sydnor A Jr, Seale DL, Gwaltney JM Jr. Etiology and antimicrobial therapy of acute maxillary sinusitis. J Infect Dis. 1979;139(2):197-202. 10. Berg O, Carenfelt C. Etiological diagnosis in sinusitis: ultrasonography as clinical complement. Laryngoscope. 1985;95(7 pt 1):851-853. 11. Shapiro GG, Furukawa CT, Pierson WE, Gilbertson E, Bierman CW. Blinded comparison of maxillary sinus radiography and ultrasound for diagnosis of sinusitis. J Allergy Clin Immunol. 1986;77(1 pt 1):59-64. 12. Rohr AS, Spector SL, Siegel SC, Katz RM, Rachelefsky GS. Correlation between A-mode ultrasound and radiography in the diagnosis of maxillary sinusitis. J Allergy Clin Immunol. 1986;78(1 pt 1):58-61. 13. Unger JM, Shaffer K, Duncavage JA. Computed tomography in nasal and paranasal sinus disease. Laryngoscope. 1984;94(10):1319-1324. 14. Havas TE, Motbey JA, Gullane PJ. Prevalence of incidental abnormalities on computed tomographic scans of the paranasal sinuses. Arch Otolaryngol Head Neck Surg. 1988;114(8):856-859.

CHAPTER 45 15. Guyton AC. Textbook of Medical Physiology. Philadelphia, PA: WB Saunders Co; 1986:477. 16. Hollinshead WH, Rosse C. Textbook of Anatomy. Philadelphia, PA: Harper & Row; 1985:982. 17. Carson JL, Collier AM, Hu SS. Acquired ciliary defects in nasal epithelium of children with acute viral upper respiratory infections. N Engl J Med. 1985;312(8):463-468. 18. Burtoff S. Evaluation of diagnostic methods used in cases of maxillary sinusitis, with a comparative study of recent therapeutic agents employed locally. Arch Otolaryngol Head Neck Surg. 1947;45:516-542. 19. Loyal V, Jones J, Noyek A. Management of odontogenic maxillary sinus disease. Otolaryngol Clin North Am. 1976;9:213-222. 20. Williams JW Jr, Simel DL, Roberts L, Samsa G. Clinical evaluation for sinusitis: making the diagnosis by history and physical examination. Ann Intern Med. 1992;117(9):705-710. 21. Sackett DL. A primer on the precision and accuracy of the clinical examination. JAMA. 1992;267(19):2638-2644. 22. Rachelefsky GS, Spector SL. Sinusitis and asthma. J Asthma. 1990;27(1):1-3. 23. Axelsson A, Runze U. Symptoms and signs of acute maxillary sinusitis. ORL J Otorhinolaryngol Relat Spec. 1976;38(5):298-308. 24. van Duijn NP, Brouwer HJ, Lamberts H. Use of symptoms and signs to diagnose maxillary sinusitis in general practice: comparison with ultrasonography. BMJ. 1992;305(6855):684-687. 25. Swischuk LE, Hayden CK Jr, Dillard RA. Sinusitis in children. Radiographics. 1982;2:241-252. 26. Wald ER, Milmoe GJ, Bowen A, Ledesma-Medina J, Salamon N, Bluestone CD. Acute maxillary sinusitis in children. N Engl J Med. 1981; 304(13):749-754.

Sinusitis

27. McClean DC. Sinusitis in children: lessons from twenty-five patients. Clin Pediatr. 1970;9(6):342-345. 28. Kogutt MS, Swischuk LE. Diagnosis of sinusitis in infants and children. Pediatrics. 1973;52(1):121-124. 29. McNeill RA. Comparison of the findings on transillumination, x-ray and lavage of the maxillary sinus. J Laryngol Otol. 1963;77:1009-1013. 30. Gwaltney JM Jr, Sydnor A Jr, Sande MA. Etiology and antimicrobial treatment of acute sinusitis. Ann Otol Rhinol Laryngol Suppl. 1981;90(3 pt 3):68-71. 31. Ballantyne JC, Rowe AR. Some points in the pathology, diagnosis and treatment of chronic maxillary sinusitis. J Laryngol Otol. 1949;63(6):337-341. 32. Otten FWA, Grote JJ. The diagnostic value of transillumination for maxillary sinusitis in children. Int J Pediatr Otorhinolaryngol. 1989;18(1):911. 33. Voltolini FER. Die ersten Operationen in der Kehlkopfshohle vom Munde aus, bei der Durchleuchtung des Kehlkopfes von aussen. Dtsch Med Wochenschr. 1889;15:340-343. 34. Smith MEN. Sinus transillumination: an archaic survival. J Laryngol Otol. 1961;75:93-94. 35. Berg O, Carenfelt C. Analysis of symptoms and clinical signs in the maxillary sinus empyema. Acta Otolaryngol (Stockh). 1988;105(3-4): 343-349. 36. Simel DL, Feussner JR, DeLong ER, Matchar DB. Intermediate, indeterminate, and uninterpretable diagnostic test results. Med Decis Making. 1987;7(2):107-114. 37. Berg O, Bergstedt H, Carenfelt C, Lind MG, Perols O. Discrimination of purulent from nonpurulent maxillary sinusitis: clinical and radiographic diagnosis. Ann Otol Rhinol Laryngol. 1981;90(3 pt 1):272-275.

599

This page intentionally left blank

U P D A T E : Sinusitis

45

Prepared by David L. Simel, MD, MHS, and John W. Williams Jr, MD, MHS Reviewed by Rowena Dolor, MD, MHS

CLINICAL SCENARIO A 36-year-old woman reports that she has “sinus headaches” about once every 2 to 3 months. On many days, she thinks she is about to get a sinus headache, but the symptoms resolve. On the day of her visit, she reports pressure in the sinuses, a headache, and nasal congestion that occurred when she woke up. There is no fever, cough, or nasal discharge. She requests an antibiotic. Your examination does not reveal pus in the nares or nasal polyps, though you find she does have some discomfort when you apply pressure to the sinuses. Before you turn off the light to transilluminate the sinuses, what additional lines of inquiry could be explored?

UPDATED SUMMARY ON SINUSITIS Original Review Williams JW Jr, Simel DL. Does this patient have sinusitis? diagnosing acute sinusitis by history and physical examination. JAMA. 1993;270(10):1242-1246.

UPDATED LITERATURE SEARCH Our literature search used the parent search strategy for The Rational Clinical Examination series, combined with the subject “exp sinusitis,” published in English from 1992 to June 2004. The results yielded 191 titles, for which we reviewed the titles and abstracts; 22 were selected for additional review. These articles were reviewed to identify studies that assessed the sensitivity and specificity of medical history or physical examination features for sinusitis. We required that the studies be conducted with outpatients, involve prospectively collected data, and use radiologic imaging, endoscopy, or sinus puncture as a criterion standard for acute sinusitis. We excluded studies that had major design biases such as a sample confined to patients with a clinical diagnosis of sinusitis. No new original studies were identified. Two meta-analyses were identified, so the update focuses on those systematic reviews rather than individual studies.

NEW FINDINGS • Among patients with a suspicion of sinusitis in general medical practice, the prevalence of disease from sinus aspirates is about 50%. • The radiograph serves as a pragmatic reference standard for primary care practice, correctly diagnosing about 4 of 5 patients.

Details of the Update Two meta-analyses of essentially the same original studies led their respective authors to distinctly different interpretations about the outcomes, though both reported that the radiographs appeared better than ultrasonography. Engels et al1 took a pragmatic approach to the reference standard for sinusitis and compared radiographs with sinus puncture and clinical examination with both sinus puncture and radiographs. In addition, they also evaluated varying thresholds for sinus radiograph positivity (opacity, air-fluid level, or mucosal thickening) and risk score for the clinical examination. Not surprisingly, the radiograph had a slightly better summary receiver operating characteristic curve area than the clinical examination (0.83 vs 0.74, respectively), with the authors concluding that evaluating combinations of individual findings as reported in the original Rational Clinical Examination article may perform better than the overall clinical impressions. A reappraisal of the studies reported by Engels et al1 shows a summary positive likelihood ratio (LR) for radiographs of 4.2 (95% confidence interval [CI], 2.6-6.7) and negative LR of 0.25 (95% CI, 0.17-0.37). From a pragmatic standpoint, using the radiograph as a reference standard will result in the correct classification of 4 of 5 patients compared with sinus puncture. In the 2 original articles comparing radiographs with sinus puncture for general medical patients with a suspicion of sinusitis, the prevalence of sinusitis was 49% and 51%.2,3 Varonen et al4 evaluated essentially the same studies, although they counted one study as 2 separate studies, which produced slightly different results. However, the authors did not evaluate varying thresholds of positivity for the radiographs and did not include risk scores for the clinical examination because the scores have not been compared with puncture.2 In addition, the authors could not evaluate varying levels of the overall clinical examination (eg, high, intermediate, or low 601

CHAPTER 45

Update

probability) because they have not been compared with sinus puncture. Despite results that appear clinically similar, the authors concluded that the clinical examination was not reliable and that ultrasonography or radiography should be used “if a correct diagnosis is considered important.”

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION The original data for LRs, based on the number of signs and symptoms, were given without their CIs. The most important findings were maxillary toothache, purulent nasal secretion, poor response to decongestant, abnormal transillumination result, and patient report of colored nasal discharge. We recalculated the LRs for greater than or equal to 4, 3, 2, 1, and 0 findings present. Patients with greater than or equal to 4 findings have an LR of 6.4 (95% CI, 2.2-19), whereas those with 0 findings have an LR of 0.1 (95% CI, 0.02-0.41).

CHANGES IN THE REFERENCE STANDARD For clinical research, the reference standard is sinus puncture. However, for clinical care, the radiograph will correctly classify 4 of 5 patients and may serve as a pragmatic standard for evaluating the clinical examination.

RESULTS OF LITERATURE REVIEW Univariate Findings for Sinusitis Radiographs perform well compared to the reference standard of sinus puncture (Table 45-5).

Table 45-5 Likelihood Ratios of Radiographs Compared to Sinus Puncture Finding

LR+ (95% CI)

LR– (95% CI)

Radiographs vs sinus puncturea (n = 6 studies)

4.2 (2.6-6.7)

0.26 (0.17-0.37)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Calculated from the data in the studies as summarized by Engels et al.1

EVIDENCE FROM GUIDELINES A panel of experts met to discuss the definition of sinusitis for clinical research and clinical care.5 A useful clinical concept was

602

preference for the word rhinosinusitis over sinusitis because sinusitis is usually associated with nasal inflammation and rhinitis. The experts did not perform a structured systematic literature review but used consensus-building strategies to derive recommendations. For diagnosing acute bacterial rhinosinusitis, the panel’s expert opinion was based on a combination of 3 “major” findings and 9 “minor” symptoms. The panel accepted a previously proposed case definition for acute rhinosinusitis (that has not been validated), requiring the presence of 2 or more major symptoms (purulent anterior nasal drainage, purulent posterior nasal drainage, or cough) or 1 major and at least 2 minor symptoms (headache, facial pain, periorbital edema, earache, halitosis, tooth pain, sore throat, increased wheeze, fever). The objective documentation of acute bacterial rhinosinusitis requires either visualization of purulent drainage by the clinician or radiographic evidence.

CLINICAL SCENARIO—RESOLUTION This patient presents with a common set of symptoms. Patients frequently self-diagnose sinusitis, and many will self-medicate or present to their primary care provider with a request for antibiotics. The prevalence of acute bacterial sinusitis among patients who the physician suspects may have the disease is about 50%. However, unless this patient proves to have abnormal maxillary sinus transillumination results, she has none of the symptoms commonly associated with radiographic-proven sinusitis. The probability of sinusitis with none of the 5 findings is about 9%. The keys to additional lines of inquiry are recognizing that acute bacterial sinusitis is not something that comes and goes within a given day but is more persistent. Migraine headaches frequently begin on awakening and are associated with nasal stuffiness, leading patients to “misdiagnose” themselves. The absence of frank nasal discharge from the medical history and your examination, along with the abrupt onset of symptoms associated with the headache, supports an alternative diagnosis such as vascular headaches. The decision to obtain a sinus radiograph depends on whether you would treat with decongestants, antibiotics, or steroid inhalers for a positive result. If she has an abnormal radiographic result (LR, 4.2), the probability of acute bacterial sinusitis is about 30%, given the absence of clinical findings. You might value a normal radiographic result if it would help in persuading the patient that she does not likely have acute bacterial sinusitis. The probability of acute bacterial sinusitis is less than 3% for a patient with none of the clinical findings and a normal radiographic result.

CHAPTER 45

Sinusitis

SINUSITIS—MAKE THE DIAGNOSIS

PRIOR PROBABILITY Among general medical patients with suspected sinusitis, the prevalence of disease as determined by sinus puncture and culture is 50%.

POPULATION FOR WHOM SINUSITIS SHOULD BE CONSIDERED Sinusitis may be thought of as “rhinosinusitis” to emphasis the role of nasal symptoms but requires additional clinical research to determine whether the change in terminology requires a change in management approaches. Sinusitis should be considered in patients with nasal stuffiness, nasal discharge, or maxillary facial pain. Many patients will present with a self-suspicion of sinusitis.

DETECTING THE LIKELIHOOD OF SINUSITIS IN ADULTS The presence of 4 or more findings (maxillary toothache, purulent nasal secretion, poor response to decongestant, abnormal transillumination request, patient report of colored nasal discharge) makes sinusitis much more likely, whereas the absence of any of the findings makes sinusitis unlikely (Table 45-6).

REFERENCES FOR THE UPDATE 1. Engels EA, Terrin N, Barza M, Lau J. Meta-analysis of diagnostic tests for acute sinusitis. J Clin Epidemiol. 2000;53(8):852-862.a 2. van Buchem FL, Knottnerus JA, Schrijnemaekers VJ, Peeters MF. Primary-care-based randomised placebo-controlled trial of antibiotic treatment in acute maxillary sinusitis. Lancet. 1977;349(9053):683-687. 3. Laine K, Maättä T, Varonen H, Mäkelä M. Diagnosing acute maxillary sinusitis in primary care: a comparison of ultrasound, clinical examination and radiography. Rhinology. 1998;36(1):2-6.

Table 45-6 Likelihood Ratios for Radiographs and the Clinical Findings for Sinusitis LR+ (95% CI)

LR– (95% CI)

Radiographs vs sinus puncture (6 studies)a 4.2 (2.6-6.7) 0.26 (0.17-0.37) Clinical findings compared with sinus radiographs (1 study)a ≥4 6.4 (2.2-19) 3 2.6 (1.5-4.4) 2 1.1 (0.73-1.7) 1 0.47 (0.27-0.80) 0 0.1 (0.02-0.4) Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aMaxillary toothache, purulent nasal secretion, poor response to decongestant, abnormal transillumination result, patient report of colored nasal discharge.

REFERENCE STANDARD TESTS Sinus puncture with culture serves as the reference standard for research. Clinicians will prefer to use sinus radiographs, although some patients (approximately 20%) will be misclassified. A recent panel of experts accepts an abnormal radiographic result as evidence of acute bacterial rhinosinusitis for patients with appropriate symptoms.5

4. Varonen H, Mäkelä M, Savolainen S, Läärä E, Hilden J. Comparison of ultrasound, radiography, and clinical examination in the diagnosis of acute maxillary sinusitis: a systematic review. J Clin Epidemiol. 2000;53 (9):940-948.a 5. Meltzer EO, Hamior DL, Hadley JA, et al. Rhinosinusitis: establishing definitions for clinical research and primary care. J Allergy Clin Immunol. 2004;114(6 suppl):S155-S212. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

603

This page intentionally left blank

EVIDENCE TO SUPPORT THE UPDATE: Sinusitis

TITLE Meta-analysis of Diagnostic Tests for Acute Sinusitis.

45

the summary sensitivity and specificity, rather than calculating from the original data.

AUTHORS Engels EA, Terrin N, Barza M, Lau J. CITATION J Clin Epidemiol. 2000;53(8):852-862. QUESTION With a hierarchy of accuracy based on the reference standard, how well do radiography, ultrasonography, and the clinical examination perform in identifying patients with sinusitis? DESIGN Systemic review and meta-analysis. DATA SOURCES Original articles were identified through MEDLINE, along with a review of the reference lists and review articles. STUDY SELECTION AND ASSESSMENT Englishlanguage articles from 1996 to 1998 that met prespecified criteria. Studies had to be among patients with symptoms consistent with sinusitis, and all patients had to undergo evaluation so that verification bias was avoided.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The relevant tests were radiography, ultrasonography, and the clinical examination. Each test was compared with sinus puncture, when studies were available. Following this “ideal” reference standard study, the authors included studies in which ultrasonography or the clinical examination was compared to radiography as a pragmatic reference standard. No adequate studies of computed tomography or magnetic resonance imaging were identified.

MAIN RESULTS The authors identified 4070 potential articles. From these articles, they found the following that met their inclusion criteria: studies comparing radiology with puncture (n = 6), ultrasonography with puncture (n = 5), clinical examination with puncture (n = 1), ultrasonography with radiology (n = 3), clinical examination with radiology (n = 3) (Table 45-7). All studies were done in Europe, except for an ultrasonography and a study that compared clinical examination to radiographs done in the United States. Of the 4 clinical studies, only 1 study restricted to children was not included in the original Rational Clinical Examination article. In the 2 puncture studies of adults in a generalist clinical practice, the prevalence of sinusitis was 49% and 51%. The diagnostic odds ratio for radiographs is 18 (95% confidence interval [CI], 12-27; P = .09 for heterogeneity, with

Table 45-7 Sensitivity and Specificity of Radiographs and the Clinical Examination

Test (No.) Radiographs vs puncture (6)

Clinical examination vs radiography (3)a

Summary Specificity (95% CI)

0.41 (0.33-0.49) 0.73 (0.60-0.83) 0.90 (0.68-0.97)

0.85 (0.76-0.91) 0.80 (0.71-0.87) 0.61 (0.20-0.91)

Summary ROC Curve Area 0.83

Opacity Fluid or opacity Fluid, opacity, or mucus membrane thickening

MAIN OUTCOME MEASURES The studies were assessed for the country, setting, patient characteristics, adequacy of blinding, definition of tests, and number of cut points assessed. A summary receiver operating characteristic (ROC) curve was generated for each comparison, along with summary sensitivity and specificity estimates. An estimate of the likelihood ratio (LR) was estimated from

Result

Summary Sensitivity (95% CI)

0.74

Abbreviations: CI, confidence interval; ROC, receiver operating characteristic. aOne study compared the overall clinical impression to sinus radiography, one evaluated a risk score for children, and one evaluated a risk score for adults. The 2 risk score studies show similar points on the ROC curve.

E45-1

CHAPTER 45

Evidence to Support the Update

I2 = 48%). Sinus radiographs vs sinus puncture showed an accuracy of 81%, with reasonably narrow CIs, despite statistical heterogeneity (95% CI, 74%-87%). We calculated these results for the odds ratio and accuracy from data in the original reports (see Table 45-7). Although the comparison of a clinical risk score with puncture had a summary ROC area of 0.91, the authors identified potential problems with internal validity. The studies comparing ultrasonography with puncture had too much variability for adequate ROC curve assessment, whereas those compared with radiography were so close together that a curve could not describe the points.

CONCLUSIONS LEVEL OF EVIDENCE Systematic review. STRENGTHS The inclusion criteria are well specified and

inclusive. LIMITATIONS A quality score was not assigned, and some

studies were included that lacked appropriate blinding or description of the factors that defined a positive result. Tests for homogeneity were not done, nor were summary estimates of the LRs given with their CIs. Some readers will be uncomfortable with the decision to pool studies of varying quality and that were potentially biased by the lack of blinding and case definitions. The authors used the summary sensitivity and specificity to estimate the positive and negative LRs (LR+ and LR–) of radiographs (LR+, 3.7; LR–, 0.34) for “fluid or opacity” compared with sinus puncture. The absence of fluid, opacity, or mucosal thickening decreased the LR– to 0.16, but the specificity had broad CIs. On the other hand, the CIs suggest that the LR– could be much lower and that, even with a prevalence of 50%, a completely normal radiograph result would greatly decrease the probability of sinusitis. From a pragmatic standpoint, using radiographs as the reference standard will result in the misclassification of about 20% of patients. Sinus ultrasonography is a test primarily used in Europe. Because of the substantial variability in results, the authors infer that ultrasonography may require experience that is more extensive before clinicians can rely on it. No studies of computed tomography were of sufficient quality to meet their inclusion criteria. The authors conclude that the clinical examination does have “moderate” ability to identify patients with sinusitis. They recommend further evaluation of risk scores for children and adults because they are less reliant on the experience of the examining clinician. Reviewed by David L. Simel, MD, MHS

E45-2

TITLE Comparison of Ultrasound, Radiography, and Clinical Examination in the Diagnosis of Acute Maxillary Sinusitis: A Systematic Review. AUTHORS Varonen H, Mäkelä M, Savolainen S, Läärä E, Hilden J. CITATION J Clin Epidemiol. 2000;53(9):940-948. QUESTION Compared with sinus puncture or computed tomography (CT), how well do radiographs, ultrasonography, and the overall clinical examination perform in diagnosing acute maxillary sinusitis? DESIGN Systemic review and meta-analysis. DATA SOURCES Original articles were identified through MEDLINE (1996 to April 1999) and a Finnish database (Medic), along with a review of the reference lists, review articles, and hand searching of 4 relevant journals. STUDY SELECTION AND ASSESSMENT Included studies compared the tests of interest with sinus puncture or CT for adults with suspected acute maxillary sinusitis and symptoms of fewer than 3 months’ duration.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Radiograph results were considered abnormal when the patient had at least 6-mm mucosal thickening, an air-fluid level, or a complete opacity. Ultrasonographic results were considered abnormal according to previously published criteria.1 The overall clinical impression was evaluated as positive or negative for sinusitis.

MAIN OUTCOME MEASURES The studies were assessed for validity with standard criteria from the Cochrane Collaboration. A summary receiver operating characteristic (ROC) curve was generated for each comparison, along with summary sensitivity and specificity estimates. Fixedeffects summary likelihood ratios (LRs), without confidence intervals (CIs), were provided. The authors did not provide quantitative estimates of heterogeneity.

MAIN RESULTS The authors identified 1054 potential articles. From these articles, they found 11 articles that met their inclusion; all were studies compared with sinus puncture, except for 1 study that used CT. The LRs were similar for radiographs, ultrasonography, and clinical examination (Table 45-8). Despite clinically similar LRs, the authors observed that ultrasonographic findings were heterogeneous. They also report that, as the prevalence of disease decreased in studies, the sensitivity decreased.

CHAPTER 45

Table 45-8 Likelihood Ratio for Sinusitis for Radiographs, Ultrasonography, and the Clinical Examination

Test (No.) Radiographs (7) Ultrasonography (7) Clinical examination (2)

Summary Sensitivity (95% CI)

Summary Specificity (95% CI)

Summary Summary LR+ LR–

0.87 (0.85-0.88) 0.89 (0.88-0.91)

3.4

0.26

0.85 (0.84-0.87) 0.82 (0.80-0.83)

2.8

0.30

0.69 (0.65-0.73) 0.79 (0.75-0.82

3.2

0.40

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

CONCLUSIONS LEVEL OF EVIDENCE Systematic review. STRENGTHS The inclusion criteria are well specified and

inclusive.

Sinusitis

are similar, with differences explained more by the methods used for summarizing results than the results themselves. For radiographs, the only difference in the studies included is that the data from one study were broken out in this metaanalysis into 2 separate studies with different results for radiographs (likewise, they broke out the ultrasonographic data from this single study as if they were 3 separate studies).1 This likely created some bias in the outcomes. These authors concluded, as did Engels et al,2 that there was too much heterogeneity for ultrasonography and that radiographs may perform better. Despite LRs that are not appreciably different from radiographs, the authors conclude that the clinical examination is not reliable and that radiographs should be used when a “correct diagnosis is required.” Given that the clinical examination was used to select the patients for radiographs and that no CIs were provided for the LR, it is difficult to conclude that the clinical examination is useless. Furthermore, analyzing the clinical examination as either positive or negative may dilute the efficiency of the clinical examination when patients with an estimated intermediate probability of disease are forced into the positive or negative categories.

LIMITATIONS A quality score was not assigned, and some

studies were included that lacked appropriate blinding or description of the factors that defined a positive result. Tests for homogeneity were not reported. For the summary ROC curve, the authors do not allow “gradations” of positivity for the tests of interest. Thus, information may have been lost in studies that dichotomize the overall clinical impression into “positive” or “negative.” In addition, the authors used fixedeffects measures without CIs for the summary LR. The included studies are almost identical to the meta-analysis published by Engels et al.2 Not surprisingly, the results

REFERENCES FOR THE EVIDENCE 1. Revonta M. Ultrasound in the diagnosis of maxillary and frontal sinusitis. Acta Otolaryngol (Stockh). 1980;370(suppl):1-55. 2. Engels EA, Terrin N, Barza M, Lau J. Meta-analysis of diagnostic tests for acute sinusitis. J Clin Epidemiol. 2000;53(8):852-862.

Reviewed by David L. Simel, MD, MHS

E45-3

This page intentionally left blank

46

C H A P T E R

Does This Patient Have Splenomegaly? Steven A. Grover, MD, MPA, FRCPC

CLINICAL SCENARIO Among the patients you are seeing today are the following 3: The first is an elderly woman who complains of easy fatigability, and her conjunctivae and nail beds are pale. You suspect that she is anemic because of gastrointestinal blood loss, but among your differential diagnoses you consider a lymphoproliferative disorder and decide to examine her for splenomegaly. The second is a college student with failing appetite, ability to concentrate, energy, and grades. You think that he is depressed but want to rule out infectious mononucleosis and decide to examine him for splenomegaly. The third is an otherwise healthy man with wellcontrolled hypertension and a normal cardiovascular examination result. As he lies on the examining table, stripped to his waist, you wonder whether you should take the time to examine him for splenomegaly.

Alan N. Barkun, MD, FRCPC David L. Sackett, MD, FRSC, FRCPC

WHY EXAMINE THE SPLEEN? We examine the spleen to see whether it is palpable. Most palpable spleens are enlarged, and splenomegaly in an adult requires an explanation, for it may be a manifestation of disease. Despite many important causes of splenomegaly, including cancers, infections, and connective tissue diseases, many of these diagnoses are relatively uncommon such that isolated splenomegaly in an otherwise healthy adult is most often associated with nonspecific infections or no obvious cause.1

ANATOMIC LANDMARKS AND SPLENIC SIZE The normal spleen is a curved wedge that follows the course of the bony portion of the left 10th rib (Figure 46-1A). Its narrow posterior pole points back and to the right, toward the spine. Its outer surface is convex and lies just beneath the left side of the diaphragm, and its blunt anterior pole approaches the midaxillary line, pointing toward the left side of the colic flexure. Its inner convex surface bears a large impression from the posterior wall of the stomach, and its inferior edge bears impressions from the upper pole of the left kidney and, occasionally, the tail of the pancreas.

HOW LARGE IS THE NORMAL SPLEEN? Autopsies after sudden traumatic death in individuals free of disorders likely to lead to splenomegaly have provided information on the usual weight of the spleen. In Philadelphia,

Copyright © 2009 by the American Medical Association. Click here for terms of use.

605

CHAPTER 46

The Rational Clinical Examination

A Normal anatomical relationships

B Splenomegaly

A N T E R O L AT E R A L V I E W

Diaphragm

A N T E R O L AT E R A L V I E W

Greater curvature of stomach Spleen Direction of enlargement Tail of pancreas

Spine

Rib 10

Left colic flexure (large intestine)

Figure 46-1 The Normal Sized Spleen Rests Hidden Under the Rib Cage A, The normal spleen is a curved wedge that follows the course of the bony portion of the left 10th rib. Its narrow posterior pole points back and to the right, toward the spine. Its outer surface is convex and lies just beneath the left side of the diaphragm, and its blunt anterior pole approaches the midaxillary line, pointing toward the left side of the colic flexure. Its inner convex surface bears a large impression from the posterior wall of the stomach, and its inferior edge bears impressions from the upper pole of the left kidney and occasionally the tail of the pancreas. B, As the spleen enlarges, its anterior pole continues to follow the left 10th rib as the spleen descends below the rib cage and across the abdomen toward the right iliac fossa.

Pennsylvania, such spleens exhibited median weights from 90 g (among black women) to 170 g (among young white men), with intermediate values for black men (100 g), white women (115 g), and elderly white men (130 g). The pathologists who conducted these studies stated that the “best rule of thumb is to regard any spleen under 250 g as normal.”2 This biologic variation in average spleen size underscores the need for a criterion standard definition of splenic enlargement that is acceptable to patients (ie, having one's spleen weighed is painful) and reproducible for clinicians. One such standard is the radioisotopic scintiscan, presented (with the most commonly used normal values in parentheses) as maximum values for length (12 cm) and width (7 cm),3 surface area (80 cm2),4 or volume (250 cm3).5 Most recently, an ultrasonographic criterion standard has been suggested, with splenomegaly defined as a cephalocaudad diameter of 13 cm or more.6,7

THE CONSEQUENCES OF SPLENOMEGALY FOR THE CLINICAL EXAMINATION Because the normal-sized spleen almost always lies entirely within the rib cage, it usually cannot be palpated. However, as it enlarges it displaces the stomach but cannot displace the spine, diaphragm, or kidney. Therefore, its anterior pole continues to follow the projection of the bony portion of the left 10th rib, descending below the rib cage and across the abdomen toward the right iliac fossa (Figure 46-1B). 606

HOW TO EXAMINE FOR SPLENOMEGALY Inspection Inspection of the left upper quadrant might reveal a bulging mass emerging from under the left costal margin and descending on inspiration. There are no published assessments of the accuracy of clinical inspection. Nonetheless, this sign would be expected to have low sensitivity because only massive spleens will distort the abdominal wall sufficiently to be seen. Moreover, because other large masses (a polycystic kidney or gastric or colon cancer) also can distort the abdominal wall and may descend on inspiration, this sign probably does not have perfect specificity either. In the absence of previous documentation or suspicion of massive splenomegaly, this is unlikely to be a useful sign.

Percussion Percussion seeks to identify the loss of tympany as the enlarging spleen impinges on the adjacent air-filled lung, stomach, and colon. Percussion is often claimed to be more sensitive than palpation for lesser degrees of splenomegaly, although evidence to support this claim (described herein) is scant. Three percussion methods have been validated against ultrasonography or scintigraphy: 1. Percussion by Nixon Method (as Modified by Sullivan and Williams)

The patient is placed in the right lateral decubitus position. Percussion is initiated midway along the left costal margin and continued upward along a line perpendicular to the

CHAPTER 46 costal margin (Figure 46-2). In a normal examination, a full stomach can result in initial percussion dullness, but as percussion continues along the perpendicular line tympany then becomes present because of the overlying lung. Splenomegaly is diagnosed when the dullness is present more than 8 cm above the costal margin.8,9

Examiner begins percussion at midpoint of left costal margin in a perpendicular direction towards the midaxillary line.

Right lateral decubitus position

Positive indication: dullness is present more than 8 cm above the costal margin.

2. Percussion by Castell Method

Midaxillary line

The patient is placed in the supine position. Percussion is carried out in the lowest intercostal space in the left anterior axillary line in both expiration and full inspiration (Figure 46-3). In a normal examination result, the percussion note remains resonant throughout this maneuver. Splenomegaly is diagnosed when the percussion note is dull or becomes dull on full inspiration.10

Spleen

Enlarged spleen

8 cm

Left costal margin

3. Percussion of Traube Space

The patient is supine, with the left arm slightly abducted for access to the entire Traube space (after its description by Ludwig Traube, who ascribed its disappearance to pleural effusion, not an enlarged spleen),11 defined by the sixth rib superiorly, the midaxillary line laterally, and the left costal margin inferiorly (Figure 46-3). With the patient breathing normally, this triangle is percussed across 1 or more levels from its medial to lateral margins. Normal percussion yields a resonant or tympanitic note. Splenomegaly is diagnosed when the percussion note is dull.12

Palpation Although many methods for palpation of the spleen have been reported in clinical texts and journals, only 3 have had their precision or accuracy documented in the clinical literature and will be described herein. Relaxation of the abdominal wall is a prerequisite for successful palpation and can be assisted by both the examiner (friendly, gentle, and warm hands) and the patient (flexed, supported knees). Two-Handed Palpation With Patient in Right Lateral Decubitus

With the patient in the right lateral decubitus position, the examiner's left hand is slipped from front to back around the left lower thorax, gently lifting the left lowermost rib cage anteriorly and medially. The tips of the fingers of the examiner's right hand are pressed gently just beneath the left costal margin, and the patient is asked to take a long, deep breath as the palpation of a descending spleen is sought. If none is felt, the procedure is repeated, lowering the right hand 2 cm toward the umbilicus each cycle, until the examiner is confident that a massive spleen has not been missed. (Some authorities suggest starting palpation over the lower abdomen and moving up toward the costal margin.) The same procedure can be carried out with the patient supine. One-Handed Palpation With Patient Supine

This method is identical to the former one, except that no counterpressure is applied by the left hand to the rib cage. With the patient supine, the tips of the fingers of the examiner's right hand are pressed gently just beneath the left costal margin, and the patient is asked to take a long, deep breath as the palpation of a descending spleen is sought. If none is felt,

Splenomegaly

Figure 46-2 Nixon Method to Detect Splenomegaly Nixon method of percussion requires that the patient be placed in the right lateral decubitus position. Percussion is started at the midpoint of the left costal margin and proceeds perpendicularly. Splenomegaly is diagnosed when the dullness is present more than 8 cm above the costal margin.

Supine position

Normal Diaphragm (expiration)

Rib 6

Diaphragm (inspiration)

Traube space

Left costal margin Castell spot

Anterior axillary line Position during expiration Position during inspiration

Splenomegaly Using Castell’s method, the examiner percusses at the level of Castell’s spot in both expiration and full inspiration. Positive indication: percussion is dull or becomes dull on full inspiration.

Using Traube’s space, the examiner percusses across the space at one or more levels from medial to lateral margins while patient breathes normally. Positive indication: percussion is dull.

Position during expiration Position during inspiration

Figure 46-3 Percussion in Traube Space and at Castell Spot to Detect Splenomegaly Traube space is defined by the sixth rib superiorly, the left anterior axillary line laterally, and the costal margin inferiorly. Castell spot is located at the junction of the lowest intercostal space and the left anterior axillary line.

607

CHAPTER 46

The Rational Clinical Examination

the procedure is repeated, lowering the right hand 2 cm toward the umbilicus each cycle, until the examiner is confident that a massive spleen has not been missed. Some examiners like to apply counterpressure to the patient's flank with the left hand while palpating with the right. Hooking Maneuver of Middleton With Patient Supine

The patient is asked to lie flat with his or her left fist under the left costovertebral angle. The examiner is positioned to the patient's left, facing the patient's feet. The fingers of both the examiner's hands are curled under the left costal margin, and the patient is asked to take a long, deep breath as the palpation of a descending spleen is sought.13 Additional Features of the Palpable Spleen

Given its origin within the rib cage, most texts state that it is never possible to palpate (get above) the upper border of the spleen, helping distinguish it from other abdominal masses that may present an upper border. If a spleen is greatly enlarged, it may be possible to feel a hilar notch along its medial border.

PRECISION OF THE SIGNS FOR SPLENOMEGALY When groups of inpatients with and without splenomegaly had their Traube spaces percussed by 3 internists, the interexaminer agreement (κ values) ranged from 0.19 to 0.41, which is modest at best.12 However, recent food intake reduced the accuracy of Traube space percussion in this study and probably decreased the test precision when different physicians examined the same patient at various times after meals. Among the same patients, a second study14 showed that the interexaminer agreement for palpation ranged from 0.56 to 0.70, demonstrating that reproducibility between examiners of palpation was better than percussion. When tested among 50 patients with alcoholism, agreement among different examiners (using 2-handed palpation with the patient in the right lateral decubitus and 1-handed palpation with the patient supine) demonstrated an interclass correlation coefficient of 0.75 and was as good as that for ascites (and marginally better than that for jaundice,

Table 46-1 Studies of the Accuracy of Percussion No. of Patients 118a

65

a

Criterion Standard

Maneuver

Ultrasonog- Traube space raphy percussion All patients12 Nonobese patients who have not eaten recently12 Scintigra- Nixon method9 phy Castell method9

Sensitivity, % (No.)

Specificity, % (No.)

62 (58/94) 78 (29/37)

72 (109/151) 82 (54/66)

59 (10/17)

94 (45/48)

82 (14/17)

83 (40/48)

Each patient was examined by 1 to 3 examiners, for a total of 245 examinations.

608

Dupuytren contracture, vascular spiders, gynecomastia, palmar erythema, asterixis, or clubbing).15 Senior gastroenterologists exhibited marginally better agreement than more junior physicians (intraclass correlation coefficients of 0.81 and 0.73, respectively). When different examiners were asked to report the extent to which the spleen tip extended below a specific bony landmark (eg, the xiphisternal-sternal junction), their estimates varied on average by 6 cm.16

ACCURACY OF THE SIGNS FOR SPLENOMEGALY Table 46-1 summarizes studies on the accuracy of percussion. Using ultrasonographic results as the criterion standard, percussion of Traube space had a sensitivity of 62% (95% confidence interval [CI], 51%-72%) and a specificity of 72% (95% CI, 65%-80%).12 Percussion sensitivity was reduced by the presence of obesity (more false-negative results), and its specificity was decreased by recent food intake (more false-positive results). Accordingly, among leaner patients who had not eaten in the previous 2 hours, percussion sensitivity was 78% (95% CI, 62%90%), and its specificity was 82% (95% CI, 70%-90%). A second study9 examined the sensitivity and specificity, individually and in combination, of the Nixon and Castell methods of percussion (as well as 2-handed palpation in the supine and right lateral decubitus positions). In comparing the Nixon to the Castell method of percussion, the Castell method exhibited a higher sensitivity (82% vs 59%) but lower specificity (83% vs 94%) (Table 46-1). Table 46-2 summarizes 7 studies of the accuracy of palpation. The first 2 studies17,18 assessed the accuracy of the routine examination for splenomegaly by abstracting the clinical examinations (performed by a large number and range of clinicians) from routine clinical charts. Both studies found low sensitivity (20%-28%) but high specificity (98%-100%). Most enlarged spleens were missed (a high rate of false-negative results, leading to low sensitivity), but few examiners reported palpating spleens that were not there (a low rate of falsepositive results, leading to high specificity). When the results of these 2 studies were combined, the routine examination for splenomegaly had a sensitivity of 27% (95% CI, 19%-36%) and a specificity of 98% (95% CI, 96%-100%). In the other 5 palpation studies4,5,9,14,19 (Table 46-2), the examination for splenomegaly was performed as part of the study. Because the examiners knew that they were under scrutiny, it is not surprising that both their true-positive reports and false-positive reports of splenomegaly increased; that is, the overall sensitivity of palpation was higher and the specificity lower than in the 2 previously described studies that assessed the routine examination as recorded in clinical notes. One study9 compared percussion methods and palpation and demonstrated that the Castell method of percussion may be somewhat more sensitive than palpation (82% vs 71%) (Tables 46-1 and 46-2). Finally, if splenomegaly was declared when any of the 4 signs (2 for percussion and 2 for palpation) were positive, true-positive and false-positive declarations of splenomegaly increased because the increase in sensitivity to

CHAPTER 46

Splenomegaly

Table 46-2 Studies of the Accuracy of Palpation No. of Patients

Criterion Standard

17

47 217

Autopsy Scintigraphy18

99 32 100 65

Scintigraphy4 Operation19 Scintigraphy5 Scintigraphy9

118a

Ultrasonography14

Maneuver

Sensitivity, % (No.)

Based on Routine Examinations Recorded in Clinical Charts Physical examination 20 (3/15) Clinical impressions 28 (26/92) Overall 27 (29/107) Based on Specific Examinations Done as Part of the Study Palpation 57 (31/54) Palpability 59 (16/27) Supine 2-handed palpation 56 (47/84) Supine and right lateral 71 (12/17) decubitus palpation Supine palpation or Middleton 56 (53/94) maneuver Overall 58 (159/276)

Specificity, % (No.)

100 (32/32) 98 (122/125) 98 (154/157) 100 (45/45) 100 (5/5) 69 (11/16) 90 (43/48) 93 (140/151) 92 (244/265)

a

Each patient was examined by 1 to 3 examiners, for a total of 245 examinations.

88% (fewer large spleens missed) was accompanied by a decrease in specificity to 83% (more normal-sized spleens mistakenly called large). The final study14 evaluated the accuracy of bedside diagnostic maneuvers, using receiver operating characteristic curve analysis. This analytic technique evaluates the discriminating ability of different tests by comparing the true-positive rate (sensitivity) and false-positive rate (1 – specificity) of each test using different definitions of a positive test result (test thresholds). The discriminating ability refers to the probability of correctly selecting the patient with splenomegaly between 2 patients: one with an enlarged spleen and one with a normal spleen. A test with a discriminating ability of zero performs no better than chance alone, whereas a perfect test has a discriminating ability of 100%. In this study, supine palpation, right lateral decubitus palpation, and Middleton maneuver all demonstrated similar discriminating abilities (73%-79%). The discriminating ability of palpation and percussion was similar, although the test specificity of palpation appeared to be generally superior to percussion. The most important finding of this study was that palpation was a better discriminator among patients in whom percussion result was positive. (As might be expected, these patients have the largest spleens.) When percussion dullness was present, palpation discriminated correctly 87% of the time. However, if percussion was not dull, palpation was a poor discriminator (55%) or only slightly better than chance. This confirms that percussion and palpation should be used together because percussion dullness identifies a subset of patients in whom palpation is a useful test. If percussion dullness is absent, there is no need to palpate, because palpation is a poor test among such patients. Finally, this study also demonstrated that, given a clinical suspicion (the prior probability or disease prevalence) of splenomegaly before examining the patient of 10% to 90%, it is difficult to substantially decrease the likelihood of an enlarged spleen because the false-negative rate of bedside diagnosis was

28%, even if percussion and palpation results were negative. On the other hand, when a positive bedside examination result was defined as both percussion and palpation results being positive, the high test specificity of 97% significantly increased the likelihood of splenic enlargement to 60% or more.

IS SPLENOMEGALY RESULT EVER NORMAL? About 3% of otherwise healthy students entering a US college were found to have unexplained palpable spleens1 and, on incomplete follow-up, appeared to fare none the worse20; similarly, 12% of otherwise healthy postpartum women at a Canadian hospital had palpable spleens.21

THE BOTTOM LINE Guidelines for examining for splenic enlargement are summarized in Table 46-3. 1. Splenomegaly is uncommon but occurs in a wide variety of conditions. Given the low sensitivity of the clinical examination, it can be argued that the routine examination for splenomegaly cannot definitively rule in or rule out splenomegaly in normal, asymptomatic patients when the prevalence is less than 10% and additional imaging tests will be required. Rather, the examination for splenomegaly is most useful to rule in the diagnosis of splenomegaly among patients in whom there is a clinical suspicion of at least 10%. 2. The bedside examination of the spleen should start with percussion. If percussion is not dull, there is no need to palpate because the results of palpation will not effectively rule in or rule out splenic enlargement. If the possibility of missing splenic enlargement remains an important clinical concern, then ultrasonography or scintigraphy is indicated. In the presence of percussion dullness, palpation should follow. If both test results are positive, the diagnosis of splenomegaly is established (providing that the clinical suspicion 609

CHAPTER 46

The Rational Clinical Examination

Table 46-3 Guidelines for Examining for Splenic Enlargement Recommendations and Rationale Clinical Suspicion (Prior Probability) of Splenic Enlargement Less than 10% Percussion or palpation for splenomegaly of limited usefulness Maneuvers are not sufficiently sensitive to rule out splenomegaly Given the low pretest probability of splenomegaly, test specificity of clinical examinations is not sufficiently high to rule in splenic enlargement, even if both test results are positive 10% Or more Percussion and palpation can be used to rule in splenomegaly if both results are positive Percuss first, and if result is positive, then palpate If percussion result is negative but your clinical suspicion remains high, order ultrasonography because palpation in the presence of abdominal tympany is not specific enough to rule in splenomegaly If percussion result is positive but palpation result is negative, then ultrasonography is also needed to confidently evaluate spleen size To confidently rule out splenomegaly, a radiologic procedure is necessary because of the limited sensitivity of bedside examination

of splenomegaly was at least 10% before examination). If palpation result is negative, diagnostic imaging will be required to confidently rule in or rule out splenomegaly.

CLINICAL SCENARIO—RESOLUTION Returning to the 3 patients originally described at the beginning of this article, you may be able to confidently rule in splenic enlargement in the pale elderly women complaining of fatigue if your preexamination clinical suspicion of splenomegaly is at least 10% and if both percussion and palpation results are positive. Abdominal examination is not sufficiently sensitive to rule out splenic enlargement in the college student with symptoms of depression. Finally, you may choose to examine for splenic enlargement in the asymptomatic man with hypertension, but a negative examination result may be a false negative, and a positive examination result will require radiologic confirmation to rule in splenomegaly.

Author Affiliations at the Time of the Original Publication

Divisions of General Internal Medicine, Gastroenterology, and Clinical Epidemiology, Montreal General Hospital, and the Departments of Medicine and Epidemiology and Biostatistics, McGill University, Montreal, Quebec, Canada (Drs Grover and Barkun); and the Division of General Internal Medicine, Department of Medicine, and the Department of Clinical Epidemiology and Biostatistics of McMaster Univer-

610

sity and Henderson General Hospital, Hamilton, Ontario, Canada (Dr Sackett). Acknowledgments

The authors acknowledge, with thanks, helpful comments received from Roger Williams, MD, and Andreas Laupacis, MD. Dr Grover is a research scholar and Dr Barkun is a clinical scholar supported by the Fonds de la recherche en santé du Quebec.

REFERENCES 1. Ebaugh FG, McIntyre OR. Palpable spleens: ten-year follow-up. Ann Intern Med. 1979;90(1):130-131. 2. Myers J, Segal RJ. Weight of the spleen, I: range of normal in a nonhospital population. Arch Pathol. 1974;98(1):33-35. 3. Larson SM, Tuell SH, Moores KD, Nelp WB. Dimensions of the normal adult spleen scan and prediction of spleen weight. J Nucl Med. 1971;12 (3):123-126. 4. Westin J, Lanner SO, Larsson A, Weinfeld A. Spleen size in polycythemia: a clinical and scintigraphic study. Acta Med Scand. 1972;191(3):263-271. 5. Zhang B, Lewis SM. Use of radionuclide scanning to estimate size of spleen in vivo. J Clin Pathol. 1987;40(5):508-511. 6. Niederau C, Sonnenberg A, Mueller JE, Erckenbrecht JF, Scholten T, Fritsch WP. Sonographic measurements of the normal liver, spleen, pancreas, and portal vein. Radiology. 1983;149(2):537-540. 7. Koga T, Morikawa Y. Ultrasonographic determination of the splenic size and its clinical usefulness in various liver diseases. Radiology. 1975;115 (1):157-161. 8. Nixon RK Jr. The detection of splenomegaly by percussion. N Engl J Med. 1954;250(4):166-167. 9. Sullivan S, Williams R. Reliability of clinical techniques for detecting splenic enlargement. BMJ. 1976;2(6043):1043-1044. 10. Castell DO. The spleen percussion sign: a useful diagnostic technique. Ann Intern Med. 1967;67(6):1265-1267. 11. Talbott JE. A Biographical History of Medicine: Excerpts and Essays on the Men and Their Work. New York, NY: Grune & Stratton; 1970:594-595. 12. Barkun AN, Camus M, Meagher T, et al. Splenic enlargement and Traube’s space: how useful is percussion? Am J Med. 1989;87(5):562-566. 13. Shaw MT, Dvorak V. Palpation of slightly enlarged spleens. Lancet. 1973;1(7798):317. 14. Barkun AN, Camus M, Green L, et al. The bedside assessment of splenic enlargement. Am J Med. 1991;91(5):512-518. 15. Espinoza P, Ducot B, Pelletier G, et al. Interobserver agreement in the physical diagnosis of alcoholic liver disease. Dig Dis Sci. 1987;32(3):244247. 16. Blendis LM, McNeilly WJ, Sheppard L, Williams R, Laws JW. Observer variation in the clinical and radiological assessment of hepatosplenomegaly. BMJ. 1970;1(5698):727-730. 17. Riemenschneider PA, Whalen JP. The relative accuracy of estimation of enlargement of the liver and spleen by radiologic and clinical methods. Am J Roentgenol. 1965;94:462-468. 18. Halpern S, Coel M, Ashburn W, et al. Correlation of liver and spleen size: determinations by nuclear medicine studies and physical examination. Arch Intern Med. 1974;134(1):123-124. 19. Ingeberg S, Stockel M, Sorensen PJ. Prediction of spleen size by routine radioisotope scintigraphy. Acta Haematol. 1983;69(4):243-248. 20. McIntyre RR, Ebaugh FG Jr. Palpable spleens in college freshmen. Ann Intern Med. 1967;66(2):301-306. 21. Berris B. The incidence of palpable liver and spleen in the postpartum period. CMAJ. 1966;95(25):1318-1319.

U P D A T E : Splenomegaly

46

Prepared by Alan N. Barkun, MD, and Steven A. Grover, MD Reviewed by Andrew Muir, MD

CLINICAL SCENARIO A 34-year-old man has complained of fatigue and abdominal pain. He presents to the emergency department with vague abdominal pain and fever. The medical history is also that of intermittent sweats and some weight loss. Your examination reveals diffuse adenopathy. Traube space is dull to percussion. You decide to try to palpate the spleen edge but, despite spending a few minutes examining the patient while he is supine and then while he is on his side, you decide that you cannot feel the spleen. According to your findings, how confident should you be that the spleen is not enlarged?

Original Review Grover SA, Barkun AN, Sackett DL. The rational clinical examination: does this patient have splenomegaly? JAMA. 1993;270(18):2218-2221.

UPDATED LITERATURE SEARCH Our literature search used the parent search strategy for The Rational Clinical Examination series, combined with the subject “exp splenomegaly,” published in English from 1991 to 2004, and articles that referred to the original review. The results yielded 136 articles, for which we reviewed the titles and abstracts. We found 5 articles suitable for review, although 1 was a duplicate publication. Of the remaining 4 studies, 3 were selected because they had prospective evaluation of patients for splenomegaly, with both sensitivity and specificity data collected independent of an ultrasonograph used as the reference standard test. One of the studies had information on the interobserver variability of examination techniques. We also identified 1 study of the sensitivity of the examination for splenomegaly in athletes.

SUMMARY OF NEW FINDINGS • Palpation might have greater accuracy than percussion, especially in lean patients. However, assessment of palpation performance may be biased because in a number of studies, palpation followed percussion maneuvers.

• Examiners should become proficient in 1 palpation method and 1 percussion method because the combination of both results may be better than either alone.

Details of the Update A study from Brazil suggested that combining the results of 2 examiners’ palpation findings (presence of a palpable spleen, presence of a spleen felt more than 4 cm below the costal margin)1 gave good results. When both physicians palpated the spleen, the likelihood ratio (LR) of splenomegaly increased (LR, 7.6; 95% confidence interval [CI], 4.5-12); when neither physician palpated the spleen, the likelihood of splenomegaly decreased (LR, 0.31; 95% CI, 0.14-0.56). The inference is that having a colleague confirm your findings for splenomegaly might be useful. Two studies from India suggested that palpation maneuvers may have better accuracy for diagnosing splenomegaly than percussion techniques.2,3 However, the order of the maneuvers was not stated, and by western standards the patients were of small stature and size (as measured by body mass index). Furthermore, in one of the studies, false-negative results for Traube space percussion were significantly higher in smaller patients.3 Therefore, although palpation may perform better than percussion in lean patients, we do not know whether the same test characteristics apply to patients with larger body mass.4 The clinical utility appears enhanced when the results of both percussion and palpation are considered but should be confirmed in other studies in which the order of the examination is specified.3 A study performed in a convenience sample of patients with confirmed or suspected human immunodeficiency virus (HIV) suggested that 3 palpation maneuvers and 3 percussion maneuvers were relatively insensitive but had better specificity,5 supporting the findings of the original Rational Clinical Examination article. Although the study sample was small (27 patients), a unique feature of the evaluation was that there were 8 observers, allowing a comparison between observers and an assessment to see whether the various maneuvers performed similarly. However, they also noted significant interobserver variability that did not depend on the years of medical practice. The poor reliability was evident in the broad range of individual assessors’ sensitivity and specificity values. The sensitivity of each of the tests seemed 611

CHAPTER 46

Update

to improve with the length of the trial, but the overall accuracy of all the findings was low. Because the evaluation of such a large number of individual findings may have lacked independence, and because the total number of patients was so small, we did not combine these results with other studies. The presence of splenomegaly in athletes (often caused by mononucleosis) creates a diagnostic dilemma for clinicians who must decide when the splenomegaly has resolved so that the athlete can return to sports participation. A study of 29 athletes with splenomegaly (length, 12.5-15.5 cm) documented by ultrasonography (normal length < 12 cm), showed that the clinician could detect the spleen in only 17%.6 Many athletes have well-developed abdominal musculature, which makes palpation for splenomegaly even more difficult.

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION A reappraisal of the original publication showed that CIs around the signs would help understanding of their potential importance.7 We used the original data, in addition to data from newer articles, to create random-effects summary estimates for the LRs. In addition, we used the diagnostic odds ratios to assess whether the overall accuracy for some maneuvers might be better than others. We used only data from studies that used ultrasonography as the reference standard test.

Table 46-4 Likelihood Ratios of Percussion and Palpation Maneuvers for Splenomegaly Maneuver (No. of Combined Studies)

Nixon sign (1) Percussion of Traube space (3) Castell sign (1) Supine, 1-handed palpation (4) Middleton hooking maneuver (1)

LR+ (95% CI)

LR– (95% CI)

DOR (95% CI)

Percussion Maneuvers 3.6 (1.8-7.3) 0.41 (0.26-0.64) 8.9 (3.1-25) 2.3 (1.8-2.9) 0.48 (0.39-0.60) 4.8 (3.2-7.3) 1.2 (0.98-1.6) 0.45 (0.19-1.1) 2.8 (0.92-8.3) Palpation Maneuvers 8.2 (5.8-12) 0.41 (0.30-0.57) 22 (13-38) 6.5 (3.1-15) 0.16 (0.08-0.32)

40 (11-138)

Abbreviations: CI, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

(Table 46-4). Supine one-handed palpation has been the most widely studied palpation maneuver, which increases the confidence in the results (Table 46-4).

EVIDENCE FROM GUIDELINES No federal guidelines discuss the assessment of splenomegaly by using physical examination.

CHANGES IN THE REFERENCE STANDARD Although radiologic studies have suggested the possible use of competing technologies, such as nuclear scan and specialized computed tomography (CT) examinations, the most widely recognized and available gold standard remains ultrasonography. All articles assessing the utility of clinical examination maneuvers in the detection of splenomegaly published in the past 13 years used ultrasonography as the reference standard for the diagnosis of splenomegaly (a length of 12 or 13 cm).

RESULTS OF LITERATURE REVIEW Percussion using the Nixon method (Figure 46-2) or Traube’s space (Figure 46-3) works best for detecting splenomegaly

612

CLINICAL SCENARIO—RESOLUTION This patient may have a viral or myeloproliferative syndrome, so you have a good reason to assess for splenomegaly. The physical examination results seem contradictory. You have percussed dullness (which increases the likelihood of splenomegaly), but you cannot palpate the splenic tip (which decreases the likelihood of splenomegaly). The percussion findings have a lower accuracy than the palpation signs (as suggested by the diagnostic odds ratios). You decide you need to know whether the patient has splenomegaly, so you must proceed to additional testing with ultrasonography or a CT scan.

CHAPTER 46

Splenomegaly

SPLENOMEGALY—MAKE THE DIAGNOSIS During the general physical examination, patients should not be evaluated for splenomegaly.

PRIOR PROBABILITY The prevalence of palpable splenomegaly in an otherwise healthy student population is low, approximating 3%8; 12% of normal postpartum women had palpable spleens.9 The prevalence of splenomegaly increases significantly among other selected populations, such as HIV patients (up to 66%10), or in areas in which schistosomiasis is prevalent.11

POPULATION FOR WHOM THE PHYSICAL EXAMINATION OF SPLENOMEGALY SHOULD BE SOUGHT • Suspected or proven viral illness, lymphoproliferative disorder, or malignancy • Cirrhosis • Suspected portal hypertension • Suspected or proven malaria • Connective tissue disorders associated with splenomegaly

start with Traube space percussion, followed, if dull, by supine 1-handed palpation (Table 46-5). These maneuvers have received more extensive evaluation than other maneuvers, allowing us greater confidence in the findings. Middleton maneuver, in which the physician stands to the left of the patient and hooks the examining hand under the ribs, may work as well. Palpation may be superior to percussion, especially in lean patients. When it remains important not to miss splenomegaly, imaging will be necessary because the clinical examination does not provide sufficient clinical certainty. Table 46-5 Summary Likelihood Ratios for Palpation to Detect Splenomegaly and Percussion of Traube Space Test (No.)

LR+ (95% CI)

LR– (95% CI)

DOR (95% CI)

Supine 1-handed palpation (4 studies) Percussion of Traube space (3 studies)

8.2 (5.8-12)

0.41 (0.30-0.57) 0.48 (0.39-0.60)

22 (13-38)

2.3 (1.8-2.9)

4.8 (3.2-7.3)

Abbreviations: CI, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

DETECTING SPLENOMEGALY In cases in which splenomegaly is questioned, the clinical examination is more specific than sensitive and is best used when ruling in the diagnosis among patients for whom the suspicion is at least 10%. Moreover, the examination should

REFERENCES FOR THE UPDATE 1. Gerspacher-Lara R, Pinto-Silva RA, Serufo JC, Rayes AA, Drummond SC, Lambertucci JR. Splenic palpation for the evaluation of morbidity due to schistosomiasis mansoni. Mem Inst Oswaldo Cruz. 1998;93(suppl 1):245-248.a 2. Chongtham DS, Singh MM, Kalantri SP, Pathak S. Accuracy of palpation and percussion manoeuvres in the diagnosis of splenomegaly. Indian J Med Sci. 1997;51(11):409-416.a 3. Dubey S, Swaroop A, Jain R, Verma K, Garg P, Agarwal S. Percussion of Traube’s space—a useful index of splenic enlargement. J Assoc Physicians India. 2000;48(3):326-328.a 4. Barkun AN, Camus M, Green L, et al. The bedside assessment of splenic enlargement. Am J Med. 1991;91(5):512-518. 5. Tamayo SG, Rickman LS, Mathews WC, et al. Examiner dependence on physical diagnostic tests for the detection of splenomegaly: a prospective study with multiple observers. J Gen Intern Med. 1993;8(2):69-75.a 6. Dommerby H, Stangerup SE, Stangerup M, et al. Hepatosplenomegaly in infectious mononucleosis, assessed by ultrasonic scanning. J Laryngol Otol. 1986;100(5):573-579.

REFERENCE STANDARD TESTS Ultrasonography, CT, nuclear liver-spleen imaging.

7. Grover SA, Barkun AN, Sackett DL. The rational clinical examination: does this patient have splenomegaly? JAMA. 1993;270(18):22182221. 8. Larson SM, Tuell SH, Moores KD, Nelp WB. Dimensions of the normal adult spleen scan and prediction of spleen weight. J Nucl Med. 1971;12 (3):123-126. 9. Berris B. The incidence of palpable liver and spleen in the postpartum period. CMAJ. 1966;95(25):1318-1319. 10. Furrer H. Prevalence and clinical significance of splenomegaly in asymptomatic human immunodeficiency virus type 1-infected adults: Swiss HIV Cohort Study. Clin Infect Dis. 2000;30(6):943-945. 11. Lambertucci JR, Cota GF, Pinto-Silva RA, et al. Hepatosplenic schistosomiasis in field-based studies: a combined clinical and sonographic definition. Mem Inst Oswaldo Cruz. 2001;96(suppl):147-150.

a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

613

This page intentionally left blank

46

EVIDENCE TO SUPPORT THE UPDATE: Splenomegaly

MAIN RESULTS TITLE Accuracy of Palpation and Percussion Maneuvers in the Diagnosis of Splenomegaly. AUTHORS Chongtham DS, Singh MM, Kalantri SP, Pathak S. CITATION Indian J Med Sci. 1997;51(11):409-416. QUESTION What are the sensitivity and specificity of palpation and percussion maneuvers in diagnosing splenomegaly? DESIGN Prospective, independent comparison of nonconsecutive cases. SETTING Medical ward at Katsurba Hospital, India.

The prevalence of splenomegaly was 52% (42/80). Mean splenic size was 15 cm among those with splenomegaly and 9.9 cm among those without enlargement. The likelihood ratios for the maneuvers are shown in Table 46-6. Receiver operating curve (ROC) analyses showed a progressive decline in sensitivity from 98% to 50% as the palpation threshold progressed from 1 to 4 (increasing certainty of feeling a spleen), whereas specificity increased from 58% to 95%. Nixon percussion maneuver was correlated with splenic size. The ROC area under the curve for varying thresholds on Traube space percussion was 0.74.

CONCLUSIONS LEVEL OF EVIDENCE Level 3.

PATIENTS Eighty hospitalized patients (37 female patients) in a general medical ward. Exclusions were patients with left-sided pleural effusion, history of ascites, or splenomegaly. Mean age was 31.5 years, and weight was 45 ± 8 kg.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The test performance characteristics of Traube space percussion (Barkun et al1), and the percussion maneuvers of Castell2 and Nixon3 were evaluated at various percussion note thresholds (1 = definitely tympanitic, 2 = probably tympanitic, 3 = uncertain, 4 = probably dull, 5 = definitely dull1). Supine palpation and Middleton palpation maneuver 4 were also assessed on a 5-point scale (1 = spleen definitely not palpable, 2 = spleen probably not palpable, 3 = uncertain, 4 = spleen probably palpable, and 5 = spleen definitely palpable, as previously suggested5). The assessments were carried out by a physician blinded to the patient’s clinical history and laboratory results. The examination was carried out before or at least 2 hours after the patient had eaten. Ultrasonography was performed by an independent operator within 24 hours of the clinical examination.

STRENGTHS Independent assessment of well-defined physi-

cal examination maneuvers. Table 46-6 Likelihood Ratios of Palpation and Percussion Maneuvers for Splenomegaly Test

Sensitivity, Specificity, LR+ % % (95% CI)

LR– (95% CI)

DOR (95% CI)a

0.23 (0.13-0.40)

43 (11-163)

6.5 0.16 (3.1-15) (0.08-0.32)

40 (11-138)

Palpation Maneuvers 92 10 (3.7-29)

Supine 1handed palpation Middleton hooking palpation maneuver

79

67

MAIN OUTCOME MEASURES

Nixon percussion Traube space percussion Castell percussion

The sensitivity and specificity of the various maneuvers were described. A spleen was considered enlarged if greater than 13 cm on ultrasonography.

Abbreviations: CI, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aDOR calculated from data provided in the article.

86

76

86

87

Percussion Maneuvers 82 3.6 (1.8-7.3) 63 2.1 (1.4-3.3) 32

0.41 (0.26-0.64 0.38 (0.20-0.66)

1.2 0.45 (0.98-1.6) (0.19-1.1)

8.9 (3.1-25) 5.5 (2.1-14) 2.8 (0.92-8.30)

E46-1

CHAPTER 46

Evidence to Support the Update

LIMITATIONS The order in which the examinations were per-

formed is not described. Furthermore, the generalizability may be questioned, considering the patient population characteristics (the mean Quetelet index of the studied patients was low by western standards, at 17.8 ± 2.6 kg/m2). The overall prevalence of splenomegaly suggests that this population may differ considerably from others or that there may have been some selection bias. The palpation maneuvers appeared to perform appreciably better than percussion methods, as evidenced by the high diagnostic odds ratios. We do not know whether “leanness” as evidenced by a low body mass index creates a bias that favors palpation or percussion.

REFERENCES FOR THE EVIDENCE 1. Barkun AN, Camus M, Meagher TW, et al. How useful is Traube’s space percussion in assessing splenic enlargement? Am J Med. 1989;87(5):562-566. 2. Castell DO. The spleen percussion sign: a useful diagnostic technique. Ann Intern Med. 1967;67(6):1265-1367. 3. Nixon RK Jr. The detection of splenomegaly by percussion. N Engl J Med. 1954;250(4):166-167. 4. Shaw MT, Dvorak V. Palpation of slightly enlarged spleens. Lancet. 1973;1(7798):317. 5. Barkun AN, Camus M, Green L, et al. The bedside assessment of splenic enlargement. Am J Med. 1991;91(5):512-518.

Abdominal palpation was performed with patients in the decubitus position during deep inspiration, by 2 independent physicians in a blinded fashion. The greatest distance between the splenic border and the costal margin was also independently measured by the examiners.

MAIN OUTCOME MEASURES The 2 examination maneuvers were considered positive for splenomegaly when 1. the spleen was palpable by both examiners; and 2. the distance between the splenic border and the costal margin was greater than 4 cm, as measured by both examiners. Splenomegaly was defined as a splenic length greater than 120 mm by ultrasonography. Only patients aged 18 years or older were included in the categorization of splenic enlargement because of the lack of widely accepted quantitative criteria in children.

MAIN RESULTS

TITLE Splenic Palpation for the Evaluation of Morbidity Due to Schistosomiasis Mansoni.

The prevalence of splenomegaly in this patient population was 7%. A spleen was palpated by both physicians in 37 cases (discordance between examiners occurred in 5 cases). Mean splenic lengths in patients with and without palpable spleen were 10.4 cm and 7.1 cm, respectively (P < .001). Table 46-7 shows the likelihood ratios for the results where a positive test required agreement between the examining physicians.

AUTHORS Gerspacher-Lara R, Pinto-Silva RA, Serufo JC, Rayes AAM, Drummond SC, Lambertucci JR.

CONCLUSIONS

Reviewed by Alan N. Barkun, MD

CITATION Mem Inst Oswaldo Cruz (Rio de Janeiro). 1998;93(suppl I):245-248. QUESTION What are the reliability and validity of 2 methods of palpation in detecting ultrasonographically identified splenomegaly? DESIGN Prospective assessment of 2 near-complete communities with an independent assessment by ultrasonography. SETTING Two Brazilian rural communities. PATIENTS The study population was recruited from 551 individuals (92% of the local population) from Queixadinha, in the district of Caraí, located in the northeast of the State of Minas Gerais, Brazil, an area known to be highly endemic for schistosomiasis. An additional 517 individuals (89% of the total population) were recruited from Capão, a rural community in the district of Presidente Juscelino in the center of the state, where, for unknown reasons, transmission of schistosomiasis probably does not occur and in which other tropical diseases that can cause splenomegaly have never been identified. E46-2

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD

LEVEL OF EVIDENCE Level 1. STRENGTHS The study used a sound design and an

accepted gold standard. LIMITATIONS The methods of palpation are not adequately

described. The results are interesting in that a “positive” result required 2 examiners’ agreement. This suggests that clinicians

Table 46-7 Likelihood Ratio for the Presence of Palpable Splenomegaly vs the Presence of a Large Palpable Spleena Test Palpable spleen Distance between splenic border and costal margin > 4 cm

LR+ Sensitivity Specificity (95% CI) 0.72

0.91

0.28

0.98

LR– (95% CI)

7.6 0.31 (4.5-12) (0.14-0.56) 14 0.74 (4.6-41) (0.50-0.89)

DOR (95% CI) 25 (8.5-73) 19 (5.2-69)

Abbreviations: CI, confidence interval; DOR, diagnostic odds ratio; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aThe test was considered positive only when both examiners agreed on the finding.

CHAPTER 46 might have better accuracy when they ask for a second opinion about the palpation findings.

Table 46-8 Likelihood Ratios for Percussion, Palpation, and a Combination of the 2 Findings for Splenomegaly

Reviewed by Alan N. Barkun, MD Test

Sensitivity, %

Specificity, %

LR+ (95% CI)

LR– (95% CI)

67

75

44

97

2.7 (1.7-4.3) 14 (3.5-58)

0.44 (0.27-0.68) 0.57 (0.43-0.77)

Traube space percussion Palpation

TITLE Percussion of Traube’s Space—A Useful Index of Splenic Enlargement. AUTHORS Dubey S, Swaroop A, Jain R, Verma K, Garg P, Agarwal S. CITATION J Assoc Physicians India. 2000;48(3):326-328. QUESTION Is Traube space percussion useful in assessing splenic enlargement? DESIGN Prospective, nonconsecutive patients, with an independent assessment by ultrasonography. SETTING An Indian University hospital.

Splenomegaly

Palpation and percussion Both positive One positive Both negative

14 (0.85-245) 2.9 (2.0-4.4) 0.18 (0.10-0.33)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS The study uses, overall, a sound design and

PATIENTS One hundred patients were medical inpatients.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD After Traube space percussion,1 the findings were labeled as tympanitic (resonant) or dull. Dullness to percussion is an abnormal finding that suggests splenomegaly. In addition, the spleen was also palpated with the patient positioned in the supine and right lateral decubitus positions. The clinician assessed the spleen as palpable or not palpable. Each patient was subsequently sent for ultrasonography. Splenomegaly was defined as a splenic longitudinal measurement of 12 cm or more on ultrasonography.

MAIN OUTCOME MEASURES Sensitivity and specificity.

an accepted gold standard. LIMITATIONS The exact sequence of physical examination

maneuvers is unclear. Therefore, a comparison of the performance of one technique to another cannot be done precisely, because we do not know the order in which the maneuvers were done. However, the maximal clinical utility appeared to be achieved when both percussion and palpation were considered. These results in a different patient population appear to confirm the findings that Traube space percussion and palpation are useful for identifying splenomegaly. The authors did not report a standardized order of percussion and then palpation or vice versa. However, in considering the results of the findings, clinicians can conclude that both maneuvers should be performed. The presence of dullness to percussion confirms the importance of a palpable spleen. The absence of dullness to percussion modulates the importance of a palpable spleen and suggests that for some patients either the spleen is palpable, though not enlarged, or the examiner’s findings of a palpable spleen are in error. If neither finding is present, the likelihood of splenomegaly is decreased.

MAIN RESULTS The prevalence of splenomegaly in this patient population was 36%. The splenic lengths among patients with ultrasonographically diagnosed splenomegaly were 13.1 ± 0.96 cm vs 9.42 ± 1.06 cm for those without splenic enlargement. The results of Traube space percussion are shown in Table 46-8. The Quetelet index (a measure of body size) was higher among patients who had false-negative findings. The diagnostic odds ratios (DORs), calculated from data provided in article, suggest that palpation (DOR, 25; 95% confidence interval [CI], 5.2-117) might be more accurate than percussion (DOR, 6.0; 95% CI, 2.4-15).

REFERENCE FOR THE EVIDENCE 1. Barkun AN, Camus M, Meagher TW, et al. How useful is Traube’s space percussion in assessing splenic enlargement? Am J Med. 1989;87(5):562566.

Reviewed by Alan N. Barkun, MD

E46-3

This page intentionally left blank

47

C H A P T E R

Does This Patient Have Strep Throat? Mark H. Ebell, MD Mindy A. Smith, MD Henry C. Barry, MD Kathy Ives, BS Mark Carey, BS

CLINICAL SCENARIOS In each of the following cases, the physician must decide whether the patient has group A β-hemolytic streptococcal pharyngitis (strep throat). In case 1, a 7-year-old boy presents in March without a cough but with 1 day of sore throat accompanied by fever, headache, moderate cervical adenopathy, and a markedly exudative and erythematous pharynx. His brother was recently diagnosed as having strep throat. In case 2, a 16-year-old presents with severe sore throat and anterior adenopathy for 3 days but no tonsillar enlargement, exudate, fever, or cough. In case 3, a 42-year-old woman presents with 5 days of sore throat and cough but no adenopathy, tonsillar enlargement, recent exposure to strep, or exudate.

WHY IS THE DIAGNOSIS IMPORTANT? The 1995 National Ambulatory Medical Care Survey1 found that sore throat is the third most common presenting complaint in office-based practice, accounting for 4.3% of visits. Sore throat is usually caused by direct infection of the pharyngeal tissue (pharyngitis). The differential diagnosis of pharyngitis is summarized in Table 47-1. Sore throat can also be caused by conditions such as gastroesophageal reflux disease, acute thyroiditis, persistent cough, and postnasal drainage because of allergic rhinitis or sinusitis. However, reliable estimates for the likelihood of these conditions among patients with sore throat are not available. Untreated group A β-hemolytic streptococcal pharyngitis typically lasts 8 to 10 days. Patients are infectious during the period of acute illness and for approximately 1 week after. Antibiotic treatment decreases the severity of symptoms, reduces their duration by approximately 1 day,5 reduces the risk of transmission to others after 24 hours of treatment, and reduces the likelihood of suppurative complications and rheumatic fever.6 Suppurative complications include peritonsillar abscess (occurring in 4 y in a British general practice 670 US children in original study, 892 children in validation study37

Area under the ROC curve 0.79 (good accuracy) 71% Sensitive, 71% specific

Successfully validated in 3 new adult populations34-36 No prospective validation, relies on one physician’s examination skills LR+ = 2, LR– = 0.75 in pediat- The lowest-risk group still has a 6%32 ric validation study37 to 16%37 risk of strep throat; similar results in study of adults and children21 1237 US adult patients in 2 Area under the ROC curve 0.70 No prospective validation emergency departments and to 0.74, depending on setting; 189 patients at a student health 85% sensitive and 42% specific service 7-Item score, includ- 1783 Patients in a Danish gen- LR+ = 1.3, LR– = 0.2 Rule has low specificity (26%) ing age eral practice 6-Item additive score 693 US adult outpatients Results presented as nomoNot prospectively validated gram only Algorithm based on 4 418 US adults with sore throat High risk = 28% strep; moderate In a prospective study in adults and signs and symptoms at an HMO outpatient clinic risk = 15% strep; low risk = 4% children by Crawford et al,23 23% of (see Figure 47-1) strep high-risk patients, 12% of moderaterisk patients, and 3% of low-risk patients had a positive throat culture result for strep Algorithm based on 4 133 Norwegian adults and chil- High risk = 62% strep; moderate Not prospectively validated signs and symptoms dren with sore throat risk = 34% strep; low risk = 10% strep Algorithm based on 4 621 Canadian adults and chil- Risk stratified into 5 levels, from Good prospective validation in primary signs and symptoms dren >3 y old and presenting to 1%-51% care practices, including both children and patient age family physicians and adults

Abbreviations: HMO, health maintenance organization; LR+, positive likelihood ratio; LR–, negative likelihood ratio; ROC, receiver operating characteristic curve.

Table 47-4 summarizes previous efforts to develop or validate clinical prediction rules for the diagnosis of strep throat. One of the best validated is a simple 4-item clinical prediction rule developed by Centor et al.28 The Centor score has been validated in 3 distinct adult populations34-36 and considers 4 signs and symptoms: tonsillar exudate, swollen tender anterior cervical nodes, absence of cough, and a history of fever. The rule is accurate, with an area under the ROC curve of 0.79. One point is assigned for each of the patient’s signs and symptoms, and the sum is used to determine the likelihood of strep throat (Figure 47-1).28 The presence of 3 or 4 findings increases the probability of strep throat. For example, a patient with a pretest probability of 10% and a Centor score of 4 would have a 41% probability of strep throat. Patients with none or 1 of the cardinal findings have a very low risk of strep throat, and it may be appropriate to forgo testing or treatment in this group. The Centor clinical prediction rule has not been validated in younger patients. Recently, McIsaac et al37 have modified Centor’s score and validated it prospectively in a mixed group of adults and children (Figure 47-2). Another rule, developed by Walsh et al,19 has been validated prospectively in a mixed population of adults and children (Figure 47-3).

The Breese score has been prospectively validated in a large pediatric population.32,38,39 However, a low Breese score does not rule out strep: 14% of children with a score of 20

1. Assign 1 point for each of the following clinical characteristics: (1) history of fever, (2) anterior cervical adenopathy, (3) tonsillar exudate, and (4) absence of cough. 2. Find the column that most closely matches the pretest probability of strep in the patient and look down the column to the row that matches the patient's number of points to determine the probability of strep. Pretest Probability of Strep Throat, % Points, No.

Likelihood Ratio

5 10 15 20 25

40 50

0

0.16

1

2

2

3

5

10 14

1

0.3

2

3

5

7

9

17 23

2

0.75

4

8 12 16 20

33 43

3

2.1

10 19 27 34 41

58 68

4

6.3

25 41 53 61 68

81 86

Figure 47-1 Centor Clinical Prediction Rule for the Diagnosis of Strep Throat in Adults

619

CHAPTER 47

The Rational Clinical Examination

or less had a positive throat culture result. In addition, it requires the results of a white blood cell count, which may not be immediately available in many outpatient practices.

1. Add Up Points for Patient Symptom or Sign

Case 1 describes a child with a high likelihood (51%) of streptococcal pharyngitis according to the McIsaac clinical rule (Figure 47-2). In fact, the likelihood of strep throat is probably even higher because of his recent exposure. The physician might wish to treat without further diagnostic confirmation. Children with a low or intermediate probability of strep and a negative rapid antigen test result should still have a backup throat culture. In case 2, an adolescent has a pretest probability (estimate, 15%) falling between that of adults and children. In this age group, infectious mononucleosis is also a relatively common cause of sore throat. Assuming a pretest probability of 15% and 2 points on the Centor score, he has a 12% probability of strep throat (Figure 47-1). The physician should decide whether to recommend a rapid antigen test to clarify the need for treatment. Newer rapid tests have a sensitivity (85%) and specificity (93%) close to that of throat culture.40 If a patient with a 12% probability of strep throat has a negative rapid test result, the likelihood of strep decreases to only 2%, whereas if the results are positive, it increases to 62%.

Points

History of fever or measured temperature > 38˚C Absence of cough Tender anterior cervical adenopathy Tonsillar swelling or exudates Age < 15 y Age ≥ 45 y

1 1 1 1 1 –1

2. Find Risk of Strep Likelihood Points Ratio –1 or 0 1 2 3 4 or 5

CLINICAL SCENARIOS—RESOLUTIONS

0.05 0.52 0.95 2.5 4.9

% With Strep (Patients With Strep/Total) 1 (2/179) 10 (13/134) 17 (18/109) 35 (28/81) 51 (39/77)

Figure 47-2 McIsaac Modification of the Centor Strep Score Data from a group of 167 children aged 3 years or older and 453 adults in Ontario, Canada. Baseline risk of strep 17% in this population. Reprinted with permission from McIsaac et al.37

Walsh Diagnostic Algorithm Adults with sore throats

Enlarged or tender cervical nodes?

Yes

Pharyngeal exudates?

Yes

High risk

Yes

High risk

Yes

Oral temperature ≥ 38.3˚C?

No

No

Recent exposure to strep?

No

Recent cough?

No

No

Moderate risk

Low risk

Validation Results % Strep by Culture

Figure 47-3 Walsh Algorithm for Evaluating Cases of Adults With Sore Throats

620

Risk Group

Original19

Validation23

High Moderate Low

28 15 04

23 12 03

Yes

Moderate risk

CHAPTER 47

Finally, the woman in case 3 has none of the cardinal characteristics of strep throat in the Centor score and the Walsh algorithm and therefore has a low (2%-3%) probability of strep throat. It may be appropriate to reassure this patient that strep throat is unlikely and to consider nonbacterial causes of sore throat. Some would argue that a throat culture is always necessary and that treatment should be delayed until culture results become available.41 However, this approach ignores the accuracy of rapid antigen tests, particularly when used in tandem with an accurate assessment of the pretest probability with a clinical score. A recent study of 30000 episodes of sore throat found that changing from a policy encouraging throat culture to one encouraging the use of a rapid antigen test only decreased the percentage of patients with sore throat receiving a culture from 65% to 13%, without an adverse increase in suppurative complications.42

THE BOTTOM LINE This study further confirms that no single element of the medical history or physical examination is powerful enough to confirm the probability of streptococcal pharyngitis. Instead, physicians should consider a combination of findings, including tonsillar exudate, tender or enlarged anterior cervical nodes, the absence of cough, and a history of fever (or measured temperature >38°C). A rational approach to therapy integrates these findings with the patient’s age and the clinical setting, the information from Figures 47-1 and 47-2, the results of rapid antigen testing or throat culture, and the clinician’s own judgment. Author Affiliations at the Time of the Original Publication

Michigan State University, East Lansing (Drs Ebell, Smith, and Barry, Ms Ives, and Mr Carey); First Consulting Group, Okemos, Michigan (Ms Ives). Acknowledgments

This work was supported by the Michigan Consortium for Family Practice Research, a research center funded by the American Academy of Family Physicians. We acknowledge the assistance and detailed feedback of Peter Blomgren, MD, Matthew Gillman, MD, and David L. Simel, MD, MHS. We also thank Robert Centor, MD, for his assistance with the calculation of the area under the receiver operating characteristic curve in his clinical prediction rule.

REFERENCES 1. National Center for Health Statistics. 1995 National Ambulatory Medical Care Survey [CD-ROM Series 13, No. 11]. Hyattsville, MD: National Center for Health Statistics; 1995. 2. Komaroff A, Aronson MD, Pass TM, Ervin CT, Branch WT. Serologic evidence of chlamydial and mycoplasmal pharyngitis in adults. Science. 1983;222(4626):927-928. 3. Hoffman S. An algorithm for a selective use of throat swabs in the diagnosis of group A streptococcal pharyngo-tonsillitis in general practice. Scand J Prim Health Care. 1992;10:295-300.

Streptococcal Pharyngitis

4. Seppala H, Lahtonen R, Ziegler T, et al. Clinical scoring system in the evaluation of adult pharyngitis. Arch Otolaryngol Head Neck Surg. 1993;119(3):288-291. 5. Denny FW. Effect of treatment on streptococcal pharyngitis: is the issue really settled? Pediatr Infect Dis. 1985;4(4):352-354. 6. Snellman L, Stang H, Stang J, et al. Duration of positive throat cultures for group A streptococci after initiation of antibiotic therapy. Pediatrics. 1993;91(6):1166-1170. 7. Dajani A, Taubert K, Ferrieri P, et al. Treatment of acute streptococcal pharyngitis and prevention of rheumatic fever: a statement for health professionals. Pediatrics. 1995;96(4 pt 1):758-764. 8. Centers for Disease Control and Prevention. Summary of notifiable diseases, United States, 1997. Morb Mortal Wkly Rep CDC Surveill Summ. 1998;46(54):1-87. 9. McIsaac WJ, Goel V. Effect of an explicit decision-support tool on decisions to prescribe antibiotics for sore throat. Med Decis Making. 1998;18(2):220228. 10. Tompkins RK, Burnes DC, Cable WE. An analysis of the cost-effectiveness of pharyngitis management and acute rheumatic fever prevention. Ann Intern Med. 1977;86(4):481-492. 11. Pauker SG, Kassirer J. The threshold approach to clinical decision-making. N Engl J Med. 1980;302(20):1109-1117. 12. Bisno AL. Streptococcus pyogenes. In: Mandell GL, Bennett JE, Dolin R, eds. Principles and Practice of Infectious Diseases. 4th ed. New York, NY: Churchill-Livingstone; 1995. 13. Gunnarsson RK, Holm SE, Soderstrom M. The prevalence of betahaemolytic streptococci in throat specimens from healthy children and adults: implications for the clinical value of throat cultures. Scand J Prim Health Care. 1997;15(3):149-155. 14. Holleman DR, Simel DL. Does the clinical examination predict airflow limitation? JAMA. 1995;273(4):313-319. 15. Steinhoff MC, Khalek MK, Khallaf N, et al. Effectiveness of clinical guidelines for the presumptive treatment of streptococcal pharyngitis in Egyptian children. Lancet. 1997;350(9082):918-921. 16. Kaplan EL, Top FH, Dudding BA, Wannamaker LW. Diagnosis of streptococcal pharyngitis: differentiation of active infection from the carrier state in the symptomatic child. J Infect Dis. 1971;123(5):490-501. 17. Stillerman M, Bernstein SH. Streptococcal pharyngitis: evaluation of clinical syndromes in diagnosis. Am J Dis Child. 1961;101:476-489. 18. Komaroff AL, Pass TM, Aronson MD, et al. The prediction of streptococcal pharyngitis in adults. J Gen Intern Med. 1986;1(1):1-7. 19. Walsh BT, Bookheim WW, Johnson RC, Tompkins RK. Recognition of streptococcal pharyngitis in adults. Arch Intern Med. 1975;135(11):14931497. 20. Kljakovac M. Sore throat presentation and management in general practice. N Z Med J. 1993;106(963):381-383. 21. Reed BD, Huck W, French T. Diagnosis of group A beta-hemolytic streptococcus using clinical scoring criteria, Directigen 1-2-3 group A streptococcal test, and culture. Arch Intern Med. 1990;150(8):1727-1732. 22. McIsaac WJ, White D, Tannenbaum D, Low DE. A clinical score to reduce unnecessary antibiotic use in patients with sore throat. CMAJ. 1998;158(1):75-83. 23. Crawford G, Brancato F, Holmes KK. Streptococcal pharyngitis: diagnosis by gram stain. Ann Intern Med. 1979;90(3):293-297. 24. Kardaun JW, Kardaun OJ. Comparative diagnostic performance of three radiological procedures for the detection of lumbar disk herniation. Methods Inf Med. 1990;29(1):12-22. 25. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med. 1993;12(14):1293-1316. 26. Breese BB, Disney FA. The accuracy of diagnosis of beta streptococcal infections on clinical grounds. J Pediatr. 1954;44(6):670-673. 27. Kreher NE, Hickner JM, Barry HC, Messimer SR. Do gastrointestinal symptoms accompanying sore throat predict streptococcal pharyngitis? an UPRNet study. J Fam Pract. 1998;46(2):159-164. 28. Centor RM, Witherspoon JM, Dalton HP, Brody CE, Link K. The diagnosis of strep throat in adults in the emergency room. Med Decis Making. 1981;1(3):239-246. 29. Dobbs F. A scoring system for predicting group A streptococcal throat infection. Br J Gen Pract. 1996;46(409):461-464. 30. Foong BB, Yassiim M, Chia YC, Kang BH. Streptococcal pharyngitis in a primary care clinic. Singapore Med J. 1992;33:597-599.

621

CHAPTER 47

The Rational Clinical Examination

31. Kahan E, Appelbaum T, Bograd H, et al. Should nurses in Israeli primary care clinics be expected to manage streptococcal throat infections? Public Health. 1995;109(5):347-351. 32. Breese BB. A simple scorecard for the tentative diagnosis of streptococcal pharyngitis. Am J Dis Child. 1977;131(5):514-517. 33. Centor RM, Meier FA, Dalton HP. Throat cultures and rapid tests for diagnosis of group A streptococcal pharyngitis. Ann Intern Med. 1986;105 (6):892-899. 34. Wigton RS, Connor JL, Centor RM. Transportability of a decision rule for the diagnosis of streptococcal pharyngitis. Arch Intern Med. 1986;146 (1):81-83. 35. Meland E, Digranes A, Skjaerven R. Assessment of clinical features predicting streptococcal pharyngitis. Scand J Infect Dis. 1993;25(2):177183. 36. Poses RM, Cebul RD, Collins M, Fager SS. The importance of disease prevalence in transporting clinical prediction rules. Ann Intern Med. 1986;105(4):586-589.

622

37. McIsaac WJ, Goel V, To T, Low DE. The validity of a sore throat score in family practice. CMAJ. 2000;163(7):811-815. 38. Clancy CM, Centor RM, Campbell MS, Dalton HP. Rational decision making based on history: adult sore throats. J Gen Intern Med. 1988;3(3):213217. 39. Funamura JL, Berkowitz CD. Applicability of a scoring system in the diagnosis of streptococcal pharyngitis. Clin Pediatr (Phila). 1983;22(9): 622-626. 40. Pichichero ME, Disney FA, Green JL, et al. Comparative reliability of clinical, culture, and antigen detection methods for the diagnosis of group A beta-hemolytic streptococcal tonsillopharyngitis. Pediatr Ann. 1992;21(12):798-805. 41. Pichichero ME. Cost-effective management of sore throat: it depends on the perspective. Arch Pediatr Adolesc Med. 1999;153(7):672-680. 42. Webb KH, Needham CA, Kurtz SB. Use of a high-sensitivity rapid strep test without culture confirmation of negative results. J Fam Pract. 2000;49(1):34-38.

U P D A T E : Streptococcal Pharyngitis

47

Prepared by David L. Simel, MD, MHS Reviewed by Jane Kim, MD, and Kathy Vanenkevort, BS

CLINICAL SCENARIO A 19-year-old college student has a severe sore throat and a mild fever (temperature 38.3°C), and he feels bad. The symptoms have been present for 4 days and initially started with a dry cough. There is a pharyngeal exudate, but only on the left side of the posterior pharynx. His neck reveals tender adenopathy. Should you assume he has streptococcal pharyngitis and start treatment?

UPDATED SUMMARY ON STREPTOCOCCAL PHARYNGITIS Original Review Ebell MH, Smith MA, Barry HC, Ives KY, Carey M. Does this patient have strep throat? JAMA. 2000;284(22):2912-2918.

UPDATED LITERATURE SEARCH Our literature search used the parent search for The Rational Clinical Examination series, combined with the search term “pharyngitis,” for the years 2000 to August 2005. Because the original publication supported the use of the Centor score, we reviewed studies that further explored and validated the use of this score in patient populations that included adults, rather than children alone, with pharyngitis. We identified 27 potentially relevant articles for further review, of which 6 warranted closer assessments. Three of those studies1-3 validated the impression that the individual symptoms and signs for streptococcal pharyngitis are not diagnostically useful, so multivariate models that include combinations of findings must be used. However, none of these studies included prospective model validation.

NEW FINDINGS • The Centor score alone is inadequate for making a correct diagnosis in patients with 2 or more symptoms. • The clinical examination improves markedly when the Centor score is used to identify patients for rapid streptococcal tests.

• Treatment decisions based on the Centor score, without rapid testing, depend more on the prevalence of disease and benefit/risk of treatment rather than useful likelihood ratios (LRs).

Details of the Update A study of Israeli adults older than 16 years and with pharyngitis provided a unique patient sample by including those with a Centor score of only 0 to 1.2 Most other studies exclude these patients from their analysis, focusing only on those with at least 2 of 4 symptoms. However, 38% of the patients had a mild pharyngitis presentation, with a 0 to 1 score; the LR for such a score is 0.16 and only 5% of patients with a score of 0 to 1 will have a positive culture result. When the investigators created a multivariable model (7 symptoms and signs) for this patient group that included a broad spectrum of disease, only pharyngeal exudates remained significant at predicting culture positivity (positive LR, 1.8; 95% confidence interval [CI], 1.5-2.2; negative LR, 0.27; 95% CI, 0.13-0.53). The Centor score may not work equally well for children. To evaluate this, McIsaac et al4 assembled a population of patients aged 3 to 69 years, and they validated their modified Centor score. The score was modified by age, with points added or subtracted as follows: aged 3 to 14 years, add 1 point; aged 15 to 44 years, add 0 points; aged 45 years or older, subtract 1 point. After adjusting for age, the investigators compared the modified Centor score to culture for those patients with a score greater than or equal to 2. The prevalence of disease was 22% for adults but 34% for children younger than 18 years. The modified Centor score did not appreciably change the LR for adults, but it did have an effect on children. At high scores, the LRs differ, with an LR of 1.6 (95% CI, 0.5-5.0) for a modified Centor score of 4 to 5 in adults that improves to an LR of 4.0 (95% CI, 2.7-6.0) for children younger than 18 years. After evaluating a variety of treatment strategies, the authors reported predictive values based on the modified Centor scores (Table 47-5). A pragmatic study assessed the use of the Centor score in adults (>18 years) to identify patients for point-of-care testing and throat cultures.5 The study is retrospective, so we cannot determine the number of patients evaluated for 623

CHAPTER 47

Update

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION

Table 47-5 Modified Centor Scores

Score 4 Score 2-3 Score 0-1

Centor Score, Adults (≥18 y)a

Modified Centor Score, Children (3-17 y)

LR (95% CI)

LR (95% CI)

1.2 (0.62-2.2) 1.3 (0.85-1.9) 0.26 (0.14-0.48)

4.0 (2.7-6.0) 0.69 (0.59-0.83)b Uncertain

Abbreviations: CI, confidence interval; LR, likelihood ratio. a Summary LR from McIsaac et al,4 Atlas et al,5 and Chazan et al.2 b Summary LR from McIsaac et al.4

CHANGES IN THE REFERENCE STANDARD The throat culture continues to be the recognized reference standard for the diagnosis of group A β-hemolytic streptococci. However, a positive throat culture or rapid antigen test result provides adequate confirmation of the presence of group A β-hemolytic streptococci in the pharynx and is accepted as a pragmatic reference standard.

Table 47-6 Positive Predictive Value of Treatment According to the Modified Centor Score Modified Centor Score

Positive Predictive Negative Predictive Value of Decision Value of Decision Antibiotic to Treat, % to Not Treat, % Treatment Strategy (95% CI) (95% CI)

Adults (≥18 y)a 4 Treat with antibiotics 84 (73-90) 2-3 Rapid test; treat if positive result Children (3-17 y) 2-5 (All chil- All get rapid test; treat 98 (94-99) dren with for a positive rapid sore throat) test result and culture those with a negative rapid test result

RESULTS OF LITERATURE REVIEW The modified Centor score can direct the antibiotic treatment strategy (Table 47-6).

94 (90-96)

EVIDENCE FROM GUIDELINES 100 (98-100)

Abbreviation: CI, confidence interval. a McIsaac et al.4

acute pharyngitis who did not undergo point-of-care testing. However, prevalence of pharyngitis is similar to that in other published studies. As in other studies, the Centor score alone had an LR of 1.5 (95% CI, 1.2-1.9) for patients with at least 2 findings, and for those with 0 to 1 finding the LR was 0.35 (95% CI, 0.16-0.75). The value of point-of-care testing highlighted its utility when combined with the Centor score in both ruling in and ruling out disease. A positive rapid streptococcal test result in patients with at least 2 Centor score findings had an LR of 179; the Centor score did not affect the LR when the point-of-care testing result was negative, because the LR was 0.09 for a broad range of 0 to 4 symptoms.

624

These newer studies confirm the utility of the Centor score for certain patients while highlighting the weaknesses. The disease prevalence is higher in children (18 y ≈1 0.26 (0.14-0.48)

Centor score 2-4 Centor score 0-1 Children 3-17 y Modified Centor score, 4-5 Modified Centor score, 2-3 Modified Centor score, 0-1

4.0 (2.7-6.0) 0.69 (0.59-0.83) Uncertain

Abbreviations: CI, confidence interval; LR, likelihood ratio.

Table 47-8 Centor Score Combined With Rapid Strep Point-of-Care Test Results, Adults Centor Score

Point-of-Care Test Result

2-4 Findings 0-1 0-4

Positive Positive Negative

LR (95% CI) 179 (110-2861) 26 (1.4-465) 0.09 (0.03-0.24)

POPULATION FOR WHOM STREPTOCOCCAL PHARYNGITIS SHOULD BE CONSIDERED

Abbreviations: CI, confidence interval; LR, likelihood ratio.

• Children and adults with sore throat.

REFERENCE STANDARD TESTS

DETECTING THE LIKELIHOOD OF STREPTOCOCCAL PHARYNGITIS

Streptococcal throat culture, rapid streptococcal antigen tests.

The Centor score and modified Centor score perform differently for younger vs older patients (Table 47-7). The Centor score improves greatly when combined with rapid strep test results (Table 47-8).

REFERENCES FOR THE UPDATE 1. Wong MCK, Chung CH. Group A streptococcal infection in patients presenting with a sore throat at an accident and emergency department: prospective observational study. Hong Kong Med J. 2002;8(2):92-98. 2. Chazan B, Shaabi M, Bishara E, Colodner R, Raz R. Clinical predictors of streptococcal pharyngitis in adults. Isr Med Assoc J. 2003;5(6):413-415.a 3. Nawaz H, Smith DS, Mazhari R, Katz DL. Concordance of clinical findings and clinical judgment in the diagnosis of streptococcal pharyngitis. Acad Emerg Med. 2000;7(10):1104-1109. 4. McIsaac WJ, Kellner JD, Aufricht P, Vanjaka A, Low DE. Empirical validation of guidelines for the management of pharyngitis in children and

adults [published correction appears in JAMA. 2005;294:2700]. JAMA. 2004;291(13):1587-1595.a 5. Atlas SJ, McDermott SM, Mannone C, Barry MJ. The role of point of care testing for patients with acute pharyngitis. J Gen Intern Med. 2005;20(8):759-761.a 6. Cooper RJ, Hoffman JR, Bartlett JG, et al. Principles of appropriate antibiotic use for acute pharyngitis in adults: background. Ann Intern Med. 2001;134(6):509-517.

a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

625

This page intentionally left blank

EVIDENCE TO SUPPORT THE UPDATE: Streptococcal Pharyngitis

TITLE The Role of Point of Care Testing for Patients With Acute Pharyngitis. AUTHORS Atlas SJ, McDermott SM, Mannone C, Barry MJ.

47

Thirty-eight patients (26%) had group A β-hemolytic streptococcus by culture. The LRs for the Centor scores improved greatly when combined with the rapid strep test (Tables 47-9 and 47-10).

CONCLUSIONS

CITATION J Gen Intern Med. 2005;20(8):759-761.

LEVEL OF EVIDENCE Level 3.

QUESTION Does point-of-care (POC) testing with a rapid test for group A β-hemolytic streptococcus improve the performance of the Centor score?

STRENGTHS Pragmatic study that took advantage of clini-

DESIGN Prospective, nonconsecutive study in which every patient who had POC testing also had a throat culture. SETTING Two primary care practices with data collected during a 12-month period. PATIENTS Adults (≥18 years) with symptoms of acute pharyngitis. Patients were excluded if their symptom duration was greater than 7 days, they had taken antibiotics within the past 24 hours, they were immunocompromised, or they had an acute pulmonary disease flare-up.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The providers collected information for the Centor score and collected the samples for POC tests and throat cultures.

MAIN OUTCOME MEASURES Sensitivity, specificity, and likelihood ratio (LR) of the Centor score and POC tests. Centor score = history of fever (temperature >38°C), tonsillar exudates, swollen anterior cervical lymph nodes, absence of cough (patient report). A positive response to each finding is given 1 point, so a maximum Centor score is 4.

MAIN RESULTS The authors completed data forms on 179 patients. They excluded 29 according to their criteria and had a final sample size of 148 after eliminating 2 patients with incomplete data.

cal decisions to perform POC testing, informed by the Centor score, vs the culture reference standard. LIMITATIONS Nonconsecutive patients, so we do not know

how many patients were evaluated for acute pharyngitis without POC testing. Table 47-9 Likelihood Ratios for Centor Scores Centor Score

n

LR+ (95% CI)

4 3 2 0-1a Collapsed data 2-4 Findings 0-1

18 30 45 55

1.4 1.4 1.6 0.35 1.5 (1.2-1.9) 0.35 (0.16-0.75)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio. aOnly 1 patient had 0 findings.

Table 47-10 Likelihood Ratios for Rapid Strep Tests as a Function of Centor Scores Centor Score 2-4 0-1 2-4 0-1 Collapsed data 2-4 Findings 0-1 0-4

POC

N

LR+ (95% CI)

Positive Positive Negative Negative

31 4 62 51

179 26 0.07 0.14

Positive Positive Negative

179 (110-2861) 26 (1.4-465) 0.09 (0.03-0.24)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; POC, point-ofcare rapid strep test.

E47-1

CHAPTER 47

Evidence to Support the Update

Because not all patients with acute pharyngitis were enrolled, it is possible that the providers selected patients for whom the diagnosis was not clear, making the Centor score perform with lower efficiency. However, the prevalence of streptococcal pharyngitis in this group is comparable to that in other studies. The Centor score did not perform well for identifying affected patients, but the presence of no more than 1 finding reduces the prior probability of 25% to 10%. The Centor score alone was not adequate for making a diagnosis. The power of POC testing is highlighted by the results. A Centor score modulates the LR in that individuals with a score of 2 to 4 and a positive POC test result have an even higher LR than those with a score of 0 to 1. At the lower end of prior probability of group A β-hemolytic streptococcus by culture in adults with acute pharyngitis (10%), the probability of disease with a Centor score of 0 to 1 and a positive POC test result increased to 74%. A negative POC test result, at any level of Centor score, effectively ruled out group A βhemolytic streptococcus by culture, with a posttest probability of less than 2%. Reviewed by David L. Simel, MD, MHS

Table 47-11 Likelihood Ratios for Centor Scores Centor Score

Total (%)

LR+ (95% CI)

4 2-3 0-1

14 (5) 112 (55) 78 (40)

0.51 (0.12-2.2) 2.0 (1.6-2.4) 0.16 (0.06-0.43)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio.

MAIN RESULTS A total of 207 patients were enrolled, with only 3 dropped because of missing data; 24% of patients had a positive throat culture result. A multivariate analysis of 7 symptoms and 10 signs showed that only pharyngeal exudate was significantly predictive of a positive culture result (positive likelihood ratio [LR], 1.8; 95% confidence interval [CI], 1.5-2.2; negative LR, 0.27; 95% CI, 0.13-0.53). A low Centor score made strep throat much less likely (Table 47-11).

CONCLUSIONS LEVEL OF EVIDENCE Level 2. STRENGTHS Consecutive adults, including those who had

a Centor score of 0 to 1. These patients are missing in most other validation studies. TITLE Clinical Predictors of Streptococcal Pharyngitis in Adults. AUTHORS Chazan B, Shaabi M, Bishara E, Colodner R, Raz R. CITATION Isr Med Assoc J. 2003;5(6):413-415. QUESTION What are the clinical features that predict sore throat caused by group A β-hemolytic streptococcus? DESIGN Consecutive patients with sore throat. SETTING Israeli primary care clinics during 4 consecutive winter months. PATIENTS Adults (>16 years) with sore throat.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The clinical symptoms were recorded and a throat swab for culture was obtained. Rapid testing was not done.

MAIN OUTCOME MEASURES A multivariate model was used to analyze the independent symptoms and signs. The outcome of Centor score vs the culture was reported. E47-2

LIMITATIONS The study had few patients who had all 4

findings of the Centor; thus, the results are unstable for this group. This prospective study allows us to assess the usefulness of a Centor score of 0 to 1. Only 5% of patients with 1 or no symptoms had a positive culture result. Given that the probability of a positive culture result in this group was at the upper range of probabilities observed in adults, a Centor score of 0 to 1 decreases the probability of strep throat to less than 5% for most adults. The data support the recommendation for obtaining a rapid test for those with a score of 2 to 3, rather than treating empirically, because the LR was only 2. There were so few patients with a Centor score of 4 that the results are not useful for making conclusions about this group of patients. The multivariate analysis selected exudates as the only useful sign or symptom. This finding suggests that the presence of exudates in the Centor score might be dominated by the results of this single finding when a population of all patients with sore throats is included, rather than just those with more than 1 Centor finding. Reviewed by David L. Simel, MD, MHS

CHAPTER 47

TITLE Empirical Validation of Guidelines for the Management of Pharyngitis in Children and Adults.

Streptococcal Pharyngitis

Because a culture is the reference standard test, we focused only on strategies that used initial combinations of the modified Centor Score or rapid tests, rather than strategies that went straight to culture.

AUTHORS McIsaac WJ, Kellner JD, Aufricht P, Vanjaka A, Low DE. CITATION JAMA. 2004;291(13):1587-1595. QUESTIONS What is the likelihood of a group A βhemolytic streptococcus culture according to a modified Centor score adjusted for patient age? Among a modified Centor score (adjusted for age) (Box 47-1), rapid tests, and the throat culture (the reference standard), which single or combined approach results in the most correct treatment decisions with the fewest rapid tests and cultures? DESIGN The data were collected prospectively during a 3-year study period, and then the strategies were analyzed retrospectively. SETTING Family practice clinic in Canada. PATIENTS Patients with a chief complaint of sore throat, who ranged in age from 3 to 69 years. Patients were enrolled if they had a modified Centor score of 2 or greater and the physician or study nurse believed that a throat swab was necessary.

MAIN RESULTS A total of 918 patients were screened, with complete data available for 787 patients. Among the 333 adults, the prevalence of disease was 22%. The children had a prevalence of 34%. The modified Centor score performed differently, depending on the patient’s age (Table 47-12). Treatment could be guided by the score (Table 47-13).

CONCLUSIONS LEVEL OF EVIDENCE Level 3. STRENGTHS This is a large study, conducted during a 3-

year study period. A large distribution of patient ages helps us evaluate the generalizability of results. Table 47-12 Likelihood Ratios of Modified Centor Scores as a Function of Age Modified Centor Score

LR+ (95% CI) Adults (≥18 y)

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Various treatment strategies that included the modified Centor score alone, rapid flu tests, or culture were assessed retrospectively to determine whether the strategy led to unnecessary tests or antibiotics. See Box 47-1.

4-5 3 2

1.6 (0.5-5.0) 1.3 (1.1-1.6) 0.53 (0.34-0.82) Children (3-17 y)

4-5 3 2

4.0 (2.7-6.0) 0.73 (0.61-0.88) 0.50 (0.31-0.80)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio.

MAIN OUTCOME MEASURES The frequency of culture positivity as a function of the modified Centor score allowed us to calculate likelihood ratios (LRs) for the score in predicting culture positivity for group A β-hemolytic streptococcus. We transformed the sensitivity and specificity of each strategy to LRs and predictive values. Box 47-1 Modified Centor Score (Range 0 to 5)

History of fever (temperature >38°C), tonsillar exudates, swollen anterior cervical lymph nodes, absence of cough (patient report) (A positive response to each finding is given 1 point, and then modified by age) Age 3-14 y 15-44 y ≥45 y

Modification for Age +1 0 –1

Table 47-13 Management Strategy for Patients With a Modified Centor Score Greater Than or Equal to 2

Modified Centor Score

Antibiotic Treatment Strategy

Negative Positive Predictive Value Predictive Value of Decision to of Decision to Not Treat, % Treat, % (95% CI) (95% CI)

Adults (≥18 y) 4 Treat with antibi84 (73-90) otics 2-3 Rapid test, treat if positive result Children (3-17 y) 98 (94-99) 2-5 (All chilAll get rapid test; dren with sore treat for a positive throat) rapid test result and culture for those with a negative rapid test result

94 (90-96)

100 (98-100)

Abbreviation: CI, confidence interval.

E47-3

CHAPTER 47

Evidence to Support the Update

LIMITATIONS The study entrance criteria required that the

physician or nurse determine that a throat swab was warranted. Although data were not given on the number of eligible patients who were not enrolled, clinicians should understand that these patients had a chief complaint of sore throat. The inference is that patients with sore throat who were more concerned about other symptoms (eg, fever or nasal congestion) were not enrolled. Furthermore, adults who had a sore throat but only 1 symptom would not have been included, because they had a modified Centor score of 0 to 1. The treatment strategies were not studied prospectively but instead were evaluated after the data were collected. The modified Centor score (adjusted for age) did not work much better than the original Centor score for adults. For children, the LR of a modified Centor score of 4 to 5 increases the likelihood of group A streptococcus 4-fold, but current treatment recommendations require a rapid test for all children and cultures for those with negative results.1,2 This strategy leads to almost 100% accuracy for treatment decisions in children with sore throats.

E47-4

For adults, the data apply only to patients with a modified Centor score of at least 2. Those patients with all 4 symptoms can be treated empirically with antibiotics, and those with a score of 2 to 3 can have treatment guided by a rapid test. With this strategy, about 16% of treated patients will not have group A β-hemolytic streptococcus, whereas 6% of patients with infection will not be treated. The only way to eliminate the 6% of patients who go untreated would be to use a strategy that required culture whenever the rapid strep test result is negative. Reviewed by David L. Simel, MD, MHS

REFERENCES FOR THE EVIDENCE 1. Bisno AL, Gerber MA, Gwaltney JM Jr, Kaplan EL, Schwartz RH. Practice guidelines for the diagnosis and management of group A streptococcal pharyngitis. Clin Infect Dis. 2002;35(2):113-125. 2. Cooper RJ, Hoffman JR, Bartlett JG, et al. Principles of appropriate antibiotic use for acute pharyngitis in adults: background. Ann Intern Med. 2001;134(6):509-517.

48

C H A P T E R

Is This Patient Having a Stroke? Larry B. Goldstein, MD David L. Simel, MD, MHS

CLINICAL SCENARIO The wife of a 58-year-old right-handed man calls emergency medical services because her husband abruptly developed difficulty speaking and moving his right arm. Figure 48-1 presents the diagnostic flow of a patient who experiences neurologic symptoms that suggest a stroke.

WHY IS THE CLINICAL EXAMINATION OF PATIENTS WITH SUSPECTED STROKE IMPORTANT? Since the original review of stroke published as part of The Rational Clinical Examination series more than a decade ago, much has changed.1 What has not changed is the staggering cost of the personal, societal, and economic consequences of strokes. The estimated direct and indirect cost of stroke in 2005 is $56.8 billion in the United States alone.2 More than 700000 people in the United States have a stroke each year, of which nearly onethird represent recurrent events.3 About 163000 annual stroke deaths make it the third leading cause of death in the United States. Between 15% and 30% of stroke survivors become permanently disabled, whereas 20% remain in institutional care 3 months after their stroke. Not too long ago, the clinical examination functioned primarily to catalog a patient’s neurologic impairments that in turn correlated with the stroke’s vascular territory and likely cause. The inferences about the anatomy and etiology guided secondary preventive strategies and established the prognosis, rather than directing immediate treatment. Despite the advent of modern noninvasive neuroimaging technologies, the clinical examination for stroke is more important than ever because therapeutic interventions for patients with acute stroke and sophisticated approaches to prevent recurrent strokes now exist. Appropriate treatment and prevention depend on accurate interpretation of the patient’s symptoms and clinical examination findings. For example, the risk/benefit balance for carotid endarterectomy requires an accurate assessment of symptoms to identify those with a transient ischemic attack (TIA) or nondisabling stroke.4 The rapid screening of patients with neurologic symptoms begins with prehospital care personnel5 because the effectiveness of reperfusion strategies for acute ischemic stroke are time dependent. The brain can withstand profound ischemia for only limited periods, and the benefits of intravenous tissue plasminogen activator (tPA) lessens as the time from the onset of the patient’s symptoms increases.6 Public education programs have stressed the need to call emergency medical responders (eg, 911) for persons experiencing stroke symptoms. Patients, family members, and prehospital care personnel such as emergency medical technicians must recognize the symptoms and signs of strokes to minimize treatment delays. Arrival to the hospital by emergency medical transport has been associated with more rapid treatment and

Copyright © 2009 by the American Medical Association. Click here for terms of use.

627

CHAPTER 48

The Rational Clinical Examination

Onset of stroke symptoms

Activate emergency response system

Perform prehospital assessment (Emergency medical technician/paramedic)

Prior probability of stroke ~~10% CPSSa Any item present, LR = 5.5 0 items present, LR = 0.39 LAPSSb Positive LR = 31 Negative LR = 0.09

Perform rapid evaluation on arrival in emergency department Assess factors associated with increased likelihood of stroke 1. Focal neurologic deficit 2. Persistent neurologic deficit 3. Acute onset during previous week 4. No history of head trauma

0 Factors

1-3 Factors

4 Factors

LR = 0.14 Probability of stroke ~~1.5%

LR uncertain Probability of stroke ≥10% (Baseline)

LR = 40 Probability of stroke ~~80%

Clinical judgment

Condition other than stroke

Clinical judgment

Condition other than stroke

Assess stroke severity with NIH Stroke Scale Perform neuroimaging Perform laboratory tests to exclude stroke mimics

Begin stroke treatment Establish prognosis according to clinical findings

thereby presumably improved outcomes.7-10 Thus, the accuracy of the clinical examination becomes relevant not just for stroke specialists and emergency physicians but also for paramedics, nursing personnel, and emergency medical technicians who may be the first responders. When patients with stroke symptoms arrive at the hospital, a standardized neurologic examination, combined with neuroimaging results, determines subgroups of patients who might benefit from 628

Figure 48-1 Diagnostic Flow of a Patient Who Experiences Neurologic Symptoms That Suggest a Stroke aCincinnati Prehospital Stroke Scale (CPSS); facial droop, arm drift, and abnormal speech. bLos Angeles Prehospital Stroke Scale (LAPSS); medical history (age >45 y, no history of seizures, symptoms < 24 h, not wheelchair bound), blood glucose 60-400 mg/dL (3.3-22 mmol/L), and examination showing unilateral facial weakness, grip weakness, and arm weakness. Abbreviations: LR, likelihood ratio; NIH, National Institutes of Health.

intravenous thrombolysis vs those who may be at increased risk from thrombolytic-related bleeding.11-13 Experienced examiners tailor the neurologic examination to address specific clinical questions because a stroke produces different symptoms and signs, depending on the area of affected brain. A variety of other conditions complicate diagnostic efforts by causing symptoms and signs similar to stroke (stroke mimics). In the patient example, emergency

CHAPTER 48 medical services were called for a patient with new focal neurologic symptoms. We will observe the example patient through the emergency evaluation and highlight the clinical questions and features of the examination that increase the likelihood of accurately and reliably identifying a stroke, the stroke subtype, and the patient’s prognosis.

METHODS This review updates a 1994 report on clinical assessment of stroke1 and is based on relevant studies identified through MEDLINE, restricted to the time since the last review. Information on the physical examination and neurologic examination is difficult to identify because the Medical Subject Headings for the articles typically do not include obvious terms. For example, searching the terms “cerebrovascular disorders” limited to human research studies, English-language articles (1994-2005) yields 9029 articles. However, when the results of this global search are crossed with the term “neurological examination,” there are 176 articles, and when crossed with “physical examination,” only 19 articles remain. Eliminating review articles and case reports from this reduced set left only 4 potentially relevant articles. Because of the low yield, we relied heavily on searches of the bibliographies of textbook chapters, review articles, and personal files to identify additional relevant literature for updating the role of the clinical examination since the original Rational Clinical Examination article on stroke in 1994. To examine the accuracy and reliability of the clinical assessment of stroke for either diagnosis or prognosis, the following

Stroke

general inclusion criteria were used in assessing articles: (1) the article addressed the issue of accuracy or reliability of medical history or physical examination for diagnosis or estimation of short-term prognosis (mortality or functional disability); (2) the study site or participants (clinicians or patients) were described; (3) the data were not limited to case reports or reviews of other studies; and (4) the primary data or appropriate summary statistics were presented. For assessment of the accuracy of diagnosis, references included articles that also described a final diagnosis established by an expert who reviewed all clinical data, neuroimaging, and other relevant laboratory tests. These articles were evaluated for quality according to whether the clinical examination was performed masked to the neuroimaging results (see Table 1-7).14 Articles describing prognosis in terms of functional status were included if the outcome was measured with a scale that is either comparable to a scale in common use or was validated in the context of the study. The sensitivity (how often a diagnostic procedure detects a condition when it is present), specificity (how often a diagnostic procedure result is negative when the condition is absent), and likelihood ratios (LRs) (the odds favoring the diagnosis or outcome vs not having the diagnosis) for each finding or scale were recorded from each article or were calculated according to primary data as necessary.15,16 Table 48-1 summarizes the included studies that gave sensitivity and specificity data for the diagnosis of stroke or TIA. For studies of precision, the κ statistic (describes the agreement between paired observers beyond that predicted by chance) or the intraclass correlation coefficient (when there are more than 2 examiners) is given. Intraclass coefficients range from 0 to 1,

Table 48-1 Summary of Included Studies With Sensitivity/Specificity Data Source, y

Level of Evidencea

Country

Setting

No. of Participants

Kothari et al,17 1997 Kothari et al,18 1999 Kidwell et al,19 2000 Karanjia et al,20 1997

2 3 1 2

United States United States United States United States

ED ED and neurology service Field and ED Neurology clinics

299 171 441 381

von Arbin et al,21 1980 von Arbin et al,22 1981 Panzer et al,23 1985 Oxbury et al,24 1975 Tuthill et al,25 1969 Frithz and Werner,26 1976 Allen,27 1984 Henley et al,28 1988 Fullerton et al,29 1988 Britton et al,30 1980

3 3 2 3 3 3 3 2 3 2

Sweden Sweden United States United Kingdom United States Sweden United Kingdom United Kingdom Ireland Sweden

Hospital Stroke unit Hospital Hospital Stroke unit/community hospital Hospital Hospital Hospital Hospital Stroke unit

2252 206 369 93 202 344 148 172 206 200

Inclusion Criteria Clinical trial and ED patients Suspected stroke or stroke mimic Suspected stroke Stroke, TIA, or other neurologic condition Medical admissions Stroke unit admission Suspected stroke Stroke Suspected stroke Stroke, 50 mm/h 14 >100 mm/h 10 Anemia 22

Sensitivity (95% CI)

0.86 (0.62-0.97) 0.65 (0.54-0.74) 0.47 (0.40-0.54) 0.45 (0.26-0.66) 0.41 (0.30-0.52) 0.32 (0.29-0.35) 0.31 (0.14-0.54) 0.31 (0.20-0.44) 0.29 (0.10-0.57) 0.16 (0.07-0.28)

0.96 (0.93-0.97) 0.83 (0.75-0.90) 0.39 (0.29-0.50) 0.44 (0.34-0.54)

Abbreviations: CI, confidence interval; ESR, erythrocyte sedimentation rate. a Includes results of all eligible studies, including those that reported clinical features for patients with positive biopsy results only.

649

CHAPTER 49

The Rational Clinical Examination

When we separately analyzed the pooled data from all studies, only 4% of patients with positive temporal artery biopsy results and data on ESR had a normal value. If one uses a less strict cutoff point, even an ESR of less than 50 mm/h substantially reduces the probability of disease (LR, 0.35). This value is lower than the LR– of any symptom or sign. In contrast to clinical lore, a high ESR was less useful in identifying those with TA among all patients referred for biopsy, which likely relates to the verification bias inherent in patient selection for the eligible studies because referring physicians would have had knowledge of the ESR before recommending a biopsy. Although an ESR of greater than 100 mm/h conferred an LR+ of 1.9, this value is less than the most useful symptoms and signs. In contrast, mean ESR values were similar for patients with and without positive temporal artery biopsy results. Anemia was present in 44% of patients with biopsy-proven TA. This finding was present in a similar number of patients who had negative biopsy results. Mean hemoglobin levels were similar between patients with positive and negative biopsy results (11.6 g/dL vs 12.4 g/dL, respectively); the lack of anemia was not helpful in ruling out disease.

ARE THESE CLINICAL FEATURES EVER NORMAL? The presence of particular symptoms or signs in patients with negative temporal artery biopsy results does not imply that these findings are “normal” or common in patients without disease. Rather, it suggests that other conditions that clinicians may initially confuse for TA have overlapping clinical features. The frequency of such findings in randomly selected individuals of the same age would likely be lower than the frequency among patients in this review with negative biopsy results. Several studies have followed patients with negative biopsy results to determine their ultimate or correct diagnoses. Chmelewski et al35 reported the outcomes of 98 patients undergoing temporal artery biopsies during a 5-year period at their institution. Among the 68 patients with negative biopsy results, 15 proved to have neurologic disorders (including migraine, stroke, and optic neuropathy), 14 had PMR, 10 had other rheumatologic disorders (including vasculitis other than TA, rheumatoid arthritis, and CREST [calcinosis, Raynaud disease, esophageal dysmotility, sclerodactyly, telangiectasia] syndrome), and 4 had fever of unknown origin. Miscellaneous diagnoses included sinusitis, endocarditis, amyloidosis, and malignancy. In another biopsy series, Roth et al40 studied 33 patients with a clinical suspicion of TA but negative biopsy results. The most common diagnoses, in descending order, were joint disease (degenerative or rheumatoid), malignant lymphoma, arteriosclerotic carotid artery disease, diabetes mellitus, and ischemic optic neuropathy. In our first clinical scenario, the history of bitemporal headache and a modestly increased ESR would be among those factors that may lead a clinician to suspect TA. In this setting, one would seek the potential additional history of jaw claudication or diplopia and determine the presence of a 650

prominent, tender, or beaded temporal artery. If present, these factors would substantially increase the likelihood of positive temporal artery biopsy results. In the second scenario, TA is among the diagnostic considerations for transient partial monocular vision loss in the setting of a constitutional illness. The history in this case is sufficiently compelling to justify a temporal artery biopsy. Given the high prior probability and the poor performance of historical and examination features in excluding disease, an otherwise normal medical history and physical examination result would not sufficiently reduce the likelihood of TA to avoid the need for a temporal artery biopsy. A normal ESR would, however, reduce the likelihood of disease by a factor of 0.2 and should prompt consideration of alternative diagnoses.

THE BOTTOM LINE Available data suggest that many of the clinical features commonly found in patients with the disease are unhelpful in predicting the likelihood of positive temporal artery biopsy results. Our study evaluates the predictive value of clinical features among patients who are already clinically suspected of having the disease, as determined by the clinicians who referred them for biopsy. Although we could not determine, from the primary studies, the factors that went into the decision to refer for biopsy, certain clinical features modified the likelihood of disease among these patients. It is likely that these same clinical factors would be useful to consider at initial evaluation, even before the decision to proceed to biopsy. In addition, the verification bias inherent in this analysis makes the significance of our results greater because they help to predict biopsy results even among patients who have a higher prior probability of disease than do unselected patients with any particular clinical feature. When a medical history is taken in a patient with possible TA, jaw claudication and diplopia substantially increase the probability of positive biopsy results (LR+s, 4.2 and 3.4, respectively). No symptoms help rule out the diagnosis by their absence. Among physical examination findings, synovitis makes the diagnosis of TA less likely, whereas beaded, prominent, enlarged, and tender temporal arteries increase the likelihood of positive biopsy results. Beaded, prominent, or enlarged arteries confer the highest positive LRs of any clinical or laboratory feature and substantially increase the probability that a patient with suspected TA will have positive biopsy results. Although these findings increase the chance of having TA, they are variably sensitive, from 16% (beaded temporal artery) to 65% (any temporal artery abnormality). The results of tests of ESR alter the likelihood of positive biopsy results. A normal ESR (LR, 0.2) or ESR less than 50 mm/h (LR, 0.35) makes positive biopsy results less likely, but setting the ESR threshold at 100 mm/h is less efficient because patients with an ESR less than 100 mm/h have an LR (0.8) that only slightly decreases the likelihood of disease. Among patients clinically suspected of having disease, those

CHAPTER 49 with an ESR greater than 100 mm/h have a modestly increased likelihood of biopsy-proven TA (LR, 1.9). The clinician faced with a patient who may have TA has a difficult diagnostic challenge. The goal is to rule out other morbid conditions that may mimic TA, to avoid unnecessary evaluation, and to quickly and correctly identify and treat patients who do in fact have the disorder. Given the extreme difference in prevalence of TA between the general population ( 100 (4) ESR 50-100 (5) ESR < 50 (5)

4.3 (3.0-6.1) 3.5 (1.8-6.8) 1.7 (1.1-2.4) 1.7 (1.5-1.9) 1.1 (0.94-1.3) 1.1 (1.0-1.2) 1.9 (1.1-3.3) 1.1 (0.87-1.5) 0.55 (0.38-0.80)

0.72 (0.66-0.79) 0.96 (0.93-0.99) 0.73 (0.66-0.82) 0.67 (0.56-0.80) 0.97 (0.92-1.0) 0.2 (0.08-0.51)

Abbreviations: CI, confidence interval; ESR, erythrocyte sedimentation rate; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a An abnormal ESR was defined by the laboratory analyses of the individual studies. From these data, a normal ESR has a likelihood ratio of 0.2 for temporal arteritis.

add additional information to the other variables. Future studies should reassess the role of the platelet count as a screening test for temporal arteritis among patients with compatible symptoms, especially those with vision complaints. One retrospective review4 assessed the ethnic background among patients with biopsy-proven temporal arteritis. None of the 40 Hispanic patients in the United States referred for temporal artery biopsy had positive results. A study from a tertiary hospital in Spain5 showed very few differences between these patients compared to the summary data from the original Rational Clinical Examination article. A smaller study of patients in the United Kingdom6 also showed values similar to those in the original Rational Clinical Examination article. The low prevalence in this population should be studied in future case series. The strict inclusion criteria of our original review required primary data from clinical series, excluding decision analyses from consideration because decision analyses require assumptions about the prevalence of disease and differing clinical features. The published decision analyses preceding our review did not have access to a systematic estimation of these values. However, they provide an alternative strategy about manageBox 49-1 Temporal Arteritis Score (for Patients ≥ 50 y)

Score = –240 + 48 × (headache) + 108 × (jaw claudication) + 56 × (scalp tenderness) + 1.0 × (ESR) + 70 × (ischemic optic neuropathy) + 1.0 × (age) (If symptom present, substitute 1.0; if negative, substitute 0) Estimated probability = [exp(score/50)]/[1 + exp(score/50)] If score less than –110, low risk ( 70, high risk (>80% chance of positive biopsy result) 654

ment of patients suspected of having temporal arteritis. We identified 3 such studies7-9 through our literature search. Not surprisingly, using different assumptions, these authors developed differing predictive models. Each study modeled empiric treatment strategies, treatment guided by biopsy results, and treatment of all patients irrespective of biopsy results. The model results changed with differing estimated prior probability of disease. None of these studies, however, estimated the influence of particular clinical or laboratory features on the likelihood of positive biopsy results. Therefore, these provide a complementary analysis but do not add to the information in our review or update on The Rational Clinical Examination.

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION New data allowed us to refine our summary estimates for the LRs of clinical features for temporal arteritis. None of the estimates changed appreciably, although the new data generally led to narrower CIs and, therefore, more confidence in the role of each finding.

CHANGES IN THE REFERENCE STANDARD The reference standard for the diagnosis of temporal arteritis remains a temporal artery biopsy.

RESULTS OF LITERATURE REVIEW Table 49-7 details univariate analyses of clinical variables associated with temporal arteritis. As in the original metaanalysis, the presence of jaw claudication or diplopia was associated with the highest LRs. For decreasing the likelihood of temporal arteritis, a normal ESR has the lowest LR.

Multivariate Findings for Temporal Arteritis Younge et al1 developed a temporal arteritis score, shown in Box 49-1, that estimates the probability of temporal arteritis according to the presence of 6 factors. The authors derived this score from a large sample of 1113 patients undergoing temporal artery biopsy, all of whom were older than 50 years. This is the largest series in the literature that includes patients undergoing temporal artery biopsy with both positive and negative biopsy results (the entire literature from 1966 to 2000 includes only 2680 patients). We were unable to determine the value of combinations of clinical features in our original review because of the limitations of the meta-analytic design and the lack of individual patient specific data. The temporal arteritis score of Younge et al1 is an important contribution that assists clinicians in estimating the likelihood of temporal arteritis among patients suspected of having the disease. However, it was derived from a group of patients who were older than 50 years, and its use should be limited to people of similar age. Prospective validation studies are necessary, but the large

CHAPTER 49

Box 49-2 American College of Rheumatology Criteria10 for Temporal Arteritis

1. Age at disease onset at least 50 y 2. New headache 3. Temporal artery abnormality (tenderness, diminished pulsation unrelated to atherosclerosis of cervical arteries) 4. Increased erythrocyte sedimentation rate (at least 50 mm/h by Westergren method) 5. Abnormal artery biopsy result (vasculitis with mononuclear cell predominance or granulomatous inflammation, usually with multinucleated giant cells)

patient sample provides some reassurance to clinicians who choose to apply the score to their patients.

EVIDENCE FROM GUIDELINES There are no well-established consensus guidelines for the evaluation, diagnosis, or treatment of patients with suspected or proven temporal arteritis. Clinicians and researchers generally agree on the American College of Rheumatology (ACR) criteria for the classification of giant-cell (temporal) arteritis.10 These criteria were described as “classification” criteria (rather than “diagnostic”) to make their purpose clear: they are best used among patients with vasculitis to improve standardization and comparability of studies, not necessarily as diagnostic criteria for clinical practice. They are reproduced in Box 49-2. The ACR criteria for all vasculitis syndromes, including temporal arteritis, have been criticized for poor predictive value when applied to individual patients in clinical practice.11 However, other guidelines have not been widely accepted.12 All of these guidelines use clinical factors presented in the original and updated literature reviews. Because there is no clear consensus about the definition or gold standard for the diagnosis of temporal arteritis beyond a positive temporal artery biopsy result, in our meta-analysis we required at least 90% of individuals considered to have the disease to have histologic “proof.”

Temporal Arteritis

CLINICAL SCENARIO—RESOLUTION Our 72-year-old woman has a new onset of temporal and occipital headache that raises the possibility of temporal arteritis. One should seek the presence of those features that confer a high LR+, including diplopia and jaw claudication. In her case, scalp tenderness is present (LR+, 1.7), but she does not have other historical features that confer a high LR+. On examination, one looks for the presence of beaded, tender, or pulseless temporal arteries. Her pulseless temporal arteries confer an LR+ of 2.7, but the CI around this result is broad (95% CI, 0.55-13). An ESR measurement would be helpful: a normal ESR confers an LR of 0.2, whereas an elevated ESR greater than 100 mm/h increases the likelihood of disease (LR, 1.9). Intermediate ESR values, that is, values that are elevated but less than 100 mm/h, occur commonly in patients with temporal arteritis) and would increase the likelihood to a lesser degree. The temporal artery score of Younge et al1 provides an alternate strategy for estimating disease risk by combining the most important clinical features. If we enter the data for our patient into this prediction rule, using hypothetical ESR values of 50 and 100, we obtain the following results. For ESR = 50: Score = –240 + 48 × (headache = 1) + 108 × (jaw claudication = 0) + 56 × (scalp tenderness = 1) + 1.0 × (ESR = 50) + 70 × (ischemic optic neuropathy = 0) + 1.0 × (age = 72) Score = –14. Intermediate risk (probability, 43%) For ESR = 100: Score = –240 + 48 × (headache = 1) + 108 × (jaw claudication = 0) + 56 × (scalp tenderness = 1) + 1.0 × (ESR = 100) + 70 × (ischemic optic neuropathy = 0) + 1.0 × (age = 72) Score = 36. Intermediate risk (probability, 67%) In this case, using the prediction rule of Young et al,1 the risk is intermediate according to clinical evaluation. The ESR results do not modify the likelihood of temporal arteritis, as determined by clinical evaluation alone. After this evaluation, temporal arteritis is still a consideration. Previous studies and clinical experience suggest that biopsy should be performed in 7 to 10 days, although the yield of biopsy decreases over time after the initiation of corticosteroid treatment.

655

CHAPTER 49

Update

TEMPORAL ARTERITIS—MAKE THE DIAGNOSIS

PRIOR PROBABILITY Temporal arteritis is relatively rare, though the disease may be underdiagnosed.13 The prevalence increases with age, and it occurs more commonly among women and whites. One study found that, among white persons 50 years and older, the prevalence of temporal arteritis was 200 cases per 100000; among persons older than 85 years, the prevalence was 1100 per 100000.14 Most published series have been from northern Europe and the northern United States, but the disease has been observed worldwide.

POPULATION FOR WHOM TEMPORAL ARTERITIS DISEASE SHOULD BE CONSIDERED Temporal arteritis should be considered in all adults aged 50 years and older with appropriate symptoms. Although prevalence varies by sex, race, and geographic locale, no single demographic factor among persons older than 50 years decreases the likelihood enough to exclude the diagnosis.

DETECTING THE LIKELIHOOD OF TEMPORAL ARTERITIS One can estimate the likelihood of temporal arteritis by using either single features (and applying the summary LRs from our meta-analysis) or by using combinations of features, as established by the prediction rule of Younge et al1 (see Table 49-8).

Table 49-8 The Single Best Findings or Combinations of Findings Can Be Used to Estimate the Probability of Temporal Arteritis LR+ (95% CI)

LR– (95% CI)

Single Best Findings Suggesting the Presence of Temporal Arteritis Jaw claudication 4.3 (3.0-6.1) Diplopia 3.5 (1.8-6.8) Single Best Finding Suggesting the Absence of Temporal Arteritis ESR < 50 mm/h (n = 5) 0.55 (0.38-0.80) Combinations of Findingsa Headache + jaw claudication + scalp tenderness at age 60 y Headache + jaw claudication + scalp tenderness at age 80 y Headache + jaw claudication + scalp tenderness at age 60 y, ESR = 50 mm/h Headache + jaw claudication + scalp tenderness at age 80 y, ESR = 50 mm/h No headache + no jaw claudication + no scalp tenderness at age 60 y, ESR = 50 mm/h No headache + no jaw claudication + no scalp tenderness at age 80 y, ESR = 50 mm/h

Posterior Probability, % 65 74 84 88 7 10

Abbreviations: CI, confidence interval; ESR, erythrocyte sedimentation rate; LR+, positive likelihood ratio; LR–, negative likelihood ratio; OR, odds ratio. a These are examples of various combinations of findings for patients with 3 of 3 symptoms vs 0 of 3 symptoms present at various ages. The addition of age and ESR provides important information when combined with the symptoms.

REFERENCE STANDARD TESTS Temporal artery biopsy and histologic evaluation is the reference standard for the diagnosis of temporal arteritis. Other means of diagnosis have been suggested, including positron emission tomography scanning15,16 and ultrasonography17-22 for imaging of the temporal artery. Although results of small studies have been promising, studies of these tests have been flawed (primarily by incomplete evaluation against the gold standard, temporal artery biopsy) and are not widely accepted. Although they could at some point prove

REFERENCES FOR THE UPDATE 1. Younge BR, Cook BE Jr, Bartley GB, Hodge DO, Hunder GG. Initiation of glucocorticoid therapy: before or after temporal artery biopsy? Mayo Clin Proc. 2004;79(4):483-491.a 2. Foroozan R, Danesh-Meyer H, Savino PJ, Gamble G, Mekari-Sabbagh ON, Sergott RC. Thrombocytosis in patients with biopsy-proven giant cell arteritis. Ophthalmology. 2002;109(7):1267-1271.a 3. Gabriel SE, O’Fallon M, Achkar AA, et al. The use of clinical characteristics to predict the results of temporal artery biopsy among patients with suspected giant cell arteritis. J Rheumatol. 1995;22(1):93-96. 4. Liu NH, LaBree LD, Feldon SE, Rao NA. The epidemiology of giant cell arteritis: a 12-year retrospective review. Ophthalmology. 2001;108(6):11451149.a 5. Gonzalez-Gay MA, Garcia-Porrua C, Amor-Dorado JC, Llorca J. Influence of age, sex, and place of residence on clinical expression of giant-cell arteritis in Northwest Spain. J Rheumatol. 2003;30(7):15481551.a 6. Mohamed MS, Bates T. Predictive clinical and laboratory factors in the diagnosis of temporal arteritis. Ann R Coll Surg. 2002;84(1):7-9.a

656

diagnostically useful in the diagnosis of temporal arteritis, studies to date have not provided sufficient, conclusive evidence confirming the diagnostic value of these tests beyond standard clinical information (including medical history, physical examination, and routine measures of inflammation) and biopsy as alternative reference standards. Magnetic resonance angiography, computed tomography, or standard angiography can be helpful for extracranial disease, including inflammatory involvement of the aorta or its proximal branches.22

7. Elliot DL, Watts WJ, Reuler JB. Management of suspected temporal arteritis: a decision model. Med Decis Making. 1983;3(5):63-68. 8. Buchbinder R, Detsky AS. Management of suspected giant cell arteritis: a decision analysis. J Rheumatol. 1992;19(8):1220-1228. 9. Nadeau SE. Temporal arteritis: a decision-analytic approach to temporal artery biopsy. Acta Neurol Scand. 1988;78(2):90-100. 10. Hunder GG, Bloch DA, Michel BA, et al. The American College of Rheumatology 1990 criteria for the classification of giant cell arteritis. Arthritis Rheum. 1990;33(8):1122-1128. 11. Rao JK, Allen NB, Pincus T. Limitations of the 1990 American College of Rheumatology classification criteria in the diagnosis of vasculitis. Ann Intern Med. 1998;129(5):345-352. 12. Frearson R, Cassidy T, Newton J. Polymyalgia rheumatica and temporal arteritis: evidence and guidelines for diagnosis and management in older people. Age Ageing. 2003;32(4):370-374. 13. Ostberg G. Temporal arteritis in a large necropsy series. Ann Rheum Dis. 1971;30(3):224-235. 14. Lawrence RC, Helmick CG, Arnett FC, et al. Estimates of the prevalence of arthritis and selected musculoskeletal disorders in the United States. Arthritis Rheum. 1998;41(5):778-799.

CHAPTER 49 15. Brodmann M, Lipp RW, Passath A, Seinost G, Pabst E, Pilger E. The role of 2-18F-fluoro-2-deoxy-D-glucose positron emission tomography in the diagnosis of giant cell arteritis of the temporal arteries. Rheumatology (Oxford). 2004;43(2):241-242. 16. Turlakow A, Yeung HW, Pui J, et al. Fludeoxyglucose positron emission tomography in the diagnosis of giant cell arteritis. Arch Intern Med. 2001;161(7):1003-1007. 17. LeSar CJ, Meier GH, DeMasi RJ, et al. The utility of color duplex ultrasonography in the diagnosis of temporal arteritis. J Vasc Surg. 2002;36(6):1154-1160. 18. Murgatroyd H, Nimmo M, Evans A, MacEwen C. The use of ultrasound as an aid in the diagnosis of giant cell arteritis: a pilot study comparing histological features with ultrasound findings. Eye. 2003;17(3):415-419. 19. Nesher G, Shemesh D, Mates M, Sonnenblick M, Abramowitz HB. The predictive value of the halo sign in color Doppler ultrasonography of the

Temporal Arteritis

temporal arteries for diagnosing giant cell arteritis. J Rheumatol. 2002; 29(6):1224-1226. 20. Pfadenhauer K, Weber H. Duplex sonography of the temporal and occipital artery in the diagnosis of temporal arteritis: a prospective study. J Rheumatol. 2003;30(10):2177-2181. 21. Salvarani C, Silingardi M, Ghirarduzzi A, et al. Is duplex ultrasonography useful for the diagnosis of giant-cell arteritis? Ann Intern Med. 2002;137(4):232-238. 22. Stanson AW. Imaging findings in extracranial (giant cell) temporal arteritis. Clin Exp Rheumatol. 2000;18(4 suppl 20):S43-S48.

a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

657

This page intentionally left blank

EVIDENCE TO SUPPORT THE UPDATE: Temporal Arteritis

TITLE Thrombocytosis in Patients With Biopsy-Proven Giant Cell Arteritis.

49

with positive biopsy results were significantly more anemic. Among patients suspected of having temporal arteritis, thrombocytosis significantly predicts the likelihood of a positive temporal artery biopsy result (see Tables 49-9 and 49-10).

AUTHORS Foroozan R, Danesh-Meyer H, Savino PJ, Gamble G, Mekari-Sabbagh ON, Sergott RC. CITATION Ophthalmology. 2002;109(7):1267-1271.

CONCLUSION LEVEL OF EVIDENCE Level 3 (using criteria from original

QUESTION Are the complete blood cell (CBC) count and erythrocyte sedimentation rate (ESR) useful in predicting positive temporal artery biopsy results among patients suspected of having giant-cell arteritis (GCA)? DESIGN Retrospective, case-control series. SETTING Specialty eye hospital in Philadelphia, Pennsylvania. PATIENTS Ninety-one consecutive patients undergoing temporal artery biopsy for suspicion of GCA; biopsy performed within 1 week of presentation. Corticosteroid therapy before biopsy was not allowed; blood tests were conducted within 24 hours of biopsy.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Diagnostic (gold) standard was temporal artery biopsy; tests included CBC count and Westergren ESR. Definition of elevated platelet levels (>400 × 103/μL) was based on reference range greater than 2 SD above the mean; elevated ESR was above age/2 for men and (age + 10)/2 for women. No patients had a clinical course to suggest biopsy-negative GCA.

review). STRENGTHS The investigators asked a unique question

regarding the value of laboratory testing to stratify probability of disease. LIMITATIONS All patients were evaluated at a subspecialty ophthalmology clinic. The sample size was small.

Table 49-9 Comparison of Laboratory Values Between Those With Positive vs Negative Biopsy Results for Giant-Cell Arteritis Test Mean ESR level, mm/h Mean hematocrit level, % Mean hemoglobin level, g/dL Mean platelet count, ×103/μL

Biopsy Result Positive

Biopsy Result Negative

P Value

82 34.8 11.7 433

70 37 12.5 277

.12 .03 .01

400 × 103/μL Combination of ESR and platelet count > 400 × 103/μL

MAIN RESULTS Forty-seven patients had a positive biopsy result; 44 had negative biopsy result. White blood cell counts were no different between patients with positive and negative biopsy results, although patients

Sensitivity Specificity LR+ (95% CI) LR– (95% CI) 0.79 0.57

0.27 0.91

0.51

0.91

1.1 (0.86-1.4) 0.78 (0.37-1.6) 6.3 (2.4-17) 0.47 (0.33-0.66) 5.6 (2.1-15)

0.54 (0.40-0.73)

Abbreviations: CI, confidence interval; ESR, erythrocyte sedimentation rate; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E49-1

CHAPTER 49

Evidence to Support the Update

Commentary This study was performed with high quality, although it was retrospective and selected patients were treated at a specialty eye hospital. Two-thirds of the patients had primarily visual complaints. The results suggest that elevated platelet count may be useful in suggesting the diagnosis of GCA, but LRs may not be helpful enough to preclude biopsy or rule out the need for one. Also, the marginal value of elevated platelet count beyond elements of the medical history, physical examination, and other routine laboratory tests (especially lack of normal ESR) may be small. The authors suggest that platelet count may be better than ESR in predicting results of biopsy, in part because an elevation in ESR is part of what goes into the decision to get a biopsy. However, the definition of elevated ESR (age and sex adjusted) was more restrictive in this study than in many others and may have lessened its predictive power. This study does not examine the value of history-taking or physical examination findings. Reviewed by Robert H. Shmerling, MD

TITLE Influence of Age, Sex, and Place of Residence on Clinical Expression of Giant-Cell Arteritis in Northwest Spain. AUTHORS Gonzalez-Gay MA, Garcia-Porrua C, AmorDorado JC, Llorca J. CITATION J Rheumatol. 2003;30(7):1548-1551. QUESTION Do age, sex, and urban residence influence the clinical expression of giant-cell arteritis? DESIGN Retrospective chart review. SETTING Tertiary referral hospital in northwestern Spain that is the only referral center for a mixed urban and rural area encompassing approximately 250000 people. PATIENTS All patients with biopsy-proven giant-cell arteritis between 1981 and 2001.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Clinical and laboratory features of patients with biopsy-proven giant-cell arteritis represented the diagnostic tests. The diagnostic standard was a positive temporal artery biopsy.

MAIN OUTCOME MEASURES The main outcome measure was sensitivity.

MAIN RESULTS Few differences exist in the clinical presentation of biopsyproven giant-cell arteritis according to age, sex, and place of E49-2

residence. The only clinical sex-based difference is a higher prevalence of polymyalgia rheumatica in women (see Table 49-11). Women had a statistically significantly lower hemoglobin level than men. No clinical features differed for urban- or rural-dwelling patients. Age of onset at presentation did not significantly influence the clinical presentation. A trend existed toward more polymyalgia rheumatica in younger patients, but this difference was not significant. Hemoglobin levels were minimally lower in younger patients, and more older patients had an increased alkaline phosphatase level.

CONCLUSIONS LEVEL OF EVIDENCE Level 4 (using criteria from the origi-

nal review). STRENGTHS Consistent data set across all patients. LIMITATIONS The study population consisted only of patients with giant-cell arteritis. The relatively small

Table 49-11 Most Presenting Features of Giant-Cell Arteritis Are Similar Between Men vs Women and Patients Younger Than 70 Years vs Older Than 70 Years

Variable Men, % Age at diagnosis, y Living in urban area, % Delay to diagnosis, wk Headache, % Scalp tenderness, % Constitutional syndrome, % Abnormal temporal artery examination, % Jaw claudication, % Dysphagia, % Polymyalgia rheumatica, % Fever, % Visual manifestations, % Permanent visual loss, % Cerebrovascular accident, % Limb claudication of recent onset, % ESR, mean, mm/h Hemoglobin, mean, g/dL Platelet count, mean, ×103/μL Increased alkaline phosphatase, %

Men Women (n = 97) (n = 113)

Onset 70 y of Age of Age (n = 42) (n = 168) 48

46

75 27a 9.7 90 34 67 73

75 46a 11 85 34 62 78

31 12 88 26 76 67

39 9.9 87 36 61 77

36 3 33a 8 26 13 3

45 7 49a 11 21 12 1

29 0 52 12 21 12 5

44 7 39 9 24 13 1

4

2

7

2

91 12.2a 407

95 11.4a 412

100 11.3a 437

92 11.9a 402

26

28

48a

22a

Abbreviation: ESR, erythrocyte sedimentation rate. aP < .05 for comparison between men and women or between younger and older patients.

CHAPTER 49 number of study subjects limited the power to detect significant differences.

Table 49-12 The Incidence of Giant-Cell Arteritis Differs by Race Race (No.)

Commentary This case series provides a detailed summary of clinical and laboratory features among a cohort of patients with biopsy-proven giant-cell arteritis in Spain. The overall prevalence of specific features is similar to that reported in our original review and meta-analysis. Differences include higher incidences of headache and polymyalgia rheumatica and lower incidences of fever and visual manifestations than in our original review. In this study, the authors aimed to identify differences in clinical presentations according to age, sex, and urban location. Remarkably, nearly all features were similar across these patient subsets. The only clinical feature that was statistically significantly different across nearly 60 comparisons was the greater incidence of polymyalgia rheumatica among women compared with men. However, this series may have lacked sufficient statistical power to detect significant differences. Small differences in hemoglobin and the incidence of elevated alkaline phosphatase level existed in these comparisons, but these are not clinically significant. We have previously shown that anemia does not predict positive biopsy results among patients suspected of having the disease (positive likelihood ratio, 1.5 [95% confidence interval, 0.82-2.9]; negative likelihood ratio, 0.79 [95% confidence interval, 0.6-1.0]). This study suggests that clinical suspicion and the value of particular clinical features of giant-cell arteritis do not differ among these selected patient subsets.

Temporal Arteritis

Positive Biopsy Result, %

OR (95% CI)

40 0 0 12

22 (3.6-133) 0 (0-4.2) 0 (0-0.38) 0.61 (0.09-4.0)

White (66) Black (6) Hispanic (40) Asian (9)

Abbreviations: CI, confidence interval; OR, odds ratio.

MAIN OUTCOME MEASURES Incidence of temporal arteritis among white, Asian, black, and Hispanic patients undergoing temporal artery biopsy. Hispanic patients self-reported whether they considered themselves to be of white or Latino descent.

MAIN RESULTS Twenty patients (16.5%) had positive temporal artery biopsy results. The mean age of the study population was 70 ± 8.8 years. White patients were older than Asian, black, and Hispanic patients. The mean age for patients with a positive biopsy result was 75 years, whereas that for patients with a negative biopsy result was 69 years. Giant-cell arteritis is rare among a population of Americans of Hispanic ethnicity (Table 49-12).

Reviewed by Gerald W. Smetana, MD

CONCLUSIONS LEVEL OF EVIDENCE Level 1 (using criteria from the origi-

TITLE The Epidemiology of Giant Cell Arteritis: A 12Year Retrospective Review. AUTHORS Liu NH, LaBree LD, Feldon SE, Rao NA.

nal review). STRENGTHS Asked a unique question not previously

addressed in the literature.

CITATION Ophthalmology. 2001;108(6):1145-1149.

LIMITATIONS No clinical information was recorded and only demographic and laboratory variables were studied.

QUESTION What is the incidence of biopsy-proven giant-cell arteritis among individuals of Hispanic descent?

Commentary

DESIGN Retrospective chart review. SETTING Subspecialty academic ophthalmology institute in the United States. PATIENTS Sequential patients (n = 121) undergoing temporal artery biopsy.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The diagnostic tests were demographic factors including age, sex, and ethnicity. The diagnostic standard was a temporal artery biopsy. The authors explicitly stated the pathologic criteria used to classify a temporal artery biopsy result as positive.

The original review reconfirmed the observation that temporal arteritis is predominantly a disease of whites. Among all eligible studies in that review, 86% of all patients with positive biopsy results were white. Descriptions of blacks with temporal arteritis have been largely restricted to case reports and small series. The incidence among US Hispanics has not been well studied. In this report, the authors determined the race of all patients undergoing temporal artery biopsy at a referral ophthalmology center in Los Angeles, California. Although Hispanics constituted 33% of all patients referred for biopsy, not a single biopsy result was positive in this group of patients (95% confidence interval, 0%-7.2%). Reviewed by Gerald W. Smetana, MD E49-3

CHAPTER 49

Evidence to Support the Update

TITLE Predictive Clinical and Laboratory Factors in the Diagnosis of Temporal Arteritis.

Table 49-13 Likelihood Ratios of Demographic Variables, Symptoms, Signs, and Laboratory Values for Temporal Arteritis (Disease Frequency 17/50)

AUTHORS Mohamed MS, Bates T.

Feature (No. With Feature)

CITATION Ann R Coll Surg. 2002;84(1):7-9. QUESTION Among patients undergoing temporal artery biopsy, which clinical and laboratory factors predict positive biopsy results? DESIGN Retrospective chart review. SETTING Single hospital in the United Kingdom. PATIENTS All patients (n = 50) who underwent temporal artery biopsy between January 1988 and December 1997.

DESCRIPTION OF THE TEST AND DIAGNOSTIC STANDARD The diagnostic tests were demographic features, presenting clinical features, laboratory investigation, and the duration of corticosteroid therapy before biopsy. The diagnostic standard was a temporal artery biopsy. The authors did not state the criteria used to determine whether a temporal artery biopsy result was positive.

MAIN OUTCOME MEASURES The main outcome measures were sensitivity and specificity.

MAIN RESULTS Seventeen patients had temporal arteritis and 33 patients had a normal biopsy result. The mean age was 73 years (range, 60-82 years) for patients with a positive biopsy result and 67 years (range, 49-85 years) for those with a negative biopsy result. The mean durations of steroid therapy for patients with positive and negative biopsy results were 7 and 10 days, respectively. The mean erythrocyte sedimentation rate (ESR) was 56 mm/h for patients with a positive biopsy result and 38 mm/h for those with a negative biopsy result. Seventeen patients (34%) had a positive temporal artery biopsy result (Table 49-13). Among clinical and laboratory features in a population of 50 patients suspected of having temporal arteritis, an ESR less than 50 mm/h decreased the likelihood of temporal arteritis, whereas an ESR of 50 to 100 mm/h increased the likelihood of temporal arteritis. All other results had a 95% confidence interval that included 1.

CONCLUSIONS LEVEL OF EVIDENCE Level 1 (using criteria from the origi-

nal review). E49-4

Jaw pain (6) History of fever (4) Polymyalgia rheumatica (4) Male sex (15) Neurologic symptoms (21) Steroid use before biopsy (31) Headache (44) Temporal tenderness (36) Visual symptoms (21) Ocular signs (8) ESR >100 mm/h (2) 50-100 mm/h (21) 80% chance of positive biopsy result). The model was validated with prospective data on 289 patients; 86% of the high-risk patients had a positive biopsy result, whereas 9% of the low-risk patients had a positive biopsy result.

SETTING Mayo Clinic, Rochester, Minnesota. PATIENTS One thousand one hundred thirteen sequential patients, identified through the Mayo Surgical Index, undergoing temporal artery biopsy between January 1988 and December 1997. Twenty percent of the patients were receiving oral corticosteroids at the biopsy.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Diagnostic (gold) standard was temporal artery biopsy; the authors collected multiple clinical features (by medical history, physical examination, and laboratory studies). Standard Mayo Clinic reference ranges for laboratory values were used, including erythrocyte sedimentation rate (ESR) of 0 to 22 mm/h for men and 0 to 29 mm/h for women.

MAIN OUTCOME MEASURES Sensitivity, specificity, and predictive values of various clinical and laboratory findings with respect to biopsy results were calculated.

MAIN RESULTS • Three hundred seventy-three patients had positive biopsy results (33.5%); 740 (66.5%) had negative biopsy results. • The commonly taught combination of headache with ESR had a likelihood ratio (LR) of 2.4 (95% confidence interval [CI], 2.1-2.7) when the ESR was elevated. When neither a headache nor ESR abnormality was present, the LR for temporal arteritis was 0.42 (95% CI, 0.36-0.49). Clinical findings (LRs and CIs are calculated from data provided in the article) are shown in Table 49-14.

Table 49-14 Likelihood Ratios for Single Symptoms and in Combination for Temporal Arteritis Test/Feature

Jaw claudication Diplopia Scalp tenderness Myalgia/arthralgia New headache Decreased vision Weight loss Jaw claudication and decreased vision Jaw claudication and diplopia New headache, jaw claudication, and scalp tenderness Jaw claudication and scalp tenderness New headache and jaw claudication New headache and decreased vision New headache and scalp tenderness

Sensitivity Specificity LR+ (95% CI) Single Features 0.40 0.94 6.9 (5.0-9.5) 0.04 0.99 3.7 (1.5-9.2) 0.33 0.89 3.1 (2.4-4.0) 0.46 0.50 2.2 (1.6-3.1) 0.67 0.60 1.7 (1.5-1.9) 0.13 0.92 1.5 (1.0-2.1) 0.24 0.81 1.3 (1.0-1.6) Combination of Findings 0.06 1.0 44 (5.9-322)

LR– (95% CI)

0.64 (0.59-0.7) 0.97 (0.95-0.99) 0.75 (0.70-0.81) 0.90 (0.86-0.95) 0.54 (0.46-0.63) 0.95 (0.91-0.99) 0.93 (0.87-0.99) 0.98 (0.97-0.99)

0.02

10

30 (1.7-519) 0.98 (0.97-0.99)

0.15

0.99

19 (8.1-42)

0.86 (0.82-0.90)

0.17

0.99

18 (8.3-39)

0.84 (0.80-0.88)

0.32

0.96

8.7 (5.8-13) 0.71 (0.66-0.76)

0.06

0.99

6.2 (2.7-14) 0.95 (0.93-0.98)

0.29

0.93

3.9 (2.9-5.3) 0.77 (0.72-0.82)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E49-5

CHAPTER 49

Evidence to Support the Update

Table 49-15 Likelihood Ratios of Laboratory Findings for Temporal Arteritis Finding

Sensitivity Specificity LR+ (95% CI)

Abnormal platelet count Abnormal ESR Abnormal hemoglobin level

LR– (95% CI)

0.37

0.77

1.6 (1.3-1.9) 0.82 (0.75-0.89)

1.0 0.80

0.16 0.32

1.2 (1.1-1.2) 0.02 (0-0.14) 1.2 (1.1-1.3) 0.63 (0.50-0.79)

Abbreviations: CI, confidence interval; ESR, erythrocyte sedimentation rate; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

A score derived from clinical features and laboratory testing among patients suspected of having GCA can stratify patients into low, intermediate, and high likelihood of a temporal artery biopsy.

CONCLUSIONS LEVEL OF EVIDENCE Level 1 (using criteria from the origi-

nal review). STRENGTHS The study had a large sample size, standard-

ized data abstraction for all patients, and a temporal biopsy in all patients. LIMITATIONS Retrospective review.

Commentary This was a high-quality study, although it was retrospective. The results suggest that several readily available clinical fea-

E49-6

tures can be combined to establish low, intermediate, and high levels of risk for positive biopsy. Strengths of this study were that the authors separately reported data for patients receiving corticosteroids before biopsy, combined clinical features (as a clinician does in actual practice), and prospectively tested the model derived from the retrospective analysis. An important limitation was the retrospective design. For identifying patients with temporal arteritis, the data suggest that the findings of headache, jaw claudication, and scalp tenderness have some degree of independence. The independence can be inferred by noticing that multiplying the LR for the presence of each of the findings approximates the LRs when they are assessed in combination. The authors have performed a service for clinical readers by evaluating these variables in a clinical model, confirming that they have independent significance (though jaw claudication is the most important when present), and validating their results by assessing the model prospectively. Although a normal ESR appeared to rule out disease with a univariate LR of 0.02, the model should be examined for how that finding would work when there is a strong clinical suspicion. For example, a 72-year-old man who has a new headache, but no other signs or symptoms, and an ESR of 20 mm/h would have a score of –100 and should be at low to intermediate risk (probability, 12%). As jaw claudication and scalp tenderness symptoms are added, his risk increases to 78%, even with an ESR of only 20 mm/h. If other investigators validate these data in future research, then age plus clinical findings (headache, scalp tenderness, and jaw claudication in combination) would exceed the importance of the ESR. Reviewed by Robert H. Shmerling, MD

50

C H A P T E R

Does This Patient Have an Acute Thoracic Aortic Dissection?

CLINICAL SCENARIOS CASE 1 A 64-year-old man with a history of hyperten-

sion presents to the emergency department after sudden onset of severe, anterior chest pain. On examination, he is alert but uncomfortable. His blood pressure is normal and identical in both arms. His chest is clear, and careful cardiac auscultation fails to reveal a diastolic murmur. A chest radiograph reveals a small pleural effusion but is otherwise unremarkable. CASE 2 A 59-year-old woman is brought to the emer-

gency department after the sudden onset of tearing chest pain. On examination, she is alert and oriented. Her blood pressure is identical in both arms. Results of her cardiac and pulmonary examinations are normal but she has a dense left-sided motor deficit. A portable chest radiograph raises the question of a widened mediastinum.

Michael Klompas, MD WHY IS CLINICAL EXAMINATION IMPORTANT? A man … was seized with a pain of the right arm and soon after of the left, … after these there appeared a tumor on the upper part of the sternum…. He was ordered to think seriously and piously of his departure from this mortal life, which was very near at hand and inevitable. —J. B. Morgagni, 17611 There is no disease more conducive to clinical humility than aneurysm of the aorta. —Sir William Osler, c 19002 Acute thoracic aortic dissection, one of the most common and serious diseases of the aorta, carries a high morbidity and mortality rate when it is not recognized and treated promptly. Autopsy series conducted before the era of modern treatment estimated that 40% to 50% of patients with dissection of the proximal aorta died within 48 hours.3 For those fortunate enough to survive the initial 48 hours, the disease was thought to carry a 90% 1-year mortality rate.3,4 Since the introduction of modern treatment regimens, the fatality rate has declined dramatically. Patients with proximal ascending dissections who rapidly undergo surgery in experienced tertiary centers have a 30-day survival rate of 80% to 85% and a 10-year survival of 55%.4,5 Likewise, patients with dissection of the descending aorta treated with aggressive antihypertensive therapy have a 30-day survival rate greater than 90% and a 10-year survival rate of 56%.4-6 Realization of the dramatic benefits of medical intervention depends on rapid establishment of the diagnosis of dissection. Approximately 4.6 million patients per year present with chest pain to emergency departments in the United States

Copyright © 2009 by the American Medical Association. Click here for terms of use.

659

CHAPTER 50

The Rational Clinical Examination

(8.2% of all emergency department visits).7 Although advanced imaging techniques can reliably establish the diagnosis of thoracic aortic dissection in high-risk populations, it is obviously inefficient, uneconomic, and unrealistic to image every patient complaining of chest pain. Indiscriminate use of diagnostic imaging in poorly chosen patients with low pretest probability of having dissection has been predicted to yield up to an 85% rate of false-positive results, depending on the imaging modality chosen.8 On the other hand, misdiagnosis of acute thoracic aortic dissection as unstable angina or myocardial infarction can have disastrous iatrogenic consequences should the patient receive anticoagulants or thrombolytic therapy.9 Physicians are therefore acutely dependent on the clinical history, examination, and chest radiograph to determine which patients require further study. Traditionally, clinical diagnosis of thoracic aortic dissection has been inaccurate. Physicians correctly suspect the diagnosis in as few as 15% to 43% of presentations when initially evaluating patients with dissection.3,10,11 Diagnostic delay of more than 24 hours after hospitalization occurs in up to 39% of cases.12 When the diagnosis is made, not infrequently it is an incidental discovery made during an advanced imaging procedure intended to assess for other diagnoses.13,14 Autopsies reveal the correct diagnosis is still missed in more than 10% of patients.13 The purpose of this review is to offer physicians an evidencebased foundation for using the clinical history, physical examination, and chest radiograph to assess the likelihood of thoracic aortic dissection.

of severe ripping or tearing chest pain. The pain is sometimes described as having a migrating quality, likely corresponding to extension of the tear along the aorta. Depending on the location of the tear and its direction of extension, patients alternately describe the pain as radiating to the neck, back, or abdomen. Occasional presentations of painless dissection have been reported, though these are usually accompanied by other findings.25,26 Retrograde extension of the tear to the aortic valve can result in aortic regurgitation, with its characteristic diastolic murmur. Likewise, if the tear communicates with the pericardial space, patients can present with symptoms of acute pericardial tamponade (hypotension, pulsus paradoxus, jugular venous distention, and muffled heart sounds). Syncope or prolonged unconsciousness can be the initial presentation of patients with pericardial tamponade. The initial aortic tear and subsequent extension of a false lumen along the aorta can occlude blood flow from the true lumen of the aorta into any of the arteries that originate from the aorta. Depending on which arteries become occluded, patients can present with a variety of corresponding syndromes. These include acute myocardial infarction from occlusion or extension of tear into the coronary arteries (typically the right coronary artery); death, syncope, or hemiplegia after occlusion of one or both carotid arteries; absent peripheral pulses in the major limb vessels secondary to occlusion of the brachiocephalic trunk, left subclavian artery, or distal aorta; anuria from disruption of renal blood flow; and paraplegia or quadriplegia from occlusion of vessels feeding the anterior spinal artery.

Pathophysiology of Thoracic Aortic Dissection

Examination for the Signs and Symptoms of Thoracic Aortic Dissection

The aortic wall is composed of 3 contiguous tissue layers in sequence from the vessel lumen proceeding outwards: the intima, media, and adventitia. Weakening of these tissue layers can lead to a tear in the intima, permitting the entry of blood between the intima and adventitia.15 Passage of blood into this space can extend the tear and create a so-called false lumen. The majority of these tears take place in the ascending aorta, usually in the right lateral wall where the greatest shear force on the artery wall is produced by blood expulsed from the heart under high pressure.3 The tear then extends along the greater curve of the aortic arch and down the descending aorta, though retrograde extension of the tear toward the aortic valve is also possible.15 Most aortic tears occurring beyond the ascending aorta originate immediately distal to the left subclavian artery.15 Predisposing factors for the initiation of a thoracic aortic dissection include hypertension,15 bicuspid aortic valve,15 coarctation of the aorta,15 the Marfan syndrome,16 EhlersDanlos syndrome,17 Turner syndrome,18 giant cell arteritis,19 third-trimester pregnancy,20 cocaine abuse,21 trauma,22 intraaortic catheterization,23 and history of cardiac surgery, particularly aortic valve replacement.24 The clinical features of thoracic aortic dissection are a consequence of the underlying pathophysiologic changes in the aorta. Patients perceive the initial aortic tear as sudden onset 660

The classic clinical history for thoracic aortic dissection consists of the sudden onset of severe tearing or ripping chest pain radiating to the interscapular region or low back, occurring in late-middle-aged men with a history of hypertension. Physicians therefore need to inquire of patients about the onset, quality, radiation, and intensity of patients’ pain. Inquiry should also be made of history or symptoms suggestive of factors that increase the risk of aortic dissection, including hypertension, Marfan syndrome, bicuspid aortic valve, previous aortic valve replacement, and the other syndromes previously listed. History-taking from patients with thoracic aortic dissection has tended to be poor; however, there is evidence that a more thorough medical history may increase diagnostic yield. A retrospective chart review of 83 patients with subsequently confirmed thoracic aortic dissection revealed that only 42% of conscious patients were asked all of 3 basic questions about their pain (quality, radiation, intensity at onset).14 One-quarter of patients were asked 1 or none of these key questions. If all 3 questions were asked, physicians correctly diagnosed thoracic aortic dissection in 30 of 33 patients (91%); if 1 or more of these questions was omitted, then the correct diagnosis was suspected during the initial evaluation in only 22 of 45 (49%) patients (P < .001). In

CHAPTER 50 these patients, the diagnosis was made later, usually as an incidental finding during imaging procedures intended to diagnose alternative conditions. Unfortunately, the retrospective design of this study cannot preclude the possibility that physicians were simply more likely to ask about additional classic findings when they already had a strong clinical suspicion of thoracic aortic dissection derived from other data, including physical examination and chest radiograph. The physical examination should begin with elicitation of vital signs, particularly the blood pressure and pulses on both sides of the body. While checking the blood pressure, the examiner should evaluate for acute pericardial tamponade by assessing for pulsus paradoxus, particularly in a patient with hypotension or jugular venous distention. Frequent allusion is made to the importance of comparing the blood pressure in both arms. Although it is essential to seek evidence of vascular occlusion in the arms, the complete examination should include comparison of all major arteries, including the carotid and femoral pulses, in addition to the radial pulses. Most of the published series of patients with thoracic aortic dissection comment only on the loss or obvious diminishment of pulses rather than particular blood pressure differentials. Older retrospective autopsy series that do refer to blood pressure differentials arbitrarily designate a difference in systolic pressure between arms of 20 mm Hg or 30 mm Hg as significant.3,27 However, a convenience sample of 610 patients without thoracic aortic dissection presenting to an emergency department showed that 53% had interarm differences of greater than 10 mm Hg and 19% had differences greater than 20 mm Hg.28 Nonetheless, a goodquality, prospective, observational study did find that a blood pressure differential of greater than 20 mm Hg was an independent predictor of dissection.29 Hence, a blood pressure differential of at least 20 mm Hg ought to be present to be considered significant. Cardiac auscultation should focus on detecting the diastolic murmur of aortic regurgitation.30 A rapid neurologic examination directed toward the detection of gross motor and sensory defects such as hemiplegia and paraplegia should ensue. Rarer clinical findings reported in the literature include pulsatile sternoclavicular joint, hoarseness, dysphagia, superior vena cava syndrome, Horner syndrome, bulbar palsies, acute arterial occlusion, deep vein thrombosis, and bilateral testicular tenderness.31-37 A chest radiograph should be obtained and examined for abnormalities of the aortic silhouette. This is best accomplished with a standing anteroposterior projection. Unfortunately, the majority of chest radiograph findings associated with thoracic aortic dissection are subjective and not defined. Criteria for radiographic features associated with traumatic thoracic aortic dissection have been proposed but have not been adopted or validated in radiologic studies of nontraumatic dissections.38 Radiographic abnormalities may include wide mediastinum, widening of the aortic knob, difference in diameter between the ascending and descending aorta, and blurring of the aortic margin secondary to local extravasation

Thoracic Aortic Dissection

of blood.39 The chest radiograph might also reveal unilateral or bilateral pleural effusions. The calcium sign, consisting of the separation of intimal calcification from the outer border of the aortic knob by 1 cm or more, is highly suggestive of dissection but present in a minority of cases.37,40 Comparison with previous chest radiographs of the same patient can help the examiner detect suggestive new changes in the aortic contour.

METHODS Literature Search and Selection A structured MEDLINE search including 1966 through 2000 was conducted to identify English-language articles examining the accuracy of the clinical history, examination, and chest radiograph in the detection of acute thoracic aortic dissection. Key words used in the search included “physical examination,” “medical history taking,” “professional competence,” “reproducibility of results,” “observer variation,” “diagnostic tests,” “decision support techniques,” “Bayes theorem,” “sensitivity,” “specificity,” “thoracic aortic dissection,” “aortic aneurysm,” and “dissecting aneurysm.” Articles focusing only on electrocardiograms (ECGs) were not specifically sought because such analyses document a variety of abnormalities seen with thoracic aneurysm but lack the appropriate clinical information for valid sensitivity and specificity estimates. When studies reported the results of ECGs as part of the overall clinical examination, however, these data were collated. Abstracts were reviewed and the full texts of articles that might meet the inclusion criteria were retrieved. The reference lists of reviewed articles were searched to identify additional sources. All potential articles were reviewed for explicit inclusion and exclusion criteria. Articles were included if they were original studies describing the clinical findings in a series of 18 or more consecutive patients with confirmed dissection of the thoracic aorta (Table 50-1). Acceptable means of confirmation of diagnosis were surgical exploration, autopsy, aortogram, magnetic resonance imaging, computed tomography, or transesophageal echocardiography. The latter 4 imaging studies were included as acceptable gold-standard investigations according to high sensitivity and specificity.41,42 Articles were excluded if more than 15% of their cohorts included trauma patients, patients with chronic thoracic aortic dissection (defined as a dissection presumed to have occurred more than 14 days before presentation), or patients with abdominal aortic aneurysms or if the study selectively included patients with only proximal or distal dissections. Retrieved studies were graded for quality using criteria similar to that used in previous articles in this series but modified to include only consecutive series. Level 1 studies were defined as prospective, blinded examinations of a large number (>100) of independently selected consecutive patients. Level 2 studies were of identical criteria but included fewer than 100 patients. Level 3 studies were large, prospective investigations but included nonindependently selected patients. Level 4 studies were retrospective reviews of nonindependently selected patients (see Table 1-7). 661

CHAPTER 50

The Rational Clinical Examination

Table 50-1 Studies Assessing the Accuracy of Clinical Examination for Thoracic Aortic Dissection

Source, y

Clinical Setting, Study Dates

Armstrong et al,43 1998 Chan,44 1991

University hospital, 1992-1994 University hospital, 1987-1989

Enia et al,45 1989

Hospital, 1981-1987

Erb and Tullis,46 1960 Hagan et al,5 2000

University hospital, 1950-1960 12 Tertiary centers in 6 countries, 1996-1998

Hume and Porter,47 University hospital and 1963 medical examiner's office, 1950-1962 48 Itzchak et al, Hospital, 1960-1973 1975 Jagannath et al,40 University hospital, 1986 1965-1977 Levinson et al,27 University hospital, 1950 1935-1947 University hospital, Lindsay and Hurst,49 1967 1949-1966 50 Hospital, 1987-1993 Luker et al, 1994 Mészáros et al,10 2000 Miller et al,51 1979 Nielsen,52 1961 Pate et al,53 1976 Pinet et al,54 1984 Slater and DeSanctis,37 1976 Strong et al,55 1974 Sullivan et al,11 2000 Viljanen,12 1986 Von Kodolitsch et al,29 2000

3 Hungarian towns, 1972-1998 University hospital, 1963-1979 3 Danish hospitals, 1944-1958 Memphis, TN, hospitals, dates not given University hospital, 1970-1979 University hospital, 1963-1973 University hospital and VA hospital, 1960-1973 3 University hospital EDs, 1992-1996 University hospital, 1964-1985 University hospital, 1988-1996

Design

No. of Patient Episodes

Age, y, Mean (Range)

Male, %

Retrospective review of patients with clinically suspected TAD referred for TEE Prospective evaluation of utility of transesophageal echocardiography in patients with clinically suspected TAD Prospective evaluation of transthoracic echocardiography in patients with clinically suspected TAD Retrospective chart review

75 (34 With TAD) 40 (18 With TAD)

57 (20-80)

74

91

4

60

60

…c

4

46 (35 With TAD)

58 (34-82)

91

66

4

30

56 (36-85)

67



4

Multinational prospective international registry; cases identified on admission or review of discharge/surgery/radiology records; 60% of cases referred Retrospective chart review d

464

63

65

62

4

68

53 (10-79)

79

81

4

Retrospective chart review

24

57 (12-86)

75

46

4

72 (36 With TAD) 58

62 (17-85)

1/3

4

59 (22-90)

Not stated 72



4

62

57 (31-83)

65

65

4

75

61 (24-77)

49

47

4

86

66 (36-97)

61

86

4

73

57 (20-86)

70

73

4

40

66 (36-83)

45



4

Retrospective chart review

126

Not reported

79



4

Retrospective chart review

191

58 (19-90)

69

64

4

Retrospective chart review

124

59 (19-81)

73

43

4

Retrospective chart review

59

60 (26-86)

78

46

4

65 (36-89)



61

4

51

66

64

4

53

78

61

3

Retrospective review of radiographse Retrospective chart review of autopsy cases Retrospective chart review Retrospective review of radiologists’ initial chest radiograph readings in cases with subsequently confirmed TAD Longitudinal, observational, population-based study f Retrospective review of surgically managed cases Retrospective chart reviewg

Retrospective review of ED patients 44 referred for thoracic imaging Retrospective review of surgically man73 aged cases Prospective study of patients presenting to 250 ED with history suggestive of TAD (128 With TAD)

Abbreviations: ED, emergency department; TAD, thoracic aortic dissection; TEE, transesophageal echocardiography; VA, Veterans Affairs. a Type A refers to aortic dissections involving the aorta proximal to the subclavian artery. b See Table 1-7. c Ellipses indicate information not available. d Two cases not confirmed by surgery or autopsy. e Does not include data on the frequency of specific radiographic findings but does report interobserver agreement. f Eleven percent of cases were chronic. g Forty cases in which TAD was considered cause of death; also reports additional 18 cases in which TAD was incidental finding on autopsy.

662

Type A, Level of %a Qualityb

CHAPTER 50

Study Characteristics A total of 274 studies were identified by the search strategy, of which 21 studies met inclusion criteria (Table 50-1). No level 1 or level 2 studies were located. One study met level 3 criteria; the remaining 20 were level 4. One large series was selfdescribed as prospective in conception and definition of clinical parameters.5 An unknown percentage of its patients, however, were identified by physician review of discharge records, echocardiography, and surgical databases. This study was consequently classified conservatively as level 4.5 Approximately half the investigations, including the 1 level 3 study, were specifically designed to elucidate the clinical presentation of acute aortic dissection. The remaining reports were either designed to test new imaging modalities or to study the outcomes of medical or surgical management of patients with thoracic aortic dissection. In each case, however, these studies included data on patients' clinical findings at diagnosis. The studies varied considerably in the number and detail of components of the clinical history or examination that were reported. Only the prospective level 3 study explicitly defined the criteria used to establish whether a given clinical finding was present or absent.29 These studies assessed a total of 1848 patients aged 10 to 97 years. The major limitation of all the studies is that patients were selected for inclusion either retrospectively after confirmation of diagnosis by a reference standard study or prospectively according to the presenting clinical picture. Therefore, in all these studies the reference standard and clinical examination were not applied independently of one another. This biases the results of the studies to overestimate the sensitivity of clinical findings because more obvious cases are preferentially included in such series. In addition, physicians performing the reference standard procedure were not blinded to the results of the clinical examination and vice versa. This too could lead to overestimation of sensitivity. Only 4 studies included control groups.29,43-45 Although these investigations can be used to generate data for specificity in addition to sensitivity, their estimations of specificity are heavily influenced by their inclusion biases. The specificities derived from these studies should be interpreted with caution because they reflect only the specificity for a given sign or symptom among patients similar to those included in the studies (ie, those with a full clinical syndrome suggestive of thoracic aortic dissection). These studies likely overestimate sensitivity and underestimate specificity by selecting patients for inclusion because of the presence of the particular sign being considered, thereby creating cohorts with artificially high prevalence of the finding.

Data Analysis Summary measures for the sensitivity for components of the clinical examination for acute thoracic aortic dissection used published raw data from the reported trials that met criteria. Only 4 studies included specificity data that allowed construction of likelihood ratios (LRs). A random-effects model was used to generate conservative summary measures and confidence intervals (CIs) for the sensitivity and LRs.56 For LRs, a

Thoracic Aortic Dissection

summary measure is reported only when there are more than 2 studies. The uncertainty in these measures is reflected in the broad CIs around the estimates. Interobserver agreement was calculated and interpreted using the κ statistic of Landis and Koch.57 Fast Pro version 1.8 software was used for the metaanalysis (Academic Press, San Diego, California).

RESULTS Accuracy of the Clinical History Risk Factors

Sixteen studies examining 1553 patients report sensitivities for various components of the clinical history in Table 50-2. Most patients with dissection have a documented history of hypertension (sensitivity, 64%); however, the LR+ of this history is 1.6 (95% CI, 1.2-2.0). The pooled prevalence of the Marfan syndrome in this group of studies was 5% (95% CI, 4%-7%). Given that the Marfan syndrome afflicts only 0.02% to 0.03% of the general population,58 the high prevalence of the Marfan syndrome in these series is suggestive of a markedly increased risk associated with this disorder, though the frequency of the Marfan syndrome detected in these series likely reflects the inclusion biases of these studies. The one controlled study that assessed for the Marfan syndrome generated an LR+ of 4.1.29 Symptoms

The majority of patients presented with pain (pooled sensitivity, 90%) of severe intensity (sensitivity, 90%) that occurred suddenly (sensitivity, 84%). All other recorded clinical symptoms were present in a low to moderate proportion of patients (Table 50-2). Patients were most likely to have anterior chest pain (sensitivity, 57%); however, pain was frequently experienced elsewhere, including the posterior chest (32%), back (32%), and abdomen (23%). Likewise, migrating and ripping or tearing pain was present in only 31% and 39% of patients, respectively. The presence of pain of sudden onset is not diagnostic (LR+, 1.6; 95% CI, 1.0-2.4). The absence of this history, however, substantively decreases the probability of an acute thoracic aortic dissection (LR–, 0.3; 95% CI, 0.2-0.5). Physicians should be cautious about relying too heavily on the absence of sudden pain to exclude aortic dissection because the inclusion biases of these studies likely overestimate the sensitivity. Pain of a tearing or ripping sensation may also be diagnostically useful. Two studies found almost identical specificities of 94% and 95% for this historical feature.29,43 Although the reported specificities were almost identical, the LR+s generated by these 2 studies differed considerably (1.2 vs 11; Table 50-3) reflecting significant heterogeneity in the sensitivity for this history reported by the 2 investigations. The retrospective study found that only 7% of patients had noted tearing or ripping pain.43 By contrast, the better-quality, larger, prospective study, in which physicians were asked to query predefined clinical symptoms of each patient, reported a sensitivity of 62%.29 This figure is more consistent with the other large study with prospectively defined clinical symptoms in this series5 and with the pooled sensitivity for this symptom (Table 50-2). Therefore, it seems reasonable to suspect that the higher reported sensitivity 663

CHAPTER 50

The Rational Clinical Examination

Table 50-2 Sensitivity of the Clinical History in the Diagnosis of Acute Thoracic Aortic Dissection Sensitivity, %

Source, y Armstrong et al,43 1998 Chan,44 1991 Enia et al,45 1989 Erb and Tullis,46 1960 Hagan et al,5 2000 Hume and Porter,47 1963 Levinson et al,27 1950 Lindsay and Hurst,49 1967 Mészáros et al,10 2000 Nielsen,52 1961 Pate et al,53 1976 Pinet et al,54 1984 Slater and DeSanctis,37 1976 Strong et al,55 1974 Sullivan et al,11 2000 Von Kodolitsch et al,29 2000 Summary sensitivity, % (95% CI)

History of No. of Hyperten- Marfan Chest Patients sion Syndrome Any Pain Pain

SuddenAbdominal Onset Pain Pain

Severe Pain

Ripping or Tearing Migrating Pain Pain Syncope

34

…a



94

74





56

27

88

93

7



6

18

56



78











78





39



35

80

























30

53

7

70

40







17











464

72

5

96

73

61

36

53

30

85

91

51

17

9

68

89

4

97

59

59

33

43

49











58

59



78

47



9

36

40









14

62





90



61

14

13

11











72

67



92



64



10

10









14

40

18

3

65



54



8

33

76







16

126





88

63



38

22



88







10

191

53

7

96

63



30







89



6



124

65

5

94

91

43

38

76

4

93

94



71

5

59

75

3





32



25

27











44

70

0

98

66







34









2

128

77

7

100b



76



50c

22

79

86

62

44

10

NA

64 (54-72)

5 (4-7)

39 (14-69)

31 (12-55)

9 (8-12)

90 67 57 (85-94) (56-77) (48-66)

Abbreviations: CI, confidence interval; NA, not applicable. a Ellipses indicate data not available. b Presence of pain inclusion criterion for study. c Posterior chest or lower back pain.

664

Anterior Chest Posterior Back Pain Chest Pain Pain

32 32 (24-40) (19-47)

23 (16-31)

84 90 (80-89) (88-92)

CHAPTER 50

Thoracic Aortic Dissection

Table 50-3 Accuracy of Clinical Findings for Thoracic Aortic Dissection in Consecutive Patients Preselected for High Clinical Suspicion of Dissection Referred for Advanced Imaging Symptom or Sign History of hypertension

Sudden chest pain

“Tearing” or “ripping” pain Migrating pain Pulse deficit

Focal neurologic deficit Diastolic murmur

Enlarged aorta or wide mediastinum

Left ventricular hypertrophy on admission electrocardiogram

Source, y Chan,44 1991a Enia et al,45 1989b Von Kodolitsch et al,29 2000c Summary Chan,44 1991a Armstrong et al,43 1998d Von Kodolitsch et al,29 2000c Summary Armstrong et al,43 1998d Von Kodolitsch et al,29 2000c Chan,44 1991a Von Kodolitsch et al,29 2000c Armstrong et al,43 1998d Enia et al,45 1989b Von Kodolitsch et al,29 2000c Summary Armstrong et al,43 1998d Von Kodolitsch et al,29 2000c Chan,44 1991a Armstrong et al,43 1998d Enia et al,45 1989b Von Kodolitsch et al,29 2000c Summary Chan,44 1991a Armstrong et al,43 1998d Von Kodolitsch et al,29 2000c Summary Chan,44 1991a Von Kodolitsch et al,29 2000c

LR+ (95% CI)

LR– (95% CI)

1.5 (0.8-3.0) 1.1 (0.7-1.6) 1.8 (1.4-2.3) 1.6 (1.2-2.0) 1.0 (0.7-1.4) 1.5 (1.1-1.9) 2.6 (2.0-3.5) 1.6 (1.0-2.4) 1.2 (0.2-8.1) 11 (5.2-22) 1.1 (0.5-2.4) 7.6 (3.6-16) 2.4 (0.5-12) 2.7 (0.7-9.8) 47 (6.6-333) 5.7 (1.4-23) 6.6 (1.6-28) 33 (2.0-549) 4.9 (0.6-40) 1.2 (0.4-3.8) 0.9 (0.5-1.7) 1.7 (1.1-2.5) 1.4 (1.0-2.0) 1.6 (1.1-2.3) 1.6 (1.1-2.2) 3.4 (2.4-4.8) 2.0 (1.4-3.1) 0.2 (0.03-1.9) 3.2 (1.5-6.8)

0.7 (0.4-1.3) 0.7 (0.4-2.4) 0.4 (0.3-0.6) 0.5 (0.3-0.7) 0.98 (0.3-3.1) 0.3 (0.1-0.8) 0.3 (0.2-0.4) 0.3 (0.2-0.5) 0.99 (0.9-1.1) 0.4 (0.3-0.5) 0.97 (0.6-1.6) 0.6 (0.5-0.7) 0.93 (0.8-1.1) 0.63 (0.4-1.0) 0.62 (0.5-0.7) 0.7 (0.6-0.9) 0.71 (0.6-0.9) 0.87 (0.8-0.9) 0.8 (0.6-1.1) 0.97 (0.8-1.2) 1.1 (0.6-1.7) 0.79 (0.6-0.9) 0.9 (0.8-1.0) 0.13 (0.02-1.0) 0.42 (0.2-0.9) 0.31 (0.2-0.4) 0.3 (0.2-0.4) 1.2 (0.9-1.6) 0.84 (0.7-0.9)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio. aA total of 18 (n = 40) patients with thoracic aortic dissection. bA total of 35 (n = 46) patients with thoracic aortic dissection. cA total of 128 (n = 250) patients with thoracic aortic dissection. dA total of 34 (n = 75) patients with thoracic aortic dissection.

and LR are the more accurate data. Migratory pain has performance characteristics that are similar to tearing or ripping pain. The LR+ for the presence of this quality was 7.6 (95% CI, 3.616) in one study29 but only 1.1 (95% CI, 0.5-2.4) in the other.44 Additional studies of independently selected patients that prospectively ask about the sensation of tearing or ripping and migration of pain are needed to confirm the high LR for these findings. Description of pain as sharp was slightly more prevalent than tearing or ripping; however, this descriptor was elicited in only 2 studies and had an LR+ near unity.5,43

Accuracy of the Physical Examination Physical examination findings classically associated with thoracic aortic dissection are typically present in less than half of

all cases (Table 50-4). However, when present, signs of thoracic aortic dissection can be helpful. Among the most useful is a pulse differential between carotid, radial, or femoral arteries. Although the pooled sensitivity for this sign is only 31%, a deficit in 1 of these pulses compared with the contralateral side is strongly suggestive of dissection (LR+, 5.7; 95% CI, 1.4-23).29,43,45 Focal neurologic deficits, though present in only 17% of cases, may also be helpful. Specificity for this sign is high in the 2 studies in which it has been measured (LR+, 6.6-33; Table 50-3).29,43 The absence of a pulse deficit or focal neurologic deficit does not appreciably alter the likelihood of thoracic aortic dissection. The presence or absence of a diastolic murmur is not helpful. Only one-third of patients with thoracic aortic dissection have a diastolic murmur (sensitivity, 28%). The LR+ and LR– 665

CHAPTER 50

The Rational Clinical Examination

Table 50-4 Sensitivity of the Physical Examination in the Diagnosis of Acute Thoracic Aortic Dissection Sensitivity, % Source, y Armstrong et al,43 1998 Chan,44 1991 Enia et al,45 1989 Erb and Tullis,46 1960 Hagan et al,5 2000 Hume and Porter,47 1963 Itzchak et al,48 1975 Levinson et al,27 1950 Lindsay and Hurst,49 1967 Mészáros et al,10 2000 Miller et al,51 1979 Nielsen,52 1961 Pate et al,53 1976 Pinet et al,54 1984 Slater and DeSanctis,37 1976 Strong et al,55 1974 Sullivan et al,11 2000 Viljanen,12 1986 Von Kodolitsch et al,29 2000 Summary sensitivity (95% CI)

No. of Patients 34 18 35 30 464 68 24 58 62 66 73 40 126 191 124 59 44 73 128 NA

Elevated BP

Diastolic Murmur

Pulse Deficit

…a 15 12 … 22 … … 49 49 … 27 72 49 32 15 68 4 34 … … 21 66 28 19 29 35 45 44 11 20 58 64 … … … … 37 21 33 … 35 55 36 32 31 66 20 34 … … 12 … 29 37 41 40 38 49 (41-57) 28 (21-36) 31 (24-39)

Pericardial Rub

Congestive Heart Failure

Focal Neurologic Deficit

Shock

New MI on ECG

… … … 0 … … … 5 … 2 … … … 12 … … … … … 6 (3-13)

… … … … 7 … … … … … 29 … … … … … … … … 15 (4-33)

32 … … 13 5 … 21 16 23 41 12 … 13 … 19 … 14 22 13 17 (12-23)

26 … … … 16 10 … 22 13 36 … 30 21 38 10 5 … 30 12 19 (15-26)

11 … … 25 3 … … 32 … 9 … 10 … … 3 … 2 … 2 7 (4-14)

Abbreviations: BP, blood pressure; CI, confidence interval; ECG, electrocardiogram; MI, myocardial infarction; NA, not applicable. aEllipses indicate data not available.

(LR+, 1.4; 95% CI, 1.0-2.0; LR–, 0.9; 95% CI, 0.8-1.0) are close to 1, suggesting that the presence or absence of a diastolic murmur should not be considered helpful.29,43-45 Unfortunately, these studies do not comment on whether the diastolic murmurs identified were known to be new or old. It is possible that if a diastolic murmur was known to be new that it had greater diagnostic utility. Patients’ blood pressure on presentation is not helpful. Although approximately half of patients present with elevated blood pressure (pooled sensitivity, 49%; 95% CI, 41%-57%), an equal proportion are either hypotensive or normotensive. Only 1 study permitted calculation of an LR for hypertension; however, this study confirmed its low diagnostic yield (LR+, 1.3 for systolic blood pressure >150 mm Hg).29 Pericardial rub is rarely present (pooled sensitivity, 6%; 95% CI, 3%-13%). Assessment for pulsus paradoxus and jugular venous distention is not enumerated in any of the studies. Electrocardiographic findings consistent with acute myocardial infarction do not rule out aortic dissection. New Q waves or ST-segment elevation were observed in 7% of admission ECGs (Table 50-4). Similarly, normal ECG results were documented in 8% to 31% (mean, 22%) of patients.5,10,11,37,46,52 The remaining ECGs had a variety of other abnormalities, including left ventricular hypertrophy, atrial fibrillation, and nonspecific ST-segment changes. As part of the clinical evaluation, ECGs have not been studied well but seem to have little utility for detecting or ruling out thoracic aortic dissection. 666

Accuracy of the Chest Radiograph Pooling of 13 studies permitted analysis of 1337 radiographs. Only 3 studies commented on the proportion of portable vs conventional radiographs. The proportions of portable radiographs reported in these investigations were 24%, 61%, and 80%.29,43,50 Radiographic findings classically associated with thoracic aortic dissection are not reliably present (Table 50-5). However, most patients with thoracic aortic dissection do tend to have abnormal findings on chest radiographs (sensitivity, 90%) so that a completely normal radiograph result helps to decrease the likelihood of the diagnosis. In particular, absence of wide mediastinum and abnormal aortic contour decreases the probability of disease (LR–, 0.3; 95% CI, 0.2-0.4; Table 50-5). Interobserver and intraobserver agreement for physician assessment of radiographs has been reported in 2 studies, both using radiologists as participants. Agreement was generally found to be fair (κ = 0.25 for intraobserver agreement on suspicion for aortic dissection50; κ = 0.23-0.33 for interobserver agreement on presence of wide mediastinum, irregularities of the aortic contour, and pleural effusion40). These low rates of interobserver agreement underscore the lack of validated standards for defining the radiographic features of aortic dissection.

Accuracy of Combinations of Findings Most clinical findings associated with thoracic aortic dissection are insensitive when considered in isolation. Com-

CHAPTER 50

Thoracic Aortic Dissection

Table 50-5 Sensitivity of the Chest Radiograph in the Diagnosis of Acute Thoracic Aortic Dissection Sensitivity, %a Source, y Armstrong et al,43 1998 Chan,44 1991 Earnest et al,39 1979 Hagan et al,5 2000 Itzchak et al,48 1975 Luker et al,48 1994 Pate et al,53 1976 Pinet et al,54 1984 Slater and DeSanctis,37 1976 Strong et al,55 1974 Sullivan et al,11 2000 Viljanen,12 1986 Von Kodolitsch et al,29 2000 Summary sensitivity (95% CI)

No. of Patients

Abnormal Aortic Contour

Pleural Effusion

Displaced Intimal Calcification

Wide Mediastinum

Abnormal Chest Radiograph Findings

34 18 74 427 24 75 87 191 116 59 31 73 128 NA

… … 66 50 88 76 … … 96 54 42 … 76b 71 (56-84)

… … 27 19 17 … 10 … 9 … … … 13 16 (12-21)

… … 7 14 4 8 … … 9 2 … … … 9 (6-13)

86 94 11 62 83 … 70 56 … 34 … 75 … 64 (44-80)

100 … 93 88 … 85 90 … 96 95 84 … … 90 (87-92)

Abbreviations: CI, confidence interval; NA, not applicable. a Ellipses indicate data not available. bMediastinal or aortic widening.

binations of findings, though not often found, markedly increase the accuracy of clinical assessment for thoracic aortic dissection. The single level 3 study described increasing accuracy of progressive combinations of findings (Table 50-6).29 For example, aortic pain alone (pain of sudden onset, tearing, or ripping in character or both) has an LR+ of 2.6; the presence of both aortic pain and pulse or blood pressure differentials increases the LR+ to 10 (95% CI, 1.480). Further addition of mediastinal or aortic widening on chest radiograph clinches the diagnosis with an LR+ of 66 (95% CI, 4.1-1062). Unfortunately, this diagnostically valuable triad was present in only 27% of patients. Conversely, patients without any findings from the triad (aortic pain, pulse of blood pressure differential, and mediastinal widening) are unlikely to have a thoracic aortic dissection, given an LR– of 0.07 (95% CI, 0.03-0.17). However, 4% of patients in this category, without any of the above signs, were nonetheless ultimately diagnosed with aortic dissection. Given the high morbidity of a missed diagnosis, even such a pronounced LR– is insufficient to defer diagnostic imaging if thoracic aortic dissection is still clinically suspected. The improved accuracy of combinations of clinical findings may further be inferred from a holistic view of the 4 studies that selected patients for inclusion on the basis of an overall clinical picture suggestive of thoracic aortic dissection. Despite the relative rarity of thoracic aortic dissection compared with other acute causes of pain, approximately half the patients selected for these studies turned out to have thoracic aortic dissection (pooled sensitivity, 52%). By comparison, only 0.003% of patients presenting to an emergency department with acute back, chest, or abdominal pain are eventually diagnosed with dissection.29 This implies that a full clinical history, examination, and radiograph substan-

tially select for patients with acute dissection. Furthermore, among patients referred for aortic imaging who turn out not to have an acute dissection, approximately half to threequarters are diagnosed with alternative serious diseases that can potentially be identified by imaging intended to confirm the diagnosis of thoracic aortic dissection (Table 50-7).29,33,43-45,59 The clinical syndrome suspicious for thoracic aortic dissection, although far from pathognomonic for acute dissection, does detect patients with serious disease that merit advanced diagnostic imaging.

THE BOTTOM LINE Despite the large number of case series describing patients with thoracic aortic dissection, the clinical examination for thoracic aortic dissection has yet to be prospectively scrutinized in an independent, blinded study. The extant data permit estimation of the sensitivity of clinical history, phys-

Table 50-6 Positive Likelihood Ratio of Aortic Dissection in Patients With Combinations of Findingsa No. of Findings

LR+ (95% CI)

3 2 1 0

66 (4.1-1062) 5.3 (3.0-9.4) 0.5 (0.3-0.8) 0.1 (0.0-0.2)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio. aData from Von Kodolitsch et al.29 Findings include aortic pain (severe, sudden-onset tearing pain), blood pressure or pulse differential between arms, or wide mediastinum on chest radiograph.

667

CHAPTER 50

The Rational Clinical Examination

Table 50-7 Final Diagnoses in Patients With Clinical Syndromes Suggestive of Thoracic Aortic Dissection but Without Thoracic Aortic Dissection on Further Study No. (%) of Patientsa Diagnosis Acute coronary syndrome Chest wall syndrome Mediastinal cyst or tumor Neuroradicular syndrome Pulmonary disease Hypertensive crisis Gastrointestinal disease (esophagitis, PUD, gastritis, pancreatitis) Pneumothorax Pulmonary embolism Pleuritis Pericarditis Nondissecting aneurysm Aortic plaque rupture or intramural hemorrhage Valvular pathology Arteriosclerotic emboli No definitive diagnosis

Von Kodolitsch et al,29 2000 (n = 122)

Enia et al,45 1989 (n = 11)

Armstrong et al,43 1998 (n = 41)b

Eagle et al,33 1986 (n = 51)c

18 (15) 18 (15) … 1 (0.8) 1 (0.8) 11 (9) 12 (9.8)

2 (18) … … … … … …

8 (20) … … … … … …

12 (24) … 4 (8) … … … 2 (4)

2 (1.6) 6 (4.9) 5 (4.0) 7 (5.7) … … … … 4 (3.3)

… 1 (9) … 4 (36) 1 (9) … … … 3 (27)

… … … 3 (7) 13 (32) 9 (22) 4 (10) … 14 (34)

… 1 (2) 1 (2) 3 (6) 4 (8) … 5 (10) 1 (2) 14 (28)

Abbreviation: PUD, peptic ulcer disease. aEllipses indicate data not available. bSome patients without thoracic aortic dissection were given multiple diagnoses. cIncluded 55 patients with suspected thoracic aortic dissections but negative aortogram results; 4 patients were false-negative cases and later demonstrated to have thoracic aortic dissection.

ical examination, and chest radiography but likely overestimate the accuracy of the clinical examination by selectively including more obvious cases. A small number of studies have included control populations and may therefore estimate the specificity of components of the clinical examination; however, the accuracy of these data is again limited by the lack of independence between the selection of patients for study and clinical findings. Given the high, rapid mortality associated with undiagnosed thoracic aortic dissection, prospective, independent studies of the clinical examination are needed to aid physicians in determining which aspects of the clinical examination ought to be relied on to refer patients rationally for further diagnostic studies. Until then, the current literature permits the following limited conclusions about the clinical examination: • Most patients with thoracic aortic dissection have severe pain of abrupt onset. The absence of pain of sudden onset substantively decreases the probability of dissection (LR–, 0.3; 95% CI, 0.2-0.5); however, the study design of the reports included in this article precludes accurate assessment of the sensitivity and specificity of these features. The presence of tearing or ripping pain (LR+, 1.2-11) or pain that migrates (LR+, 1.1-7.6) may prove useful, but additional data are required to know whether they are reliable features of the clinical history. 668

• Physical findings associated with thoracic aortic dissection tend to be present in a third or fewer cases; however, pulse deficits (LR+, 5.7; 95% CI, 1.4-23) or focal neurologic deficits (LR+, 6.6-33) greatly increase the likelihood of thoracic aortic dissection in the appropriate clinical setting. The presence or absence of a diastolic murmur is not useful (LR+, 1.4; LR–, 0.9). • A normal aorta and mediastinum on chest radiograph helps exclude the diagnosis (LR–, 0.3; 95% CI, 0.2-0.4), but no particular radiographic abnormality is dependably present. • The presence of the above findings in combination increases the LR+ for dissection, but even the absence of multiple findings does not definitively exclude the diagnosis. Clinical history, examination, and radiography can help rule in aortic dissection but are not sufficiently accurate to rule out the disease.

CLINICAL SCENARIOS—RESOLUTIONS CASE 1 The patient's clinical history of sudden onset of

severe chest pain is worrisome. His history of hypertension slightly increases his risk of a thoracic aortic dissection. The absence of a diastolic murmur, blood pressure differential, neurologic deficit, and widened mediastinum does not reliably exclude the diagnosis of thoracic aortic dissection. Given the high mortality of untreated or mistreated thoracic aortic dissection, this patient merits further advanced imaging.

CHAPTER 50

CASE 2 The presence of a neurologic deficit in a patient

with a clinical history consistent with thoracic aortic dissection is a specific finding. This patient has a high likelihood of having an acute thoracic aortic dissection and ought to undergo urgent diagnostic imaging to locate and delineate the suspected lesion.

Author Affiliation at the Time of the Original Publication

Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts. Acknowledgments

I thank Donald Glower, MD, Manesh Patel, MD, and David Isbell, MD, for thoughtful comments on an earlier version of this work.

REFERENCES 1. Morgagni JB. De sedibus et causis morborum, 1761. In: Rosman HS, Patel S, Borzak S, Paone G, Retter K. Quality of history taking in patients with aortic dissection. Chest. 1998;114(3):793-795. 2. Bean RB, Bean WB. Sir William Osler: Aphorisms From His Bedside Teachings and Writings. Springfield, IL: Charles C Thomas; 1961:138. 3. Hirst AE Jr, Johns VJ Jr, Kime SW Jr. Dissecting aneurysm of the aorta: a review of 505 cases. Medicine (Baltimore). 1958;37(3):217-279. 4. Nienaber CA, von Kodolitsch Y. Meta-analysis of changing mortality pattern in thoracic aortic dissection. Herz. 1992;17(6):398-416. 5. Hagan PG, Nienaber CA, Isselbacher EM, et al. The International Registry of Acute Aortic Dissection (IRAD): new insights into an old disease. JAMA. 2000;283(7):897-903. 6. Masuda Y, Yamada Z, Morooka N, et al. Prognosis of patients with medically treated aortic dissections. Circulation. 1991;84(suppl 5):III7III13. 7. Burt CW. Summary statistics for acute cardiac ischemia and chest pain visits to United States EDs, 1995-1996. Am J Emerg Med. 1999;17(6):552-559. 8. Barbant SD, Eisenberg MJ, Schiller NB. The diagnostic value of imaging techniques for aortic dissection. Am Heart J. 1992;124(2):541-543. 9. Marian AJ, Harris SL, Pickett JD, et al. Inadvertent administration of rtPA to a patient with type 1 aortic dissection and subsequent cardiac tamponade. Am J Emerg Med. 1993;11(6):613-615. 10. Mészáros I, Morocz J, Szlavi J, et al. Epidemiology and clinicopathology of aortic dissection. Chest. 2000;117(5):1271-1278. 11. Sullivan PR, Wolfson AB, Leckey RD, Burke JL. Diagnosis of acute thoracic aortic dissection in the emergency department. Am J Emerg Med. 2000;18(1):46-50. 12. Viljanen T. Diagnostic difficulties in aortic dissection: retrospective study of 89 surgically treated patients. Ann Chir Gynaecol. 1986;75(6): 328-332. 13. Spittell PC, Spittell JA Jr, Joyce JW, et al. Clinical features and differential diagnosis of aortic dissection: experience with 236 cases (1980 through 1990). Mayo Clin Proc. 1993;68(7):642-651. 14. Rosman HS, Patel S, Borzak S, et al. Quality of history taking in patients with aortic dissection. Chest. 1998;114(3):793-795. 15. Larson EW, Edwards WD. Risk factors for aortic dissection: a necropsy study of 161 cases. Am J Cardiol. 1984;53(6):849-855. 16. Murdoch JL, Walker BA, Halpern BL, et al. Life expectancy and causes of death in the Marfan syndrome. N Engl J Med. 1972;286(15):804-808. 17. Mattar SG, Kumar AG, Lumsden AB. Vascular complications in EhlersDanlos syndrome. Am Surg. 1994;60(11):827-831. 18. Sybert VP. Cardiovascular malformations and complications in Turner syndrome. Pediatrics. 1998;101(1):E11. 19. Soderbergh J, Malmvall BE, Andersson R, Bengtsson BA. Giant cell arteritis as a cause of death: report of 9 cases. JAMA. 1986;255(4):493-496. 20. Pumphrey CW, Fay T, Weir I. Aortic dissection during pregnancy. Br Heart J. 1986;55(1):106-108. 21. Rashid J, Eisenberg MJ, Topol EJ. Cocaine-induced aortic dissection. Am Heart J. 1996;132(6):1301-1304.

Thoracic Aortic Dissection

22. Rogers FB, Osler TM, Shackford SR. Aortic dissection after trauma: case report and review of the literature. J Trauma. 1996;41(5):906908. 23. Ohmoto Y, Ikari Y, Hara K. Aortic dissection during directional coronary atherectomy. Int J Cardiol. 1996;55(3):289-291. 24. Von Kodolitsch Y, Loose R, Ostermeyer J, et al. Proximal aortic dissection late after aortic valve surgery: 119 cases of a distinct clinical entity. Thorac Cardiovasc Surg. 2000;48(6):342-346. 25. Greenwood WR, Robinson MD. Painless dissection of the thoracic aorta. Am J Emerg Med. 1986;4(4):330-333. 26. Gerber O, Heyer EJ, Vieux U. Painless dissections of the aorta presenting as acute neurologic syndromes. Stroke. 1986;17(4):644-647. 27. Levinson DC, Edmeades DT, Griffith GC. Dissecting aneurysm of the aorta: its clinical, electrocardiographic and laboratory features. Circulation. 1950;1(3):360-387. 28. Singer AJ, Hollander JE. Blood pressure: assessment of interarm differences. Arch Intern Med. 1996;156(17):2005-2008. 29. Von Kodolitsch Y, Schwartz AG, Nienaber CA. Clinical prediction of acute aortic dissection. Arch Intern Med. 2000;160(19):2977-2982. 30. Choudhry NK, Etchells EE. The rational clinical examination: does this patient have aortic regurgitation? JAMA. 1999;281(23):2231-2238. 31. Leonard JC, Hasleton PS. Dissecting aortic aneurysms: a clinicopathological study. Q J Med. 1979;48(189):55-63. 32. Chan-Tack KM. Aortic dissection presenting as bilateral testicular pain. N Engl J Med. 2000;343(16):1199. 33. Eagle KA, Quertermous T, Kritzer GA, et al. Spectrum of conditions initially suggesting acute aortic dissection but with negative aortograms. Am J Cardiol. 1986;57(4):322-326. 34. Gleeson H, Hughes T, Northridge D, Prendergast BD. Bulbar palsies and chest pain. Lancet. 2000;356(9232):826. 35. Robertson GS, Macpherson DS. Aortic aneurysm presenting as deep venous thrombosis. Lancet. 1988;1(8590):877-878. 36. Pacifico L, Spodick D. ILEAD—ischemia of the lower extremities due to aortic dissection: the isolated presentation. Clin Cardiol. 1999;22(5):353356. 37. Slater EE, DeSanctis RW. The clinical recognition of dissecting aortic aneurysm. Am J Med. 1976;60(5):625-633. 38. Fultz PJ, Melville D, Ekanej A, et al. Nontraumatic rupture of the thoracic aorta: chest radiographic features of an often unrecognized condition. AJR Am J Roentgenol. 1998;171(2):351-357. 39. Earnest F IV, Muhm JR, Sheedy PF II. Roentgenographic findings in thoracic aortic dissection. Mayo Clin Proc. 1979;54(1):43-50. 40. Jagannath AS, Sos TA, Lockhart SH, et al. Aortic dissection: a statistical analysis of the usefulness of plain chest radiographic findings. AJR Am J Roentgenol. 1986;147(6):1123-1126. 41. Nienaber CA, Von Kodolitsch Y, Nicolas V, et al. The diagnosis of thoracic aortic dissection by noninvasive imaging procedures. N Engl J Med. 1993;328(1):1-9. 42. Sarasin FP, Louis-Simonet M, Gaspoz JM, Junod AF. Detecting acute thoracic aortic dissection in the emergency department: time constraints and choice of the optimal diagnostic test. Ann Emerg Med. 1996;28 (3):278-288. 43. Armstrong WF, Bach DS, Carey LM, et al. Clinical and echocardiographic findings in patients with suspected acute aortic dissection. Am Heart J. 1998;136(6):1051-1060. 44. Chan KL. Usefulness of transesophageal echocardiography in the diagnosis of conditions mimicking aortic dissection. Am Heart J. 1991;122(2):495-504. 45. Enia F, Ledda G, Lo Mauro R, et al. Utility of echocardiography in the diagnosis of aortic dissection involving the ascending aorta. Chest. 1989;95(1):124-129. 46. Erb BD, Tullis IF. Dissecting aneurysm of the aorta: the clinical features of thirty autopsied cases. Circulation. 1960;22:315-325. 47. Hume DM, Porter RR. Acute dissecting aortic aneurysms. Surgery. 1963;53:122-154. 48. Itzchak Y, Rosenthal T, Adar R, et al. Dissecting aneurysm of thoracic aorta: reappraisal of radiologic diagnosis. Am J Roentgenol Radium Ther Nucl Med. 1975;125(3):559-570. 49. Lindsay J, Hurst JW. Clinical features and prognosis in dissecting aneurysm of the aorta: a re-appraisal. Circulation. 1967;35(5):880-888. 50. Luker GD, Glazer HS, Eagar G, et al. Aortic dissection: effect of prospective chest radiographic diagnosis on delay to definitive diagnosis. Radiology. 1994;193(3):813-819.

669

CHAPTER 50

The Rational Clinical Examination

51. Miller DC, Stinson EB, Oyer PE, et al. Operative treatment of aortic dissections: experience with 125 patients over a sixteen-year period. J Thorac Cardiovasc Surg. 1979;78(3):365-382. 52. Nielsen NC. Dissecting aneurysm of the aorta. Acta Med Scand. 1961;170:117-127. 53. Pate JW, Richardson RL, Eastridge CE. Acute aortic dissections. Am Surg. 1976;42(6):395-404. 54. Pinet F, Froment JC, Guillot M, et al. Prognostic factors and indications for surgical treatment of acute aortic dissections: a report based on 191 observations. Cardiovasc Intervent Radiol. 1984;7(6): 257-266.

670

55. Strong WW, Moggio RA, Stansel HC Jr. Acute aortic dissection: twelve-year medical and surgical experience. J Thorac Cardiovasc Surg. 1974;68(5):815-821. 56. Eddy DM, Hasselblad V, Shachter RD. Meta-Analysis by the Confidence Profile Method: The Statistical Synthesis of Evidence. San Diego, CA: Academic Press; 1992. 57. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159-174. 58. Pyeritz RE. The Marfan syndrome. Annu Rev Med. 2000;51:481-510. 59. Oliver TB, Murchison JT, Reid JH. Spiral CT in acute non-cardiac chest pain. Clin Radiol. 1999;54(1):38-45.

U P D A T E : Thoracic Aortic Dissection

50

Prepared by Michael Klompas, MD Reviewed by Frank Lederle, MD

CLINICAL SCENARIO A 64-year-old man with history of hypertension is treated in the emergency department for a chief complaint of severe chest pain of recent onset, radiating to the abdomen. His heart examination is remarkable for the presence of an S4 but no murmurs. His electrocardiogram (ECG) has changes consistent with acute inferior myocardial infarction. You have drawn up a syringe full of tissue plasminogen activator that you are about to inject when a thought suddenly occurs to you: Could this patient have an acute thoracic aortic dissection?

UPDATED SUMMARY ON THORACIC AORTIC DISSECTION Original Review Klompas M. Does this patient have an acute thoracic aortic dissection? JAMA. 2002;287(17):2262-2272.

UPDATED LITERATURE SEARCH Additional aortic dissection studies were sought with the same parent search criteria used for The Rational Clinical Examination series, combined with the terms, “dissecting aneurysm,” “aortic rupture,” “aortic aneurysm, thoracic,” “aneurysm, dissecting,” “aortic diseases/diagnosis,” and the text word, “thoracic aortic dissection.” The search was conducted for studies published between 2000 and August 2004. In addition, articles citing the original Rational Clinical Examination articles were reviewed. The search strategy resulted in 468 articles. Titles and abstracts were reviewed with the same limitation criteria as in the original article to find large, consecutive series of patients suspected to have aortic dissection, whose diagnosis was confirmed with a reference standard investigation (computed tomography [CT] angiography, magnetic resonance imaging [MRI], transesophageal echocardiography [TEE], aortogram, surgical exploration, or autopsy). As before, studies limited to proximal or distal aortic dissection or abdominal aortic dissection were excluded. One new study was identified.

NEW FINDINGS • Younger patients with thoracic dissection (100) that appeared likely to meet our review criteria. We also examined all articles mentioned in the most recent American College of Obstetricians and Gynecologists Technical Bulletin.3 Each article was reviewed by at least 1 author and in ambiguous cases by all 3. Included articles and review articles were culled for further references. We attempted to contact the authors of all articles included in this review and to request additional references. We received replies from 7 authors, but no additional references were produced.

Inclusion and Exclusion Criteria Articles were included if they (1) involved original research performed on symptomatic patients in a primary care setting (including sexually transmitted disease clinics), (2) compared a diagnostic test with a recognized criterion standard, (3) allowed the calculation of sensitivity or specificity, and (4) discussed tests that would provide diagnostic information during the course of the office visit. We excluded articles that reported on women treated in specialty or referral settings, those with recurrent or treatment-refractory vaginitis, or asymptomatic patients (for example, women treated for routine pelvic examination).

Evaluation of Methods Eighteen articles met our inclusion and exclusion criteria and are listed in Table 52-1.6-12,23,30-39 We graded the articles’ diagnostic methodologic quality on a 3-point scale (highest to lowest quality). The grading and criteria are listed in Box 52-1. A different quality score from other Rational Clinical Examination articles (see Table 1-7) was required, because the focus of our study involved 3 different types of vaginitis, each of which have different laboratory criterion standards.

Evaluation of Criterion Standards The diagnostic criterion standard for vaginal candidiasis is a positive culture result or identification of yeast by

Vaginitis

microscopy. Because many asymptomatic women have vaginal yeast colonization, it is not clear whether a positive culture result or microscopy alone confirms Candida as the cause of symptoms, yet this is the current diagnostic criterion standard. We accepted studies that used microscopy only as a criterion standard but considered these of lower quality. We used the Amsel criteria40 as the criterion standard for the diagnosis of bacterial vaginosis. Bacterial vaginosis is diagnosed when 3 of 4 findings are present: (1) a thin, homogeneous vaginal discharge; (2) clue cells; (3) positive whiff test; and (4) vaginal pH level higher than 4.5.40 Several articles used either Gram stain or a positive culture for Gardnerella vaginalis as criterion standards, which we also accepted, although we did not consider this optimal. The criterion standard applied to the diagnosis of trichomoniasis is a positive culture result. Immunofluorescence and polymerase chain reaction are probably equivalent to culture. We accepted studies that included identification of trichomonads by direct microscopy or Papanicolaou tests, although these were considered of lesser quality.

Data Extraction Sensitivity, specificity, and likelihood ratios (LRs) were either taken directly from the article or calculated from data provided in the article. All of the authors extracted the data and computed sensitivity and specificity from each article independently. Disagreements were resolved by consensus. All data and any calculations were sent to the primary authors for their review. One author of an article12 we included provided additional data that have been incorporated into this review. A fourth person independently verified all data points. The absence of standard definitions for a variety of symptoms and signs, along with ambiguous phrasing of terms, made it impossible to combine results across studies.

Statistical Analysis Statistical analysis was performed using SPSS (version 10.0; SPSS Inc, Chicago, Illinois) and Stata (version 8; StataCorp, College Station, Texas) statistical software. When there were no patients in one of the 4 cells of a 2 × 2 table (true positive, false positive, false negative, true negative), the value 0.5 was added to each cell of the 2 × 2 table for calculating the LRs.

Results Precision

Precision refers to the degree to which independent observers will find the same result when applying the same test. No study reported the precision of the tests reviewed in this article. Accuracy of Symptoms

Tables 52-2 and 52-3 present the sensitivity, specificity, and LRs for all symptoms. The reviewed articles tested the following symptoms for their usefulness in the diagnosis of vaginal complaints: (1) characteristics of the discharge (quantity, 693

CHAPTER 52

The Rational Clinical Examination

Table 52-1 Included Studies of Diagnostic Strategies for Vaginal Symptoms

Source, y

No. of Patients

Setting

Symptoms

Urban ED or walk-in clinic; Denver, CO Private gynecologists; Zarka, Jordan Urban ED; Kansas City, MO

Vaginal itching, discharge, or pain Vaginal discharge

23 (32)

29 (41)

5 (7)

2

Candidiasis: culture only

78 (26)

90 (30)

9 (3)

2

Vaginal discharge

NA

NA

55 (35)

2

24 (25)

37 (38)

13 (13)

3

Not indicated

NA

NA

10 (15)

2

Bacterial vaginosis: Nugent criteriac Trichomoniasis: culture, microscopy, immunofluorescence Bacterial vaginosis: Spiegel criteriae; trichomoniasis: microscopy; candidiasis: microscopy Trichomoniasis: culture

Genital complaints

NA

79 (45)

19 (11)

2

Increased vaginal discharge, malodor, or pruritus Suspected vaginitis Vaginal discharge

141 (39)

NA

NA

2

21 (17)

NA

NA

2

Candidiasis: culture

53 (22)

91 (38)

10 (4)

3

Bacterial vaginosis: Amsel criteriag; candidiasis: microscopy; trichomoniasis: microscopy Candidiasis: culture

Abbott,12 1995b

71

Abu Shaqra,30 2001 Bennett et al,11 1989

301

Bleker et al,31 1989d

97

Urban general hospital gyne- Vaginal discharge cology clinic; Amsterdam, The Netherlands

Borchardt et al,32 1992 Briselden and Hillier,23 1994

69

3 Clinics (1 STD clinic); San Jose, Costa Rica STD clinic; Seattle, WA

157

176

Vaginal Bacterial Vaginal Candidiasis, Vaginosis, Trichomoniasis, Quality No. (%) No. (%) No. (%) Scorea

Criterion Standard

Bacterial vaginosis: clinical criteria; trichomoniasis: culture, microscopy Candidiasis: culture, microscopy

Bro,7 1989

361

General practices (n = 29); Aarhus, Denmark

Carlson et al,6 2000f Chandeying et al,10 1998

124

Gynecology outpatient clinic; Helsinki, Finland University gynecology outpatient clinic; Songlkla, Thailand

Eckert et al,33 1998 Fule et al,34 1990

774

STD clinic; Washington state

“A new problem”

186 (24)

294 (38)

116 (15)

2

200

Hospital gynecology clinic; Solapur, India

Abnormal vaginal discharge

NA

34 (17)

NA

2

Holst et al,35 1987

101

Community health center; Lund, Sweden

23 (23)

34 (34)

9 (9)

2

Krieger et al,36 1988 Livengood et al,37 1990 O’Dowd and West,9 1987h Ryu et al,38 1999 Schaaf et al,8 1990i

600

STD clinic; Seattle, WA

Genital malodor or abnormal vaginal discharge “New problems”

NA

NA

90 (15)

2

Trichomoniasis: culture

67

2 Hospital gynecology clinics

NA

NA

67 (100)

NA

2

162

Department of General Practice; Nottingham, England University obstetrics/gynecology clinic; Seoul, Korea County hospital family planning clinic or communitybased women’s health center; San Francisco, CA Swedish community health center; Lund, Sweden

Vaginal symptoms Vaginal discharge

NA

81 (50)

NA

3

NA

NA

18 (10)

2

Bacterial vaginosis: Amsel criteriag Bacterial vaginosis: culture only Trichomoniasis: culture

Evaluation for vaginitis

32 (26)

27 (22)

9 (7)

2

Bacterial vaginosis: Amsel criteriag; trichomoniasis: culture; candidiasis: culture

Vaginal discharge or malodor

23 (23)

34 (34)

9 (9)

2

Bacterial vaginosis: Amsel criteriag; trichomoniasis: culture; candidiasis: culture

Wathne et al,39 1994j

240

177 123

101

Bacterial vaginosis: culture and exclusion of other causes Bacterial vaginosis: Amsel criteriag

Abbreviations: ED, emergency department; NA, information not reported; STD, sexually transmitted disease. aSee Box 52-1 for criteria for quality scoring. bAdditional unpublished data from this study were included in this review. cDetermined using criteria from Nugent et al.25 dTwenty-two patients were not diagnosed. eDetermined using criteria from Spiegel et al.50 fSeventy-four patients were not diagnosed. gDetermined using criteria from Amsel et al.40 hNineteen patients were not diagnosed. iFifty-one patients were not diagnosed. Women with herpes or urinary tract infections were excluded. jData appear to be same as in Holst et al.35 Data on bacterial vaginosis were reported differently in this article and have been excluded from our analysis.

694

CHAPTER 52 color, consistency), (2) presence or absence of itching, (3) irritative symptoms (redness, pain/burning, swelling), (4) odor (present, fishy, or foul), (5) patient’s self-diagnosis, (6) urinary tract symptoms, (7) bleeding, and (8) dyspareunia. Discharge Characteristics

Patients’ descriptions of their discharge do not appear useful diagnostically with 1 exception. A “cheesy” discharge increases the likelihood of candidiasis (LR, 2.4; 95% confidence interval [CI], 1.4-4.2), whereas a watery discharge makes it less likely (LR, 0.12; 95% CI, 0.02-0.82). Itching

Several studies confirm that 70% to 90% of patients with vaginal candidiasis complain of itching (range of LRs, 1.4 to 3.3). Similarly, these studies show LRs ranging from 0.18 to 0.79 for women who do not have itching; thus, lack of itching decreases the likelihood of candidal infection. Itching symptoms are not useful for assessing the likelihood of bacterial vaginosis or trichomoniasis. Irritative Symptoms

The limited data suggest that irritative symptoms are slightly useful in the diagnosis of candidiasis. Erythema increases the likelihood of candidiasis slightly (LR, 2.0; 95% CI, 1.5-2.8); its absence decreases its likelihood (LR, 0.84; 95% CI, 0.76-0.92). Odor

The presence of an odor perceived by the patient decreases the likelihood of candidiasis (range of LRs, 0.35 to 0.48), whereas the absence of an odor increases its likelihood (range

Vaginitis

Box 52-1 Criteria for Quality Scoring LEVEL 1

Explicit inclusion and exclusion criteria. More than 95% of patients received specified diagnostic evaluation including criterion standard. More than 2 persons performed the diagnostic test, and a measure was made of interobserver variability. Sensible normal range defined for continuous variables (when applicable) and criterion standards were used (Amsel40 criteria for bacterial vaginosis, culture for vaginal trichomoniasis, and culture for vaginal candidiasis). (No studies met all level 1 criteria.) LEVEL 2

Level 2 studies failed 1 or more level 1 criteria or used the following criterion standards: for bacterial vaginosis, Amsel40 modification, Spiegel,50 Nugent,25 culture and exclusion of other causes; for vaginal trichomoniasis, polymerase chain reaction, immunofluorescence; and for vaginal candidiasis, culture. (Fifteen studies met level 2 criteria.) LEVEL 3

Level 3 studies failed 1 or more level 1 criteria or used the following criterion standards: for bacterial vaginosis, Gardnerella culture; for vaginal trichomoniasis, microscopy or Papanicolaou test; and for vaginal candidiasis, microscopy. (Three studies met level 3 criteria.)

Table 52-2 Accuracy of Symptoms for Diagnosis of Vaginal Candidiasis or Bacterial Vaginosis Symptom

Diagnosis

Type of discharge described by patient Any VC BV BV Cheesy VC Increased VC BV Watery VC White VC VC Yellow VC VC BV Malodor or odor VC VC VC BV BV BV BV

No. of Patients With Diagnosis

Sensitivity, %

Specificity, %

LR+ (95% CI)

LR– (95% CI)

Reference

32a 27a 67 23 186 34 23 32a 186 32a 186 27a 23 32a 23 34 67 27a 34

72 (NS) 59 (NS) 91 65 NS 59 4 41 (NS) NS 19 (NS) NS 26 (NS) 26 16 (NS) 21 97 73 41 (NS) 53

…b … … 73 … 67 63 … … … … … 46 … 37 40 … … …

… … … 2.4 (1.4-4.2) … 1.8 (1.2-2.8) 0.12 (0.02-0.82) … … … … … 0.48 (0.23-1.0) … 0.35 (0.16-0.77) 1.6 (1.3-2.0) … … …

… … … 0.48 (0.27-0.86) … 0.61 (0.40-0.95) 1.5 (1.2-1.9) … … … … … 1.6 (1.1-2.4) … 2.1 (1.5-3.0) 0.07 (0.01-0.51) … … …

8 8 38 12 34 36 12 8 34 8 34 8 12 8 40 36 38 8 40

(continued )

695

CHAPTER 52

The Rational Clinical Examination

Table 52-2 Accuracy of Symptoms for Diagnosis of Vaginal Candidiasis or Bacterial Vaginosis (Continued ) Symptom Itching

Chief complaint Irritation Pain or burningc Rednessc Swellingc Urinary tract Increased frequency of urination Dysuria

External dysuria Other “Another” yeast infection Abnormal bleeding

Diagnosis

No. of Patients With Diagnosis

Sensitivity, %

Specificity, %

LR+ (95% CI)

LR– (95% CI)

Reference

VC VC VC VC VCb BV BV VC BV BV VC VC VC VC

23 140 32a 23 186 34 27a 186 67 27a 32a 186 186 186

87 79 69 (NS) 91 50 41 67 (NS) 27 45 48 (NS) 69 (NS) 20 28 24

50 58 … 47 64 37 … 92 … … … 88 86 92

1.7 (1.3-2.4) 1.8 (1.6-2.2) … 1.7 (1.4-2.2) 1.4 (1.2-1.7) 0.66 (0.42-1.0) … 3.3 (2.4-4.8) … … … … 2.0 (1.5-2.8) 1.4 (1.2-1.7)

0.26 (0.09-0.78) 0.38 (0.27-0.53) … 0.18 (0.05-0.70) 0.78 (0.67-0.91) 1.6 (1.0-2.4) … 0.79 (0.72-0.87) … … … … 0.84 (0.76-0.92) 0.78 (0.67-0.91)

12 7 8 40 34 36 8 34 38 8 8 34 34 34

VC

32a

16 (NS)







8

VC BV BV VC

32a 27a 34 186

13 (NS) 11 (NS) 32 33

… … … 85

… … … 2.2 (1.6-2.9)

… … … 0.79 (0.71-0.88)

8 8 40 34

VC BV

23 67

35 4

90 …

3.3 (1.2-9.1) …

0.72 (0.53-1.0) …

12 38

Abbreviations: BV, bacterial vaginosis; CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis; VC, vaginal candidiasis. aPatient may have had more than 1 diagnosis. bEllipses indicate data not reported. cElicited by clinician.

Table 52-3 Accuracy of Symptoms for the Diagnosis of Vaginal Trichomoniaisis Symptom Type of discharge described by patient Any White Yellow Malodor or odor Any “Fishy” Itching Irritation Urinary tract Increased frequency of urination Dysuria Postcoital bleeding Dyspareunia

Sensitivity, %

Specificity, %

LR+ (95% CI)

LR– (95% CI)

Reference

8a 17 8a 8a

75 (NS) 65 13 (NS) 50 (NS)

…b 29 … …

… 0.90 (0.63-1.3) … …

… 1.2 (0.62-2.5) … …

8 39 8 8

8a 13 17 8a 8a

50 (NS) 46 35 75 (NS) 63 (NS)

… 45 76 … …

… … 0.84 (0.45-1.6) 1.2 (0.68-2.1) 1.5 (0.74-3.0) 0.85 (0.59-1.2) … … … …

8 32 39 8 8

8a 8a 17 17 17

38 (NS) 38 (NS) 0 0 6

… … 97 97 96

… … 0.64 (0.04-10) 0.9 (0.06-13) 1.4 (0.18-11)

8 8 39 39 39

… … 1.0 (0.85-1.3) 1.0 (0.75-1.4) 0.98 (0.87-1.1)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis. Patient may have had more than 1 diagnosis. b Ellipses indicate data not reported. a

696

No. of Patients With Diagnosis

CHAPTER 52

Vaginitis

LR+ (95% CI)

LR– (95% CI)

Reference

Table 52-4 Accuracy of Signs for the Diagnosis of Vaginal Candidiasis Sign

Type of discharge noted by clinician Any Yellow White Curdy Flocculent Consistency of discharge Thick Curdy Curdy Thin Inflammation Any Perineal edema or erythema Vulvar edema Erythema or edema Vulvar erythema Vaginal erythema Vulvar excoriations Vulvar fissures Vaginal wall Vulvar Cervical mucopus Odor noted by clinician Any “Fishy” Combined signs Curdy discharge or vulvar inflammation Curdy discharge in presence of itching

No. of Patients With Diagnosis Sensitivity, % Specificity, % 32a 32a 32a 140 23

87 (NS) 16 (NS) 63 (NS) 16 43

…b … … 97 84

… … … 6.1 (2.5-14) 2.7 (1.3-5.5)

… … … 0.86 (0.80-0.93) 0.67 (0.46-0.98)

8 8 8 7 40

32a 186 53 32a

52 18 72 48

… 99 100 …

… 15 (6.4-36) 130 (19-960) …

… 0.83 (0.78-0.89) 0.28 (0.19-0.44) …

8 34 10 8

140 23 186 23 186 186 186 186 32a 53 186

46 57 17 91 54 18 4 17 23 40 21

78 77 98 … 79 94 99 96 … 95 72

2.1 (1.5-2.8) 2.5 (1.3-4.6) 7.8 (4.2-15) … 2.5 (2.1-3.1) 2.9 (1.9-4.5) 8.4 (2.3-31) 4.6 (2.7-7.7) … 8.2 (4.0-16) 0.75 (0.55-1.0)

0.69 (0.58-0.82) 0.56 (0.35-0.92) 0.85 (0.79-0.91) … 0.58 (0.49-0.68) 0.88 (0.82-0.94) 0.96 (0.93-0.99) 0.86 (0.80-0.92) … 0.63 (0.51-0.79) 1.1 (1.0-1.2)

7 12 34 40 34 34 34 34 8 10 34

32a 24

6 0

… 28

… 0.03 (0-0.47)

… 2.9 (2.4-5.0)

8 32

53 53

81 77

95 100

17 (8.8-32) 150 (20-1000)

0.20 (0.11-0.35) 0.23 (0.14-0.37)

10 10

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis. aPatient may have had more than 1 diagnosis. bEllipses indicate data not reported.

of LRs, 1.6 to 2.1). Complaints of malodor (or odor) are so strongly associated with bacterial vaginosis that absence of malodor virtually ruled out the condition in 1 study (LR, 0.07; 95% CI, 0.01-0.51).35 A fishy odor noticed by the patient is not helpful in diagnosing trichomoniasis. Self-Diagnosis

Women who complain of having “another yeast infection” are more likely to have candidiasis (LR, 3.3; 95% CI, 1.2-9.1). Urinary tract symptoms were not found to be associated with any of the 3 diagnoses in 1 study,8 whereas Eckert et al33 found “external” dysuria associated with candidiasis. Bleeding

In one study of 17 patients with trichomoniasis, no patient complained of postcoital bleeding.38 Of 67 patients with bacterial vaginosis in the study by Livengood et al,37 only 4% complained of abnormal bleeding. Dyspareunia

Only 1 of 17 patients with trichomoniasis complained of dyspareunia, which is a nonsignificant association.38

Accuracy of Signs Tables 52-4 and 52-5 present the sensitivity, specificity, and LRs for all signs. We evaluated (1) characteristics of the discharge (amount, color, consistency), (2) inflammatory findings (edema, erythema, excoriations, tenderness, mucopus), and (3) odor. Discharge

The finding of a discharge on examination does not distinguish between the 3 conditions. More than 60% of patients with these diagnoses have a discharge. A thick, curdy, or flocculent white discharge is strongly predictive of candidiasis (range of LRs, 2.7 to 130). The absence of these characteristics makes candidiasis less likely (range of LRs, 0.28 to 0.86). Women whose discharge is judged normal (LR, 0.11; 95% CI, 0.01-0.86) to mild (LR, 0.53; 95% CI, 0.37-0.75) are less likely to have bacterial vaginosis than women with moderate (LR, 2.5; 95% CI, 1.7-3.8) to profuse (LR, 3.0; 95% CI, 0.32-28) discharge. A white discharge makes bacterial vaginosis less likely (range of LRs, 697

CHAPTER 52

The Rational Clinical Examination

Table 52-5 Accuracy of Signs for the Diagnosis of Bacterial Vaginosis or Vaginal Trichomoniasis Sign

Diagnosis

Type of discharge noted by clinician Any BV Vaginal discharge on vulvae BV Normal BV Mild BV Moderate BV Profuse BV Color or appearance Bloodstained BV Clear BV Green BV Mucoid BV Purulent, frothy BV Yellow BV BV VT VT White BV BV VT Curdy BV Consistency Homogeneous VT Thick BV VT Thin BV VT Transparent BV Inflammation Erythema or edema VT Vulvar BV BV Cervical BV Vaginal BV Vaginal wall BV VT Uterine/ad/nexal tenderness BV Odor noted by clinician Any BV VT VT High cheese BV

No. of Patients With Diagnosis Sensitivity, % Specificity, %

LR+ (95% CI)

LR– (95% CI)

Reference

27a 67 81 81 81 81

100 (NS) 64 1 33 62 4

…b … 89 37 75 99

… … … … 0.11 (0.01-0.86) 1.1 (1.0-1.2) 0.53 (0.37-0.75) 1.8 (1.3-2.5) 2.5 (1.7-3.8) 0.51 (0.38-0.69) 3.0 (0.32-28) 0.98 (0.93-1.0)

8 38 9 9 9 9

81 81 81 33 33 81 27a 8a 9 81 27a 8a 33

1 0 1 3 30 60 30 (NS) 50 (NS) 89 37 41 (NS) 13 (NS) 3

99 85 99 100 51 85 … … 93 32 … … 71

1.0 (0.06-16) 1.0 (0.97-1.0) 0.01 (0-0.16) 2.9 (1.6-5.4) 1.0 (0.06-16) 1.0 (0.97-1.0) 1.6 (0.10-24) 0.99 (0.92-1.1) 0.62 (0.34-1.1) 1.4 (0.96-1.9) 4.1 (2.4-7.1) 0.46 (0.35-0.62) … … … … 14 (6.1-31) 0.12 (0.02-0.75) 0.55 (0.40-0.75) 2.0 (1.4-2.8) … … … … 0.10 (0.01-0.74) 1.4 (1.1-1.7)

9 9 9 35 35 9 8 8 40 9 8 8 35

10 27a 8a 27a 8a 33

100 12 (NS) 0 (NS) 88 (NS) 100 (NS) 0

60 … … … … 96

2.2 (1.7-2.8) … … … … 0.31 (0.02-6.3)

0.15 (0.02-1.0) … … … … 1.0 (0.97-1.1)

10 8 8 8 8 35

17 67 67 67 67 27a 8a 67

18 1 12 10 15 33 (NS) 63 (NS) 12

97 … … … … … … …

6.4 (1.6-26) … … … … … … …

0.85 (0.68-1.1) … … … … … … …

39 38 38 38 38 8 8 38

27a 8a 8a 81

78 (NS) 87 (NS) 50 (NS) 78

… … … 75

… … … 3.2 (2.1-4.7)

… … … 0.30 (0.19-0.45)

8 8 8 9

Abbreviations: BV, bacterial vaginosis; CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis; VT, vaginal trichomoniasis. aPatient may have had more than 1 diagnosis. bEllipses indicate data not reported.

698

CHAPTER 52

Vaginitis

Table 52-6 Accuracy of Office Laboratory Tests for the Diagnosis of Vaginal Candidiasis or Bacterial Vaginosis Laboratory Test

Diagnosis

Microscopy Clue cells

Curved rods Mobiluncus-type rods Bacilli with corkscrew motility Lactobacilli scant or absent Yeast seen with potassium hydroxide

Yeast seen with saline Yeast seen with saline and methylene blue Yeast seen with Gram stain Trichomonads seen with saline Leukocytes more than epithelial cells Leukocytes on slide pH Level 4.5) and is incorporated into the case definition. A majority of patients (>90%) with trichomoniasis will have an increased pH level, but the specificity (51%) has been evaluated in only

Table 52-7 Accuracy of Office Laboratory Tests for the Diagnosis of Vaginal Trichomoniasis Laboratory Test Microscopy Clue cells Yeast seen with potassium hydroxide Trichomonads seen with saline

Leukocytes more numerous than epithelial cells Leukocytes on slide pH Level 4.9 >5.4 Whiff test result positive

No. of Patients With Diagnosis

Sensitivity, %

Specificity, %

LR+ (95% CI)

LR– (95% CI)

Reference

13 8a 8a 8a 9 18 10 88 55 9 8a

69 75 (NS) 13 (NS) 75 (NS) 78 67 0 60 49 100 25

33 …b … … … 100 100 100 100 74 …

1.0 (0.70-1.5) … … … … 100 (14-740) 4.5 (0.1-217) 310 (43-2200) 51 (7.1-360) 3.5 (2.3-5.2) …

0.93 (0.39-2.2) … … … … 0.34 (0.17-0.64) 0.96 (0.84-1.1) 0.40 (0.31-0.52) 0.51 (0.40-0.67) 0.14 (0.02-0.87) …

32 8 8 8 40 23 33 37 11 40 8

8a 9 13 8a 9

17 100 92 25 (NS) 67

… … 51 … 65

… … 1.9 (1.4-2.5) … 1.9 (1.1-3.3)

… … 0.15 (0.02-1.0) … 0.51 (0.20-1.3)

8 40 32 8 40

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio; NS, reported by author to be not significantly associated with diagnosis. aA patient may have had more than 1 diagnosis. bEllipses indicate data not reported.

700

CHAPTER 52 1 study. Unfortunately, given the overlap between the pH levels in various conditions, it is hard to draw firm conclusions from the existing literature. Whiff Test

A positive whiff test result makes candidiasis less likely (LR, 0.31; 95% CI, 0.12-0.79) but is positively associated with trichomoniasis (LR, 1.9; 95% CI, 1.3-2.7). A positive whiff test result is one of the diagnostic criteria for bacterial vaginosis.

Vaginitis

should consider empirical therapy or further testing for trichomonads or Candida. Clinicians may want to consider less common causes of vaginal symptoms, including gonorrhea, chlamydia, herpes, or genital warts. Finally, there may be no pathologic condition causing the discharge, and the clinician may elect, after discussion with the patient, an approach of watchful waiting.

Are These Symptoms and Signs Ever Normal? The distinction between normal and abnormal in terms of vaginal symptoms is problematic. The primary literature on normal vaginal discharge is scant.41 It appears that a normal vaginal discharge increases at midcycle (because of an increase in cervical mucus),42,43 can be malodorous,44 and may be accompanied by irritative symptoms (such as itch).45 This problem is compounded by the fact that the vaginal pathogens identified by the current diagnostic approach can be found in asymptomatic women.46,47 Gardnerella is part of the normal vaginal flora.48 Thus, the identification of microbes in a vaginal discharge does not prove that they create symptoms.

CLINICAL SCENARIOS—RESOLUTIONS CASE 1 What is the appropriate diagnostic evaluation?

No symptom has enough predictive power to allow the confident diagnosis of any of the 3 main causes of vaginitis. The wet mount examination remains the best way to make a diagnosis. Symptoms and signs can suggest a particular diagnosis. Candidiasis is associated with itching, a cheesy discharge, redness, and self-diagnosis, whereas bacterial vaginosis is associated with increased discharge and a complaint of odor. A watery discharge makes candidiasis unlikely. Inflammatory signs are relatively specific for vaginal candidiasis but are not always present and do occur in trichomoniasis. An absent or mild discharge makes bacterial vaginosis unlikely. Odor observed on examination occurs in bacterial vaginosis but not in candidiasis. Most diagnoses are made by microscopy and the whiff test. Most studies (but not all) would support that candidiasis is associated with a normal pH level. Although the microscopic identification of yeast or trichomonads is diagnostic, these causes cannot be ruled out by negative findings on microscopy. The presence of clue cells makes candidiasis less likely. A lack of lactobacilli and the presence of bacilli with corkscrew motility are 2 findings highly associated with bacterial vaginosis. CASE 2 What do you do when the diagnostic evaluation

fails? Despite a full medical history, physical examination, and microscopy, the evaluation in this case does not pinpoint a cause of the patient’s symptoms. There are several possibilities to consider in patients for whom the diagnostic evaluation is inconclusive. It is possible that the algorithm has failed to diagnose vaginal candidiasis or trichomoniasis; clinicians

THE BOTTOM LINE Our conclusions are subject to 2 important limitations. First, the LRs in these studies are not particularly robust. Second, despite dozens of articles devoted to the diagnosis of vaginal symptoms, we could locate only 18 that were useful in this review and none was of the highest methodologic quality. Current research on vaginitis has a number of weaknesses. Studies on vaginitis often mix together women with symptoms and those presenting for follow-up examinations or routine care. By analyzing data from these distinct patient groups as if they were one, the research fails to address either the question of how to diagnose patients with symptoms or how to screen for asymptomatic disease. The vocabulary of physical findings is not standardized (ie, what is a cheesy discharge?), case definitions for candidiasis and trichomoniasis are not clear, and multiple criterion standards are used. Scant attention has been paid to interobserver variability, which is a key issue in the clinical examination. Furthermore, most studies concentrate on diagnosing one particular etiology. However, the task facing the clinician is to choose among different etiologies. When 2 pathogens are identified in a study (mixed infections), it is conceptually difficult to clarify whether one, both, or neither is responsible for the symptoms. Finally, the studies on trichomonas, with only one exception, had fewer than 20 patients; this is not a good base on which to draw solid conclusions (a fact emphasized by the large 95% CIs of the LRs). In addition to these limitations, the existing diagnostic approach fails to diagnose approximately 30% of women with vaginal symptoms. The time is ripe for new approaches to these complaints. Despite these limitations, primary care clinicians need to be skilled in the diagnosis of vaginal candidiasis, bacterial vaginosis, and trichomoniasis. Patients may also have concerns regarding the meaning of these symptoms for their health and personal relationships49 and these concerns need to be addressed sensitively. Recognizing that the clinical examination is a limited tool in this setting presents the problem of finding ways to better diagnose and treat patients with vaginal symptoms. Vaginal symptoms may be the most common gynecologic complaint in primary care, but much remains to be learned about their clinical diagnosis. Author Affiliations at the Time of the Original Publication

Department of Family and Social Medicine, Albert Einstein College of Medicine, Bronx, New York (Dr Anderson); Center for Family Medicine in the College of Physicians and Surgeons, Columbia University, New York, New York (Dr Klink); 701

CHAPTER 52

The Rational Clinical Examination

and the Department of Family Practice, Beth Israel Medical Center/The Institute for Urban Family Health, New York, New York (Dr Cohrssen). Acknowledgments

Jean Abbott, MD (University of Colorado, Denver), kindly provided additional data from her study, which we gratefully incorporated in this review. Clyde Schechter, MD, provided invaluable assistance with the statistical analysis. Kathleen O’Toole assisted in reviewing data points from the included articles. Lori Bastian, MD, MPH, Joanne Piscitelli, MD, and David L. Simel, MD, MHS, provided extensive and invaluable advice in the preparation of this article.

REFERENCES 1. Kent HL. Epidemiology of vaginitis. Am J Obstet Gynecol. 1991;165(4 pt 2):1168-1176. 2. Mou S. Vulvovaginitis. In: Rakel RE, Bope ET, eds. Conn’s Current Therapy 2003. Philadelphia, PA: WB Saunders; 2003:1149-1152. 3. Technical Bulletin No. 226: Vaginitis. Washington, DC: American College of Obstetricians & Gynecologists; 1996. 4. Bickley LS. Acute vaginitis. In: Black ER, Bordley DR, Tape TG, Panzer RJ, eds. Diagnostic Strategies for Common Medical Problems. Philadelphia, PA: American College of Physicians–American Society of Internal Medicine; 1999:255-268. 5. Mulley AG. Approach to the patient with a vaginal discharge. In: Goroll AH, Mulley AG, eds. Primary Care Medicine: Office Evaluation and Management of the Adult Patient. Philadelphia, PA: Lippincott Williams & Wilkins; 2000:702-707. 6. Carlson P, Richardson M, Paavonen J. Evaluation of the Oricult-N dipslide for laboratory diagnosis of vaginal candidiasis. J Clin Microbiol. 2000;38(3):1063-1065. 7. Bro F. The diagnosis of Candida vaginitis in general practice. Scand J Prim Health Care. 1989;7(1):19-22. 8. Schaaf VM, Perez-Stable EJ, Borchardt K. The limited value of symptoms and signs in the diagnosis of vaginal infections. Arch Intern Med. 1990;150(9):1929-1933. 9. O’Dowd TC, West RR. Clinical prediction of Gardnerella vaginalis in general practice. J R Coll Gen Pract. 1987;37(295):59-61. 10. Chandeying V, Skov S, Kemapunmanus M, Law M, Geater A, Rowe P. Evaluation of two clinical protocols for the management of women with vaginal discharge in southern Thailand. Sex Transm Infect. 1998;74(3):194-201. 11. Bennett JR, Barnes WG, Coffman S. The emergency department diagnosis of Trichomonas vaginitis. Ann Emerg Med. 1989;18(5):564566. 12. Abbott J. Clinical and microscopic diagnosis of vaginal yeast infection: a prospective analysis. Ann Emerg Med. 1995;25(5):587-591. 13. Burstein GR, Murray PJ. Diagnosis and management of sexually transmitted diseases among adolescents. Pediatr Rev. 2003;24(4):119-127. 14. Ryan CA, Courtois BN, Hawes SE, Stevens CE, Eschenbach DA, Holmes KK. Risk assessment, symptoms, and signs as predictors of vulvovaginal and cervical infections in an urban US STD clinic: implications for use of STD algorithms. Sex Transm Infect. 1998;74(suppl 1):S59-S76. 15. Stuart-Harris C. The epidemiology and clinical presentation of herpes virus infections. J Antimicrob Chemother. 1983;12(suppl B):1-8. 16. Chiu A, Kelly K, Thomason J, Otte T, Mullins D, Fink J. Recurrent vaginitis as a manifestation of inhaled latex allergy. Allergy. 1999;54(2):184-186. 17. Ludman B. Human seminal plasma protein allergy: a diagnosis rarely considered. J Obstet Gynecol Neonatal Nurs. 1999;28(4):359-363. 18. Bachmann G, Nevadunsky N. Diagnosis and treatment of atrophic vaginitis. Am Fam Physician. 2000;61(10):3090-3096. 19. Berg AO, Heidrich FE, Fihn SD, et al. Establishing the cause of genitourinary symptoms in women in a family practice: comparison of clinical examination and comprehensive microbiology. JAMA. 1984;251(5):620625.

702

20. Mayaud P, ka-Gina G, Cornelissen J, et al. Validation of a WHO algorithm with risk assessment for the clinical management of vaginal discharge in Mwanza, Tanzania. Sex Transm Infect. 1998;74(suppl 1):S77-S84. 21. Wiesenfeld HC, Macio I. The infrequent use of office-based diagnostic tests for vaginitis. Am J Obstet Gynecol. 1999;181(1):39-41. 22. Allen-Davis JT, Beck A, Parker R, Ellis JL, Polley D. Assessment of vulvovaginal complaints: accuracy of telephone triage and in-office diagnosis. Obstet Gynecol. 2002;99(1):18-22. 23. Briselden AM, Hillier SL. Evaluation of affirm VP microbial identification test for Gardnerella vaginalis and Trichomonas vaginalis. J Clin Microbiol. 1994;32(1):148-152. 24. Gwyther RE, Addison LA, Spottswood S, Bentz EJ, Evens S, Abrantes A. An innovative method for specimen autocollection in the diagnosis of vaginitis. J Fam Pract. 1986;23(5):487-488. 25. Nugent RP, Krohn MA, Hillier SL. Reliability of diagnosing bacterial vaginosis is improved by a standardized method of Gram stain interpretation. J Clin Microbiol. 1991;29(2):297-301. 26. Gardner HL, Dukes CD. Haemophilus vaginalis vaginitis: a newly defined specific infection previously classified “nonspecific” vaginitis. Am J Obstet Gynecol. 1955;69(5):962-976. 27. Seattle STD/HIV Prevention Training Network. Examination of vaginal wet preps. http://depts.washington.edu/nnptc/online_training/wet_preps.html. Accessed June 15, 2008. 28. Association of Professors of Gynecology and Obstetrics Educational Series on Women’s Health Issues. Diagnosis of Vaginitis. Washington, DC: Association of Professors of Gynecology and Obstetrics; 1996. 29. Clinical Laboratory Improvements Amendments (CLIA). How to obtain a CLIA Certificate. Centers for Medicare and Medicaid Services. http:// www.cms.hhs.gov/CLIA/downloads/HowObtainCLIACertificate.pdf. Accessed June 15, 2008. 30. Abu Shaqra QM. Bacterial vaginosis among a group of married Jordanian women: occurrence and laboratory diagnosis. Cytobios. 2001;105(408):3543. 31. Bleker OP, Folkertsma K, Dirks-Go SI. Diagnostic procedures in vaginitis. Eur J Obstet Gynecol Reprod Biol. 1989;31(2):179-183. 32. Borchardt KA, Hernandez V, Miller S, et al. A clinical evaluation of trichomoniasis in San Jose, Costa Rica using the InPouch TV test. Genitourin Med. 1992;68(5):328-330. 33. Eckert LO, Hawes SE, Stevens CE, Koutsky LA, Eschenbach DA, Holmes KK. Vulvovaginal candidiasis: clinical manifestations, risk factors, management algorithm. Obstet Gynecol. 1998;92(5):757-765. 34. Fule RP, Kulkarni K, Jahagirdar VL, Saoji AM. Incidence of Gardnerella vaginalis infection in pregnant and non-pregnant women with non-specific vaginitis. Indian J Med Res. 1990;91:360-363. 35. Holst E, Wathne B, Hovelius B, Mardh PA. Bacterial vaginosis: microbiological and clinical findings. Eur J Clin Microbiol. 1987;6(5):536-541. 36. Krieger JN, Tam MR, Stevens CE, et al. Diagnosis of trichomoniasis: comparison of conventional wet-mount examination with cytologic studies, cultures, and monoclonal antibody staining of direct specimens. JAMA. 1988;259(8):1223-1227. 37. Livengood CH III, Thomason JL, Hill GB. Bacterial vaginosis: diagnostic and pathogenetic findings during topical clindamycin therapy. Am J Obstet Gynecol. 1990;163(2):515-520. 38. Ryu JS, Chung HL, Min DY, Cho YH, Ro YS, Kim SR. Diagnosis of trichomoniasis by polymerase chain reaction. Yonsei Med J. 1999;40(1):56-60. 39. Wathne B, Holst E, Hovelius B, Mardh PA. Vaginal discharge—comparison of clinical, laboratory and microbiological findings. Acta Obstet Gynecol Scand. 1994;73(10):802-808. 40. Amsel R, Totten PA, Spiegel CA, Chen KC, Eschenbach D, Holmes KK. Nonspecific vaginitis: diagnostic criteria and microbial and epidemiologic associations. Am J Med. 1983;74(1):14-22. 41. Anderson M, Karasz A, Friedland S. Can vaginal symptoms ever be normal? a review of the literature. Presented at: North American Primary Care Research Group Annual Meeting; November 3, 2002; New Orleans, LA. 42. Moghissi KS, Syner FN, Evans TN. A composite picture of the menstrual cycle. Am J Obstet Gynecol. 1972;114(3):405-418. 43. Billings EL, Brown JB, Billings JJ, Burger HG. Symptoms and hormonal changes accompanying ovulation. Lancet. 1972;1(7745):282-284. 44. Doty RL, Ford M, Preti G, Huggins GR. Changes in the intensity and pleasantness of human vaginal odors during the menstrual cycle. Science. 1975;190(4221):1316-1318.

CHAPTER 52 45. Priestley CJ, Jones BM, Dhar J, Goodwin L. What is normal vaginal flora? Genitourin Med. 1997;73(1):23-28. 46. Bergman JJ, Berg AO. How useful are symptoms in the diagnosis of Candida vaginitis? J Fam Pract. 1983;16(3):509-511. 47. Blake DR, Duggan A, Quinn T, Zenilman J, Joffe A. Evaluation of vaginal infections in adolescent women: can it be done without a speculum? Pediatrics. 1998;102(4 pt 1):939-944.

Vaginitis

48. Sobel JD. Vaginitis. N Engl J Med. 1997;337(26):1896-1903. 49. Karasz A, Anderson M. The vaginitis monologues: women’s experiences of vaginal complaints in a primary care setting. Soc Sci Med. 2003;56(5):10131021. 50. Spiegel CA, Amsel R, Holmes KK. Diagnosis of bacterial vaginosis by direct Gram stain of vaginal fluid. J Clin Microbiol. 1983;18(1):170177.

703

This page intentionally left blank

U P D A T E : Vaginitis

52

Prepared by Joanne T. Piscitelli, MD, and David L. Simel, MD, MHS Reviewed by Matthew Anderson, MD, MSc

CLINICAL SCENARIO A 25-year-old woman who recently became sexually active presents with concerns about her new vaginal discharge and vaginal itching. She has not noticed an odor. When you do a speculum vaginal examination, should the discharge be examined microscopically for bacterial vaginosis, yeast, and trichomonas or will the appearance of the discharge be sufficient for diagnosis?

UPDATED SUMMARY ON VAGINITIS Original Review Anderson MR, Klink K, Cohrssen A. Evaluation of vaginal complaints. JAMA. 2004;291(11):1368-1379.

UPDATED LITERATURE SEARCH Our literature search replicated that of the original article, confined to 2003 to April 2006. We identified 92 potential articles and reviewed the abstracts to find articles that included consecutive, prospectively identified patients with vaginal complaints in a primary care setting (primary care, general gynecology, or sexually transmitted disease clinics). Our focus was on identifying clinical studies that evaluated symptomatic women. We found 1 new article that met these standards. The literature search also uncovered 2 recent articles that assessed new bedside tests for bacterial vaginosis and trichomoniasis and that had data suitable for summarizing in likelihood ratios (LRs).

NEW FINDINGS • The patient’s symptom of an abnormal vaginal odor is a useful finding, but distinguishing bacterial vaginosis from vaginal candidiasis is not as efficient as proposed in the original report. Fortuitously, the LRs for bacterial vaginosis when the woman perceives an odor and for candidiasis when an odor is absent make perceived odor a useful symptom for clinical diagnosis. The patient’s perception of an odor increases her likelihood of bacterial vaginosis

(summary LR, 2.2; 95% confidence interval [CI], 1.4-3.6), whereas the absence of an odor has the same effect in increasing the likelihood of vaginal candidiasis (summary LR, 2.2; 95% CI, 1.9-2.5). • When clinicians do not have microscopes, point-of-care testing may prove useful for bacterial vaginosis and vaginal trichomoniasis.

Details of the Update A recent study1 includes the largest patient sample in which all 3 diagnoses were systematically evaluated. For each of the target conditions, the investigators reported data that allow calculation of the LRs for abnormal discharge, change in discharge, odor, vaginal pruritus, vaginal burning, and dysuria. A vaginal odor is the most useful symptom for distinguishing patients with bacterial vaginosis (odor symptoms present) from those with vaginal candidiasis (no perceived odor). No symptom worked for identifying women with vaginal trichomoniasis, because the LR CI for every symptom (both positive and negative LRs) includes 1. For both candidiasis and trichomoniasis, microscopic tests by the clinician are much more useful than the symptoms. The presence of yeast on a potassium hydroxide (KOH) preparation had an LR of 7.4 (95% CI, 3.8-15) vs culture, whereas the absence of yeast forms is less useful in identifying women who will have positive yeast culture results (LR, 0.80; 95% CI 0.74-0.87). The presence of trichomonads on a wet preparation slide was virtually diagnostic (LR, 22; 95% CI, 13-37). The absence of trichomonads does not rule out vaginal trichomoniasis because a culture result can still be positive (LR, 0.39; 95% CI, 0.29-0.53). Although not reviewed in the original Rational Clinical Examination article on vaginitis, point-of-care testing for both bacterial vaginosis and vaginal trichomoniasis is gathering increased attention. Approved products are now available and marketed toward clinics that do not have access to microscopes or trained personnel for assessing the presence of clue cells (bacterial vaginosis) or trichomonads. Compared with the Amsel criteria,2 the BVBlue Test (Gryphus Diagnostics, LLC, Birmingham, Alabama) has a positive LR of 9.8 (95% CI, 6.0-16) and a negative LR of 0.13 (95% CI, 0.08-0.21).3 The test uses a chromogenic assay for vaginal fluid sialidase produced by bacteria. Although the test takes 705

CHAPTER 52

Update

fewer than 10 minutes to perform, in this study the test kits were taken to a laboratory for processing. The findings require further study in a setting in which the clinic personnel interpret the results as a true “bedside” test, rather than sending the sample to a trained laboratory technician. A second type of point-of-care test for bacterial vaginosis incorporates a pH test and a test for amines (both of these are part of the Amsel criteria2). In a resource-poor environment, Azerbaijani women at a health fair were screened with the FemExam (Litmus Concepts, Inc, Santa Clara, California).4 Compared with the Amsel criteria,2 a FemExam result positive for both pH and amines has a sensitivity of 92% for bacterial vaginosis, suggesting that it may be a reasonable substitute for the complete Amsel criteria2 (positive LR, 7.5; 95% CI, 4.0-14). However, finding that both the pH and amine results are negative has an LR that is 0.45 (95% CI, 0.34-0.57), which is not low enough to rule out bacterial vaginosis, given its high pretest probability. Although most of the women in the study did have an abnormal vaginal discharge, not all were specifically seeking care for vaginitis. A point-of-care test for trichomoniasis (Xenostrip-Tv; Xenotope Diagnostics, San Antonio, Texas) identifies antigen to the protozoan. The test is highly efficient at confirming infection, with a positive LR of 361 (95% CI, 22-5845), but a normal result does not rule out vaginal trichomoniasis, with a negative LR of 0.52 (95% CI, 0.40-0.67).5

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION With data from the original Table 52-8 and from articles identified in the update,1,6 the prevalence of vaginal candidiasis, bacterial vaginosis, and vaginal trichomoniasis among women with vaginal complaints and presenting for care can be summarized. The summary estimates provide a reasonable anchor for making clinical decisions, though the data suggest geographic variability, which means that each provider needs a sense of prevalence in his or her own practice setting. The summary prevalences are as follows: bacterial vaginosis, 34% (95% CI, 28%-41%); vaginal candidiasis, 26% (95% CI, 22%-30%); and vaginal trichomoniasis, 10% (95% CI, 7%-15%). These prevalences support the notion that approximately 30% of women will have less common infections or remain undiagnosed after their evaluation. We calculated summary LR for several of the symptoms in which the results were clinically consistent across studies. When considering the CI associated with these summary LRs, the clinician should have a better sense for the utility of the findings.

CHANGES IN THE REFERENCE STANDARD None.

706

RESULTS OF LITERATURE REVIEW Table 52-8 Univariate Findings for Vaginitis Finding

Vaginal odor

Vaginal itching Yeast forms on a KOH preparation Trichomonads seen with a saline preparation

Condition (No. of Studies)a

Summary LR+ (95% CI)

Summary LR– (95% CI)

Patient Symptoms Bacterial vaginosis 2.2 (1.4-3.6) 0.30 (0.24-0.38) (2) Candidiasis (3) 0.29 (0.20-0.43) 2.2 (1.9-2.5) Candidiasis (5) 1.5 (1.3-1.8) 0.53 (0.33-0.86) Microscopic Tests Candidiasis (3) 4.8 (2.7-8.4) 0.78 (0.71-0.85) Trichomoniasis (5)

46 (17-121)

0.50 (0.36-0.71)

Abbreviations: CI, confidence interval; KOH, potassium hydroxide; LR+, positive likelihood ratio; LR–, negative likelihood ratio. a Data are combined from that in Table 2 of the original Rational Clinical Examination article article by Anderson et al7 and Table 6 in the article by Landers et al.1

EVIDENCE FROM GUIDELINES The Centers for Disease Control and Prevention funds an online training program developed by the Seattle STD/HIV Prevention Training Center that can be reviewed by clinicians who do office microscopy to diagnose vaginitis (http://depts.washington.edu/ nnptc/online_training/wet_preps_video.html; accessed June 15, 2008). Although bacterial vaginosis in pregnancy was not a focus of the review, the US Preventive Health Services Task Force8 evaluated the condition and found the evidence lacking to recommend for or against screening high-risk pregnant women for bacterial vaginosis. For clinicians who choose to screen, the task force observed that the Amsel criteria2 are the accepted clinical criteria even though the “optimal” test has not been determined.

CLINICAL SCENARIO—RESOLUTION The diagnosis of vaginitis requires microscopic examination of the vaginal discharge. Although you may not be able to determine a diagnosis in about 30% of patients, approximately 33% will have bacterial vaginosis, 25% will have candidiasis, and 10% will have trichomonas. The lack of a perceived odor makes candidiasis more likely (LR, 2.2), but the absence of the symptom is not conclusive. A thick or “curdy” discharge would be compatible with yeast, but women may have multiple infections. Thus, a diagnosis is best established by obtaining a specimen for: (1) measuring the pH; (2) preparing a slide for KOH assessment (evaluate the odor after application of KOH for the whiff test [bacterial vaginosis] and use the microscope to identify yeast forms); and (3) preparing a separate wet saline microscopic slide (for clue cells and trichomoniasis).

CHAPTER 52

Vaginitis

VAGINITIS—MAKE THE DIAGNOSIS

PRIOR PROBABILITY Among women with vaginal symptoms, the most common diagnoses are bacterial vaginosis (34%), vaginal candidiasis (26%), and vaginal trichomoniasis (10%). The prevalence changes across regions, so clinicians should be familiar with the findings in their own clinics.

POPULATION FOR WHOM VAGINITIS SHOULD BE CONSIDERED Vaginitis should be considered in any woman with concerns about a vaginal symptom that typically includes a combination of vaginal discharge, odor, irritation, or pruritus.

DETECTING THE LIKELIHOOD OF CAUSES OF VAGINITIS Although the presence of odor helps identify women more likely to have bacterial vaginosis versus candidiasis, no symptoms reliably identify those with trichomoniasis (see Table 52-9). Thus, unless point-of-care tests become validated, a microscopic evaluation is required for identifying clue cells (bacterial vaginosis), yeast forms (vaginal candidiasis), or trichomonads (vaginal trichomoniasis). Clinicians who do office microscopy need appropriate training to recognize the findings (http://depts.washington.edu/nnptc/online_ training/wet_preps_ video.html; accessed June 15, 2008).

REFERENCE STANDARD TESTS Bacterial Vaginosis

The pragmatic reference standard consists of the Amsel criteria.2 These require 4 different tests, of which at least 3 must have positive results: (1) a thin, homogenous vaginal discharge; (2) clue cells on microscopic examination; (3) positive whiff test; and (4) vaginal pH higher than 4.5.

REFERENCES FOR THE UPDATE 1. Landers DV, Wiesenfeld HC, Heine P, Krohn MA, Hillier SL. Predictive value of the clinical diagnosis of lower genital tract infection in women. Am J Obstet Gynecol. 2004;190(4):1004-1010.a 2. Amsel R, Totten PA, Spiegel CA, Chen KC, Eschenbach D, Holmes KK. Nonspecific vaginitis: diagnostic criteria and microbial and epidemiologic associations. Am J Med. 1983;74(1):14-22. 3. Bradshaw CS, Morton AN, Garland SM, Horvath LB, Kuzevska I, Fairley CK. Evaluation of a point-of-care test, BVBlue, and clinical and laboratory criteria for diagnosis of bacterial vaginosis. J Clin Microbiol. 2005;43(3):1304-1308. 4. Posner SF, Kerimova J, Aliyeva F, Duerr A. Strategies for diagnosis of bacterial vaginosis in a resource-poor setting. Int J STD AIDS. 2005;16(1):52-55.

Table 52-9 Likelihood Ratios of Symptoms and Microscopy for Vaginitis Finding

Condition

LR+ (95% CI)

LR– (95% CI)

Patient Symptoms Bacterial 2.2 (1.4-3.6) 0.30 (0.24-0.38) vaginosis Candidiasis 0.29 2.2 (1.9-2.5) (0.20-0.43) Vaginal itching Candidiasis 1.5 (1.3-1.8) 0.53 (0.33-0.86) Odor, itching, Trichomoniasis The LR+ and LR– have narrow CIs vaginal burning, that include 1, suggesting they are dysuria of no value Microscopic Tests Yeast forms on a Candidiasis 4.8 (2.7-8.4) 0.78 (0.71-0.85) KOH preparation (n = 3) Trichomonads Trichomoniasis 46 (17-121) 0.50 (0.36-0.71) seen with a saline (n = 5) preparation Vaginal odor (symptoms)

Abbreviations: CI, confidence interval; KOH, potassium hydroxide; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

The reference standard test requires culture, though culture cannot distinguish between infections and colonization. Trichomoniasis

The reference standard test in clinical research studies typically requires culture. However, in clinical practice the presence of trichomonads on a saline microscopic preparation is considered diagnostic, though the absence of trichomonads does not definitively rule out the condition.

5. Pillay A, Lewis J, Ballard RC. Evaluation of Xenostrip-Tv, a rapid diagnostic test for Trichomonas vaginalis infection. J Clin Microbiol. 2004;42 (8):3853-3856. 6. Luni Y, Munim S, Qureshi R, Tareen AL. Frequency and diagnosis of bacterial vaginosis. J Coll Physicians Surg Pak. 2005;15(5):270-272. 7. Anderson MR, Klink K, Cohrssen A. Evaluation of vaginal complaints. JAMA. 2004;291(11):1368-1379. 8. Berg AO. Screening for bacterial vaginosis in pregnancy: recommendations and rationale. Am J Prev Med. 2001;20(3 suppl):59-61. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

707

This page intentionally left blank

EVIDENCE TO SUPPORT THE UPDATE: Vaginitis

52

MAIN OUTCOME MEASURES TITLE Predictive Value of the Clinical Diagnosis of Lower Genital Tract Infection in Women. AUTHORS Landers DV, Wiesenfeld HC, Heine P, Krohn MA, Hillier SL.

Sensitivity and specificity of the clinical diagnosis compared with the laboratory diagnosis. The sensitivity and specificity of the various vaginal complaints for bacterial vaginosis could be calculated from data in the article.

CITATION Am J Obstet Gynecol. 2004;190(4):1004-1010. QUESTION Can experienced midlevel practitioners correctly diagnosis vaginitis among women with vaginal complaints? DESIGN Prospective, independent. SETTING Three sites in Pittsburgh, Pennsylvania: a student health center, a public sexually transmitted disease clinic, and a suburban public health clinic. Two of the clinicians were physician assistants and 1 was a nurse practitioner. Each clinician underwent specific instruction for the study and had competency testing in the bedside tests and microscopic studies. PATIENTS Women aged 18 to 45 years and with untreated genital complaints consisting of vaginal discharge, odor, itching, or lower genital tract burning.

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD Each patient filled out a questionnaire and then received a speculum examination. The clinician recorded evidence of mucopurulent cervicitis and evaluated vaginal secretions for color, viscosity, homogeneity, and odor after the addition of potassium hydroxide (KOH) to a sample of the vaginal secretions. The secretions were used to perform a KOH microscopic evaluation, pH testing, Gram stain, trichomonas, and yeast culture, along with endocervical cultures for sexually transmitted diseases and a Papanicolaou test. A clinical diagnosis for yeast was established from the microscopic KOH slide preparation that showed yeast. Trichomoniasis was diagnosed by observation of motile bacteria on the microscopic slide. Bacterial vaginosis was established by applying Amsel criteria.1 The laboratory reference standard diagnosis for trichomonas and yeast was established by culture, and bacterial vaginosis was established by Gram stain examined for Nugent criteria.2

MAIN RESULTS Among these 598 women with vaginal complaints, at least 1 microbiologic diagnosis was established in 79%. The distribution was bacterial vaginosis, 49%; vaginal yeast, 29%; trichomoniasis, 12%; and chlamydia or gonorrhea, 11%. Women could be coinfected by multiple organisms. Tables 52-10, 52-11, and 52-12 show the value of symptoms for bacterial vaginosis, candidiasis, and trichomoniasis. Table 52-13 displays the likelihood ratio (LR) of the clinical diagnosis for each infection compared to a laboratory criterion standard.

CONCLUSIONS LEVEL OF EVIDENCE Level 2. STRENGTHS The criteria for the clinicians’ diagnoses are

well outlined. Not only can the likelihood ratios (LRs) for the individual symptoms be reported but also the LRs for the bedside tests.

Table 52-10 Likelihood Ratios for Symptoms of Bacterial Vaginosis Compared With Amsel Criteria1 Symptoms

LR+ (95% CI)

LR– (95% CI)

Vaginal odor Change in discharge Abnormal discharge Dysuria Vaginal burning Vaginal pruritus

3.2 (2.6-3.9) 2.2 (1.8-2.6) 1.9 (1.7-2.2) 1.5 (0.97-2.3) 1.3 (0.96-1.9) 1.2 (0.97-1.5)

0.31 (0.25-0.39) 0.38 (0.31-0.47) 0.26 (0.19-0.35) 0.95 (0.89-1.1) 0.93 (0.86-1.0) 0.91 (0.81-1.0)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E52-1

CHAPTER 52

Evidence to Support the Update LIMITATIONS These results were those of experienced

Table 52-11 Likelihood Ratios for Symptoms of Vaginal Candidiasis Compared With Culture Symptoms Vaginal pruritus Vaginal burning Change in discharge Abnormal discharge Dysuria Vaginal odor

LR+ (95% CI)

LR– (95% CI)

1.1 (0.87-1.4) 0.58 (0.37-0.90) 0.47 (0.37-0.60) 0.40 (0.32-0.50) 0.37 (0.19-0.71) 0.22 (0.15-0.32)

0.95 (0.84-1.1) 1.1 (1.0-1.2) 1.9 (1.6-2.2) 3.0 (2.4-3.7) 1.1 (1.0-1.2) 2.3 (2.0-2.7)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

Table 52-12 Likelihood Ratios for Symptoms of Vaginal Trichomoniasis Compared With Culture Symptom

LR+ (95% CI)

LR– (95% CI)

Vaginal burning Dysuria Vaginal odor Vaginal pruritus Change in discharge Abnormal discharge

1.1 (0.86-1.8) 1.1 (0.58-2.1) 1.0 (0.78-1.3) 1.0 (0.70-1.4) 0.9 (0.7-1.2) 0.81 (0.65-1.0)

0.98 (0.87-1.1) 0.99 (0.90-1.1) 1.0 (0.79-1.3) 1.0 (0.84-1.2) 1.1 (0.87-1.4) 1.3 (1.0-1.8)

Abbreviations: CI, confidence interval; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

midlevel practitioners who were specifically trained to do the clinical and microscopic examination. Not only were they trained but also they demonstrated competency in the performance of the bedside tests. Generalist physicians would have to ensure their competency in microscopic examinations of vaginal secretions to replicate the results. However, the authors provide accuracy data for these 2 microscopic studies compared with cultures. A patient’s symptom of an abnormal vaginal odor makes bacterial vaginosis more likely, with an LR of 3.2 (95% confidence interval [CI], 2.6-3.9), whereas the absence of the odor makes vaginal candidiasis more likely, with an LR of 2.3 (95% CI, 2.0-2.7). A patient’s symptoms of a “change” in her vaginal discharge worked similarly (though not as well) to the presence of an odor: a change in the vaginal discharge made bacterial vaginosis more likely (LR, 2.2; 95% CI, 1.8-2.6), whereas no change in the discharge despite vaginal complaints increased the likelihood of candidiasis (LR, 1.9; 95% CI, 1.6-2.2). Vaginal pruritus was an inefficient finding for candidiasis. The symptoms have almost no value for diagnosing trichomoniasis. Although trichomoniasis is the least common of the 3 diagnoses, examination of a microscopic preparation for the organism is necessary. The presence of trichomonads on a microscopic specimen makes the diagnosis of trichomoniasis almost certain. The Amsel criteria1 for bacterial vaginosis and the presence of yeast on a KOH preparation are also much more useful than the individual clinical findings. Reviewed by David L. Simel, MD, MHS

Table 52-13 Likelihood Ratios for the Clinician’s Diagnosis Compared With Laboratory Diagnosis Clinical Diagnosis

LR+ (95% CI)

LR– (95% CI)

Trichomonas (wet preparation) Candidiasis (KOH preparation) Bacterial vaginosis (Amsel criteria)

21 (13-37) 7.4 (3.8-15) 4.0 (3.2-4.9)

0.39 (0.29-0.53) 0.80 (0.74-0.87) 0.11 (0.07-0.17)

Abbreviations: CI, confidence interval; KOH, potassium hydroxide; LR+, positive likelihood ratio; LR–, negative likelihood ratio.

E52-2

REFERENCES FOR THE EVIDENCE 1. Amsel R, Totten PA, Spiegel CA, Chen KC, Eschenbach D, Holmes KK. Nonspecific vaginitis: diagnostic criteria and microbial and epidemiologic associations. Am J Med. 1983;74(1):14-22. 2. Nugent RP, Krohn MA, Hillier SL. Reliability of diagnosing bacterial vaginosis is improved by a standardized method of Gram stain interpretation. J Clin Microbiol. 1991;29(2):297-301.

53

C H A P T E R

Does This Dizzy Patient Have a Serious Form of Vertigo? David A. Froehling, MD

CLINICAL SCENARIOS

Common Causes of Vertigo CASE 1 A 52-year-old woman was admitted to the hos-

pital because of nausea, a constant spinning sensation, and vomiting of 24 hours’ duration. Any movement of her head made these symptoms worse. On examination, she had bilateral horizontal spontaneous nystagmus. Two days later, after symptomatic improvement, she was discharged. At follow-up 2 weeks later, her symptoms and nystagmus had completely resolved. CASE 2 A 70-year-old woman had a 4-month history of

an intermittent whirling sensation when turning her head and especially when rolling over in bed. On examination, a left-side-down head-hanging maneuver elicited rotatory nystagmus, with the fast component to the left ear (Figure 53-1). There was a latency of about 3 seconds before the onset of nystagmus, which lasted approximately 10 seconds.

Marc D. Silverstein, MD David N. Mohr, MD Charles W. Beatty, MD

WHY EVALUATE VERTIGO? Vertigo is defined in Merriam-Webster’s dictionary1 as a disturbance “in which the external world seems to revolve around the individual or in which the individual seems to revolve in space.” Vertigo is an illusion of motion2 and is one of several forms of dizziness. The word dizziness is derived from the old English word dysig, meaning foolish or stupid. The modern usage of the word includes “a whirling sensation in the head with a tendency to fall,” “mentally confused or dazed,” and “giddiness.”1 In one study3 from a general internal medicine outpatient clinic, dizziness was the third most frequent complaint of patients. In a national survey reported in 1989,4 it was the 13th most frequent reason for visits to internists in the United States. Dizziness is often a diagnostic problem in the emergency department.5 Among patients treated in an emergency department,5 in an outpatient clinic,6 and in 2 subspecialty dizziness clinics,7,8 vertigo was the most frequent category of dizziness. Most patients with dizziness can be classified as having one of the following syndromes: 1. impaired perfusion of the central nervous system or near syncope (eg, orthostatic hypotension, cardiac presyncope) 2. dysequilibrium, a sensation of imbalance when standing or walking6 (eg, multiple sensory deficits) 3. psychogenic dizziness (eg, major depression, anxiety disorder, and somatization disorder) 4. vertigo (eg, Meniere disease and vestibular neuronitis)7 Usually dizziness can be classified according to information obtained from the medical history and physical examination. In this article, we concentrate on the evalua-

Copyright © 2009 by the American Medical Association. Click here for terms of use.

709

CHAPTER 53

The Rational Clinical Examination

Examiner rotates patient’s head laterally and extends patient’s neck.

Examiner lays patient down with head hanging off of table.

Examiner observes patient's eyes for appearance of nystagmus. L

R Slow

Slow

Rapid

Rapid

Examiner returns patient to seated position and allows rest for 30 seconds. The maneuver is repeated with the head extended and rotated in the opposite direction. Positive indication: maneuver reproduces patient's vertiginous symptoms and creates nystagmus.

Figure 53-1 How to Test for Positional Nystagmus The Dix-Hallpike maneuver for positional vertigo is performed by the examiner, who stands at the head of the bed. As the patient is supported and lowered into a position whereby his or her rotated and extended head hangs off the end of the examining table, the examiner observes for nystagmus. In this view, the patient’s head has been rotated to the left and expresses nystagmus with a slow response to the right and a rapid response the left. Repeating the maneuver with the head rotated in the opposite direction would reverse the direction of the nystagmus. A maneuver (with positive indication) will reproduce the patient’s symptoms.

710

tion of vertigo, the most common category of dizziness. Serious forms of vertigo are due to conditions associated with increased mortality or long-term disability. Vertigo severe enough to impair daily functioning and lasting for more than a month would be included as a serious form of vertigo. The importance of recognizing a patient’s complaint of dizziness as vertigo is that it narrows the list of possible causes. Customarily, the causes of vertigo are divided into central

causes (lesions of the central nervous system) and peripheral causes (lesions of the vestibular labyrinth or nerve or both) (Table 53-1). Because of the importance of detecting lesions or diagnosing syndromes that can be treated and because of the need to determine prognosis, physicians should attempt to make a specific diagnosis for patients with vertigo. Most cases of vertigo are due to lesions of the vestibular nerve or labyrinth.5-8 In 2 dizziness clinics, the most common cause of vertigo was benign paroxysmal positional vertigo.7,8

Table 53-1 Common Causes of Vertigo

PATHOPHYSIOLOGY OF VERTIGO AND NYSTAGMUS

Peripheral Benign paroxysmal positional vertigo Vestibular neuronitis Recurrent vestibulopathy Classic Meniere disease Head trauma (labyrinthine concussion) Acoustic neuroma Otosclerosis Herpes zoster oticus Cholesteatoma Perilymph fistula Aminoglycoside ototoxicity Central Vertebrobasilar transient ischemic attacks Cerebellar or brainstem stroke Brain tumors Multiple sclerosis Vertebrobasilar migraine

Origins of Vertigo The maintenance of the sense of balance and spatial orientation depends on input from the vestibular labyrinth, visual system, and proprioceptive nerves arising from tendons, muscles, and joints.9 The vestibular nuclei, which are in the medulla and lower pons, receive input from the vestibular labyrinth via the vestibular branch of cranial nerve VIII and from the cerebellum.10 The vestibular nuclei, in turn, send efferent fibers to the cerebellum, the medial longitudinal fasciculus, and the vestibulospinal tract. Visceral manifestations of vertigo (such as nausea and vomiting) are caused by altered input to the dorsal nucleus of the vagus nerve from the vestibular nuclei. Conscious awareness of vertigo resides in the superior temporal gyrus of the cerebral cortex9 and involves a mismatch between input to the cerebral cortex from the visual, proprioceptive, and vestibular systems.11 Lesions in various locations, including the inner ear, brain stem, and cerebellum, may all be manifested as vertigo.

CHAPTER 53

Origins of Nystagmus Nystagmus is the objective accompaniment of vertigo and is defined best as a “rhythmical oscillation of the eyes, with a fast movement in one direction and a slow movement in the other.”12 The fast component may be horizontal, vertical, rotatory, or any combination of these.13 There are 2 clinically relevant kinds of nystagmus in evaluating vertigo: Spontaneous nystagmus is elicited by having the patient look straight ahead, up, down, to the right, and to the left. This type of nystagmus is not influenced by head position.14 It is normal to have a few beats of nystagmus with extreme lateral gaze.13 Positional nystagmus is elicited by a head-hanging maneuver (Figure 53-1).13 Altered input passing from the vestibular nuclei to the nuclei of the extraocular muscles through the medial longitudinal fasciculus and related pathways in the reticular formation produces nystagmus. This input may be modified by information arising from the cerebral cortex and the cerebellum.13 For example, the fast component of spontaneous nystagmus depends on interaction between the vestibular system and the cerebral cortex.15

Vertigo

same as the dizzy symptom. The physician should initially hyperventilate along with the patient; this approach encourages the patient and demonstrates the desired rate and depth of breathing for the test.18 If hyperventilation reproduces the symptom, the dizziness is often psychogenic. However, the usefulness of hyperventilation in diagnosing psychogenic dizziness is unclear. In a study by Kroenke et al6 of 100 ambulatory patients with a chief complaint of dizziness, symptoms of dizziness were reproduced by hyperventilation in 21; however, only 1 of these patients had hyperventilation as the primary cause of dizziness. Most of them had dizziness inducible by other maneuvers in addition to hyperventilation. Further studies of the hyperventilation maneuver in the evaluation of patients with suspected psychogenic dizziness are needed. In this study of 100 patients, only 16% had pure psychogenic dizziness, but 24% had other causes of dizziness exacerbated by psychiatric illness.6

Second, Take a Proper Medical History From Patients With Vertigo After it is clear that the patient is describing vertigo, further questions help elicit clues about its specific cause. Ask When the Dizziness Occurs

HOW TO ELICIT THE SYMPTOMS AND SIGNS OF VERTIGO First, Distinguish Vertigo From Other Causes of Dizziness Patients often have difficulty describing symptoms of dizziness, and even those who have disorders that produce vertigo may not clearly describe a hallucination of movement. As Olsson and Atkins16 pointed out, “A person is so rarely conscious of his own vestibular system, he has a great deal of trouble describing his symptoms to a doctor.” Thus, clues must be gathered from the medical history and physical examination to classify the dizziness properly. Dizziness when standing may be due to vertigo, decreased cerebral perfusion,17 or dysequilibrium.6 If the patient reports having symptoms of dizziness primarily while standing, the blood pressure should be checked with the patient in the supine position and also after standing for 5 minutes. If there is an orthostatic decrease in blood pressure, the symptom is likely due to impaired central nervous system perfusion. Unsteadiness while walking, especially in elderly patients, is often due to dysequilibrium (a feeling of imbalance). The cause is usually multifactorial. On examination, the findings of decreased visual acuity and signs of peripheral neuropathy or abnormal vestibular function support a diagnosis of dysequilibrium.6,7 Dizziness when turning, and especially when rolling over in bed, is usually due to vertigo. Psychogenic dizziness is a diagnosis of exclusion that should be considered especially in patients with psychiatric illnesses, such as major depression, anxiety disorder, and a somatization disorder. In this setting, the patient should be asked to hyperventilate for 2 minutes and then asked whether the feeling associated with hyperventilation is exactly the

It is probably more important to ask a patient about the circumstances in which the dizziness occurs than to ask for a description of the dizziness. Dizziness related to early-morning activities is somewhat helpful in distinguishing between peripheral and central vertigo. Matutinal vertigo (vertigo on first arising in the morning) is usually due to a peripheral vestibular disorder.19 Ask About Other Otologic Symptoms

Associated otologic symptoms can be helpful in identifying a peripheral cause of vertigo. Hearing loss and vertigo are common in patients with otosclerosis.20 Episodes of hearing loss with vertigo, tinnitus, and a sensation of fullness in the ear occur in patients with Meniere disease.21 Patients with acoustic neuromas usually present with hearing loss rather than vertigo. Most of these patients notice dizziness but complain of unsteadiness rather than vertigo.22 Ask About Other Neurologic Symptoms

Symptoms of neurologic disease, such as weakness, difficulty with speech, or diplopia, in addition to vertigo suggest a central cause. Ask About Symptom Patterns

Patients with vestibular neuronitis (also called labyrinthitis), benign paroxysmal positional vertigo, and recurrent vestibulopathy (also called benign recurrent vertigo and vestibular Meniere disease) have normal hearing.23-26 Patients with benign paroxysmal positional vertigo23 (also called benign paroxysmal positional nystagmus27 and cupulolithiasis28) have intermittent episodes of vertigo with head turning.23,29 Vestibular neuronitis is characterized by a relatively sudden onset of severe, constant vertigo (made worse by head movement) that resolves after days or weeks.23,30 Patients with recurrent vestibulopathy have intermittent episodes of constant vertigo lasting for minutes or hours.24,25 Vertigo (with or 711

CHAPTER 53

The Rational Clinical Examination

Table 53-2 Accuracy of Signs and Symptoms for Diagnosing Peripheral Vertigo in an Emergency Departmenta No. of Patients With Peripheral Vertigo (Not an Emergency)

No. of Patients With Other Causes of Dizziness That Might Be an Emergency

Total

Predictive Value, %

Likelihood Ratio

23 31 54

4 67 71

27 98 125

Positive 85 (23/27) Negative 68 (67/98) …c

7.6 0.6 …

Positive cluster of signs and symptomsb Lack of one or more elements in cluster Total a

Data from Herr et al.5 Positive cluster includes positive results on head-hanging maneuver plus either vertigo or vomiting. c Ellipses indicate not applicable. b

without hearing loss) in a patient who has recently received aminoglycoside antibiotics may be due to the toxic effect these agents have on the vestibular labyrinth.31

How to Examine Patients With Vertigo Findings on physical examination can help physicians detect abnormalities that can be used to determine the cause of vertigo. Perform a Brief Neurologic Examination

Look for cranial nerve palsies, weakness, reflex changes, ataxia, decreased sensation in the feet, and abnormalities of gait and station. Vertical nystagmus is associated with lesions of the vestibular nuclei or of the cerebellar vermis.13 Neurologic findings other than pathologic nystagmus suggest that the lesion is central. Examine the Ears

Hearing should be checked.32 Cholesteatoma, a complication of chronic otitis media that can present with hearing loss, drainage from the ear, and vertigo, may be found33; the usual treatment for this is surgery. Alternatively, vesicles associated with herpes zoster oticus (also called Ramsay Hunt syndrome) may be present; patients with this condition often have facial palsy and deafness, together with vertigo.34 Check for Spontaneous Nystagmus

Patients with vestibular neuronitis usually have spontaneous horizontal nystagmus or a mixture of spontaneous horizontal nystagmus and rotatory nystagmus.30 Patients with disorders of the central nervous system may also have spontaneous nystagmus.35 In most of the patients examined by Silvoniemi,30 Lachman and Stahle,36 and Aantaa and Virolainen,37 nystagmus was readily apparent, but in some, detection required Frenzel glasses or electronystagmographic monitoring with the patients’ eyes closed. Patients with vestibular neuronitis may also have positional nystagmus.30 Patient 1 in the clinical scenarios had vestibular neuronitis. Perform a Head-Hanging Maneuver

Most physicians test for positional nystagmus with a method first outlined by Dix and Hallpike23 and more recently by Mohr.29 The head-hanging maneuver begins with the patient in a sitting position, with gaze fixed on the examiner’s forehead (Figure 53-1). The examiner firmly grasps the patient’s head and has the patient quickly lie supine, with the head turned about 30 degrees to one side and about 30 degrees below the level of the examining table. Next, the patient sits 712

up, and the maneuver is repeated with the head turned to the opposite side. In 1979, Baloh et al38 observed that if the maneuver was performed slowly (during a period of 20 seconds), nystagmus was not induced; thus, they recommended performing the position change in about 2 seconds. After each head-hanging maneuver, the physician should observe the patient’s eyes for 5 to 15 seconds to determine whether nystagmus has been induced.29 Overall, it takes about 3 to 5 minutes to explain the head-hanging maneuver to the patient, to perform the position changes, and to observe for nystagmus. Benign paroxysmal positional vertigo is the most common cause of vertigo7,8 and can usually be suspected on the basis of the medical history alone. Features of this syndrome include vertigo that occurs only with positional changes and an associated positional nystagmus that is usually rotatory, with a vertical or horizontal component. Also, the nystagmus usually begins 5 to 15 seconds after the head-hanging maneuver, lasts 2 to 30 seconds, and, if the patient is repeatedly returned to the provocative position, occurs less and less until it cannot be induced.23,29 Positional nystagmus cannot always be elicited in a patient with a history otherwise compatible with the diagnosis of benign paroxysmal positional vertigo.39-41 Its occurrence during a head-hanging maneuver occasionally makes a vague description of dizziness clearer. Rarely, patients with central nervous system lesions may present with positional vertigo and nystagmus and with no other neurologic abnormality.42 Patient 2 in the clinical scenarios had benign paroxysmal positional vertigo. Learning how to check for positional nystagmus usually requires practice. Always explain to the patient what you are going to do before performing a head-hanging maneuver. Specifically, ask the patient to keep the eyes open if he or she becomes vertiginous; many patients close their eyes if vertigo develops. The head-hanging maneuver should be performed quickly but not so rapidly as to injure the patient. Be observant because the nystagmus may last only a few seconds.

Accuracy of the Symptoms and Signs of Vertigo Data are available on 3 clinically relevant questions about the accuracy of the clinical examination in patients with vertigo. 1. Can positional nystagmus identify patients with benign paroxysmal positional vertigo? The answer is, not very well. Only 198 of 255 patients with positional vertigo

CHAPTER 53

Vertigo

Table 53-3 Accuracy of Signs and Symptoms for Detecting Serious Causes of Dizziness in an Emergency Departmenta

Absence of vertigo, age >69 y, or neurologic deficit Presence of vertigo, age ≤69 y, and no neurologic deficit Total

No. of Patients With Serious Causes of Dizzinessb

No. of Patients With Nonserious Causes of Dizziness

Total

Predictive Value, %

Likelihood Ratio

33 5 38

50 37 87

83 42 125

Positive 40 (33/83) Negative 88 (37/42) . . .c

1.5 0.3 ...

a

Data from Herr et al.5 Serious causes of dizziness include medication adverse effects, seizures, stroke, and cardiac arrhythmia. c Ellipses indicate not applicable. b

examined in a dizziness clinic had positional nystagmus during initial and subsequent examinations (sensitivity, 78%).39 In an epidemiologic study of positional vertigo, only 13 of 26 patients tested had positional nystagmus (sensitivity, 50%).41 2. Can matutinal vertigo distinguish peripheral causes from central causes of vertigo? Again, the answer is, not very well. In a study of 100 neurology patients (48 of whom had matutinal vertigo), matutinal vertigo had a sensitivity of 51% and a specificity of 69% for peripheral disorders,43 and in an epidemiologic study, symptoms of vertigo when rolling over in bed generated a sensitivity of 40% for benign paroxysmal positional vertigo.41 3. Can any set of symptoms and signs distinguish urgent causes from nonurgent causes of dizziness? Symptoms and signs can help identify patients in need of an urgent evaluation, as shown in Tables 53-2 and 53-3, which are from a study of 125 emergency department patients with the complaint of dizziness.5 Patients who had the highly specific cluster of positive results on the head-hanging test and either vertigo or vomiting almost always had a nonurgent peripheral vertigo (a finding with high specificity, if positive, tends to rule in the target disorder). In Table 53-3, the high sensitivity (87%) of the absence of vertigo or age older than 69 years or the presence of a neurologic deficit for a serious cause of dizziness meant that younger patients with vertigo but no neurologic deficit were unlikely to have an urgent cause of dizziness (a finding with high sensitivity, if negative, tends to rule out the target disorder). These reassuring results of the accuracy of the clinical examination come from a single study in an emergency department with rates of peripheral vertigo and serious disease characteristic of such settings; they need independent confirmation in different settings. Although the nonurgent causes of dizziness may not require immediate hospitalization, some of the causes of peripheral vertigo (eg, acoustic neuroma) deserve further diagnostic study.

THE BOTTOM LINE The following are our recommendations on useful symptoms and signs in the evaluation of patients with dizziness:

1. In patients with suspected vertigo, ask whether they have dizziness when changing body position (rolling over in bed, looking up at the ceiling, or bending over to tie shoelaces) and perform a head-hanging maneuver to check for positional nystagmus. 2. In combination with other data (including a brief neurologic examination) in an emergency department setting, the presence of positional nystagmus can be useful when evaluating for serious causes of dizziness. Author Affiliations at the Time of the Original Publication

Division of Area General Internal Medicine (Drs Froehling, Silverstein, and Mohr), Department of Health Sciences Research (Dr Silverstein), and Department of Otorhinolaryngology (Dr Beatty), Mayo Clinic and Mayo Foundation, Rochester, Minnesota.

REFERENCES 1. Webster’s Third New International Dictionary of the English Language, Unabridged. Springfield, MA: Merriam-Webster Inc; 1986:664. 2. Adams RD, Victor M. Deafness, dizziness, and disorders of equilibrium. In: Principles of Neurology. 3rd ed. New York, NY: McGraw-Hill Book Co; 1985:216-218. 3. Kroenke K, Mangelsdorff AD. Common symptoms in ambulatory care: incidence, evaluation, therapy, and outcome. Am J Med. 1989;86(3):262266. 4. Woodwell DA. Office visits to internists, 1989. Adv Data. 1992;(209):110. 5. Herr RD, Zun L, Mathews JJ. A directed approach to the dizzy patient. Ann Emerg Med. 1989;18(6):664-672. 6. Kroenke K, Lucas CA, Rosenberg ML, et al. Causes of persistent dizziness: a prospective study of 100 patients in ambulatory care. Ann Intern Med. 1992;117(11):898-904. 7. Drachman DA, Hart CW. An approach to the dizzy patient. Neurology. 1972;22(4):323-334. 8. Nedzelski JM, Barber HO, McIlmoyl L. Diagnoses in a dizziness unit. J Otolaryngol. 1986;15(2):101-104. 9. Frederick MW. Central vertigo. Otolaryngol Clin North Am. 1973;6(1):267285. 10. Kelly JP. Vestibular system. In: Kandel ER, Schwartz JH, eds. Principles of Neural Science. 2nd ed. New York, NY: Elsevier Science Publishing Co; 1985:591-595. 11. Lehrer JF, Poole DC. Diagnosis and management of vertigo. Compr Ther. 1987;13(9):31-40. 12. Rowland LP. Clinical syndromes of the brain stem. In: Kandel ER, Schwartz JH, eds. Principles of Neural Science. 2nd ed. New York, NY: Elsevier Science Publishing Co; 1985:599. 13. Mayo Clinic Department of Neurology. Clinical Examinations in Neurology. 5th ed. Philadelphia, PA: WB Saunders Co; 1981:63-95.

713

CHAPTER 53

The Rational Clinical Examination

14. Mylén CO. Positional nystagmus: a review and future prospects. J Laryngol Otol. 1950;64(6):295-318. 15. Plum F, Posner JB. The Diagnosis of Stupor and Coma. 3rd ed. Philadelphia, PA: FA Davis Co; 1980:58. 16. Olsson JE, Atkins JS. Vestibular disorders. Otolaryngol Clin North Am. 1987;20(1):83-111. 17. Barber HO. Positional vertigo and nystagmus. Otolaryngol Clin North Am. 1973;6(1):169-187. 18. Magarian GJ. Hyperventilation syndromes: infrequently recognized common expressions of anxiety and stress. Medicine (Baltimore). 1982;61(4):219-236. 19. Fisher CM. Vertigo in cerebrovascular disease. Arch Otolaryngol. 1967;85(5):529-534. 20. Thomas JE, Cody DTR. Neurologic perspectives of otosclerosis. Mayo Clin Proc. 1981;56(1):17-21. 21. Pulec JL. Meniere’s disease: etiology, natural history, and results of treatment. Otolaryngol Clin North Am. 1973;6(1):25-39. 22. Harner SG, Laws ER Jr. Diagnosis of acoustic neurinoma. Neurosurgery. 1981;9(4):373-379. 23. Dix MR, Hallpike CS. The pathology, symptomatology, and diagnosis of certain common disorders of the vestibular system. Proc R Soc Med. 1952;45(6):341-354. 24. Leliever WC, Barber HO. Recurrent vestibulopathy. Laryngoscope. 1981;91(1):1-6. 25. Slater R. Benign recurrent vertigo. J Neurol Neurosurg Psychiatry. 1979;42(4):363-367. 26. Alford BR. Meniere’s disease: criteria for diagnosis and evaluation of therapy for reporting (report of Subcommittee on Equilibrium and Its Measurement). Trans Am Acad Ophthalmol Otolaryngol. 1972;76(6): 1462-1464. 27. Harbert F. Benign paroxysmal positional nystagmus. Arch Ophthalmol. 1970;84(3):298-302. 28. Schuknecht HF. Cupulolithiasis. Arch Otolaryngol. 1969;90(6):765-778.

714

29. Mohr DN. The syndrome of paroxysmal positional vertigo: a review. West J Med. 1986;145(5):645-650. 30. Silvoniemi P. Vestibular neuronitis: an otoneurological evaluation. Acta Otolaryngol Suppl (Stockh). 1988;453:1-72. 31. Jackson GG, Arcieri G. Ototoxicity of gentamicin in man: a survey and controlled analysis of clinical experience in the United States. J Infect Dis. 1971;124(suppl):S130-S137. 32. Bagai A, Thavendiranathan P, Detsky AS. Does this patient have hearing impairment? JAMA. 2006;295(4):416-428. 33. Vernick DM, Branch WT Jr. The painful or discharging ear. In: Branch WT Jr, ed. Office Practice of Medicine. 2nd ed. Philadelphia, PA: WB Saunders Co; 1987:291-293. 34. Adams RD, Victor M. Viral infections of the nervous system. In: Principles of Neurology. 3rd ed. New York, NY: McGraw-Hill Book Co; 1985:552-553. 35. Nylen CO. The oto-neurological diagnosis of tumours of the brain. Acta Otolaryngol. 1939;33(suppl):5-151. 36. Lachman J, Stahle J. Vestibular neuritis: a clinical and electronystagmographic study. Neurology. 1967;17(4):376-380. 37. Aantaa E, Virolainen E. Vestibular neuronitis: a follow-up study. Acta Otorhinolaryngol Belg. 1979;33(3):401-404. 38. Baloh RW, Sakala SM, Honrubia V. Benign paroxysmal positional nystagmus. Am J Otolaryngol. 1979;1(1):1-6. 39. Katsarkas A, Kirkham TH. Paroxysmal positional vertigo: a study of 255 cases. J Otolaryngol. 1978;7(4):320-330. 40. Baloh RW, Honrubia V, Jacobson K. Benign positional vertigo: clinical and oculographic features in 240 cases. Neurology. 1987;37(3):371-378. 41. Froehling DA, Silverstein MD, Mohr DN, Beatty CW, Offord KP, Ballard DJ. Benign positional vertigo: incidence and prognosis in a population-based study in Olmsted County, Minnesota. Mayo Clin Proc. 1991;66(6):596-601. 42. Watson P, Barber HO, Deck J, Terbrugge K. Positional vertigo and nystagmus of central origin. Can J Neurol Sci. 1981;8(2):133-137. 43. Berkowitz BW. Matutinal vertigo: clinical characteristics and possible management. Arch Neurol. 1985;42(9):874-877.

U P D A T E : Vertigo

53

Prepared by David L. Simel, MD, MHS Reviewed by David A. Froehling, MD, and Richard Bedlack, MD, PhD

CLINICAL SCENARIO A 58-year-old healthy man presents with dizziness. One week ago, he had an upper respiratory illness consisting of a slight fever, cough, and rhinorrhea. During the previous 2 days, he has had 3 episodes of extreme unbalance lasting less than 3 to 4 minutes, when he felt as if he were “drunk.” During these episodes, he felt nauseated, which caused him to lie down and close his eyes until the symptoms resolved. He has had no hearing loss. Your neurologic examination reveals no focal findings in the cranial or peripheral nerves.

Original Review Froehling DA, Silverstein MD, Mohr DN, Beatty CW. Does this dizzy patient have a serious form of vertigo? JAMA. 1994;271(5):385-388.

UPDATED LITERATURE SEARCH The focus of the original Rational Clinical Examination article and this update is on the vestibular disorders characterized by true vertigo. True vertigo creates a sensation of rotation. Although the initial publication approached vertigo from a general perspective, we sought to find updated information on the diagnosis of benign positional vertigo, the most common cause of vertiginous symptoms. We used the search terms “vertigo/di,” “exp dizziness,” and the text words “$Hallpike,” “Eply,” or “benign positional vertigo” to identify English-language articles on vertigo in adults, published between 1993 and November 2004. After excluding case reports, letters, and general reviews, we were left with 154 articles. These were searched to identify studies using prospective data collection and that reported the sensitivity, specificity, or predictive values of clinical findings in patients who presented to their physician with complaints of dizziness. A systematic review1 evaluated the distribution of diagnoses among patients with dizziness. A second general systematic review2 without any quantitative formal research question provides a useful reference list for clinical descriptions of the common causes of vertigo. We found 1 additional article that prospectively evaluated patients in a

clinical population, using a patient questionnaire for diagnosing vertigo.

NEW FINDINGS • The response to the Dix-Hallpike maneuver serves as a reasonable reference standard for benign positional vertigo because it identifies patients who will respond to canalith repositioning maneuvers. • Hearing loss, part of the examination of the dizzy patient, has been reviewed in The Rational Clinical Examination series and can be assessed with the whispered voice test.3

Details of the Update Patients with dizziness may have a variety of disorders so that diagnosing benign positional vertigo requires an understanding of its overall incidence in relation to other etiologies. Peripheral vestibular disorders are the most common causes for dizziness (about 40% of patients with dizziness), of which benign positional vertigo and vestibular neuronitis are the most frequent diagnoses. Retrospective studies tend to find a higher incidence of benign positional vertigo than those that enroll dizzy patients prospectively. Clinicians (and patients) may be overly concerned with brain tumors when there is a new symptom of vertigo, but the likelihood that a dizzy patient without hearing loss will have a cerebellopontine angle mass responsible for the symptoms is low (probability, 1 × 10–4).4 Among patients with dizziness associated with asymmetric hearing loss, a clinician would need to perform 638 scans to detect 1 cerebellopontine angle mass (compared with 9307 scans for dizzy patients without hearing loss). Thus, the approach to clinical diagnosis should more appropriately focus on attempts to rule in less serious causes of vertigo (eg, benign positional vertigo), rather than an initial effort to rule out serious causes such as tumors. We found a systematic review1 that identified 2 retrospective studies suggesting that the clinical history alone allows proper diagnosis of 69% to 76% of dizzy patients. We also found a prospective study in a small group of patients referred to an otolaryngologist where patient history was collected through a questionnaire.5 The questionnaire directs the clinician to the more common causes of vertigo and would have allowed correct 715

CHAPTER 53

Update

IMPROVEMENTS IN THE DATA PRESENTED IN THE ORIGINAL PUBLICATION

Box 53-1 Establish the Initial Diagnosis After Understanding the Patient’s History

Patient Symptoms

Initial Diagnosis

No hearing loss + episodic vertigo

Benign positional vertigo

No hearing loss + persistent vertigo

Vestibular neuronitis

Hearing loss + episodic vertigo

Meniere disease

Hearing loss + persistent vertigo

Labyrinthitis

A systematic review provides a useful taxonomy for patients with disorders creating dizziness, improving the information provided in Table 53-1 of the original article (Figure 53-2).1 The vestibular disorders are further sorted by those that represent peripheral vestibular problems (“less serious” in terms of the underlying etiology, though often creating a significant problem with activities of daily living) vs central vestibular disorders (Figure 53-3).

CHANGES IN THE REFERENCE STANDARD

categorization of 61% of the patients with true vertigo according to whether they had episodic (1 week) and hearing loss or no hearing loss. See Box 53-1. The questionnaire requires validation in a much larger population of patients and in different clinical settings (emergency departments and primary care clinics) because the patient may not belong clearly in one category, requiring clinical judgment. However, the questions do provide a reasonable paradigm for the initial line of questioning for the vertiginous patient. Once the medical history is obtained, perhaps narrowing the diagnosis to the most likely causes, specialists use a variety of clinical maneuvers. The maneuvers assess the vestibuloocular reflex through the nystagmus response to a head thrust, through fixation suppression, after a headshake, through caloric testing, or through visual acuity during head shaking.6 Unfortunately, the maneuvers have not been assessed in primary care clinics or emergency departments to evaluate whether they add information to the Dix-Hallpike during a patient’s initial presentation for care and before referral.

The diagnosis of vestibular disorders relies on the direct observation of eye movements during positional testing in a patient with no focal neurologic findings or central nervous system disease. The clinical definition of benign positional vertigo that requires a positive Dix-Hallpike maneuver result is supported by a meta-analysis of randomized trials of canalith repositioning procedures.7 The randomized trials demonstrated that, within 1 month of treatment, patients with a positive Dix-Hallpike maneuver result benefit from the repositioning procedures with symptom resolution (number needed to treat = 3). Furthermore, the positive Dix-Hallpike maneuver result returns to normal at a rate similar to that of the symptom improvement.

RESULTS OF LITERATURE REVIEW The Dix-Hallpike maneuver can be done in most patients, but some cannot tolerate it. A small study of patients with benign positional vertigo showed that the maneuver could be performed with a different motion by having the patient lie down on his or her side.8 The examiner supports the head while the patient looks to the left at a 45-degree angle and rapidly lies down on the right side. The maneuver is repeated with the patient looking to the right and rapidly going from the sitting position to lying on the left side. The patient should cross the

Dizziness

Vestibular (~50%) True vertigo Rotational sensation

Psychiatric (~8%) Lightheadedness Anxiety or depression

Peripheral (~40% all dizziness) Affects inner ear and cranial nerve VIII

Presyncope (~9%) Impending faint

Central nervous system (~10% all dizziness)

Figure 53-2 Dizziness Taxonomy “Other” includes drug toxicity, substance abuse, and a variety of medical illnesses.

a

716

Dysequilibrium (~3%) Unsteady when walking No dizziness when sitting or lying down

Othera and undiagnosed (~30%)

CHAPTER 53 arms to prevent inadvertently stopping the motion as the physician helps with the maneuver. The agreement with the DixHallpike maneuver is moderate (κ = 0.60; 95% confidence interval, 0.32-0.89). However, patients with back or neck problems may not be able to perform the side-lying maneuver any easier than the Dix-Hallpike maneuver.9 A partial list of the absolute contraindications to either maneuver includes a history of neck surgery, severe rheumatoid arthritis, cervical myelopathy, cervical radiculopathy, carotid syncope, neck trauma, or vascular diseases of the neck.

EVIDENCE FROM GUIDELINES No federal guidelines address the systematic evaluation of dizzy patients.

CLINICAL SCENARIO—RESOLUTION The patient’s clinical history is informative. He almost certainly has benign positional vertigo or vestibular neuronitis related to his previous viral infection. A Dix-Hallpike maneuver result would likely be positive. No additional laboratory studies or radiologic imaging is necessary with this initial presentation of true vertigo.

Vertigo

True Vertigo Rotational sensation

Central vestibular disorder

Peripheral vestibular disorder Affects inner ear and cranial nerve VIII

Cerebrovascular disease Vertical nystagmus Neurologic examination findings

Benign positional vertigo Brief, intense Associated with changing head position

Tumors Acoustic neuroma Unilateral hearing loss

Vestibular neuronitis No hearing loss Severe episodes associated with nausea Lasting hours to days

Central nervous system disorder Multiple sclerosis Migraine

Labyrinthitis Hearing loss Severe episodes associated with nausea Lasting hours to days Meniere disease Episodic, lasting hours Hearing loss, tinnitus, ear fullness Other causes Associated with illness or other disorders Idiopathic

Figure 53-3 Vestibular Disorders

VERTIGO—MAKE THE DIAGNOSIS

PRIOR PROBABILITY

REFERENCE STANDARD TESTS

Once the medical history confirms vertigo in a patient with dizziness, most affected patients will have a peripheral vestibular disorder (40%). The prior probability of benign positional vertigo among dizzy patients is 10%.

The diagnosis requires direct observation of eye movements during positional testing in a patient with no focal neurologic findings or central nervous system disease. Prospective clinical studies might put more weight on the observations by a specialist, but no comparison studies between generalist physicians and specialist physicians have evaluated the accuracy of generalist clinicians.

POPULATION FOR WHOM VERTIGO SHOULD BE CONSIDERED • Benign positional vertigo should be considered only in patients who volunteer that they have dizziness symptoms.

DETECTING THE LIKELIHOOD OF VERTIGO The medical history identifies the patient with true vertigo, whereas the clinical examination results identify patients with benign positional vertigo. The responses to the maneuvers are not screening tests with an associated sensitivity and specificity because they define the diagnosis of benign positional vertigo.

717

CHAPTER 53

Update

REFERENCES FOR THE UPDATE 1. Hoffman RM, Einstdter D, Kroenke K. Evaluating dizziness. Am J Med. 1999;107(5):468-478.a 2. Hanley K, O’Dowd T, Considine N. A systematic review of vertigo in primary care. Br J Gen Pract. 2001;51(469):666-671. 3. Bagai A, Thavendiranathan P, Detsky AS. Does this patient have hearing impairment? JAMA. 2006;295(4):416-428. 4. Gizzi M, Riley E, Molinari S. The diagnostic value of imaging the patient with dizziness. Arch Neurol. 1996;53(12):1299-1304. 5. Kentala E, Rauch SD. A practical assessment algorithm of diagnosis of dizziness. Otolaryngol Head Neck Surg. 2003;128(1):54-59.a 6. Goebel JA. The ten-minute examination of the dizzy patient. Semin Neurol. 2001;21(4):391-398.

718

7. Woodworth BA, Gillespie MB, Lambert PR. The canalith repositioning procedure for benign positional vertigo: a meta-analysis. Laryngoscope. 2004;114(7):1143-1146. 8. Cohen HS. Side-lying as an alternative to the Dix-Hallpike test of the posterior canal. Otol Neurotol. 2004;25(2):130-134. 9. Humphriss RL, Baguley DM, Sparkes V, Peerman SE, Moffat DA. Contraindications to the Dix-Hallpike manoeuvere: a multidisciplinary review. Int J Audiol. 2003;42(3):166-173. a

For the Evidence to Support the Update for this topic, see http://www.JAMAevidence.com.

EVIDENCE TO SUPPORT THE UPDATE: Vertigo

TITLE Evaluating Dizziness. AUTHORS Hoffman RM, Einstadter D, Kroenke K. CITATION Am J Med. 1999;107(5):468-478. QUESTION What are the frequencies of various causes of dizziness? DESIGN Formal systematic review without meta-analysis.

53

Table 53-5 Frequency of Various Causes of Dizziness Disorder (n = 7 Studies) Peripheral vestibular disorder Central vestibular disorder Presyncope Psychiatric Dysequilibrium

Prevalence (95% CI) 0.40 (0.27-0.54) 0.09 (0.06-0.13) 0.09 (0.06-0.13) 0.08 (0.05-0.12) 0.03 (0.001-0.10)

Abbreviation: CI, confidence interval.

DATA SOURCE MEDLINE database. STUDY SELECTION AND ASSESSMENT The authors identified studies of adults with dizziness, published in English between 1966 and 1996, indexed with the following search terms: “dizziness” and “vertigo” with “vestibular function tests,” “electronystagmography,” “calorics,” “nystagmus,” “Barany,” “Hallpike,” “caloric testing,” and “brainstem auditory evoked responses.” An initial 1755 references were identified and then filtered down to 229 references that met the initial criteria; an additional 44 articles were retrieved from the reference lists. The review was based on 12 etiology studies, 16 prognosis studies, and 38 studies of diagnostic tests. The studies of etiology used a variety of diagnostic tests. Each article was reviewed by 2 investigators; disagreements were resolved by a third person.

benign peripheral vertigo) was present in 16% (median), though the range was 7% to 44%. The authors did not conduct a meta-analysis of any results. However, the sample size and frequency of disorders are presented for each etiology study. The data in Table 53-5 represent the prevalence of each disorder for the studies that were done with prospective data collection. The settings for these prospective data were primary care clinics (n = 2 studies, 240 patients), neurology clinics (n = 2 studies, 217 patients), emergency departments (n = 2 studies, 218 patients), or a dizziness clinic (n = 1 study, 104 patients). Approximately 10% of all dizzy patients had benign positional vertigo, whereas 11% had vestibular neuronitis. The frequency of other causes of true vertigo, iatrogenic causes, and undiagnosed dizzy patients is high and approximately 25% to 30%.

MAIN RESULTS

CONCLUSIONS

The clinical setting, study design, sample size, age and sex of patients, symptom duration, and diagnostic tests used were reported for the 12 etiology studies. Quality scores were not reported. The authors provide a framework for the taxonomy of the dizzy patient (see Figures 53-2 and 53-3). The authors report that the medical history and physical examination led to a probable diagnosis for dizziness in about 75% of patients, but the details of this assessment are not provided. According to 2 retrospective studies, the investigators found that the diagnoses could be based on the history alone in 69% to 76% of patients. Among all patients with dizziness, the Dix-Hallpike maneuver (suggesting

LEVEL OF EVIDENCE Systematic review. STRENGTHS The systematic review included data from pri-

mary care clinics, emergency departments, neurology clinics, and specialized dizziness clinics. The sample sizes across these clinics were well balanced, representing a typical spectrum of dizzy patients. LIMITATIONS No quality scores or formal methodologic assessments were reported, though the study design (retrospective vs prospective) is reported. The review required that studies have a reference standard for diagnostic tests, but the reference standard that was used is not reported. The authors

E53-1

CHAPTER 53

Evidence to Support the Update

acknowledge that there is no objective reference standard for most causes of dizziness. This systematic review provides a useful taxonomy for the dizzy patient. By combining the estimates for the prospective studies only, we find that about 50% of dizzy patients had vestibular disorders. This is compatible with the frequency reported in nonsystematic reviews. Peripheral vestibular disorders include patients with benign positional vertigo, vestibular neuronitis, Meniere disease, and true vertigo of unknown cause. About 10% of dizzy patients will have benign positional vertigo, and a similar number will have vestibular neuronitis. Reviewed by David L. Simel, MD, MHS

TITLE A Practical Assessment Algorithm for Diagnosis of Dizziness. AUTHORS Kentala E, Rauch SD. CITATION Otolaryngol Head Neck Surg. 2003;128(1): 54-59. QUESTION Does a simple questionnaire do as well as a clinician for diagnosing the cause of vertigo? DESIGN Prospective, nonconsecutive patients. SETTING Otolaryngology clinic with a specialist in vertigo. PATIENTS Fifty-seven patients (42 women and 15 men) referred for dizziness.

MAIN OUTCOME MEASURES For patients with true vertigo, the clinician’s diagnosis was compared with the patient’s questionnaire, categorized as shown in Box 53-1.

MAIN RESULTS A total of 35 of the 57 patients had true vertigo. The questionnaire alone would have allowed correct categorization of 61% of the patients with true vertigo according to whether they had episodic (1 week) and hearing loss or no hearing loss.

CONCLUSIONS LEVEL OF EVIDENCE Level 4. STRENGTHS Simplified approach to recording the patient

medical history. LIMITATIONS Although the clinician did not have the questionnaire answers, the clinician developed the questionnaire and was thus aware of the study hypotheses. This incorporation bias may have made the questionnaire appear to work better than it would once generalized to other settings. The questionnaire requires evaluation in a primary care and emergency department setting. The details of the clinical examination and other tests are not provided. The sample size is small. Although the overall quality of the study means that the results cannot be applied with confidence, the questionnaire does provide a reasonable paradigm for the initial line of questioning the vertiginous patient.

Reviewed by David L. Simel, MD, MHS

DESCRIPTION OF TESTS AND DIAGNOSTIC STANDARD The patient-completed questionnaire followed the paradigm of categorizing dizzy patients presented in the original Rational Clinical Examination article on vertigo.1 The questionnaire involves first asking about the presence of self-assessed hearing loss and vertigo (defined for the patient as “false sense of motion, floating, bobbing, swaying, rocking, tilting, or spinning”). The patients with true vertigo assessed the duration of episodes as episodic (1 week). The questionnaire also asked single questions to assess for (1) dysequilibrium (“Do you have a sense of being off balance, tipsy, wobbly, feeling you might fall?”); (2) presyncope (“Do you have a feeling you might faint, black out, or lose consciousness?”); or (3) psychiatric diagnosis (“Do you feel disconnected or distanced from the world around you, feel panicky, or have tingling about the mouth or hands?”). The otolaryngologist, blinded to the patient’s self-assessed questionnaire results, diagnosed the patient according to the medical history elicited, clinical examination results, and results from audiometric and otoneurologic tests. The specific tests and maneuvers were not reported. E53-2

REFERENCE FOR THE EVIDENCE 1. Froehling DA, Silverstein MD, Mohr DN, Beatty CW. Does this patient have a serious form of vertigo? JAMA. 1994;271(5):385-388.

INDEX Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

A AAA. See abdominal aortic aneurysm AAP. See American Academy of Pediatrics ABCD(E) criteria for melanoma, 392, 393t abdominal aortic aneurysm (AAA), 17-22, 25-27 abdominal palpation for, 17, 18 asymptomatic AAA, 19-20, 21t factors, affecting, 20, 21t method, 20, 22 ruptured AAA, 19 evidence from guidelines, 26 findings for, 25-26 univariate, 26 likelihood ratio, 27t literature research, 25 methods, 18-19 original publication data, improvements in, 26 physical diagnosis of, 17-18 importance of, 18 prior probability, 27 pulsatile mass of, 18 reference standard, changes in, 26, 27 abdominal auscultation, for bruits accuracy of in renovascular hypertension, 31 areas of, 30 precision of, 31 abdominal bruits, 35-37 anatomic and physiologic origin of, 29 auscultatory characteristics of, 32 evidence from guidelines, 36 examination for, 30-31 findings of, 35 literature research, 35 nonrenovascular causes of, 31t original publication data, improvements in, 36 presence of, 31-32 prevalence of, 29-30 prognosis of, 32 reference standard, changes in, 36

abdominal palpation, 25. See also palpation for abdominal aortic aneurysm, 17, 18 asymptomatic AAA, 19-20, 21t factors, affecting, 20, 21t method, 20, 22 ruptured abdominal aortic aneurysm, 19 abdominojugular reflux sensitivity, specificity, or likelihood ratio left ventricular dysfunction, 213t venous waveforms in, 126t central venous pressure assessment, 134t abdominojugular reflux test, 128, 134 abduction, of thumb testing, 113 abduction stress test, 361 abnormal monofilament testing, 112t abnormal vibratory sensation, 112t accuracy of clinical examination, 1, 9 characteristics, 4-5 confidence interval, 12 “good” symptom or sign, 11-12 likelihood ratio, 9-11 meta-analysis, 12-13 pretest probability, 11 “sensitivity-only” studies, 13 ACE. See angiotensin-converting enzyme acetaminophen, 247, 343, 357 acetylcholine receptor (AChR), 449, 450 acetylcholine receptor antibody-positive myasthenia gravis, 450f test for, 450 Achilles tendon reflex. See reflexes AChR. See acetylcholine receptor ACI. See acute cardiac ischemia ACI-TIPI. See Acute Cardiac Ischemia Time-Insensitive Predictive Instrument ACL. See anterior cruciate ligament acoustic neuromas, 711 acoustic reflectometry, 495 action tremor, 506, 507 active compression test for labral tears, 586t

Copyright © 2009 by the American Medical Association. Click here for terms of use.

acute blood loss physical signs, accuracy of, 319-320 acute cardiac ischemia (ACI), 475 multivariate findings for, 473-474 Acute Cardiac Ischemia TimeInsensitive Predictive Instrument (ACI-TIPI), 475, 476 acute chest pain, diagnosis of, 462-463, 463f acute cholecystitis, 137-143, 145-147, 561 definition, 137-138 diagnostic imaging, accuracy of, 138 findings of, 145 guidelines, evidence from, 146 likelihood ratio, 147 literature review, results of, 146 literature search, 145 methods, 138-139 original publication data, improvements in, 146 prior probability, 147 reference standard, changes in, 146, 147 results, 139-142 signs and symptoms, 138 accuracy of, 141 precision of, 140-141 acute otitis media (AOM), in children, 493-499, 501-503 anatomic/physiologic origins, 494 definition of, 494 findings of, 501 guidelines, evidence from, 502 improvement of, 498-499 likelihood ratio of, 503t literature review, results of multivariate findings for, 502, 502t univariate findings for, 502 literature search, 501 and otitis media with effusion, distinguishing between, 493 prior probability, 503 reference standard, changes in, 503 search strategy and quality review, 495-496 symptoms and signs accuracy of, 496-498 719

INDEX acute otitis media (AOM), in children (Continued) elicitation of, 494-495 precision of, 496 acute respiratory illness, 539. See also pneumonia adenoviruses, 344 aerobic exercise, 306 age, 152 as an indicator of prevalence for bruits, 30f perimenopause, 408f postmenopause, 408f sensitivity, specificity, or likelihood ratio in back pain, 76t in breast cancer, 101t in carpal tunnel syndrome, 115t in obstructive airways disease, 161t in osteoporosis, 491t in perimenopause, 416t in temporal arteritis, 654 Agency for Health Care Policy and Research, 47 Agency for Healthcare Research and Quality (AHRQ), 494, 498, 502 agreement calculation of, 8f agreement beyond chance, 3, 4, 15, 104, 294, 333, 617. See also agreement AHRQ. See Agency for Healthcare Research and Quality airflow limitation, 149-156 clinical examination for, 155 accuracy of, 155 measures of, 151-152, 153, 155 pathophysiologic characteristics of, 150-151 signs of accuracy of, 153-155 medical history, 151 physical examination, 151-152 precision of, 153 symptoms of medical history, 151 accuracy of, 152-153 precision of, 152 physical examination, 151-152 airflow obstruction, 153 alcohol abuse, 1-2, 39, 249. See also alcoholism CAGE questionnaire for, 2-3, 4-5, 7, 8t diagnostic standards for, 39-41 diagnostic tests of, 41-42 at-risk drinking, problems of in pregnant women, 44-45 720

AUDIT questionnaire, 41, 42, 43t, 43-44 biochemical and hematologic tests, 41-42 CAGE questionnaire, 41, 42, 43t, 44 MAST questionnaire, 41, 42, 42t, 44 alcohol dependency. See alcohol abuse alcohol drinking, problems of, 47-52 evidence from guidelines, 49 findings of, 47 literature research, 47 original publication data, improvements in, 48 prior probability, 50 reference standard, changes in, 48, 50 alcohol screening instruments. See AUDIT questionnaire; AUDIT-C questionnaire; CAGE questionnaire; MAST questionnaire; T-ACE questionnaire; TWEAK questionnaire web resources for, 49 Alcohol Use Disorders Identification Test (AUDIT), 41 accuracy of, 43t, 43-44 reliability of, 42 questionnaire, 47, 49, 51t sensitivity, specificity, or likelihood ratio for alcohol abuse, 50t Alcohol Use Disorders Identification Test, Consumption Questions (AUDIT-C), 49, 51t sensitivity, specificity, or likelihood ratio for alcohol abuse, 50t alcohol withdrawal syndromes, 5 alcoholism, 2-3. See also alcohol abuse algorithm-driven analyses for appendicitis, 54 alignment, of knee, 361 Alvarado clinical decision rule (MANTRELS), 61, 62, 62t, 63t. See also clinical prediction rules and scores sensitivity, specificity, or likelihood ratio in appendicitis, 63t Alzheimer disease, 509 ambulatory carotid bruit, 104-105 amenorrhea, 552, 554 American Academy of Family Practice, 502

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

American Academy of Neurology, 634 American Academy of Pediatrics (AAP), 330, 498, 502 American College of Cardiology, 211, 430, 445 American Heart Association, 302, 430, 445 American Medical Association, 267 American Psychiatric Association, 40 American Society of Clinical Oncologists policy statement, on genetic testing, 266 American Thoracic Society, 155 Amoss sign, 396 amoxicillin, 515, 516, 517, 523 amphetamine withdrawal, 249 ampicillin, 515, 516, 517 Amsel criteria, 694t sensitivity, specificity, or likelihood ratio for vaginitis, 707t, 694t anabolic steroids, 249 anatomic origin, of abdominal bruit, 29 aneroid instruments, 305, 306 angina pectoris, 462 grading of, 462t unstable, 462 angiotensin-converting enzyme (ACE) inhibitor, 35, 133, 179, 183 anhedonia, 249 ankle dorsiflexion. See strength testing ankle dorsiflexor, 78 ankle edema history of, and ascites, relationship between, 5-6 sensitivity, specificity, or likelihood ratio ascites, 6, 6f, 7, 7f, 68t, 73t ankle plantar flexion, 79 ankle reflexes, 78 ankle swelling. See ankle edema ankylosing spondylitis, 77 anorexia sensitivity, specificity, or likelihood ratio in acute cholecystitis, 140t in appendicitis, adult, 57t in temporal arteritis, 648t anterior apprehension test for labral tears, 586t anterior cruciate ligament (ACL), 358, 359, 361, 362 physical examination accuracy of, 363t maneuvers, 363t

INDEX anterior drawer test, 360f, 361 sensitivity, specificity, or likelihood ratio knee ligament and meniscus injury, 359f, 370t anterior Q waves, 187, 188 anterior release test for shoulder instability, 581f, 585t, 589, 590 sensitivity, specificity, or likelihood ratio for shoulder instability or labral tear, 585t, 591t anterior slide test for labral tears, 586t anticholinesterase test sensitivity, specificity, or likelihood ratio myasthenia gravis, 460t anticholinesterase tests, for myasthenia gravis, 451-453, 455 anticoagulant therapy, 235, 561, 562 antidepressants, 247, 249 antihypertensive therapy, 174 antimicrobial therapy, 395 AOM. See acute otitis media aortic regurgitation (AR), 419, 430t anatomic and physiologic origins of, 419 cardiac auscultation, 420-421 causes, 420t evidence from guidelines, 430 examination, 419-420, 425 features, 421f findings of, 429 likelihood ratios, 430t of physical examination, 429t literature review, 429-430 literature search, 429 maneuvers, 421 original publication data, improvements in, 429 in patients with renal failure, 425 peripheral hemodynamic signs, 421422 physical examination accuracy of, 423 precision of, 422-423 physical examination signs in diagnosis, 430 prior probability, 430 reference standard, changes in, 429, 430 aortic stenosis (AS), 437, 445, 446 physical examination accuracy of, 438t, 445t likelihood ratio, 446t apical impulse, 187, 190

Apley compression test, 360f, 361 sensitivity, specificity, or likelihood ratio in knee ligament and meniscus injury, 360f, 364t appendectomy delayed, 62 unneeded, 62 appendiceal anatomy, of appendicitis, 54 appendicitis, 53-58, 61-63 appendiceal anatomy of, 54 diagnostic modalities, accuracy of, 54 findings of, 61-62 likelihood ratio, 63 literature research, 61 original publication data, improvements in, 62 pathophysiology of, 54 prior probability, 63 reference standard, changes in, 62, 63 symptoms and signs, 54-55 accuracy of, 56-57 precision of, 55 apprehension test for shoulder instability, 581f, 583, 585t sensitivity, specificity, or likelihood ratio for shoulder instability, 585t for labral (shoulder) tear, 586t AR. See aortic regurgitation arm drift sensitivity, specificity, or likelihood ratio in stroke, 641t arm span–height difference test for occult vertebral fracture, 479, 482483 arrhythmias, 304 arterial blood gas analysis, 561 arterial bruit, compared to venous hum, 292t AS. See aortic stenosis ascites, 1, 2, 65, 71 and ankle swelling, relationship between, 5-6 evidence from guidelines, 72 example, 65 findings of, 71 history and symptoms, accuracy of, 67, 68t information, 71-72 literature search, 71 original publication data, improvements in, 71 pathophysiology of, 66 physical examination, 66

likelihood ratios for, 69t sensitivity and specificity of, 68t physical signs, 72t prior probability, 73 reference standard, changes in, 66, 71, 73 signs, 66-67, 73t accuracy of, 68-69 precision of, 67-68 symptoms, 66-67, 71t, 73t aspirin, 183 asthma, 150, 152, 166, 202, 203, 205, 206, 549 asymmetric skin lesion. See ABCD(E) criteria asymptomatic abdominal aortic aneurysm abdominal palpation for, 19-20, 21t asystole, 452 atherosclerosis abdominal bruit in, 30, 32 atrial fibrillation, 304 at-risk drinking, 47 problems of in pregnant women, 44-45 atypical (dysplastic) nevi, 384 AUDIT. See Alcohol Use Disorders Identification Test AUDIT-C. See Alcohol Use Disorders Identification Test, Consumption Questions auscultation, 528 for airflow limitation, 151, 153, 154155 auscultatory characteristics, of abdominal bruits, 32 auscultatory percussion, 66 sensitivity, specificity, or likelihood ratio for ascites, 72t autoimmune disorders, 249 aztreonam, 517

B bacampicillin, 517 bacterial pneumonia, 540 balloon angioplasty, 35 barium enemas for appendicitis, 54 barrel chest sign, 153-154 sensitivity, specificity, or likelihood ratio in obstructive airways disease, 154t BCDDP. See Breast Cancer Detection Demonstration Project BDI. See Beck Depression Inventory

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

721

INDEX Beck Depression Inventory (BDI), 250, 251. See also clinical prediction rules and scores bedside ultrasonography testing, for acute cholecystitis, 145, 146 benign moles, 383 benign paroxysmal positional vertigo, 711, 712 benzyl penicilloyl, 520 benzylpenicillin, 517 β-blocker, 133, 506 β-hemolytic streptococcal pharyngitis, 615, 616, 617, 624, 625 likelihood of, 625 β-lactam antibiotics penicillin, cross-reactivity with, 518 bias incorporation, 498 reverse workup, 520 spectrum, 141, 497 in thyroid size estimation, 282 verification, 16, 138, 141, 498, 582, 589 biceps load test I for labral tears, 586t, 590 biceps load test II for labral tears, 582f, 590 biceps load tests for labral tears, 591t biochemical and hematologic tests, 41-42 blood pressure (BP). See also hypertension classification of, 302t, 311t diastolic, 301, 302 errors in measurement, 303t-304t indirect vs direct blood pressure, 304-305 technical inaccuracies of, 305 measurement, 302t office factors, affecting, 303-304t vs usual blood pressure, 305 systolic, 301, 302 blunt abdominal trauma, 72 BMAST. See Brief Michigan Alcoholism Screening Test BMD. See bone mineral density BNP. See brain natriuretic peptide Boas sign, 138 bone mineral density (BMD), 477, 478, 484-485 BP. See blood pressure brachial-popliteal pulse gradient (Hill sign), 422 722

brachioradial delay for aortic stenosis, 438t bradycardia, 320, 452 bradykinesia, 507 maneuvers, detecting, 507f sensitivity, specificity, or likelihood ratio in Parkinson disease, 507, 514t bradykinin, 616 brain natriuretic peptide (BNP), 196, 197, 203, 209 accuracy of, 201-202 sensitivity, specificity, or likelihood ratio left ventricular dysfunction, 213t Brain Resuscitation Clinical Trials (BRCTs), 220-221 BRCTs. See Brain Resuscitation Clinical Trials breast cancer, 87, 99, 265, 266, 267, 268, 272. See also clinical breast examination evidence from guidelines, 100 findings of, 99-100 incidence of, 88t literature search, 99 pathophysiology of, 99 prior probability, 101 reference standard tests, 101 risk factors for, 88 Breast Cancer Detection Demonstration Project (BCDDP), 88, 92 Breast–ovarian cancer syndrome, 275 breast tenderness, 551, 552 Breathing Not Properly Multinational Study, 202 breath sounds sensitivity, specificity, or likelihood ratio pneumonia, adult, 531t, 536t Breslow-Day test, 554 Brief Michigan Alcoholism Screening Test (BMAST), 41, 44 British Hypertension Society, 312 British Thoracic Society, 574 bronchial lavage, 540 bronchiolitis, 540, 541f bronchitis, 527 bronchoconstriction, 452 bronchodilators, 149 Brudzinski signs, 396, 399 sensitivity, specificity, or likelihood ratio in meningitis, adult, 404t bruit abdominal, 29-32, 35-37 carotid, 103-106 periumbilical, 30

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

systolic, 35, 36 systolic-diastolic abdominal, 30, 31, 32 sensitivity, specificity, or likelihood ratio of abdominal, in renovascular hypertension, 31t, 37t of carotid, in carotid stenosis, 109f, 110t vascular, 29 bulbar weakness, 451 bulging flanks, 66 sensitivity, specificity, or likelihood ratio ascites, 69t

C C rating, 18. See also evidence, level of CAGE questionnaire. See also cut down, annoyed by criticism, guilty about drinking, eye-opener drinks sensitivity, specificity, or likelihood ratio for alcohol abuse, 5f, 41t calcium-channel blocker, 183 calf vein thrombosis, 227 Canadian Cardiovascular Society, 462 Canadian class II angina, 183 Canadian Hypertension Education Program, 312 Canadian National Breast Screening Study (NBSS), 88 Canadian Preventive Health Services Task Force, 490 Canadian Task Force, 99, 109 on the Periodic Health Examination, 18 on Preventive Health Care, 49, 261, 393 for malignant melanoma, 393 cancer, 76, 86 family history of, 265-272, 275-276 accuracy of, 268-270 data collection, improvement in, 270-272 elicitation of, 266-267 false-negative reports, reasons for, 270 false-positive reports, reasons for, 270 findings of, 275 guidelines, evidence from, 275 information, collection of, 270-272 likelihood ratio, 276

INDEX literature review, results of, 275 literature search, 275 methods, 267-268 precision of, 268 prevalence of, 266 prior probability, 276 reference standard, changes in, 275, 276 candidiasis, 706t, 707t, 694t , 697t sensitivity, specificity, or likelihood ratio in pharyngitis, 616t in vaginitis, 707t, 694t capillary refill time, 318, 331 sensitivity, specificity, or likelihood ratio in hypovolemia, adult, 327t in hypovolemia, child, 341t carbapenems, 515, 518, 524 cardiac arrest, 215 comatose survivors of, 215-223, 225-226 literature search, 225 prior probability, 226 reference standard tests, 226 ventricular fibrillation, 215 cardiac bradyarrhythmias, 452 cardiac dullness. See percussion cardiac ischemic chest pain, 462, 463. See also chest pain; myocardial infarction carotid arterial pulse and jugular venous pulse, distinguishing between, 127 carotid bruit, 103-106, 110t ambulatory bruit, 104-105 auscultation, precision of, 104 carotid artery cause, in neck, 103-104 clinical significance, 103 evidence from guidelines, 109 findings of asymptomatic patients, 107 symptomatic patients, 107 likelihood ratio, 108, 109, 110 literature review asymptomatic patients, 108-109 symptomatic patients, 108 literature search, 107 original publication data, improvements in, 107 preoperative bruit, 105 prior probability, 110 reference standard, changes in, 107-108 symptomatic bruit, 105 carotid pulse, for aortic stenosis, 446t carotid volume, for aortic stenosis, 446t

carpal tunnel syndrome (CTS), 111-117 diagnostic standard, 112-113 evidence from guidelines, 123 findings of, 121-122 history and physical examination accuracy of, 114-117 precision of, 114 importance of, 111-112 likelihood ratio, 124 literature review, 123 literature search, 121 methods, 113 normal anatomy of, 112f original publication data, improvements in, 122 prior probability, 124 reference standard, changes in, 122123, 124 sensitivity and specificity of electrodiagnosis, 112 signs, 113 symptoms, 113 case-control study, 88 case-finding, 248 instruments, 250 performance, in primary care settings, 252-253t Castell method, 607, 607f, 608 Castell sign sensitivity, specificity, or likelihood ratio in splenomegaly, 612t cauda equina syndrome, 80 CBE. See clinical breast examination CDC. See Centers for Disease Control and Prevention Center for Epidemiologic Studies Depression Screen (CES-D), 250, 251, 260. See also clinical prediction rules and scores Centers for Disease Control and Prevention (CDC), 267, 330, 355, 356, 405, 524, 706 US Influenza Sentinel Providers Surveillance Network, 344 Centor clinical prediction rule for sore throat, 619, 619f sensitivity, specificity, or likelihood ratio adults, 619f modified for age, 623-624, 623t central venous pressure (CVP), 125130, 133-135 abdominojugular reflux, 134t abnormal, 127 clinical assessment of, 126-128 abdominojugular reflux test, 128 accuracy of, 129-130

carotid arterial pulse and jugular venous pulse, distinguishing between, 127 jugular veins, 130 Kussmaul sign, 128 neck veins, position of, 126-127 precision of, 128-129 estimation of, 127, 128f findings of, 133 guidelines, evidence from, 134 likelihood ratio, 135 literature review, results of, 134 literature search, 133 original publication data, improvements in, 134 prior probability, 135 reference standard, changes in, 134, 135 venous waveforms in, 126t cephalosporins, 515, 516, 517, 524 cerebral infarction Oxfordshire classification of, 634 cerebrospinal fluid (CSF), 395-396 CES-D. See Center for Epidemiologic Studies Depression Screen Chadwick sign, 552, 556, 557 for pregnancy, 556t chance agreement, 3-4 chest hyperresonance, 154 chest pain. See also cardiac ischemic chest pain; myocardial infarction; pain acute, 462-463 cardiac ischemic, 462, 463 mechanism of, 463 chest radiograph, 195, 202, 211, 661 accuracy of, 201 for community-acquired pneumonia, 528, 530, 532 for community-required pneumonia, 535, 536, 537 for pneumonia, 540, 548 for pulmonary embolism, 561 for thoracic aortic dissection accuracy of, 666 sensitivity of, 667t chest retractions. See retractions, chest chest x-ray. See radiographic findings χ2 test, 464, 617 for abdominal aortic aneurysm, 19 2-tailed χ2 test, 529 chlamydia, 541 Chlamydia pneumoniae, 344 cholecystectomy, 39, 138 cholesteatoma, 712 chronic bronchitis, 150, 151, 152

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

723

INDEX chronic obstructive pulmonary disease (COPD), 163, 166, 168, 169, 195, 202, 203, 205, 206, 489, 490 chronic thromboembolic pulmonary hypertension, 235 cigarette smoking, 151 Cincinnati Prehospital Stroke Scale (CPSS), 628, 630, 631 for stroke, 641t cirrhosis, 292 classic essential tremor, 506, 514 clinical agreement, 3-4 clinical assessment accuracy of, 9-16 for airflow limitation, 155 accuracy of, 155 CAGE questionnaire accuracy characteristics of, 4-5 of central venous pressure, 126-128 abdominojugular reflux test, 128 accuracy, 129-130 carotid arterial pulse and jugular venous pulse, distinguishing between, 127 jugular veins, 130 Kussmaul sign, 128 neck veins, position of, 126-127 precision, 128-129 for clubbing accuracy, 168-169 precision, 168 for coma accuracy of, 219 interobserver agreement of, 218t precision of, 219 for congestive heart failure, 195-206 accuracy of, 199, 200t precision of, 199 of deep vein thrombosis, 228-229, 235-236 importance of, 173-174 for internal derangement of knee, 359-361 function, 360-361 inspection, 359 palpation, 359-360 for mitral regurgitation, 438-439 accuracy of, 439t for mitral valve prolapse, 439-440 accuracy of, 440t precision of, 9-16 for spider nevi precision of, 3f for systolic murmurs accuracy of, 436, 437t precision of, 435-436, 436t 724

for thoracic aortic dissection accuracy of, 662t clinical breast examination (CBE), 87. See also breast cancer accuracy, 90-91 bottom line, 91 anatomic basis, 87-88 bottom line priorities for research, 95-96 resolution of scenarios, 95 effectiveness, 88-90 bottom line, 90 examiner factors bottom line, on accuracy, 92 duration, 91 experience, 91-92 techniques, 91 methods, 88 patient factors age, 92 bottom line on accuracy, 92 of suggested approach, 94 breast boundaries, 93 breast characteristics, 92 cancer characteristics, 92 duration, 94 examiner pattern, 93 fingers, 93-94 inspection, 94 issues, 94 normal from abnormal (cancerous) lumps, distinguish, 95 palpation, 92 patient position, 93 techniques, 94-95 precision, 90 bottom line, 90 sensitivity, specificity, or likelihood ratio for breast cancer, 101t with mammography, 91t techniques, 88 test characteristics of, 88, 90 clinical depression, 259-263 case-finding questionnaires for accuracy of, 250-254 characteristics of, 251t clinical interview for accuracy and reliability of, 254, 255 criterion standard diagnosis, 249 data abstraction, 250 definition, 247-248 diagnostic criteria and questions, 248t findings of, 259 guidelines, evidence from, 260-261 literature review, results of, 259-260 literature search, 259

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

original publication data, improvements in, 259 patients, evaluating, 248-249 physical illness, effect of, 254-256 prior probability, 262 reference standard, changes in, 259, 262 screening, web resources for, 261 search strategy and inclusion/ exclusion criteria, 249-250 statistical methods, 250 clinical findings for left-sided heart failure detection of, 186-187, 187t precision of, 189 clinical gestalt. See clinical impression clinical impression sensitivity, specificity, or likelihood ratio for acute cholecystitis, 147t for aortic aneurysm, 24t, 134t, 147t for aortic regurgitation, 429t for central venous pressure, 134t for chronic obstructive airways disease, 159t for hypovolemia, child, general appearance, 341t for left ventricular dysfunction, 188t, 213t for pneumonia, infant and child, 548t for pulmonary embolus, 572t, 575t for valvular heart disease, 446t clinical interview, for depression accuracy and reliability of, 254, 255 clinical prediction guide, for deep vein thrombosis, 230 development of, 230-232 clinical prediction rules and scores ABCD(E) criteria, for melanoma, 392t, 393t Alvarado score, for appendicitis, adult, 63t for aortic stenosis, 438t Beck Depression Inventory, for major depression, 252t Center for Epidemiologic Studies Depression (CES-D), for major depression, 252t Cincinnati Prehospital Stroke Scale, for stroke, 641t for deep vein thrombosis, 235-236, 237-238 Glasgow Coma Scale, in recovery from coma, 220t Malnutrition Screening Tool, for malnutrition, adult, 381t

INDEX for myocardial infarction, 476t, 468469 for osteoporosis in men, 490t for osteoporosis in women, 491t Patient Health Questionnaire (PHQ-9) for depression, 261t for dysthymia, 261t for pneumonia, adult, 536t pneumonia, infant and child, 549t PRIME-MD, for depression, 261t of pulmonary embolism, 564-566, 567f, 572t, 575t accuracy of, 565t, 568t components of, 566 validation of, 566-567 for sinusitis, 603t, 625t for sore throat, 618-620, 619t Centor clinical prediction rule, 619, 619f McIsaac clinical prediction rule, 620, 620f Walsh algorithm, 621f subjective global assessment (SGA), for malnutrition, adult, 376t for temporal arteritis, 654 for urinary tract infection, women, 689t Wells Prediction Rule, for deep vein thrombosis, 246t Wicki model, 566t, 567 closed fist sign, 112t clubbing, 163-169 clinical examination for accuracy of, 168-169 precision of, 168 congenital, 163 data analysis, 166 digital, 163 findings of, 171 guidelines, evidence from, 172 inspection general appearance, 164, 165f nailfold angles, 164, 165f palpation of, 165-166 phalangeal depth ratio, 164-165, 165f Schamroth sign, 165, 165f literature review, results of, 172 literature search, 171 methods, 166 original publication data, improvements in, 171 pathophysiology of, 164 prevalence in associated conditions, 172t prevalence of, 162 reference standard, changes in, 171172, 172t

results quality of evidence, 166 quantitative indices, 166-168 signs of, 164 study characteristics, 166 symptoms of, 164 clunk test for shoulder instability, 585t sensitivity, specificity, or likelihood ratio for shoulder instability, 585t cocaine, 249, 301, 660 cog wheeling, 506. See also rigidity sensitivity, specificity, or likelihood ratio in Parkinsonism, 506 cold, 527, 541f, 593, 596, cold water caloric testing, 217 colon cancer, 265, 267 colors, multiple in a skin lesion. See ABCD(E) criteria coma clinical examination for accuracy of, 219t interobserver agreement of, 218t precision of, 219 hypoxic-ischemic, 216 methods likelihood ratios, 218 search strategy and quality review, 217-218 statistical methods, 218 motor response and brainstem reflexes, 219-221 pathophysiology of, 216 physical examination of, 216-217 postcardiac arrest, 215, 216 search results and quality of evidence, 218-219 combination chemotherapy, 228 community-acquired pneumonia, adult, 527-533, 535-537 diagnosis of clinical history, accuracy of, 530 physical examination findings, accuracy of, 530-532 findings of, 535 guidelines, evidence from, 536 likelihood ratio test for, 537 literature review, results of, 536 literature search, 535 methods data analysis, 529 literature search, 528-529 quality review, of articles, 529 multivariate findings for, 536t original publication data, improvements in, 535

pathophysiology of, 528 prediction of algorithm evaluation, 532-533 prior probability, 537 reference standard, changes in, 535536, 537 symptoms and signs elicitation of, 528 precision of, 529-530 compliance, 173-174. See also noncompliance clinical measures, accuracy of, 175176 Compliance Questionnaire Rheumatology, 180 Comprehensive Meta-Analysis software (version 2.197), 236 compression rotation test for labral tears, 586t compression ultrasonography, 230 computed tomography (CT), 17, 18 for appendicitis, 54 computed tomography (CT) angiography for pulmonary embolism, 571, 572 computed tomography (CT) scanning for acute cholecystitis, 138 chest CT, for community-acquired pneumonia, 536 for paranasal sinuses, 594 computer-guided analyses for appendicitis, 54 computerized genograms, 271 confidence interval, 6, 12 congenital clubbing, 163 congestive heart failure, dyspnea in, 195-206 in emergency department patients brain natriuretic peptide, accuracy of, 201-202, 203 chest radiographs, accuracy of, 201, 202 clinical examination and investigations accuracy of, 199, 200t precision of, 199 clinical gestalt, 199, 202 clinician’s assessment, 204 limitations, 204-205 electrocardiogram, accuracy of, 201, 202 historical items, 199, 202 pathophysiology of, 196 physical examination, 200-201, 202 physiological categories and mechanisms of, 196t pulmonary diseased patients, 202 search strategy, 196-197

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

725

INDEX congestive heart failure, dyspnea in (Continued) statistical methods, 197 study characteristics, 198, 198-199t study quality, assessment of, 197 study selection, 197 symptoms, 199-200, 202 and signs, elicitation of, 196 COPD. See chronic obstructive pulmonary disease Cope’s Early Diagnosis of the Acute Abdomen, 55, 138 Copenhagen Stroke Scale, 634 coronary heart disease, 249 corneal reflex. See reflexes Corrigan water hammer pulse, 422 sensitivity, specificity, or likelihood ratio in aortic regurgitation, 425t coryza, 571, 574 costovertebral angle tenderness sensitivity, specificity, or likelihood ratio urinary tract infection, 680t cough, 149, 151, 152 in infants differential diagnosis of, 540t sensitivity, specificity, or likelihood ratio in influenza, 355t in obstructive airways disease, 152t in otitis media, child, 497t in pneumonia, adult, 530t in streptococcal pharyngitis, 618t Courvoisier sign, 138 cover-uncover test, 451 CPSS. See Cincinnati Prehospital Stroke Scale crank test for labral tears, 586t crossed straight-leg raising sign (CSLR) sign, 78 sensitivity, specificity, or likelihood ratio for disk herniation, 86t CSF. See cerebrospinal fluid CSLR. See crossed straight-leg raising sign CT. See computed tomography CTS. See carpal tunnel syndrome curtain sign, 451 Cushing disease, 249 cut down, annoyed by criticism, guilty about drinking, eye-opener drinks (CAGE questionnaire) accuracy of, 42 predictive, 44 726

questionnaire, 41, 43t, 49, 52t accuracy characteristics of, 4-5 for alcohol abuse or dependency, 2-3, 4-5, 7, 8t reliability of, 42 sensitivity, specificity, or likelihood ratio for alcohol abuse, 5f, 41t CVP. See central venous pressure cyanotic congenital heart disease, 163 cystic fibrosis, 163

D DADS. See Duke Anxiety and Depression Scale DBP. See diastolic blood pressure D-dimer assay, 230, 236, 238-239, 561, 562, 563, 566, 567-568, 569, 571, 572, 573, 575. See also laboratory findings in deep vein thrombosis diagnosis, 239 high-sensitivity, 240, 241t likelihood ratio of, 573t moderate-sensitivity, 239-240 de Musset head bobbing sign, 421 decision analytic model, 352 deep vein thrombosis (DVT), 227-232, 245-246 clinical assessment of, 228-229, 235236 clinical prediction guide, 230 development of, 230-232 clinical prediction rules, 235-236 data extraction, 236 D-dimer testing for, 239 high-sensitivity, 240, 241t moderate-sensitivity, 239-240 diagnosis of, 228, 232 likelihood ratio, 246 objective assessment of, 229-230 original data publication, improvements in, 245 prevalence of, 238f prior probability, 246 reference standard tests, 246 search strategy, 228 statistical analysis, 236-237 study identification, 236 study selection, 236 symptoms and signs, frequency of, 229t ultrasonography testing for, 240-241 dehydration, 315 in children, 329-330

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

anatomic and physiologic origins of, 330 evidence from guidelines, 340 examination signs, precision of, 333-335 findings of, 339-340 laboratory tests, 335 likelihood ratio, 341 limitations, 335-336 literature review, 340 literature search, 339 methods search strategy and quality review, 331-332 statistical analyses, 332-335 original publication data, improvements in, 340 prior probability, 341 reference standard tests, 341 symptoms and signs, 330-331 accuracy of, 333 precision of, 333 Dehydration Assessment Scale, for hypovolemia, child, 334t delayed menses, for pregnancy, 560 depressed mood, perimenopausal, 409 Depression Scale (DEPS), 250, 251 DEPS. See Depression Scale DerSimonian-Laird random-effects method, 509 deviated nasal septum, 594, 595 diabetes mellitus, 249 Diagnostic and Statistical Manual of Mental Disorders (Third Edition Revised) (DSM-III-R), 40, 259 Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition) (DSM-IV), 48, 247, 249, 259 diagnostic odds ratio (DOR), 12, 13 diaphoresis sensitivity, specificity, or likelihood ratio in myocardial infarction, 467t diastolic blood pressure (DBP), 301, 302, 303 diastolic dysfunction, 184 and systolic dysfunction, difference between, 189, 211 diastolic murmur. See aortic regurgitation digital clubbing, 163 diplopia, 455 sensitivity, specificity, or likelihood ratio temporal arteritis, 656

INDEX dipstick urinalysis for urinary tract infection accuracy of, 678 direct blood pressure vs indirect blood pressure, 304-305 diuretic, 179 therapy, 184 Dix-Hallpike maneuver, 710f, 715, 716, 717 dizziness. See also vertigo sensitivity, specificity, or likelihood ratio postural, in hypovolemia, adult, 327t Doppler echocardiography, 430 Doppler effect, 553 DOR. See diagnostic odds ratio drink, 41, 48 dry axilla sensitivity, specificity, or likelihood ratio in hypovolemia, adult, 327t dry mucous membranes sensitivity, specificity, or likelihood ratio in hypovolemia, adult, 327t in hypovolemia, child, 341t DSM-III-R. See Diagnostic and Statistical Manual of Mental Disorders (Third Edition Revised) DSM-IV. See Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition) Duke Anxiety and Depression Scale (DADS), 250, 251 Duroziez double intermittent femoral bruit, 422, 425t sensitivity, specificity, or likelihood ratio in aortic regurgitation, 425t DVT. See deep vein thrombosis dyskinesias, 506 dyspnea, 125, 149, 152, 153, 183, 186, 187, 215, 225 in congestive heart failure, 195-206 sensitivity, specificity, or likelihood ratio in congestive heart failure, 200t in pneumonia, adult, 530t dyspnea on exertion. See dyspnea dysthymia, 248 dysuria sensitivity, specificity, or likelihood ratio in urinary tract infection, women, 689t in vaginitis, 707t

E ear rubbing sensitivity, specificity, or likelihood ratio in otitis media, child, 503t ECG. See electrocardiogram echocardiogram, 209 for left ventricular systolic dysfunction, 210-211, 212 edema, 187, 617 sensitivity, specificity, or likelihood ratio in left ventricular dysfunction, 200t edrophonium chloride, 451-452 edrophonium test, 452, 453t. See also anticholinesterase test effectiveness score, 250, 253 effusion, 359-360 egophony, 528 ejection fraction, detection of, 187-189 electrocardiogram (ECG), 202 accuracy of, 201 left bundle-branch block on, 187 for myocardial infarction accuracy of, 468 precision of, 465-466 for pulmonary embolism, 561-562 sensitivity, specificity, or likelihood ratio for left ventricular dysfunction, 213t for thoracic aortic dissection, 665t for myocardial infarction, 468t ELISA. See enzyme-linked immunosorbent assay ELISA Vidas DD, 573 emphysema, 150 endemic iodine deficiency, 285 endometrial cancer, 265, 267 enhanced ptosis. See curtain sign enlarging skin lesion. See ABCD(E) criteria enzyme-linked immunosorbent assay (ELISA), 230, 239, 572 epiglottitis, 540, 541f erythema, 617 erythrocyte sedimentation rate (ESR), 649 erythromycin, 524 ESR. See erythrocyte sedimentation rate estradiol, perimenopausal, 408, 410 ethmoid sinus, 594, 595, 595f ethnicity sensitivity, specificity, or likelihood ratio in osteoporosis, 491t in perimenopause, 416t

in thoracic aortic dissection, enlarged aorta or wide mediastinum, 673t European Influenza Surveillance Scheme, 344 European Society of Cardiology, 212 European Society of Hypertension, 312 Evaluation du Scanner Spirale dans l’Embolie Pulmonaire study group, 564 evidence, level of, 15t expected agreement. See agreement Extended Wells scoring system, 571, 573 extraocular muscles, asymmetric weakness of, 451 eye movements, in coma, 220t. See also reflexes

F facial paresis sensitivity, specificity, or likelihood ratio in stroke, 641t facial weakness, 451, 455 family history sensitivity, specificity, or likelihood ratio in cancer, 275t in early menopause, 417t female. See sex femoral pistol shot murmur, 422, 425t femur, 358, 359, 360 fever sensitivity, specificity, or likelihood ratio in acute cholecystitis, 140t in appendicitis, adult, 57t in influenza, 356t in meningitis, adult, 404t in otitis media, child, 497t in pneumonia, adult, 536t in pneumonia, infant and child, 550t in streptococcal pharyngitis, 618t in temporal arteritis, 648t in urinary tract infection, women, 679t fibromuscular hyperplasia abdominal bruit in, 30 finger-flicking percussion, 66-67 Fisher exact test, 529 flank dullness, 66, 67. See also percussion flexicurve measurement, 479, 485 flick sign, 112t

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

727

INDEX Flint murmur, 420-421 in aortic regurgitation, 420 fluctuating weakness. See reduced muscle power fluid loss, subjective global assessment, 374-375 fluid wave, 66, 67. See also percussion follicle-stimulating hormone (FSH), perimenopausal, 408, 410, 416, 417t fontanelle, sunken sensitivity, specificity, or likelihood ratio in hypovolemia, child, 334t Food and Drug Administration, 197, 416 forced expiratory time, 151, 159t in obstructive airways disease, 161t Fracture Intervention Trial, 482 fractures, spinal compression, 77 Framingham study, 463 frequent urination sensitivity, specificity, or likelihood ratio in urinary tract infection, women, 689t frontal sinus, 594f, 595, 595f surface palpation for, 596f FSH. See follicle-stimulating hormone

G gag reflex. See reflexes gait sensitivity, specificity, or likelihood ratio in Parkinsonism heel to toe walking, 514t rising from a chair, 514t shuffling, 514t gastroenterologist, 293 gastrointestinal (GI) symptoms, 374 tract hemorrhage, 316 GCS. See Glasgow Coma Scale GDS. See Geriatric Depression Scale General Health Questionnaire (GHQ12), 260 genetic testing family history assessment tools for, 270 policy statement on, 266 Geneva rule, 572, 573 Geriatric Depression Scale (GDS), 250, 251, 260 GHQ-12. See General Health Questionnaire 728

GI. See gastrointestinal girth, increased abdominal sensitivity, specificity, or likelihood ratio in ascites, 68t, 73t glabella tap reflex test, 508, 508f, 510, 513, 514. See also reflexes Glasgow Coma Scale (GCS), 216, 216t, 221. See also clinical prediction rules and scores Glasgow-Pittsburgh Cerebral Performance Categories, 217 in coma, 217 Global Initiative for Chronic Obstructive Lung Disease, 161 glucocorticoids, 249 goiter, 277-282, 285-287 accuracy of, 281-282 anatomic basics of landmarks, 277-278, 278f normal size, 278 examination, 278-279 bias in, 282 false-negative results, 280 false-positive results, 279-280 findings of, 285 guidelines, evidence from, 286 likelihood ratio, 287 literature review, results of, 286 literature search, 285 original publication data, improvements in, 285 precision of interobserver variability, 280-281 intraobserver variability, 281 prior probability, 287 reference standard, changes in, 285-286, 287 size of, 277 Goldman chest pain protocol, 474f, 475 “good” clinical finding, 11-12 Goodell sign, 552 sensitivity, specificity, or likelihood ratio in pregnancy, 556 grades of evidence. See levels of evidence Graves disease, 277 great toe extensor weakness sensitivity, specificity, or likelihood ratio in sciatica, 79t grind test sensitivity, specificity, or likelihood ratio for knee ligament and meniscus injury, 370t

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

grunting sensitivity, specificity, or likelihood ratio in pneumonia, infant and child, 550t guarding, 55 Guide to Clinical Preventive Services, Third Edition, Periodic Updates, 47

H HADS. See Hospital Anxiety and Depression Scale Haemophilus influenzae, 400, 494, 501 hand diagram. See Katz hand diagram hand grip strength test for occult vertebral fracture, 479, 485 harmful drinking, 41, 47 Hawkins grading scheme, 579 hazardous drinking, 41, 47 HCG test. See human chorionic gonadotropin test headache. See pain head-hanging maneuver, vertigo, 712 Health Canada, 344 Health Insurance Plan (HIP) study, 89 HealthSTAR database, 361 hearing loss in benign positional vertigo, 716 in labyrinthitis, 716 in Ménière disease, 716 in vertigo, 716 in vestibular neuronitis, 716 heart failure, 195 ascites, 65 heart sounds sensitivity, specificity, or likelihood ratio S2 (second heart sound) for aortic stenosis, 446t S3 (third heart sound) for aortic regurgitation, 430t for myocardial infarction, 467t for the breathless emergency patient, 200t for ventricular dysfunction, 213t S4 (fourth heart sound) for aortic stenosis, 438t for ventricular dysfunction, 200t heel-to-toe test, 513 Hegar sign, 552, 553f sensitivity, specificity, or likelihood ratio for pregnancy, 556 height loss in osteoporosis, 482, 483t, 490t

INDEX hemagglutination-inhibition method, 555 hematuria, in urinary tract infection, women, 679t hemorrhagic stroke, 635 heparin-induced thrombocytopenia, 235 hepatojugular reflux, 128 hepatomegaly findings of, 299 guidelines, evidence from, 300 likelihood ratio, 300 literature review, results of, 299 literature search, 299 original publication data, improvements in, 299 prior probability, 300 reference standard, changes in, 299, 300 hereditary cancer syndrome, 265, 267 hereditary nonpolyposis colon cancer (HNPCC), 266, 267, 268 herniated disk with radiculopathy in low back pain, 86 herpes zoster oticus, 712 HIP. See Health Insurance Plan study HIV. See human immunodeficiency virus history of penicillin allergy, 525t HNPCC. See hereditary nonpolyposis colon cancer Homans sign, 228 sensitivity, specificity, or likelihood ratio in deep vein thrombosis, 229t home pregnancy test (HPT), 559, 560, 560t accuracy of, 555-556 likelihood ratios of, 559, 560t Hoover sign, 542 Hopkins Symptom Check List (HSCL), 250, 251, 252 hormone replacement therapy (HRT), 411 Hospital Anxiety and Depression Scale (HADS), 259, 260 hot flashes, perimenopausal, 409 Hoyne sign, 396 HPT. See home pregnancy test HRT. See hormone replacement therapy HSCL. See Hopkins Symptom Check List human chorionic gonadotropin (HCG) test, 553, 554, 555, 557 human immunodeficiency virus (HIV), 611

humped back, in osteoporosis, 481t hydrochlorothiazide, 316 hypalgesia, 112t hypernatremia, 316 hypertension, 183, 301-307. See also blood pressure classification of, 311 diagnosis of guidelines for, 301-302 potential improvements in, 306307 findings of, 311 guidelines, evidence from, 312-313 likelihood ratio, 313 literature research, results of, 312 literature search, 311 measurement of accuracy of, 304 techniques for, 302t variation in, 303-304 original publication data, improvements in, 311-312 prediction, issue of blood pressure now vs blood pressure later, 305-306 palpation, 306 relative risk, 306 prior probability, 313 reference standard, changes in, 312, 313 sensitivity, specificity, or likelihood ratio in left ventricular dysfunction, 211t in thoracic aortic dissection, 666t hyperthyroidism, 277 hypertrophic cardiomyopathy, 439 hypertrophic osteoarthropathy, 163, 164 hypotension in myocardial infarction, 467t postural, 318, 319 supine, 320 hypothyroidism, 249, 277 hypovolemia, 315-316 acute blood loss physical signs, accuracy of, 319-320 clinical study, 317t findings of, 325 likelihood, 327t in ICU patients, 326t literature review, 326 literature search, 325 methods, 316-317 multivariate findings for, 326 pathogenesis, 318-319 physical signs accuracy of, 320-321 precision of, 319 postural vital signs, 317-318

prior probability, 327 reference standard, changes in, 325, 327

I ICD-10. See International Statistical Classification of Diseases, 10th Revision ice pack test, 451, 454 precision of, 455 sensitivity, specificity, or likelihood ratio for myasthenia gravis, 460t ICS. See Innsbruck Coma Scale idiopathic dilated cardiomyopathy, 183 idiopathic penicillin hypersensitivity, 517 IgE antibodies, 519 imipenem, 517 immediate penicillin hypersensitivity, 516-517 impedance plethysmography, 229-230 incorporation bias, 498 increased abdominal girth. See girth incontinence sensitivity, specificity, or likelihood ratio in perimenopause, 412t independence, 13-14 indirect blood pressure vs direct blood pressure, 304-305 technical inaccuracies of, 305 inelastic skin, 331 infiltrative disorders, 292 influenza, 343-344, 355-356 clinical findings, 347t diagnosis test, 350-352 likelihood of, 356 methods diagnostic odds ratio, 346 search strategy and quality review, 344-346 statistical methods, 346 prior probability, 356 reference standard tests, 356 signs and symptoms accuracy of, 346-350 precision of, 346 Innsbruck Coma Scale (ICS), 221 Integrated Management of Childhood Illness Scale, 330 intention tremor, 506 interleukin 1, 616 interleukin 6, 616 internal rotation resistance strength test for labral tears, 582f, 586t

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

729

INDEX International Registry of Acute Aortic Dissection (IRAD), 671 International Statistical Classification of Diseases, 10th Revision (ICD10), 40, 249 ipsilateral straight-leg raising sign, 78 IRAD. See International Registry of Acute Aortic Dissection irregular border skin lesion. See ABCD(E) criteria irritability, perimenopausal, 409 ischemic stroke, 635 subtype analysis, 635-636 accuracy of, 635 reliability of, 635-636 transient, 627, 631-633 itching sensitivity, specificity, or likelihood ratio in vaginitis, 707t

J jaw claudication, in temporal arteritis, 656t JNC-VII. See Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure, seventh report of joint line tenderness, 360. See also pain Joint National Committee, 305 Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure, seventh report of (JNC-VII), 312 jolt accentuation of headache, 396, 400 in meningitis, adult, 404t jugular veins, 190 jugular venous distention, 186, 187 sensitivity, specificity, or likelihood ratio in left ventricular dysfunction, 213t jugular venous pressure (JVP), 125, 133, 134 anatomic and physiologic origins of, 125-126 jugular veins, clinical examination of, 130 waveforms, analysis of, 126 abnormalities, 126 jugular venous pulse and carotid arterial pulse, distinguishing between, 127 JVP. See jugular venous pressure 730

K κ statistic, 152, 153. See also precision calculation of, 8f weighted, 571 Kartagener syndrome, 594 Katz hand diagram, 113, 114f sensitivity, specificity, or likelihood ratio in carpal tunnel syndrome, 115t, 123t, 124 Kernig signs, 396, 399 sensitivity, specificity, or likelihood ratio in meningitis, adult, 404t knee effusion sensitivity, specificity, or likelihood ratio in knee ligament and meniscus injury, 370t knee, meniscal, and ligamentous injuries, 357-358 anatomy, 358 anterior cruciate ligament (ACL) examination, 362 physical examination accuracy of, 363t maneuvers, 363t clinical examination for internal derangement, 359-361 function, 360-361 inspection, 359 palpation, 359-360 epidemiology of, 359 findings of, 369 likelihood of, 370 limitations, 363, 365 literature review, 369 literature search, 369 mechanism, 358-359 methods analysis, 362 search strategy, 361-362 original publication data, improvements in, 369 physical examination, 366t accuracy of, 364t maneuvers, 365t posterior cruciate ligament (PCL) examination, 362 physical examination accuracy of, 364t prior probability, 370 reference standard, changes in, 369, 370 symptoms, 358 Kussmaul sign, 128 kwashiorkor, 372

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

kyphosis, 477, 478, 479, 485 in osteoporosis, 490t

L laboratory findings sensitivity, specificity, or likelihood ratio for acute cholecystitis, 140t for adult malnutrition, albumin, 380t for deep vein thrombosis, D-dimer, 246t for hypovolemia, adult, urine specific gravity, 327t for hypovolemia, child, 334t for influenza, rapid tests, 356t for left ventricular dysfunction, brain natriuretic peptide, 213t for malnourishment, adult 380t for perimenopause, 417t for pulmonary embolus, D-dimer, 575t for streptococcal pharyngitis, rapid streptococcal test, 625t for temporal arteritis, erythrocyte sedimentation rate, 656t for urinary tract infection, women, urinalysis, 689t for vaginitis, microscopic tests, 699t, 700t, 707t labral (shoulder) tears, 589-591 findings of, 589 guidelines, evidence from, 590 likelihood ratio for, 591 literature search, 589 original publication data, improvements in, 590 physical examination tests, 580t precision of laxity maneuvers, 590t provocation maneuvers, 590t prior probability, 591 reference standard, changes in, 590, 591 labyrinthitis, 711 Lachman test, 360, 360f, 369 sensitivity, specificity, or likelihood ratio for knee ligament and meniscus injury, 359f, 370t LACS. See lacunar infarction syndrome lacunar infarction syndrome (LACS), 634 laparoscopy for appendicitis, 54

INDEX LAPSS. See Los Angeles Prehospital Stroke Scale laryngeal height sensitivity, specificity, or likelihood ratio in obstructive airways disease, 161t, 162t laryngitis, 540, 541f laryngotracheobronchitis, 540, 541f late penicillin hypersensitivity, 517-518 lateral collateral ligament (LCL), 358 lateral pivot shift test, 360f, 361 sensitivity, specificity, or likelihood ratio knee ligaments and menisci, 359f latex agglutination assays, 230 LAW (lymphocyte count, albumin, percentage weight loss) discriminant model, for adult malnutrition, 381 laxity tests for shoulder instability, 579, 580t LCL. See lateral collateral ligament lead pipe rigidity. See rigidity left-sided heart failure, in adults, 183 clinical findings for detection of, 186-187, 187t precision of, 189 definition, 184 ejection fraction, detection of, 187189, 188t methods data abstraction, 184-186 literature search, 184 pathophysiology of, 184 signs, elicitation of apical impulse, 190 jugular veins, 190 radiographic cardiomegaly, 190 radiographic redistribution, 190 third heart sound, 190 vital signs, 189-190 left ventricular dysfunction findings of, 209 guidelines, evidence from, 211-212 likelihood ratio, 213 literature review, results of, 211 literature search, 209 original publication data, improvements in, 211 prior probability, 213 reference standard, changes in, 211, 213 systolic dysfunction diagnosis of, 210-211 echocardiograms, 210-211 postmyocardial infarction, 210

and diastolic dysfunction, difference between, 211 left ventricular hypertrophy, 189 Legionella, 344 Legionella monocytogenes, 400 levels of evidence, 15t Levine grading system. See murmur intensity levodopa, 505, 506, 508 Li-Fraumeni syndrome, 275 likelihood ratio (LR) test, 218, 529, 617 for abdominal aortic aneurysm, 27t for acute cholecystitis, 147 for acute otitis media, 496, 501, 503t for aortic regurgitation, 429t, 430t for aortic stenosis, 446t for appendicitis, 63 for β-hemolytic streptococcal pharyngitis, 625 calculation of, 8f for cancer, 276 for carpal tunnel syndrome, 124 for central venous pressure, 135 for chest pain protocol, acute cardiac ischemia, 473t for chest pain radiation, 471-472, 472t for clinical assessment, 9-11 of deep vein thrombosis, 229, 231 for community-acquired pneumonia, 530t, 531, 531t, 537 for deep vein thrombosis, 246 for dehydration, 341 for goiter, 287 for home pregnancy test, 559, 560t for hypovolemia, 327t in ICU patients, 326t for influenza, 356 for labral tears, 591 for left ventricular dysfunction, 213 for major depression, 262 for malnutrition, 380t, 381t for medication nonadherence, 182t for meningitis in adults, 404t, 406 for meniscal and ligamentous knee injuries, 370 for mitral regurgitation, 447t for myasthenia gravis, 460t for myocardial infarction, 476 for obstructive airways disease, 162 for osteoporosis, 491 for Parkinson disease, 514 for pediatric pneumonia, 550 for penicillin allergy, 523 for perimenopausal, 416t, 417t for Phalen sign, 124t

for pregnancy, 560 for pulmonary embolism, 572t, 575 for reference standard tests, 356t for renal artery stenosis, 37 for shoulder instability, 591 for sinusitis, 603 for splenomegaly, 613 for stroke, 629, 641 for temporal arteritis, 648t, 649t, 654t, 656t for thoracic aortic dissection, 667t, 673, 673t for Tinel sign, 124t for urinary tract infection, 681t for vaginal complaints, 707 for valvular heart disease, 446t for vertigo, 717 limb weakness, 451 liver, physical examination of, 289-296 auscultation of, 291-292 examination of, 290-291 inspection of, 291 palpable liver edge, 292-293 physical findings of, 295 pulsatile liver edge, 293 topography, 289-290 vertical liver span, assessing, 293-295 liver disease ascites, 65 liver edge. See palpation liver span, normal, 291t load and shift anterior test for shoulder instability, 585t load and shift posterior test for shoulder instability, 585t logistic analysis, 14 Los Angeles Prehospital Stroke Scale (LAPSS), 628, 630-631, 633 low back pain anatomic/physiologic origins, 75 causes, 75 evidence from guidelines, 84 findings of, 83 history, 81 literature review, 84 literature search, 83 neurologic compromise, 77-80 cauda equina syndrome, 80 imaging tests, indications for, 80 lumbar disk herniations, 77-78 motor, reflex, and sensory dysfunction, assessment of, 7880 spinal stenosis, 80 original publication data, improvements in, 83 physical examination, 81 prevalence of diseases, 75-76

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

731

INDEX low back pain (Continued) prior probability, 86 reference standard, changes in, 84, 86 social or psychological distress, 80-81 systemic disease ankylosing spondylitis, 77 cancer, 76 compression fractures, 77 spinal infections, 77 spine range-of-motion measures, 77 lower-extremity dermatomes, 79f lower respiratory tract illness (LRI), 539, 540. See also pneumonia LR. See likelihood ratio test LRI. See lower respiratory tract illness lumbar disk herniation, 77-78 crossed straight-leg raise test, accuracy, 84t ipsilateral straight-leg raise test, accuracy, 84t physical examination, accuracy sciatica, patients with, 79t lumbar spine low back pain, 75 lung scanning, 562 lymphadenopathy sensitivity, specificity, or likelihood ratio in streptococcal pharyngitis, anterior cervical, 618t lymphocyte count, albumin, percentage weight loss (LAW) discriminant model, for adult malnutrition, 381 lysosomal enzyme, 616

M major depression, 248 likelihood ratio, 262 malaise sensitivity, specificity, or likelihood ratio influenza, 356t male. See sex malignancy, 249 ascites, 65 malignant melanoma ABCD(E) criteria likelihood ratio, 393t multivariate findings, 392t univariate findings, 392t detection and prognosis, 383 epidemiology, 383 evidence from guidelines, 392-393 findings of, 392 732

literature review, 392 literature search, 391 methods search strategy and quality filter, 385 original publication data, improvements in, 392 prior probability, 393 reference standard, changes in, 392, 393 skin examination accuracy of ABCD(E) checklist, 385t, 385-387 for detecting presence or absence, 387t, 387-388 revised 7-point checklist, 386, 386t checklists as diagnostic aid, 384 criterion standard for diagnosis, 385 historical feature assessment, 384 physical examination technique, 384 precision of, 385 signs and symptoms, 383-384 skin type risk factors, 393t malignant neoplasm, 76. See also cancer malnutrition, 371. See also nutritional status assessment evidence from guidelines, 381 findings of, 379 likelihood ratio, 382 of findings combinations, 380t of low albumin, 380t literature review, 380-381 literature search, 379 multivariate findings, 380t original publication data, improvements in, 379-380 prior probability, 381 reference standard, changes in, 380, 381 subjective global assessment, 379-380 Malnutrition Screening Tool, 381t Mammacare Method, 94, 95 mammography, 89-91 MANTRELS mnemonic, 61, 62t. See also Alvarado clinical decision rule marasmus, 372 Marfan syndrome, 663 in thoracic aortic dissection, 664t marginal cross-products, 3 Massachusetts Women’s Health Study, 408 MAST. See Michigan Alcoholism Screening Test match test, 151-152

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

maxillary sinus, 594, 595, 595f, 597 surface palpation for, 596f transillumination of, 596f McIsaac clinical prediction rule for sore throat, 620, 620f MCL. See medial collateral ligament McMurray test, 360f, 361, 369 sensitivity, specificity, or likelihood ratio knee ligament and meniscus injury, 359f, 370t MDM. See minor determinant mixture medial collateral ligament (MCL), 358, 359, 361 medial-lateral grind test, 361 Medical Research Council Thrombosis Prevention Trial, 26 medication adherence, assessing, 179182, 182t findings of, 179 guidelines, changes in, 181 literature review, results of, 180-181 literature search, 179 Morisky questions, 182t original publication data, improvements in, 180 pill count, 176t prior probability, 182 reference standard, changes in, 180, 182 Medication Adherence Self-Report Inventory, 180 Medication Event Monitoring System (MEMS) caps, 180 medication response sensitivity, specificity, or likelihood ratio to anticholinesterase, for myasthenia gravis, 460t to decongestants, for sinusitis, 603t to levodopa, for Parkinsonism, 510t to penicillin skin test, for penicillin allergy, 525t melanocyte, 383-384 MEMS. See Medication Event Monitoring System caps Ménière disease, 316, 711 meningitis, 404t clinical examination, 395-396 clinical history accuracy of, 398 sensitivity of, 398t evidence from guidelines, 405 findings of, 403 likelihood ratios, 404t, 406 literature review, 404 prospective study, 404-405 retrospective study, 404

INDEX literature search, 403 methods data analysis, 398 literature search and selection, 396-397 study characteristics, 397-398 original publication data, improvements in, 403-404 pathophysiology of, 396 physical examination accuracy of, 398, 399-400 sensitivity of, 399t prior probability, 406 reference standard, changes in, 404, 406 sensitivity of findings, 404t signs and symptoms, 396 precision of, 398 menopause. See perimenopause meta-analysis of clinical examination, 12-13 methacholine, 150 MI. See myocardial infarction Michigan Alcoholism Screening Test (MAST), 41, 42, 42t, 44. See also Brief Michigan Alcoholism Screening Test; Short Michigan Alcoholism Screening Test accuracy of, 42 questionnaire, 49 reliability of, 42 sensitivity, specificity, or likelihood ratio in problem alcohol drinking, 41 micrographia sensitivity, specificity, or likelihood ratio in Parkinsonism, 509t Middleton hooking maneuver sensitivity, specificity, or likelihood ratio in splenomegaly, 612t Mini MagLite, 597 Mini Nutritional Assessment (MNA), 380-381 Minnesota Multiphasic Personality Inventory, 80 minor depression, 248 minor determinant mixture (MDM), 520 mitral regurgitation (MR), 438-439, 445 clinical examination accuracy of, 439t and mitral valve prolapse, 446 physical examination accuracy of, 445t

mitral stenosis and pulmonic regurgitation, 423, 425 mitral valve prolapse (MVP), 439-440, 445 clinical examination accuracy of, 440t and mitral regurgitation, 446 MNA. See Mini Nutritional Assessment moderate drinking, 48 Modigliani syndrome, 280 monobactams, 515, 518 monofilament testing. See sensory change mood, 249 Moraxella catarrhalis, 494 Morisky measure, 180 sensitivity, specificity, or likelihood ratio for medication adherence, 182t morning sickness, 552, 554, 555 for pregnancy, 560 Movement Disorder Society, 507 MR. See mitral regurgitation multiple nevi, 384 multivariate analysis, 14 murmur intensity, 420 sensitivity, specificity, or likelihood ratio in aortic regurgitation clinical impression, 429t intensity, 430t significant vs insignificant systolic murmurs, 446t in thoracic aortic dissection, diastolic, 665t, 666t, 673t typical murmur, 430t in aortic stenosis carotid pulse, 446t intensity, 446t location of murmur, 446t radiation to carotids, 438t S2 (second heart sound), 446t systolic murmur, 446t timing, 438t in hypertrophic cardiomyopathy change with maneuvers, 439 in mitral regurgitation during myocardial infarction, 439t intensity, 447t location, 439t timing, 439t in tricuspid regurgitation change with abdominal pressure, 439 change with inspiration, 439

Murphy sign, 138, 145, 146 sensitivity, specificity, or likelihood ratio in acute cholecystitis, 147t muscle wasting, subjective global assessment (SGA), 374 MVP. See mitral valve prolapse myalgia sensitivity, specificity, or likelihood ratio in influenza, 348t in pneumonia, adult 536t myasthenia gravis, 449-456, 459 acetylcholine receptor antibodypositive myasthenia gravis, 449, 450f anticholinesterase tests, 451-453 likelihood ratio of, 460t office tests, accuracy of, 454-455 prior probability, 460 reference standard tests, 460 search strategy and quality review, 453 statistical methods, 454 symptoms and signs of accuracy of, 454 anatomical and physiological origins of, 450-451 elicitation of, 451 Mycoplasma pneumoniae, 344 myeloproliferative syndrome, 612 Myerson sign, 508 myocardial infarction (MI), 195, 461469. See also cardiac ischemic chest pain; chest pain accuracy of, 466 electrocardiogram, 468 medical history, 467-468 physical examination, 467-468 acute cardiac ischemia, multivariate findings for, 473-474 acute chest pain, diagnosis of, 462463, 463f cardiac and noncardiac conditions, 463-464 clinical findings of, 468-469 clinical prediction rules, 468-469 diagnostic criteria, 476 findings of, 471 guidelines, evidence from, 474-475 likelihood ratio of, 476 literature review, results of, 472-473 literature search, 471 mechanism of, 463 methods analysis, 465 precision and accuracy, test criteria for, 464

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

733

INDEX myocardial infarction (MI) (Continued) quality assessment, 464-465 search strategy, 464 selection of articles, 464 multivariate findings for, 476t original publication data, improvements in, 472 precision of electrocardiogram, 465-466 medical history, 465 physical examination, 465 pretest probability, 469 prior probability, 476 reference standard, changes in, 472, 476 symptoms and signs, 463 univariate findings for, 476t myoclonus, 217, 221, 222 sensitivity, specificity, or likelihood ratio in coma, 226t

N nafcillin, 519 nailfold angles, 164, 166 for clubbing, 165f nasal congestion differential diagnosis of, 594t sensitivity, specificity, or likelihood ratio in influenza, 356t in otitis media, child, 497t in pneumonia, adult, 536t in sinusitis, purulent, 603t nasal flaring, in pneumonia, infant and child, 544t nasal turbinates, 594, 595f NASCET. See North American Symptomatic Carotid Endarterectomy Trial National Ambulatory Medical Care Survey, 493, 615 National Center for Health Statistics, 53 National Health Interview Survey data, 99 National Heart, Lung, and Blood Institute, 161 National Institute of Neurological Disorders and Stroke, 634 National Institute on Alcohol Abuse and Alcoholism, 48 National Institutes of Health Stroke Scale (NIHSS), 630, 630t, 633, 637, 637t reliability of, 634, 635t National Osteoporosis Foundation, 478 734

National Program of Cancer Registries, 267 National Society of Genetic Counselors, 267 nausea. See nausea and vomiting nausea and vomiting sensitivity, specificity, or likelihood ratio in acute cholecystitis, 140t in appendicitis, adult, 57t in meningitis, adult, 406t in myocardial infarction, 576t NBSS. See Canadian National Breast Screening Study neck carotid artery cause, for bruits, 103104 stiffness, 396, 399 negative likelihood ratio (LR–), 19, 57, 197, 236, 454, 554, 617 in clinical examination, 9, 12 median, 253 for test of Speed, 589 for test of Yergason, 589 negative predictive value, 4, 5 calculation of, 8f Neisseria meningitides, 400 neostigmine bromide, 452 nephrotic syndrome ascites, 65 nervous tension, perimenopausal, 409 neurologic compromise, low back pain, 77-80 cauda equina syndrome, 80 imaging tests, indications for, 80 lumbar disk herniations, 77-78 motor, reflex, and sensory dysfunction, assessment of, 7880 spinal stenosis, 80 neurologic deficit sensitivity, specificity, or likelihood ratio in stroke, 641t in thoracic aortic dissection, 673t nevi atypical (dysplastic), 384 clinical assessment for spider nevi, precision of, 3f multiple nevi, 384 night sweats, perimenopausal, 409 NIHSS. See National Institutes of Health Stroke Scale nitric oxide, 616 nitroglycerin, 473 Nixon method, 606-607, 607f, 608 sensitivity, specificity, or likelihood ratio

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

in splenomegaly, 612t nomogram, 7 noncompliance. See also compliance measurement, 174-175 methods, 175 nature of, 174 normovolemic phlebotomy study, 316t postural vital signs, 318t North American Symptomatic Carotid Endarterectomy Trial (NASCET), 105, 108 Nugent criteria, 694t sensitivity, specificity, or likelihood ratio for vaginitis, 707t, 694t nutrition-associated complications, 372 nutritional status assessment, 371-372. See also malnutrition accuracy of, 375-376 anatomic and physiologic origin, 372 components of, 371-372 precision of, 375 subjective global assessment, features, 373t dietary intake change, 373 functional capacity, 374-375 gastrointestinal symptoms, 374 weight change, 372, 373 symptoms and signs, 376-377 nystagmus origin of, 711

O obesity, 20 observed agreement. See agreement obstructive airways disease, 159-162 findings of, 159-160 guidelines, evidence from, 161 likelihood ratio, 162 literature review, results of, 160-161 literature search, 159 original data publication, improvements in, 160 prior probability, 162 reference standard, changes in, 160, 162 obstructive lung disease, 528 obturator sign in appendicitis, 55 examination for, 55f occult vertebral fracture, 478 hand grip strength test for, 479, 485 rib-pelvis distance test for, 479, 485 skinfold thickness test for, 479, 485

INDEX tooth count, 485-486 wall-occiput distance test for, 479, 485 ocular myasthenia, 450 oculocephalic reflex. See reflexes odds ratio (OR), 9, 35, 228, 237 diagnostic, 12 odor sensitivity, specificity, or likelihood ratio in vaginitis, 707t office blood pressure factors, affecting, 303-304 vs usual blood pressure, 305 OME. See otitis media with effusion opening snap sensitivity, specificity, or likelihood ratio murmur, diastolic, 421 OR. See odds ratio ORAI. See osteoporosis risk assessment instrument orbicularis oculi weakness, 451, 452f orthopnea, 153. See also dyspnea orthostatic hypotension, 320t Osler sign, 304-305 OST. See osteoporosis self-assessment screening tool osteomyelitis, 594 osteoporosis, 477-486, 489-491 arm span–height difference, 479, 482-483 definition of, 478 diagnostic accuracy, 480, 483t, 484t elicitation of, 478-480 findings of, 489 hand grip strength test for, 479, 485 height loss, 482 likelihood ratio of, 491 literature review, results of, 490 multivariate findings for, 490t univariate findings for, 490t literature search, 489 methods data analysis, 480 quality assessment, of articles, 480 original publication data, improvements in, 490 pathophysiology of, 478 precision of, 480, 482t prevalence of, 478, 491t prior probability, 491 reference standard, changes in, 490, 491 rib-pelvis distance test for, 479, 485 skinfold thickness test for, 479, 485 study characteristics, 480, 481t, 482t tooth count, 485-486

wall-occiput distance test for, 479, 485 weight, 483-485 osteoporosis risk assessment instrument (ORAI), 490, 491t osteoporosis self-assessment screening tool (OST), 489, 490, 491t otitis media diagnostic criteria, in children, 494 otitis media with effusion (OME), 495 and acute otitis media, distinguishing between, 493 otolaryngologist, 496, 596, 597 otosclerosis, 711 ovarian cancer, 265, 266, 267 ovarian carcinoma, 72 Ovid MEDLINE, 47 Oxfordshire Classification of Subtypes of Cerebral Infarction stroke, symptoms, 634 oxygen free radicals, 616 oxymetazoline hydrochloride, 595

P pachydermoperiostosis, 163 PACS. See partial anterior circulation infarction syndrome pain sensitivity, specificity, or likelihood ratio in acute cholecystitis guarding, 140t rebound, 140t rectal, 140t right upper quadrant, 147t rigidity, 140t in appendicitis, adult guarding, 57t migration, 57t pain before vomiting, 57t rebound tenderness, 57t rectal tenderness, 57t right lower quadrant, 57t rigidity, 57t in back pain duration, 76t positional, 76t in cancer-induced back pain, nocturnal, 86t in coma motor response, 225t withdrawal response, 225t in influenza, 348t-349t nasal congestion, 348t pharyngitis, 349t in knee injury, joint line tenderness, 370t

in meningitis, adult, headache, 406t in myocardial infarction chest wall, 467t pleurisy, 467t positional, 467t radiation to the arms, 576t in otitis media, child, ear, 503t in temporal arteritis headache, 654t jaw, 656t scalp, 654t in thoracic aortic dissection, 673t migratory, 673t sudden onset, 673t “tearing” or “ripping,” 673t in urinary tract infection, women, 679t back pain, 680t flank pain, 679t lower abdominal pain, 679t pain provocation test of Mimori for labral tears, 582f, 586t palpable expansile tumor, 18 palpation, 291, 293, 294, 295 for airflow limitation, 151, 153, 154 of clubbing, 165-166 sensitivity, specificity, or likelihood ratio of abdomen, for abdominal aortic aneurysm, 27t of liver, for hepatomegaly, 299t of spleen, for splenomegaly, 612t, 613t of temporal artery, for temporal arteritis, 649t of thyroid, for goiter, 286t, 287t of spleen, 607-608, 609, 609t paradoxic ptosis. See curtain sign parainfluenza, 344 paranasal sinuses coronal view of, 595f sagittal view of, 594f transillumination of, 596-597 Waters view of, 594 parental suspicion of otitis media, 503t paresthesia sensitivity, specificity, or likelihood ratio in carpal tunnel syndrome, 115t Parkinson disease (PD), 505-511, 513514 accuracy of, 510 findings of, 513 guidelines, evidence from, 513 likelihood ratio of, 514 literature review, results of, 513 literature search, 513

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

735

INDEX Parkinson disease (PD) (Continued) methods, 509 original publication data, improvements in, 513 and parkinsonism, distinguishing between, 506 pathophysiologic characteristics of, 506 precision of, 510 prior probability, 514 quality of evidence, 508t, 509-510 reference standard, changes in, 513 signs of, 506-507, 510t elicitation of, 507-508 symptoms of, 506-507, 509t parkinsonian facies, 509 paroxysmal nocturnal dyspnea. See dyspnea parsimonious clinical examination, 1314 parsimony, 13 partial anterior circulation infarction syndrome (PACS), 634 Pastia sign, 617 patella, 359-360 patella reflex, 79 pathophysiology of appendicitis, 54 of community-acquired pneumonia, 528 patient, 1-2 alcoholic, 1 ascites, 1 patient-generated subjective global assessment (PG-SGA), 380 Patient Health Questionnaire (PHQ-9), 259, 260, 261, 263. See also clinical prediction rules and scores patient’s medical history, information in, 11 PCL. See posterior cruciate ligament PD. See Parkinson disease PDR. See phalangeal depth ratio PE. See pulmonary embolism peak expiratory flow, 155 pediatric pneumonia likelihood ratio for, 550 multivariate findings for, 548 univariate findings for, 548, 549t Pedigree Standardization Task Force, 267 peek sign sensitivity, specificity, or likelihood ratio myasthenia gravis, 460t pelvic appendicitis, 54 penicillin, 174 736

penicillin allergy, 515-521, 523-525 β-lactam antibiotics, cross-reactivity with, 518 clinical history, 518-519 accuracy of, 519 findings of, 523 guidelines, evidence from, 524 how to take a history for, 518 hypersensitivity reactions, classification of, 516-518, 517t immediate reactions, 516-517 late reactions, 517-518 likelihood ratio of, 525 literature review, results of univariate findings for, 524 literature search, 523 methods, 516 original publication data, improvements in, 524 prior probability, 525 reference standard, changes in, 524, 525 sensitivity, specificity, or likelihood ratio history of, 525t skin testing, 519-520 limitations of, 520-521 penicillin G, 520 penicillin skin test, 519-520, 523, 524, 525t limitations of, 520-521 percussion, 290, 291, 294, 295 advantages of, 295 for airflow limitation, 151, 153, 154 measurement of, 294-295 sensitivity, specificity, or likelihood ratio in ascites flank dullness, 69t fluid wave, 69t, 73t shifting dullness, 69t, 73t in chronic obstructive airways disease cardiac dullness, 154t chest hyperresonance, 154t for splenomegaly, of spleen, 612t, 613t in pneumonia, adult dullness of the lungs, 531t percussion methods, for splenomegaly Castell method, 607, 607f, 608 Nixon method, 606-607, 607f, 608 Traube space, 607, 607f percussive span technique, 295 perimenopause, 407, 416t definition, 408 estimate pretest probability of, 408

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

evaluation family and medical history age of mother’s menopause, 409 cigarette use, 409 hysterectomy status, 410 laboratory tests estradiol, 410 follicle-stimulating hormone, 410 inhibins, 410 physical signs maturation index, 410 skin thickness, 410 vaginal pH, 410 self-assessment, 408-413 symptoms, 409 depressed mood, 409 hot flashes, 409 nervous tension and irritability, 409 night sweats, 409 urinary incontinence, 409 vaginal dryness, 409 variable sexual interest, 409 evidence from guidelines, 416 findings of, 415 likelihood ratio, 416t, 417t literature review, 415-416 literature search, 415 methods search strategy and quality review, 410-411 statistical methods, 412 original publication data, improvements in, 415 physiology, 408 prior probability, 417 reference standard, changes in, 415 peripheral edema, 72 peripheral hemodynamic signs, 421-422 peritoneal fluid, 71, 72 periumbilical bruits, 30 petechiae, 617 PG-SGA. See patient-generated subjective global assessment phalangeal depth ratio (PDR), 164-165, 166 Phalen sign, 112t, 114 likelihood ratio, 124t sensitivity, specificity, or likelihood ratio in carpal tunnel syndrome, 124t pharyngeal exudate, in streptococcal pharyngitis, 618t pharyngeal weakness, 451 pharyngitis, 615, 623. See also pain differential diagnosis of, 616t streptococcal, 625

INDEX PHQ. See PRIME-MD Patient Health Questionnaire PHQ-9. See Patient Health Questionnaire physiologic origin, of abdominal bruit, 29 physiologic tremor, 506, 514 pigmented skin lesion. See ABCD(E) criteria pill count, in medication adherence, 176t pill-rolling tremor, 506 PIOPED. See Prospective Investigation of Pulmonary Embolism Diagnosis study PISA-PED. See Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis plain abdominal radiographs for appendicitis, 54 plain film radiographs for sinusitis likelihood ratio, 603 plaster casts, of fingers, 171 pleuritic chest pain, 215, 225 pneumatic otoscopy, 493, 494, 495, 496, 499 pneumococcal pneumonia, 527 pneumonia. See also acute respiratory illness; lower respiratory tract illness community-acquired, 527-533 pneumonia, in infant and child, 539545, 547-550. See also tachypnea anatomy and pathophysiology of, 540-541 bacterial, 540 findings of, 547 guidelines, evidence from, 549 literature search, 547 methods, 539-540 original publication data, improvements in, 548 pediatric pneumonia likelihood ratio test for, 550 multivariate findings for, 548 univariate findings for, 548, 549t prior probability, 550 reference standard, changes in, 540, 548, 550 symptoms and signs accuracy of, 543-545 elicitation of, 541-542 precision of, 542-543 pneumonia score, 548 Pneumonia Severity Index, 536

point-of-care testing, 623-624, 625t polyps, 594, 595 popliteal-brachial gradient, in aortic regurgitation, 425t positive likelihood ratio (LR+), 19, 56, 58, 197, 236, 454, 513, 554, 617 in clinical examination, 9, 12 median, 253 for test of Speed, 589 for test of Yergason, 589 positive predictive value, 4 calculation of, 8f posterior circulation infarction syndrome (POCS), 634 posterior cruciate ligament (PCL), 358, 359, 361, 362 physical examination accuracy of, 364t posterior drawer test, 361 posterior probability, 9 postmyocardial infarction, 210 postphlebitic syndrome, 235 posttest probability, 4, 7 calculation of, 8f postural tachycardia. See tachycardia postural vital signs, 316 PR. See pulmonic regurgitation precision. See also κ calculation of, 8f of clinical examination, 1, 3-4, 9 “good” symptom or sign, 11-12 for left-sided heart failure, 189 likelihood ratio, 9-11 meta-analysis, 12-13 pretest probability, 11 “sensitivity-only” studies, 13 pregnancy, 551-557, 559-560 guidelines, evidence from, 559 home pregnancy tests, accuracy of, 555-556 likelihood ratio test for, 560 literature review, results of, 559 literature search, 559 methods search strategy, 553-554 original publication data, improvements in, 559 patient history, accuracy of, 554-555 physical examination, accuracy of, 556-557 prior probability, 560 reference standard, changes in, 559, 560 signs and symptoms, 554-555 elicitation of medical history, 552

physical examination, 552-553 reference standard for, 553 during first trimester anatomic and physiologic origins, 552 uterine height, 553f pregnant women problems in at-risk drinking, 44-45 preoperative carotid bruit, 105 pressure provocation test, 112t pretest probability, 5, 7, 11 calculation of, 8f prevalence, calculation of, 8f Primary Care Evaluation of Mental Disorders (PRIME-MD), 250, 251, 252, 259, 260, 261, 263 PRIME-MD Patient Health Questionnaire (PHQ), 250, 251, 252. PRIME-MD. See Primary Care Evaluation of Mental Disorders; clinical prediction rules and scores problem drinking, 47 Prospective Investigation of Pulmonary Embolism Diagnosis (PIOPED) study, 562, 563, 564 Prospective Investigative Study of Acute Pulmonary Embolism Diagnosis (PISA-PED) study, 563, 564, 565, 566, 572 prostaglandins, 616 prostate cancer, 265 protein-energy malnutrition, 372 provocation test for labral tears, 579, 580t for shoulder instability, 579, 580t pseudohypertension, 304 Psoas sign, of appendicitis, 55 sensitivity, specificity, or likelihood ratio for appendicitis, 57t psychogenic dizziness, 711 ptosis, 451, 455 puddle sign, 66 sensitivity, specificity, or likelihood ratio for ascites, 69t pulmonary crackles. See rales pulmonary embolism (PE), 227, 235, 561-569, 571-575 clinical examination, precision of, 566 clinical gestalt, 563-564 negative result for, 563 positive result for, 563 clinical prediction rules, 564-565 accuracy of, 565t, 568t

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

737

INDEX pulmonary embolism (PE) (Continued) components of, 566 validation of, 565-566 findings of, 571 guidelines, evidence from, 574 likelihood ratio test for, 572t, 575 literature review, results of, 572-574 literature search, 571 methods data analysis, 563 data sources, 562 study selection and data extraction, 562-563 original publication data, improvements in, 571-572 pretest probability, accuracy of, 564t prior probability, 575 reference standard, changes in computed tomography (CT) angiography, 572 D-dimer assay, 572 enzyme-linked immunosorbent assay, 572 pulmonary fibrosis, 528 pulmonic regurgitation (PR), 421 and mitral stenosis, 423, 425 pulse deficit (arms), in thoracic aortic dissection, 673t pulse pressure, 421 in aortic regurgitation, 425t pulsus paradoxus sensitivity, specificity, or likelihood ratio in chronic obstructive airways disease, 154t pupillary response. See reflexes pyridostigmine bromide, 452

Q quadriceps weakness, 79 sensitivity, specificity, or likelihood ratio in sciatica, 79t QUADAS. See Quality Assessment of Diagnostic Accuracy Studies checklist quality, 15-16 Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist, 583 Quebec Task Force on Spinal Disorders, 80 questionnaire for depression Beck Depression Inventory (BDI), 250 738

Center for Epidemiologic Studies Depression (CES-D), 250 Duke Anxiety and Depression Scale (DADS), 250 Geriatric Depression Scale (GDS), 250 Patient Health Questionnaire (PHQ-9), 263t Primary Care Evaluation of Mental Disorders (PRIME-MD), 250 PRIME-MD, 263t PRIME-MD Patient Health Questionnaire (PHQ), 250 Zung Self-Rating Depression Scale (SDS), 250 for malnutrition, adult, 380t for medication adherence Morisky questions, 182t for problem alcohol drinking Alcohol Use Disorders Identification Test (AUDIT), 51t Alcohol Use Disorders Identification Test, Consumption Questions (AUDIT-C), 51t CAGE questions, 52t T-ACE questions, 52t TWEAK questions, 52t quiver eye movements, in myasthenia gravis, 454t

R radiographic cardiomegaly, 187, 189, 190 radiographic findings sensitivity, specificity, or likelihood ratio in the breathless emergency patient, chest radiograph, 213t in left ventricular dysfunction, chest radiograph, 213t for sinusitis, sinus films, 603t radiographic redistribution, 186, 187, 190 radiographic techniques for appendicitis, 54 radioisotopic scintiscan for splenomegaly, 606 radionuclide scanning for acute cholecystitis, 138 rales sensitivity, specificity, or likelihood ratio in the breathless emergency patient, 213t

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

in myocardial infarctions, 467t in pneumonia, adult, 536t in pneumonia, infant and child, 550t Ramsay Hunt syndrome, 712 random-effects measure, 13, 19 random-effects model, 316 randomized controlled trials, 88 range of motion, 358 rapid influenza test, 355 likelihood ratio for, 356t reactive airway disease, 452 readers’ guides for diagnostic test, 3t rebound tenderness, 55. See also pain; guarding receiver operating characteristic (ROC) curve, 617 recent weight gain. See weight gain rectal examination, for appendicitis, 55 recurrent vestibulopathy, 711 reduced muscle power, 451, 455 reflexes sensitivity, specificity, or likelihood ratio Achilles tendon, in normal patients, 84t corneal, in coma, 220t cough, in coma, 220t eye movements, in coma, 220t gag, in coma, 220t glabella tap, in Parkinsonism, 514t oculocephalic, in coma, 220t pupillary, in coma, 220t relocation test for labral tears, 586t for shoulder instability or labral tear, 581f, 583, 585t, 589, 590, 591t renal artery stenosis, 35 abdominal bruits in, 30 likelihood ratio, 37 multivariate findings for, 36 prior probability, 37 reference standard tests, 37 univariate findings for, 36t renovascular hypertension abdominal auscultation in, accuracy of, 31 evaluation of, abdominal bruits in, 29-32 prognosis of, 32 reserpine, 249 respiratory distress, 547 respiratory illness. See communityacquired pneumonia; pneumonia respiratory rate, in children, 541 respiratory syncytial virus (RSV), 549

INDEX rest test, 451, 455 rest tremor, 506, 507 retractions, chest sensitivity, specificity, or likelihood ratio in pneumonia, infant and child, 550t reverse workup bias, 520 rheumatoid arthritis, 164 rhinitis, 593. See also nasal congestion allergic, 594 viral, 594 rhinorrhea differential diagnosis of, 594t rhinosinusitis, 603. See also sinusitis rhinoviruses, 344 rhonchi, 155 sensitivity, specificity, or likelihood ratio in chronic obstructive airways disease, 154t rib-pelvis distance for occult lumbar vertebral fractures, test, 479, 485 in osteoporosis, 483t rigidity, 55, 506, 507 sensitivity, specificity, or likelihood ratio cog wheeling in Parkinson disease, 506, 514t ROC curve. See receiver operating characteristic curve Rovsing sign, of appendicitis, 55 RSV. See respiratory syncytial virus ruptured abdominal aortic aneurysm abdominal palpation for, 19

S S2 (second heart sound). See heart sounds S3 (third heart sound). See heart sounds S4 (fourth heart sound). See heart sounds SBP. See systolic blood pressure Scandinavian Neurological Stroke Scale, 634 scarlet fever, 617 scattergram, 250 Schamroth sign, 165 for clubbing, 165f Schober test, 77 sciatica, 77-78 physical examination accuracy for lumbar disk herniation among patients, 79t

SCID. See Structured Clinical Interview for DSM-III-R; Structured Clinical Interview for DSMIV-TR scintigraphy for liver examination, 292, 295 SCORE. See simple calculated osteoporosis risk estimate Scottish International Guidelines Network, 134 scratch test sensitivity, specificity, or likelihood ratio for hepatomegaly, 300t SDDS-PC. See Symptom Driven Diagnostic System for Primary Care SDS. See Zung Self-Rating Depression Scale seizure, 217, 221-222 sensitivity, specificity, or likelihood ratio in coma, 226t self-administered medication therapy, 173 self-diagnosis sensitivity, specificity, or likelihood ratio otitis media, parental suspicion, 503t pregnancy, suspicion of, 560 urinary tract infection, women, 680t vaginal candidiasis, 696t sensitivity, calculation of, 8f “sensitivity-only” studies, 13 sensory change sensitivity, specificity, or likelihood ratio in carpal tunnel syndrome, 115t in sciatica, 79t sex sensitivity, specificity, or likelihood ratio in ventricular dysfunction, 211t sexually transmitted diseases, 676 SGA. See subjective global assessment shadowgrams, 171 shadowgraph method, 166 shifting dullness, 66, 67. See also percussion Short Michigan Alcoholism Screening Test (SMAST), 41 shoulder instability, 577-587, 589-591 anatomy of, 578-579, 578f clinical tests for, 579-581, 580t, 581f findings of, 589 guidelines, evidence from, 590 labral tears, 579

clinical tests for, 579-581, 580t, 582f limitation of, 583 physical examination, diagnostic accuracy of, 583, 586t likelihood ratio for, 591 limitation of, 583, 583 literature search, 589 original publication data, improvements in, 590 physical examination diagnostic accuracy of, 583, 585t tests for, 580t precision of laxity maneuvers, 590t provocation maneuvers, 590t prior probability, 591 reference standard, changes in, 590, 591 signed rank test, 455 silicone models, 92, 94, 95 simple calculated osteoporosis risk estimate (SCORE) questionnaire, 490, 491 Simplified Medication Adherence Questionnaire, 180 Simplified Wells scoring system, 571, 572, 574, 575 for pulmonary embolus, 575t SimpliRed assay, 230 Single Question (SQ), for depression 250, 251 single-fiber electromyography for myasthenia gravis, 450 single-leg sit-to-stand test, 84 sinusitis, 593-598, 601-603. See also rhinosinusitis anatomy and pathophysiology of, 594 differential diagnosis of, 594t findings of, 601 guidelines, evidence from, 602 likelihood ratio test for, 603 literature search, 601 original publication data, improvements in, 602 paranasal sinuses coronal view of, 595f sagittal view of, 594f transillumination of, 596-597 Waters view of, 594 prior probability, 603 reference standard for, 594 changes in, 602, 603 symptoms and signs accuracy of, 596-598 elicitation of, 594-595 precision of, 595-596 univariate findings for literature review results, 602

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

739

INDEX sit-to-stand test, 84 sensitivity, specificity, or likelihood ratio back pain upper lumbar herniation, 86t disk herniation, 86t skin examination, for malignant melanoma accuracy of ABCD(E) checklist, 385t, 385-387 for detecting presence or absence, 387t, 387-388 revised 7-point checklist, 386, 386t checklists as diagnostic aid, 384 criterion standard for diagnosis, 385 historical feature assessment, 384 physical examination technique, 384 precision of, 385 signs and symptoms, 383-384 skin turgor, 319, 331 sensitivity, specificity, or likelihood ratio in hypovolemia, child, 334t skinfold thickness test for occult vertebral fracture, 479, 485 SLAP lesion. See superior labrum anterior posterior lesion sleep test, 451, 455 sensitivity, specificity, or likelihood ratio for myasthenia gravis, 460t SLR. See straight-leg raising sign SMAST. See Short Michigan Alcoholism Screening Test smoking. See tobacco use sneezing sensitivity, specificity, or likelihood ratio in influenza, 356t spasticity, 506 sensitivity, specificity, or likelihood ratio in Parkinsonism, 506 specificity, calculation of, 8f spectrum bias, 141, 497 speech sensitivity, specificity, or likelihood ratio in myasthenia gravis, unintelligible, 460t in Parkinsonism, soft voice, 510t in stroke, abnormal, 641t sphenoid sinuses, 594f, 595 sphygmomanometers, 306 spider nevi clinical examination precision of, 3f 740

Spiegel criteria, 694t sensitivity, specificity, or likelihood ratio for vaginitis, 707t, 694t spinal compression fractures, 77 spinal infections, 77 spinal stenosis, 80 spine range-of-motion measures, 77 spiral computed tomography (CT) scanning for pulmonary embolism, 564 spirometry, 149, 150, 154, 160 spleen. See also splenomegaly palpation of, 607-608, 609, 609t size of, 605-606, 606f splenomegaly, 605-610, 611-613. See also spleen anatomic landmarks, 605 clinical examination for consequences of, 606 guidelines, 609t inspection, 606 palpation, 607-608, 609, 609t percussion, 606-607, 607f, 608 findings of, 611 guidelines, evidence from, 612 likelihood ratio test for, 613 literature review, results for, 612 literature search, 611 original publication data, improvements in, 612 prior probability, 613 reference standard, changes in, 612, 613 signs of accuracy, 608-609 precision, 608 splenic size, 605-606, 606f sputum production, 151, 152 sensitivity, specificity, or likelihood ratio in obstructive airways disease, 152t SQ. See Single Question for depression square wrist sign, 112t stadiometer, 478-479 Staphylococcus aureus endocarditis, 519 sterile stethoscope, 32 straight-leg raising (SLR) sign, 78 sensitivity, specificity, or likelihood ratio for disk herniation, 86t strawberry tongue, 617 strength testing sensitivity, specificity, or likelihood ratio in carpal tunnel syndrome thumb, 115t

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

in Parkinsonism difficultly rising from a chair, 514t in sciatica ankle, 79t great toe, 79t quadriceps, 79t strep throat, 615-621, 623-625 clinical prediction rules for, 618-620, 619t Centor clinical prediction rule, 619, 619f McIsaac clinical prediction rule, 620, 620f Walsh algorithm, 621f findings of, 623 guidelines, evidence from, 624 likelihood of, 625 literature review, results for, 624 literature search, 623 methods search strategy and quality review, 616 statistical methods, 617 original publication data, improvements in, 624 pathophysiology of, 616 pretest probability estimation of, 618 prior probability, 625 reference standard, changes in, 624, 625 symptoms and signs, 617 diagnostic accuracy of, 617-618 precision of, 617 Streptococcus pneumoniae, 344, 400, 494, 501 stroke, 627-638 classification of, 635 diagnosis of accuracy, 633 flow, 628f reliability, 633 ischemic stroke subtype analysis, 635-636 likelihood ratio test for, 641 methods, 629-630 prehospital assessment, 630-631 prior probability, 641 prognosis of, 636-637, 636t reference standard tests, 641 severity, assessment of, 634-635 symptoms Oxfordshire Classification of Subtypes of Cerebral Infarction, 634 transient ischemic attack, 627, 631-633 vascular distribution of, 633-634

INDEX Structured Clinical Interview for DSM-III-R (SCID), 40, 250, 254 Structured Clinical Interview for DSM-IV-TR (SCID), 259 students’ aneurysm, 18 subcutaneous tissue loss, subjective global assessment (SGA), 374 subjective global assessment (SGA), of nutritional status in adult malnutrition, 373t, 376t dietary intake change, 373 functional capacity, 374-375 loss of fluid from intravascular to extravascular space, 374-375 loss of subcutaneous fat, 374 muscle wasting, 374 gastrointestinal symptoms, 374 and postoperative complications, relationship between, 376t weight change, 372, 373 sublingual nitroglycerin, 183 sulcus sign, 585t sunken eyes sensitivity, specificity, or likelihood ratio in hypovolemia, child 334t superior labrum anterior posterior (SLAP) lesion, 578, 586t swallowing, in myasthenia gravis, 454t Swan-Ganz catheterization, 203 sweating sensitivity, specificity, or likelihood ratio in pneumonia, adult, night sweats, 536t Swedish Two-County Trial, 89 Symptom Driven Diagnostic System for Primary Care (SDDS-PC), 250, 251, 252 symptomatic carotid bruit, 105 systemic disease, low back pain ankylosing spondylitis, 77 cancer, 76 compression fractures, 77 spinal infections, 77 spine range-of-motion measures, 77 systemic glucocorticoids, 149 systolic blood pressure (SBP), 301, 302, 303 systolic bruits, 35, 36 systolic click, with mitral valve prolapse, 440t systolic dysfunction, 184, 186, 187 diagnosis of, 210-211 and diastolic dysfunction, difference between, 211

echocardiograms, 210-211 postmyocardial infarction, 210 and diastolic dysfunction, difference between, 189 systolic murmurs, abnormal, 420, 436437 anatomic and physiologic origins of, 433-434 aortic stenosis, 437 causes, 434t clinical examination accuracy of, 436, 437t precision of, 435-436, 436t evidence from guidelines, 445 examination, 434-435, 440 features, 435f findings of, 443 hypertrophic cardiomyopathy, 439 literature review accuracy, 444-445 precision, 444 literature search, 443 mitral regurgitation, 438-439 mitral valve prolapse, 439-440 original publication data, improvements in, 443 prior probability, 446 reference standard, changes in, 444, 447 tricuspid regurgitation, 439 systolic-diastolic abdominal bruits, 30, 31, 32

T TA. See temporal arteritis T-ACE questionnaire, 44, 45, 49, 52t sensitivity, specificity, or likelihood ratio for alcohol abuse, 50t tachycardia sensitivity, specificity, or likelihood ratio in hypovolemia, adult, postural, 327t in pneumonia, adult, 536t for ventricular dysfunction, 211t supine, 320 tachypnea, 543, 547, 548, 549, 561, 571, 574. See also pneumonia, in infant and child World Health Organization criteria for, 548, 548t definition, age based for children, 548t sensitivity, specificity, or likelihood ratio in pneumonia, infant and child, 550t

in pneumonia, adult, 536t TACS. See total anterior circulation infarction syndrome tears absent sensitivity, specificity, or likelihood ratio in hypovolemia, child, 340t telephone diagnosis, urinary tract infection, women, 687 temporal arteritis (TA), 643-644 accuracy, 646 of laboratory evaluation, 649-650 of physical examination, 646-649 of symptoms, 646 elicit signs and symptoms, 644-645 evidence from guidelines, 655 findings of, 653 likelihood ratio, 648t, 649t, 654t, 656t literature review, 654 literature search, 653 methods search strategy and quality review, 645 statistical methods, 645-646 multivariate findings for, 654-655 original publication data, improvements in, 654 pathophysiology of, 644 precision, 646 of medical history and physical examination, 464 prior probability, 656 reference standard, changes in, 654, 656 sensitivity, 648t, 649t temporal artery, in temporal arteritis, 649t tenderness of bicipital groove, 586 tenting, 331 test of Speed, 586t negative likelihood ratio, 589 positive likelihood ratio, 589 test of Yergason, 586t negative likelihood ratio, 589 positive likelihood ratio, 589 test of Zaslav. See internal rotation resistance strength test thenar atrophy, 112t third heart sound, 190 Third National Health and Nutrition Examination Survey, 478 thoracic aortic dissection, acute, 659660 chest radiograph accuracy of, 666 sensitivity of, 667t clinical examination accuracy of, 662t, 673t, 665t

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

741

INDEX thoracic aortic dissection, acute (Continued) clinical history accuracy of, 663, 665 sensitivity of, 664t combinations of findings accuracy of, 666-667 diagnosis, 668t evidence from guidelines, 672 findings of, 671 likelihood ratio, 667t, 673, 673t literature review, 672 literature search, 671 methods data analysis, 663 literature search and selection, 661 study characteristics, 663 original publication data, improvements in, 671 pathophysiology of, 660 physical examination accuracy of, 665-666 sensitivity of, 666t prior probability, 673 reference standards, 673 sensitivity of, 672t sensitivity, specificity, or likelihood ratio in congestive heart failure, 666t signs and symptoms of, 660-661 thumb abduction, testing, 113 thyroid size sensitivity, specificity, or likelihood ratio for goiter, 282t thyroid-stimulating hormone (TSH), 286 thyrotoxicosis, 277 TIA. See transient ischemic attack tibia, 358, 359, 361 Tinaquant D-dimer, 573 Tinel sign, 112t, 114, 116 sensitivity, specificity, or likelihood ratio in carpal tunnel syndrome, 124t tissue necrosis factor, 616 tobacco use sensitivity, specificity, or likelihood ratio in myocardial infarction, 476t in obstructive airways disease, 161t, 162t in perimenopause, 416t tolerance, worry, eye opener, amnesia, kut down (TWEAK questionnaire), 49, 52t sensitivity, specificity, or likelihood ratio for alcohol abuse, 50t 742

tongue weakness, 451 tonsillar enlargement, in streptococcal pharyngitis, 618t tonsils, 617 toothache sensitivity, specificity, or likelihood ratio in sinusitis, 603t tooth count, in osteoporosis, 481t total anterior circulation infarction syndrome (TACS), 634 tourniquet test, 112t TR. See tricuspid regurgitation transient ischemic attack (TIA), 627, 631-633 diagnosis of accuracy, 632 reliability, 632-633 transillumination sensitivity, specificity, or likelihood ratio in maxillary sinusitis, 603t Traube space percussion, 607, 607f sensitivity, specificity, or likelihood ratio for splenomegaly, 613t tremor, 507, 514 action, 506, 507 classic essential, 506, 514 of Parkinson disease, 506 physiologic, 506, 514 rest, 506, 507 sensitivity, specificity, or likelihood ratio in Parkinsonism, 514t tremor syndromes, 506 tricuspid regurgitation (TR), 439 tricuspid valvular dysfunction, 293 TSH. See thyroid-stimulating hormone TWEAK questionnaire. See tolerance, worry, eye opener, amnesia, kut down 2-point discrimination, 112t 2-tailed χ2 test, 529 tympanic membrane, in otitis media, child, 503t tympanocentesis, 495, 496, 498, 501, 503 tympanometry, 495 type I collagen, 478

U ultrashort questionnaire. See Primary Care Evaluation of Mental Disorders ultrasonography, 17-18 for abdominal aortic aneurysm, 26

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

for acute cholecystitis, 138, 142 for appendicitis, 54 for deep vein thrombosis, 235, 240-241 for liver examination, 292, 295 for paranasal sinuses, 594 unguisometer, 164, 171 Unified Neurological Stroke Scale, 634 unstable angina, 462 upper lumbar disk herniation sit-to-stand test, accuracy, 84t urinary incontinence, perimenopausal, 409 urinary tract infection (UTI), 675 accuracy of dipstick urinalysis, 678 of physical examination, 678 of self-diagnosis, 678 of signs and symptoms, 678, 679680t of symptoms combinations, 678 algorithm for evaluate patients with symptoms, 681f, 682-683 definition, 675-676 differential diagnoses, 676 evidence from guidelines, 688 findings of, 687 likelihood ratio for symptoms combinations, 681t literature review, 688 literature search, 687 methods data analysis, 676-677 quality assessment of included articles, 676 multivariate approach, 689t original publication data, improvements in, 688 precision of, 678 pretest probability of, 680-682 prior probability, 689 reference standard, changes in, 688, 689 refining probability dipstick urinalysis, 682 with medical history and physical examination, 682 rule out complicate, 680 sensitivity analysis, 678, 680 study characteristics, 677 univariate findings, 689t urine specific gravity. See laboratory findings US Department of Agriculture, 48 US Department of Health and Human Services, 48

INDEX USPSTF. See US Preventive Services Task Force US Preventive Services Task Force (USPSTF), 18, 47, 99, 109, 248, 259, 261, 286, 392, 393, 416, 477, 688, 706 UTI. See urinary tract infection

V vaginal complaints, in vaginitis, 691-692 elicit symptoms and signs, 692-693 evidence from guidelines, 706 findings of, 705 likelihood ratio, 707 literature search, 705 methods criterion standards, evaluation of, 693 data extraction, 693 evaluation of, 693 inclusion and exclusion criteria, 693, 694t search strategy, 693 statistical analysis, 693 microscopic examination, 692f office laboratory tests, accuracy of, 699t, 700t inflammation, microscopic evidence of, 700 microscopy, 700 pH level, 700-701 whiff test, 701 original publication data, improvements in, 706 precision of, 693 prior probability, 707 reference standard tests, 707 signs, accuracy of discharge characteristics, 697, 699 inflammation, 699-700 odor, 700 symptoms, accuracy of, 693-697 bleeding, 697 discharge characteristics, 695 dyspareunia, 697 irritative symptoms, 695 itching, 695 odor, 695, 697 self-diagnosis, 697 univariate findings for, 706t vaginal dryness perimenopausal, 409 sensitivity, specificity, or likelihood ratio in menopause, 412t

vaginal infections, 676, 682 vaginal symptoms sensitivity, specificity, or likelihood ratio in vaginitis, 707t discharge characteristics, 695t-698t in urinary tract infection discharge, 689t irritation, 689t Valsalva maneuver, 128, 196, 420 sensitivity, specificity, or likelihood ratio in heart failure, 200t valvular heart disease physical examination, 444-445 variable sexual interest, perimenopausal, 409 vascular bruits, 29 vascular distribution, of stroke, 633-634 accuracy of, 633 reliability of, 633-634 vasodilator therapy, 184 venography, 229 venous hums, 104 compared to arterial bruit, 292t venous thromboembolism, 227, 561 risk factors for, 562t venous waveforms abnormal, 126t analysis of, 126 in central venous pressure assessment, 126t ventricular fibrillation cardiac arrest, 215 verification bias, 16, 138, 141, 498, 582, 589 vertigo, 709-710 causes, 710t elicit symptoms and signs, 711712 finding of, 715 likelihood ratio, 717 literature review, 716 literature search, 715 origin of, 710 original publication data, improvements in, 716 prior probability, 717 reference standard, changes in, 716, 717 symptoms and signs accuracy of, 712-713 vestibular neuronitis, 711, 712 Veterans Affairs, 179

visual analog scale, 564 volume depletion, 127, 315, 316, 325, 326, 327, 330, 331 vomiting. See nausea and vomiting

W wall-occiput distance test for occult thoracic vertebral fractures, 479, 485 in osteoporosis, 483t Walsh algorithm for sore throat, 621f water hammer pulse (Corrigan), 425t weak thumb abduction, 112t web resources, for alcohol screening, 49 weight in osteoporosis, 484, 485, 490t sensitivity, specificity, or likelihood ratio gain in ascites, 68t, 73t loss in low back pain, 76t weighted κ statistic, 571 Welch-Allyn-Finnoff transilluminator, 595 Wells scoring system, 565, 566t. See also clinical prediction rules and scores sensitivity, specificity, or likelihood ratio for deep vein thrombosis, simplified, 246t for pulmonary embolus, simplified, 572t, 575t wheezing, 151, 154-155 sensitivity, specificity, or likelihood ratio in obstructive airways disease, 153f, 154t, 161t, 162t in pneumonia, adult, 536t in pneumonia, infant and child, 550t white coat hypertension. See office blood pressure WHO. See World Health Organization WHO-5. See World Health Organization-5 Well-Being Scale whole blood agglutination test, 230 whole blood assays, 239 WinBUGS software, 218 WISE. See Women’s Ischemia Syndrome Evaluation study

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

743

INDEX Women’s Ischemia Syndrome Evaluation (WISE) study, 415 workup bias. See verification bias World Health Organization (WHO), 41, 48, 161, 247, 330, 393, 408, 462, 478, 539, 547, 548 Flunet, 344

744

International Influenza Program, 344 World Health Organization-5 WellBeing Scale (WHO-5), 259

Y Yale 1-question screen, 259-260

Z X x-ray. See radiographic findings

Page numbers followed by a t or f indicate locations of Tables or Figures, respectively.

Zung Self-Rating Depression Scale (SDS), 250, 251